# session4 by shuifanglj

VIEWS: 0 PAGES: 46

• pg 1
```									TCOM 540

Session 4
Agenda
•   Review Session 2 assignment and Quiz
•   Economies of Scale
•   Traffic and Cost Generation Techniques
•   Case Study of Traffic Generation
Economies of Scale
• Highly important in telecommunications
– Big pipes often (but not always!) cheaper than
small ones per unit capacity
– Big pipes carry traffic more efficiently – lower
blocking/more effective capacity
Big Pipes are (Usually) Cheaper
per Unit capacity
• FTS2001 price for dedicated circuit from
Falls Church, VA to Englewood, CO

DS0      \$40     \$276     \$59     \$375     \$375

4xDS0   \$155     \$798    \$175    \$1128     \$282

T1      \$155    \$1787    \$175    \$2117     \$88

3xDS1   \$1813   \$6025    \$1245   \$9083     \$126
Efficiency of Big Pipes
• Example: Max minutes per circuit for p = 0.03

Circuits       Minutes   Min/Ckt
1         250       250
2        2300      1150
3        5850      1950
4      10300       2575
5      15300       3060
6      20800       3467
Traffic Models
• Topics:
–   Uniform
–   Random
–   Population power
–   Modified population power
–   Normalized model
–   Asymmetric model
Uniform
• Traffic from site a to site b is
T(a,b) = C
• Not realistic for most situations
Random
• T(a,b) = R
– Where R is a random number generated on a
defined interval [Tmin, Tmax]
• This simple model is useful in some
applications
– WWW-type traffic
– As one component of a more complex model
Population Power
• If the sites a and b have populations Pa and
Pb, and are distance Da,b apart, then

T(a,b) = a*(pa*pb)b/Da,bg

where a, b, g are suitably-chosen constants
Modified Population Power
• Large, close sites can dominate the simple
population power model
• Fails if D = 0
• Use offsets Doff and poff

T(a,b) = a*(pa*pb+ poff)b/(Da,b+ Doff )g
Normalization
• Choose a so as to give the desired level of
traffic on the network by matching
– Total traffic on network
– Traffic from each site (row normalization)
– Traffic to and from each site (row and column
normalization)
• Must have S traffic in = S traffic out for this to be
possible!
• Algorithmic iterative approach can be used
Asymmetric Traffic
• Models considered so far are symmetric
T(a,b) = T(b,a)
• Real traffic is often not symmetric
– E.g., WWW access
• Introduce concept of Levels
– Each site is assigned to a level Li, i=1, … , n
• Matrix of multipliers M(Li, Lj)
Asymmetric Traffic (2)

• If M =  ( )
0 1
3 0  then traffic from a level 1 node to
a level 2 node will be one-third of the traffic
from a level 2 node to a level 1 node
• Revised model is then
T(a,b) = a*M(La, Lb)*(pa*pb+ poff)b/(Da,b+Doff
)g
More Complex Models
• Introduce a random element into the
previous model
• Superimpose multiple components
representing different types of traffic
• Redefine the distance function
– “Organizational distance”
Tariff Structures
• Fundamental distinction
– Fixed cost per month
• Private lines, PVCs, some internet access
• Cost may also depend on bandwidth, distance,
“quality”, …
– Usage based
• Switched pipes – e.g., switched voice
– Price may also depend on distance, bandwidth, …
• Data – per packet
Tariff Structures (2)
–   Initiation charge
–   Cancellation charge
–   Features charges
–   Access charges
Tariff Structures (3)
• Tariff structures are not simple
– Depend on level of competition, administrative
and other boundaries, other factors
– Best deals are not tariffed
• Usually competed/negotiated by large customers
Tariff Structures (4)
• In some cases (especially international),
tariffs may not exist, or may be for half-
circuit only
– I.e., to a notional mid-ocean meeting point
• Commercial tariff services (e.g., Valucom,
CCMI) are not cheap
Linear Distance-Based Charges
• Underlying tariff structure for many dedicated
circuits and PVCs is distance-based, e.g.,
Cost = a + b*distance
• Rates for individual location-pairs may vary
– Carrier may have excess capacity on certain routes =>
may be cheaper
– Carrier may have to buy capacity from others => may
be more expensive
Piecewise-Linear Charges

Cost/month

Distance

• Each segment is linear
Step Function

Cost/month

Distance
Cost Generators
• Cahn provides 5 cost generators
– 1. Linear (TARIFF-UNIVERSAL)
– 2. Piecewise linear (2 pieces) (TARIFF-
UNIVERSAL)
– 3. Piecewise linear (limited international)
(TARIFF-NATIONAL)
– 4. International half-circuit (TARIFF-HCKT)
– 5. Exceptions (TARIFF-OVERRIDE)
Case Study of
Traffic Generation
Outline
• Background and Problem Statement
– Current Academic Researchers in this Problem
•   Proposed Algorithm
•   Sample NetHealth Data Description
•   Numerical Example with Proposed Algorithm
•   O-D Matrix Tool Interface
•   O-D Matrix Tool Outputs
•   Next Steps
•   References
Background and Problem
Statement
• Background
– Client uses Concord’s NetHealth to monitor performance
– When “what if” analyses on the network are required, the origin –
destination (O - D) traffic matrix that generated the measured link
utilization reported by NetHealth is required
– The O-D traffic is the matrix of offered loads that originates at one
node and is destined for another node 1
• Problem:
– To estimate the O-D traffic matrix given aggregate link utilization

1 Note:   nodes are groups of users that enter a router on a common interface, not single users
Problem Background
• Vardi (1996) first used the term “network tomography” to
refer to this problem due to the similarity between network
inference and medical tomography1.
• There are two forms of network tomography (our problem
is the 2nd) (Coates, 2001)
– Link level parameter estimation based on end-to-end, path level
traffic measurements, or
– Sender-receiver path-level traffic intensity estimation based on
link-level traffic measurements (antithesis of first form)

1 Tomography:  a method of producing a three-dimensional image of the internal structures
of a solid object (as the human body or the earth) by the observation and recording of
the differences in the effects on the passage of waves of energy impinging on those
structures (Merriam-Webster Dictionary).
this Problem
• Bell Labs: Cao, Cleveland, Lin, Sun,
Vander Wiel, Davis, Yu, Zhu
• UC Berkeley: Coates, Hero, Nowak, Yu
• Vardi (Rutgers)
Used
• Frequently a linear model is assumed for the O-D traffic matrix
estimation problem (Vardi (1996), Coates et al. (2001), Cao et al.
(2001), Cao et al. (2000))
y=Ax
where
y =(y1, …, ynL)’ is the observed column vector of incoming and
outgoing byte counts for each of nL links
x =(x1, …, xnP )’ is the unobserved vector of byte counts for all OD nP
pairs in the network, where nP = n(n-1) in a network of n nodes
A = nL x nP routing matrix (0/1)
– Elements aij of A are “1” if link i belongs to the directed path of
the O-D pair j
Used
• Often the matrix A has a very large dimension
(thousands of rows and columns for a moderate
number of sites), and thus iterative algorithms are
used
• Although the model is linear, it is not a typical
linear regression because of the the nonnegativity
constraints on the parameters x
• Also, because nP is typically larger than n L,
identifiability of the parameters is a problem

a

c

b                    d

A matrix:                                    O-D pairs
1 2 3 4 5 6 7 8 9 10 11 12
ab ac ad ba bc bd ca cb cd da db dc
1 (a to b)    1 0 0 0 0 0 0 0 0 0 0 0
2 (a to c)    0 1 1 0 0 1 0 0 0 0 0 0
3 (b to a)    0 0 0 1 0 1 1 0 0 1 0 0
Links 4 (b to c)    0 0 0 0 1 0 0 0 0 0 0 0
5 (c to b)    0 0 0 0 0 0 1 1 0 1 1 0
6 (c to d)    0 0 1 0 0 1 0 0 1 0 0 0
7 (d to c)    0 0 0 0 0 0 0 0 0 1 1 1
• Vardi (1996) assumes the O-D byte counts are Poisson
– Maximum likelihood via the Expectation-maximization1 (EM)
algorithm is used to solve for O-D matrix
– O-D byte counts can be assumed Normal as an approximation to
Poisson; may allow simpler solution techniques
– A moment method for estimation is also proposed
• Cao et al. (2000) embellish the above Poisson model by
assuming all quantities are time-varying
– Maximum likelihood estimation is done via a combination of the
EM algorithm and a second-order global optimization method

1 EM algorithm is an algorithm for finding likelihood estimators from incomplete data. It is
an iterative algoirthm in which a starting estimate is updated using a transformation
called the EM operator. (Vanderbei and Iannone, 1994).
• Cao et al. (2001) describe a divide-and-conquer approach
that can be used for large networks
– O-D pairs are partitioned into a number of disjoint sets
• Clustering methods used to group the O-D pairs
– For each O-D set, a corresponding set of links are selected for
estimation (not disjoint)
• Heuristics used to select links, balancing estimation accuracy
and computational cost
– Parameters are estimated for each O-D set, which aggregates the
remaining rates for other O-D pairs not in the current set
– Computational complexity can be reduced from O(Ne5) to O(Ne2),
where Ne is the number of edge nodes
Cao et al. Cao et al.
(2000)     (2001)
("Time-   ("Scalable
Feature                    Vardi    Varying…") Method...")
Scalability                                                          
                      
1
Computational Complexity
Mathematical/Programming Complexity                                  
Robustness to model misspecification                                 
Usable w/ missing data                                               

 Exceeds requirements, highly desirable
 Meets requirements
 Does not meet requirements, less desirable
1 This   method can also be used with parallel computing
Need for New Algorithm
• Academic methods are not feasible due to
number of nodes in Client’s network
(approximately 1000 nodes)
• Proposed algorithm is a heuristic; faster
Proposed Algorithm
• Compute the probability of originating at node i and
terminating at node j
– p(i,j) is the fraction of all (unidirectional) network traffic coming in
to node j, if i to j within a certain number of hops
• Estimate the total load originating at each node as the
• Compute TM(i,j), the estimate of packets per second
originating at node i and terminating at node j, using
estimate of the total load originating at each node times the
probability p(i,j)
• Route traffic via Enhanced Interior Gateway Routing
Protocol (EIGRP)
Proposed Algorithm (2)
• Compute an adjustment factor based on the ratio of the
achieved
Sample NetHealth Data
Description
• Net data from January 2001
• Data
– 991 nodes
• Data fields for each link record:
– Originating and terminating nodes for each link
– Link utilizations in each direction
O-D Matrix Tool Interface
• Input Parameters:
– Max Hops: max number of hops between nodes for
nonzero traffic probability
– Link Factor: maximum deviation of estimated from
Numerical Example with
Proposed Algorithm
• Input Parameters:
– Number of Hops varied from 2 to 10
Numerical Example with
Proposed Algorithm (2)
• Results (run times on 866 Mhz PC):
–   Problem with Backbone Links only runs quickly (1 min)
–   Full Network problem about 4 hours
–   Error in link utilizations under 1 percent for either problem
–   Small increase in run times with number of hops (e.g., 20%
increase as number of hops doubles from 4 to 8)

Data
Setup: Analysis:
Problem      Error (%)   (min)    (min)
Backbone Only        0.1%        0.1      1.1
Full Network         0.6%       26.5    229.4
O-D Matrix Tool Outputs
• Network Performance
– Expected packet delay
– Average number of hops
– Estimated and measured packets per second in each direction
– Expected packet delay
• Node Performance
– Originating packets per second
– Total packets per second in and out of each node
– Number connected to each node
Sample Map Output
NIPRNet: Network
Client CONUSCONUS
References
• “A Scalable Method for Estimating Network Traffic Matrices from
Link Counts”, Bell Labs Tech Report, 2001, Jin Cao, D. Davis, Scott
Vander Wiel Bin Yu, and Zhengyuan Zhu.
• “Time-Varying Network Tomography: Router Link Data”, Journal of
the American Statistical Association, 95, 1063-1075, 2000, Jin Cao, D.
Davis, Scott Vander Wiel and Bin Yu.
• Mark Coates, Alfred Hero, Robert Nowak, and Bin Yu (2001). “Large
scale inference and tomography for network monitoring and
diagnosis”. Technical Report 604, Stat Dept, UC Berkeley. August,
2001.
• Y. Vardi, "Network Tomography : Estimating Source-Destination
Traffic Intensities From Link Data", Journal of the American
Statistical Association March 1996,Vol.91 No 433, Theory and
Methods
• R.J. Vanderbei and J. Iannone, "An EM approach to OD matrix
estimation," Technical Report, Princeton University, 1994
Assignment Session 4
For the following set of sites and traffic, design a minimum-cost
network with reasonable performance:
V           H              A          B           C         D         E
A              156        998   A                    250       250       400
B             1929       1537   B        250                   250       350       200
C             2112        542   C        250         250                 450       150
D             1526       1090   D        400         350       450
E             2727       1210   E                    200       150

Sites                                 Traffic (kbps)
Assignment Session 4 (2)
You have two types of links available for this design:

1.   Capacity 1.5 Mbps, cost \$80 + \$8/mile
2.   Capacity 64kbps, cost \$10 + \$1/mile

You may use multiple links to satisfy demand.

Note: What is the maximum utilization you will allow per