An Information-theoretic Approach to Network Measurement and

Document Sample
An Information-theoretic Approach to Network Measurement and Powered By Docstoc
					    An Information-theoretic Approach to
    Network Measurement and Monitoring

      Yong Liu, Don Towsley, Tao Ye, Jean Bolot


q   motivation
q   background
q   flow-based network model
q   full packet trace compression
    § marginal/joint
q coarser granularity
    § netflow and SNMP
q future work

q network monitoring: sensing a network
    § traffic engineering, anomaly detection, …
    § single point v.s. distributed
q different granularities
    § full traffic trace: packet headers
    § flow level record: timing, volume
    § summary statistics:
      byte/packet counts
q challenges
    § growing scales:
      high speed link, large topology
    § constrained resources:
      processing, storage, transmission
    § 30G headers/hour at UMass gateway
q solutions
    § sampling: temporal/spatial
    § compression: marginal/distributed


    q how much can we compress monitoring traces?
    q how much information is captured by different
      monitoring granularity?
       § packet trace/NetFlow/SNMP
    q how much joint information is there in multiple
       § joint compression
       § trace aggregation
       § monitor placement

                 Our Contribution
    q flow-based network models
       § explore temporal/spatial correlation in network traces
       § projection to different granularity
    q information theoretic framework
       § entropy: bound/guideline on trace compression
       § quantitative approach for more general problems
    q validation against measurement from operational

               Entropy & Compression
    q Shannon entropy of discrete r.v.

    q compression of i.i.d. symbols (length M) by coding
       § coding:
       § expected code length:

       § info. theoretic bound on compression ratio:

    q Shannon/Huffman coding
       § assign short codeword to frequent outcome
       § achieve the H(X) bound

                Entropy & Correlation
q joint entropy

q entropy rate of stochastic process
    §   exploit temporal correlation

    §   Lempel-Ziv Coding: (LZ77, gzip, winzip)
        asymptotically achieve the bound for stationary process
q joint entropy rate of correlated processes
    §   exploit spatial correlation

    §   Slepian-Wolf Coding: (distributed compression)
        encode each process individually, achieve joint entropy rate in limit

           Network Trace Compression
    q naïve way: treat as byte stream, compress by generic tools
       §   gzip compress UMass traces by a factor of 2
    q network traces are highly structured data
       §   multiple fields per packet
           • diversity in information richness
           • correlation among fields
       §   multiple packets per flow
           •   packets within a flow share information
           •   temporal correlation
       §   multiple monitors traversed by a flow
           •   most fields unchanged within the network
           •   spatial correlation
    q network models
       § explore correlation structure
       § quantify information content of network traces
       § serves as lower bounds/guidelines for compression algorithms

                  Packet Header Trace
              0                        16                           31
                               time stamp (sec.)
                            time stamp (sub-sec.)
              vers. HLen       ToS               total length
                        IPID            flags     fragment offset
 IP Header        TTL       protocol        header checksum
                               source IP address
                            destination IP address
                     source port            destination port
                            data sequence number
TCP Header                 acknowledgment number
              Hlen       TCP flags               window size
                     checksum                   urgent pointer
                  Header Field Entropy
              0                        16                           31
                               time stamp (sec.)
     Timing                                                              time
                            time stamp (sub-sec.)
              vers. HLen       ToS               total length
                        IPID            flags     fragment offset
 IP Header        TTL       protocol        header checksum
                               source IP address
                            destination IP address                       flow id
                     source port            destination port
                            data sequence number
TCP Header                 acknowledgment number
              Hlen       TCP flags               window size
                     checksum                   urgent pointer
            Single Point Packet Trace

      T0   F0   T1    F1   T3       F0                 Tm   F0       Tn    Fn

     q packet inter-arrival:
        # bits per packet:
     q temporal correlation introduced by flows
        § packets from same flow closely spaced in time
        § they share header information

     q flow based trace:       T0    F0      T3   F0               Tm     F0

     q flow record:            F0        K   T0

                           flow flow arrival
                                                   packet inter-arrival
                            ID size time

                     Network Models
     qflow-based model
       § flow arrivals follow Poisson with rate
       § flows are classified to independent flow classes
         according to routing (the set of routers traversed)
       § flow i is described by:
          • flow inter-arrival time:
          • flow ID:
          • flow length:
          • packet inter-arrival time within the flow:
       § packet arrival stochastic process:

          Entropy in Flow Record

 q # bits per flow:

 q # bits per second:
 q marginal compression ratio

 q determined by flow length (pkts.) and
  variability in pkt. inter-arrival.

     Single Point Compression: Results

          Trace          H (total)        Model         Compression
                                          Ratio          Algorithm
          C1-in          706.3772         0.2002          0.6425
         BB1-out         736.1722         0.2139          0.6574
         BB2-out         689.9066         0.2186          0.6657

 q Compression ratio lower bound calculated by entropy much lower
 than real compression algorithm
 q Real compression algorithm difference
     § Records IPID, packet size, TCP/UDP fields
     § Fixed packet buffer for each flow => many flow records for long flows

     Distributed Network Monitoring
 q single flow recorded by multiple
 q spatial correlation:
   traces collected at distributed
   monitors are correlated
 q marginal node view:
   #bits/sec to represent flows seen
   by one node, bound on single point
 q network system view:
   #bits/sec to represent flows cross
   the network, bound on joint
 q joint compression ratio:
   quantify gain of joint compression

            Baseline Joint Entropy Model
     q “perfect” network
        § fixed routes/constant link delay/no packet loss
     q flow classes based on routes
        § flows arrive with rate:
        § # of monitors traversed:
        § #bits per flow record:
     q info. rate at node v:

     q network view info. rate:

     q joint compression ratio:

     q dependence on # of monitors travered
        Joint Compression: Results

              Set of Traces             Joint Compression Ratio
     {C1-in, BB1-out, C2-in, BB2-out}             0.5
            {C1-in, BB1-out}                    0.8649
            {C1-in, BB2-out}                    0.8702
            {C2-in, BB1-out}                    0.7125
            {C2-in, BB2-out}                    0.6679

           Coarser Granularity Models
     q NetFlow model
       § similar to flow model:
       § joint compression result similar to full trace
     q SNMP model
       § any link SNMP rate process is sum of rate processes of all
         flow classes passing through that link
       § traffic rates of flow classes are independent Gaussian
       § entropy can be calculated by covariance of these processes
       § information loss due to summation

       § small joint information between monitors
       § difficult to recover rates of flow classes from SNMP data

     Joint Compression Ratio of Different

       Set of Traces    SNMP     NetFlow   Packet Trace
     {C1-in, BB1-out}   1.0021   0.8597      0.8649

     {C1-in, BB2-out}   0.9997   0.8782      0.8702


 q information theoretic bound on marginal
  compression ratio -- ~ 20% (time+flow id,
  even lower if include other low entropy fields)
 q marginal compression ratio high (not very
  compressible) in SNMP, lower in NetFlow, and
  the lowest in full trace
 q joint coding is much more useful/nessassary
  in full trace case than in SNMP
 q “More entropy for your buck”

                   Future Work

 q network impairments
     § how many more bits for delay/loss/route change
 q   model netflow with sampling
 q   distributed compression algorithms
 q   lossless v.s. lossy compression
 q   entropy based monitor placement
     § maximize information under constraints



Shared By: