Document Sample
ppt Powered By Docstoc
					Network Anomaly Diagnosis

        Status Report
•   Network performance metrics
•   Active monitoring tools
•   Preparation of canonical dataset
•   Anomaly detection
•   Anomaly diagnosis
 Network performance metrics
• One-Way Delay (OWD)
   –   Serialization Delay
   –   Propagation Delay
   –   Queuing Delay
   –   Forwarding Delay
• Round-Trip Time (RTT)
• Delay Variation (Jitter)
• Packet Loss
   – Congestion
   – Errors
• Packet Reordering
• Maximum Transmission Unit (MTU)
• Bandwidth Delay Product (BDP)
        One-Way Delay (OWD)
• The time it takes for a packet to reach its end-to-
  end destination
• It can be broken down into:
   – per-hop one-way delays, and these in turn into:
      • per-link and
      • per-node delay components
• The per-link component of one-way delay consists
  of two sub-components: propagation delay and
  serialization delay.
• The per-node component of one-way delay also
  consists of two sub- components: forwarding
  delay and queuing delay.
          Serialization Delay
• Its the time taken to separate a packet into
  sequential link transmission units (bits).
• It is obtained by dividing the packet size (in
  bits) by the capacity of the link (in bits per
• Nowadays, as links increasingly have a
  higher bit rate, serialization delay is less
           Propagation Delay
• Propagation delay is the duration of time for
  signals to move from the transmitting to the
  receiving end of a link.
• On simple links, this is the product of:
  – the link's physical length and
  – the characteristic propagation speed of media.
• On high-speed wide-area network (WAN)
  paths, delay is usually dominated by
  propagation times.
             Queuing Delay
• Queuing delay is defined as the time a
  packet has to spend inside a node such as a
  router while waiting for availability of the
  output link.
• It depends on
  – the amount of traffic competing to send packets
    towards the output link and
  – on the priorities of the packet
           Forwarding Delay
• Processing at the node
  – Reading forwarding-relevant information
     • Dest address + other headers
  – Forwarding decision
     • Based on the routing table
    Round-Trip Time (RTT)
• Its the sum of the one-way delays from
  source to destination plus time it takes B to
  formulate the response.
• Large RTT values can cause problems for
  TCP and other window-based transport
• The round-trip time influences the
  achievable throughput, as there can only be
  a window's worth of unacknowledged data
  in the network.
       Delay Variation (Jitter)
• OWD is not constant on a real network
  because of competing traffic and contention
  for processing resources.
• The difference between a given packet‘s
  actual and average OWD is termed ‗delay
  variation‘ or jitter.
• It only compares the delays experienced by
  packets of equal size, as OWD is dependent
  on packet size because of serialization delay
                Packet Loss
• Packet loss is determined as the probability of a
  packet being lost in transit from a source A to a
  destination B.
• Applications requiring reliable transmission e.g.
  Bulk data transfers, use retransmission, which
  reduces performance.
• In addition, congestion-sensitive protocols such as
  standard TCP assume that packet loss is due to
  congestion, and respond by reducing their
  transmission rate accordingly.
• Congestion and errors are the two main reasons
  for packet loss.
• When the offered load exceeds the capacity of a
  part of the network, packets are buffered in
• Since these buffers are also of limited capacity,
  congestion can lead to queue overflows, which
  leads to packet losses.
• Congestion can be caused by
   – moderate overload condition maintained for an
     extended amount of time or
   – by the sudden arrival of a very large amount of traffic
     (traffic burst).
• Another reason for loss of packets is
  corruption, where parts of the packet are
  modified in-transit.
• When such corruptions happen on a link
  (due to noisy lines etc.), this is usually
  detected by a link-layer checksum at the
  receiving end, which then discards the
            Packet Reordering
• The Internet Protocol (IP) does not guarantee the
  packet ordering.
• Packet reordering concerns packets of different
   – larger packets take longer to transfer and may be
     overtaken by smaller packets in transit.
• It can be measured by:
   – injecting the same traffic pattern via traffic generator
     and calculating the reordering.
   – To measure maximal reordering: short burst of long
     packets immediately followed by a short burst of short
 Maximum Transmission Unit (MTU)
• It describes the maximum size of an IP
  packet that can be transferred over the link
  without fragmentation.
• Common MTU sizes are:
  – 1500 bytes (Ethernet, 802.11 WLAN)
  – 4470 bytes (FDDI, common default for POS
    and serial links)
  – 9000 bytes (Internet2 and GÉANT convention,
    limit of some Gigabit Ethernet adapters)
  – 9180 bytes (ATM, SMDS)
 Bandwidth Delay Product (BDP)
• The Bandwidth Delay Product (BDP) of an end-to-
  end path is the product of the bottleneck bandwidth
  and the delay of the path.
• It is often useful to think of BDP as the "memory
  capacity" of a path, i.e. the amount of data that fits
  entirely into the path between two end-systems.
• This relates to throughput, which is the rate at
  which data is sent and received.
• Network paths with a large BDP are called Long
  Fat Networks or LFNs.
• BDP is an important parameter for the
  performance of window-based protocols such as
         Available bandwidth
• dynamic bandwidth capacity (Cap) by
  analyzing the minimum inter-packet delay;
• the cross-traffic (Xtr) byanalyzing the inter-
  packet delay variability;
• and the available bandwidth
  – (Abw = Cap – Xtr)
           Active Probing Tools
• Some publicly available active network monitoring
  tools are:
• Throughput & Delay Measurement Tools
   –   Ping
   –   Traceroute
   –   Iperf
   –   Thrulay
• Path Characterization & Bandwidth Estimation
   –   pathChirp
   –   Pathload
   –   ABwE
   –   Netperf
   –   Nettest
• Uses ICMP echo mechanism
• Measures
   – RTT
   – Packet loss
   – Reachability
• Source & Destination use timestamps to calculate
• Packet loss can be measured by sending a set of
  packets from a source to a destination and
  comparing the number of received packets against
  the number of packets sent.
                       Ping example
                                         Remote host
  Repeat count           Packet size

syrup:/home$ ping -c 6 -s 64
PING ( 64 data bytes
72 bytes from icmp_seq=0 ttl=240 time=641.8 ms
72 bytes from icmp_seq=2 ttl=240 time=1072.7 ms
72 bytes from icmp_seq=3 ttl=240 time=1447.4 ms  Missing seq #
72 bytes from icmp_seq=4 ttl=240 time=758.5 ms
72 bytes from icmp_seq=5 ttl=240 time=482.1 ms
--- ping statistics --- 6 packets transmitted, 5
     packets received, 16% packet loss round-trip min/avg/max =
     482.1/880.5/1447.4 ms

• Determines the route a packet takes through the
  Internet to reach its destination; i.e. the number of
  "hops" it takes.
• It sends "probe" packets with TTL values
  incrementing from one, and uses ICMP "Time
  Exceeded" messages to detect "hops" on the way
  to the specified destination.
• It also records "response" times for each hop, and
  displays losses and other types of failures
• Used to measure
   – hop-by-hop connectivity and
   – RTT
• UDP/ICMP tool to show route packets take from local to
  remote host  Max hops
                                   Remote host
17cottrell@flora06:~>traceroute -q 1 -m 20
traceroute to (, 20 hops max, 40 byte packets
 1 RTR-CORE1.SLAC.Stanford.EDU ( 0.642 ms
 2 RTR-MSFC-DMZ.SLAC.Stanford.EDU ( 0.616 ms
 3 ESNET-A-GATEWAY.SLAC.Stanford.EDU ( 0.716 ms
 4 ( 1.377 ms
 5 ( 75.536 ms
 6 ( 80.629 ms
 7 ( 154.742 ms
 8 ( 137.403 ms
 9 ( 135.850 ms
10 ( 128.648 ms                    No response:
11 ( 762.150 ms                  Lost packet or router
12 ( 751.851 ms           ignores
13 *
14 ( 827.301 ms
• It reports bandwidth, delay jitter, datagram loss.
• It has two modes
   – TCP
   – UDP
• In TCP mode it is used to measure the bandwidth.
• It reports MSS (Maximum Segment Size) and the
  observed read sizes. It supports TCP window size
  adjustment via socket buffers.
• In UDP mode packet loss and delay jitter can be

• Thrulay stands for THRUput and deLAY.
• It measures the capacity of a network by
  sending a bulk TCP stream over it.
• One-way delay can be measured by sending
  a Poisson stream (arrivals at random) of
  very precisely positioned UDP packets.
• Pathload is based on the technique of Self-Loading
  Periodic Streams (SLoPS), for measuring available
• A periodic stream in SLoPS consists of K packets of size
  L, sent to the path at a constant rate R.
• If the stream rate R is higher than the available bandwidth,
  the one-way delays of successive packets at the receiver
  show an increasing trend.
• In pathload, sender uses UDP for packet stream and TCP
  connection for controlling measurements.
• Sender timestamps each packet upon transmission, which
  provides relative one-way delay at the receiver side.
• pathChirp is used to calculate unused
  capacity or available bandwidth.
• Based on the concept of self-induced
  – Probing rate < available bw  no delay increase
  – Probing rate > available bw  delay increases
• Unique to pathChirp is an exponentially
  spaced chirp probing train.
• ABwE is a light weight available bandwidth
  measurement tool based on the packet pair
  dispersion technique.
• It uses fixed size packets with specific delay
  between each packet pair.
• The observed packet pair delay is converted into
  available bandwidth calculations.
• The tool can be used in continuous mode and
  detects all substantial bandwidth changes caused
  by improper routing or by congestions.
• Netperf is a benchmark that can be used to
  measure the performance of many different types
  of networking.
• It provides tests for both unidirectional
  throughput, and end-to-end latency.
• Essentially, these tests will measure how fast one
  system can send data to another and/or how fast
  that other system can receive it.
• Request/response performance is the second area
  that can be investigated with netperf.
                 Selection of tools
• Shriram et al. concludes his comparison of public End-to-
  End bandwidth estimation tools on High-Speed Links with
  following remarks:
   – ―Pathload and Pathchirp are the most accurate. Iperf requires
     maximum buffer size and is sensitive to small packet loss.‖
• Ribeiro et al. in comparisons of pathChirp with pathload
  and TOPP, clearly show that pathChirp performs far batter
  than the other two tools in single-hop, multi-hop scenarios
  and on real internet as well.
• Cottrel et al. states ―Ping utility does not require any extra
  installation/deployment efforts and has become a de-facto
  standard for measuring RTT and packet loss.‖
• Tools selected are pathChirp, pinger, thrulay and
• Cottrell L., Internet Monitoring, Presented at NUST
  Institute of Information Technology (NIIT) Rawalpindi,
  Pakistan, March 15, 2005
• Cottrell L., Matthews W. and Logg C., Tutorial on Internet
  Monitoring & PingER at SLAC
• Jacobson V., ―pathchar — a tool to infer characteristics of
  Internet paths‖
• Ribeiro V., Riedi R., Baraniuk R., Navratil J., Cottrell L.
  pathChirp: Efficient Available Bandwidth Estimation for
  Network Paths.
• Jain M., Dovrolis C., End-to-End Available Bandwidth:
  Measurement Methodology, Dynamics, and Relation with
  TCP Throughput. ACM SIGCOMM 2002.
• Navratil J. and. Cottrell L., ABwE: A Practical Approach
  to Available Bandwidth Estimation.
•   Beginner's Guide to Network-Distributed Resource Usage
•   Cottrell, L., Logg, C.: Overview of IEPM-BW Bandwidth Testing of Bulk
    Data Transfer. In: Sc2002: High Performance Networking and Computing.
•   Shriram A., Murray M., Hyun Y., Brownlee N., Broido A., Fomenkov M.,
    claffy k., ―Comparison of Public End-to-End Bandwidth Estimation tools on
    High-Speed Links‖
•   Prasad R. S., Dovrolis C., ―Capacity estimation using packet dispersion
    techniques : recent progress and open issues‖ Presentation
•   Labit Y., Owezarski P., Larrieu N., Evaluation of Active Measurement Tools
    for Bandwidth Estimation in Real Environment.
•   Dovrolis C., Jain M., ―End-to-end available bandwidth estimation using
    Pathload‖ Presentation
•   claffy k., Dovrolis C., ―Bandwidth estimation: measurement methodologies
    and applications‖ Presentation
•   GÉANT2 Performance Enhancement and Response Team (PERT) User Guide
    and Best Practice Guide
    Anomaly Detection Techniques
•   Exponential weighted moving average (EWMA)
•   Holt-Winters forecasting algorithm
•   Plateau Algorithm
•   Principal Component Analysis (PCA)
•   ARIMA-based Box-Jenkins forecasting models
•   Signal analysis techniques, e.g. the wavelet
    analysis scheme
• EWMA, a forecasting based techniques
• EWMA predicts the next value in a given
  timeseries based on recent history.
• Specifically, if zt denotes the traffic at time t,
  then the EWMA prediction for time t+1 is
  given by ^zt+1 as:

• Anomalies can then be quantified by taking
  the difference between the forecasted and the
  actual value, i.e.,
   Holt-Winters (HW) Algorithm
• Service network variable time series exhibit the
  following regularities:
   – A trend over time (i.e., a gradual increase in application
     daemon requests over a two month period due to
     increased subscriber load).
   – A seasonal trend or cycle (i.e., every day bytes per
     second increases in the morning hours, peaks in the
     afternoon and declines late at night).
   – Seasonal variability. (i.e., application requests fluctuate
     wildly minute by minute during the peak hours of 4-8
     pm, but at 1 am application requests hardly vary at all).
     Holt-Winters (HW) Algorithm
• The Holt-Winters (HW) algorithm uses a triple
  Exponential Weighted Moving Average (EWMA)
  approximation to characterize the time series behavior
  as a superposition of three components: a baseline, a
  linear trend and a seasonal effect.
             Plateau Algorithm
• It is used to detect a significant change in the base RTT
  of a path.
• Base RTT means a change in most of the RTT values
  measured, new values are based on the new, higher level.
• The plateau detector is based around two windows,
   – the summary window and
   – the sample window.
• Summary window is used to characterize the normal
  state of the path which is based on the previous samples.
• The number of samples stored in the summary window is
  a user settable parameter.
              Plateau Algorithm
• Samples are first stored in summary window,
  trigger selector decides if a sample forms part of a
• In this case, the sample is stored in the sample
• This decision is made on the value of the sample,
  the mean and variance of the summary window,
  and on parameters that the user has chosen.
   – RTT > samplemean + (samplevariance* sensitivty)
            Plateau Algorithm
• A trigger is generated when a number of samples
  (a user settable parameter) meet the above
  equation are received.
• Each time a sample, that fulfils the above
  equation, is received a counter is incremented.
• For a non-zero counter when a sample that does
  not meet the above equation is received the
  counter is decremented.
• If the counter reaches the user parameter trigger
  duration a trigger is generated. If the counter
  returns to zero the trigger is aborted.
• In either case, data in the sample window is passed
  to the summary window.
  Principal Component Analysis (PCA)
• PCA is a coordinate transformation method that maps a given set of
  data points onto new axes. These axes are called the principal axes or
  principal components.
• When working with zero-mean
• data, each principal component has the property that it points in
• the direction of maximum variance remaining in the data, given
• the variance already accounted for in the preceding components.
• As such, the first principal component captures the variance of the
• data to the greatest degree possible on a single axis. The next principal
• components then each capture the maximum variance among
• the remaining orthogonal directions. Thus, the principal axes are
• ordered by the amount of data variance that they capture.
•   McGregor A.J. and Braun H-W,
    ‖Automated Event Detection for Active
    Measurement Systems‖, Proceedings of
    PAM2001, Amsterdam, Netherlands,
    April 2001
    Preparation of Canonical dataset
•    Fetch data from monitoring nodes
     (Pathchirp, Thrulay, Ping, Traceroute) -
     Adnan, Ali
•    Study/Analyze Data - Adnan, Ali
•    Run Run-period - Adnan
•    Analyze Alerts - Adnan, Ali
•    Design a format - Adnan, Ali
•    Rearrange data according to new format -
     Adnan, Ali
                       Topology to be used

    Monitoring             Monitoring   Monitoring   Monitoring
      Node                   Node         Node         Node

Monitored Nodes = 21
Data being collected using this
 •   Pinger: Round trip Time (Min, Max, Avg)
 •   Pathchirp: Throughput (Avg, Max, Min, Stdev)
 •   Thrulay: Throughput (Avg, Max, Min, Stdev)
     RTT (Avg, Max, Min, Stdev)
 •   trace route summaries
 •   Host Monitoring data (e.g. data from LISA,
     Nagios, Ganglia) --- candidate
   Format of canonical data set
• Not sure, needs some more investigation of
  available data from multiple sources.
• Data is fetched from 1st Jan, 06 to 28th Feb

Shared By: