Docstoc

Network reliability and QoS measurements

Document Sample
Network reliability and QoS measurements Powered By Docstoc
					Network reliability and
QoS measurements




      Henning Schulzrinne
     University of Cincinnati
          March 2003
Overview
   The IRT Lab at Columbia University
   Application: Internet multimedia
   Quality of service =
       scheduling and admission control 
        thousands of papers…
       network signaling
       end-system performance  embedded end
        systems + PCs
       QoS  network application reliability
Laboratory overview
   11 PhDs
       3 at IBM, Lucent, Telcordia
   5 MS
   Visitors (Ericsson, Fujitsu, Mitsubishi,
    Nokia, U. Coimbra, U. Oulu, …)
   China, Finland, Greece, India, Japan,
    Portugal, Spain, Sweden, US, Taiwan
IRT topics
   Internet multimedia protocols and systems
        Internet telephony and radio (SIP, RTSP, RTP)
        Content distribution networks
        Internet-scale event distribution
        Service creation
        Ubiquitous, context-aware computing and communications
   Protocols and services for wireless ad-hoc networks
   Service discovery
   Quality of service
        Pricing for adaptive services
        Scalable resource reservation protocols (CASP, BGRP, YESSIR)
        End-system evaluation
        Network measurements
        Service reliability
Internet multimedia
   Internet telephony = replacing the existing
    circuit-switched system with Internet-based
    systems
       Signaling and services
       Quality of service philosophies:
            end systems adapt and compensate
                  end systems use FEC, LBR, PLC
                  jitter  playout delay compensation
            network offers guarantees  difficult architecturally,
             business, not necessarily technically
            we pursue both
Assessment of VoIP Service
Availability




        Wenyu Jiang
     Henning Schulzrinne
     IRT Lab, Dept. of Computer Science
            Columbia University
Overview
(on-going work, preliminary results, still
  looking for measurement sites, …)
 Service availability

 Measurement setup

 Measurement results
     call success probability
     overall network loss
     network outages
     outage induced call abortion probability
Service availability
   Users do not care about QoS
   at least not about packet loss, jitter, delay
       FEC and PLC can deal with losses up to 5-8%
   rather, it’s service availability  how likely is it that I
    can place a call and not get interrupted?
   availability = MTBF / (MTBF + MTTR)
       MTBF = mean time between failures
       MTTR = mean time to repair
   availability = successful calls / first call attempts
       equipment availability: 99.999% (“5 nines”)  5
        minutes/year
       AT&T: 99.98% availability (1997)
       IP frame relay SLA: 99.9%
       UK mobile phone survey: 97.1-98.8%
Availability – PSTN metrics
   PSTN metrics (Worldbank study):
       fault rate
            “should be less than 0.2 per main line”
       fault clearance (~ MTTR)
            “next business day”
       call completion rate
            during network busy hour
            “varies from about 60% - 75%”
       dial tone delay
Example PSTN statistics




                      Source: Worldbank
Measurement setup
Node name Location                                   Connectivity   Network
columbia         Columbia University, NY             >= OC3         I2
wustl            Washington U., St. Louis                           I2
unm              Univ. of New Mexico                                I2
epfl             EPFL, Lausanne, CH                                 I2+
hut              Helsinki University of Technology                  I2+
rr               NYC                                 cable modem    ISP
rrqueens         Queens, NY                          cable modem    ISP
njcable          New Jersey                          cable modem    ISP
newport          New Jersey                          ADSL           ISP
sanjose          San Jose, California                cable modem    ISP
suna             Kitakyushu, Japan                   3 Mb/s         ISP
sh               Shanghai, China                     cable modem    ISP
Shanghaihome     Shanghai, China                     cable modem    ISP
Shanghaioffice   Shanghai, China                     ADSL           ISP
Measurement setup
   Active measurements
   call duration 3 or 7 minutes
   UDP packets:
       36 bytes alternating with 72 bytes (FEC)
       40 ms spacing
   September 10 to December 6, 2002
   13,500 call hours
Call success probability
   62,027 calls       All             99.53%

    succeeded, 292     Internet2       99.52%
    failed  99.53%
                       Internet2+      99.56%
    availability
                       Commercial      99.51%
   roughly constant
    across I2, I2+,    Domestic (US)   99.45%
    commercial ISPs    International   99.58%

                       Domestic        99.39%
                       commercial
                       International   99.59%
                       commercial
Overall network loss
   PSTN: once connected,          loss   0%     5%      10%     20%

    call usually of good           All    82.3   97.48   99.16   99.75
    quality                        ISP    78.6   96.72   99.04   99.74
       exception: mobile phones
                                   I2     97.7   99.67   99.77   99.79
   compute periods of time
                                   I2+    86.8   98.41   99.32   99.76
    below loss threshold
                                   US     83.6   96.95   99.27   99.79
       5% causes degradation
        for many codecs            Int.   81.7   97.73   99.11   99.73
       others acceptable till     US     73.6   95.03   98.92   99.79
        20%                        ISP
                                   Int.   81.2   97.60   99.10   99.71
                                   ISP
Network Outages
   sustained packet losses
       arbitrarily defined at 8 packets
       far beyond any recoverable loss (FEC,
        interpolation)
   23% outages
   make up significant part of 0.25%
    unavailability
   symmetric: AB  BA
   spatially correlated: AB   AX
   not correlated across networks (e.g., I2 and
    commercial)
Network outages

                        1                                                                1
                                        US Domestic paths                                                        all paths
                                       International paths                                                      Internet2
Complementary CDF




                                                                 Complementary CDF
                                                                                       0.1
                      0.1
                                                                                      0.01
                     0.01
                                                                                     0.001
                    0.001
                                                                         0.0001

        0.0001                                                                       1e-05
                            0   50 100 150 200 250 300 350 400                               0   50 100 150 200 250 300 350 400
                                    outage duration (sec)                                            outage duration (sec)
Network outages
       no. of     %         duration    duration   total (all,   outages >
       outages    symmetric (mean)      (median)   h:m)          1000
                                                                 packets
all      10,753       30%         145         25       17:20        10:58

I2          819     14.5%         360         25         3:17        2:33

I2+       2,708       10%         259         26         7:47        5:37

ISP       8,045       37%         107         24         9:33        4:58

US        1,777       18%         269         20         5:18        3:53

Int.      8,976       33%         121         26       12:02         6:42
Outage-induced call abortion
proability
   Long interruption  user likely   all        1.53%
    to abandon call                   I2         1.16%
   from E.855 survey: P[holding]     I2+        1.15%
    = e-t/17.26 (t in seconds)        ISP        1.82%
    half the users will abandon     US         0.99%
    call after 12s                    Int.       1.78%
   2,566 have at least one           US ISP     0.86%
    outage                            Int. ISP   2.30%
   946 of 2,566 expected to be
    dropped  1.53% of all calls
Conclusion
   Availability in space is (mostly) solved 
    availability in time restricts usability for new
    applications
   initial investigation into service availability for
    VoIP
   need to define metrics for, say, web access
   unify packet loss and “no Internet dial tone’’
   far less than “5 nines”
   working on identifying fault sources and
    locations
   looking for additional measurement sites
Quality and Performance
Evaluation of VoIP End-points




         Wenyu Jiang
       Henning Schulzrinne
        Columbia University
Motivations
   The quality of VoIP depends on both
    the network and the end-points
   Extensive QoS literature on network
    performance, e.g., IntServ, DiffServ
       Focus is on limiting network loss & delay
   Little is known about the behavior of
    VoIP end-points
Performance Metrics for VoIP
End-points
   Mouth-to-ear (M2E) delay
       compare network delay
   Clock skew
       whether it causes any voice glitches
       amount of clock drift
   Silence suppression behavior
       whether the voice is clipped (depends much on hangover
        time)
       robustness to non-speech input, e.g., music
   Robustness to packet loss
       voice quality under packet loss
   Acoustic echo cancellation
   Jitter adaptation: delay > max(jitter)?
Measurement Approach
     Capture both original and output audio
     Use adelay program to measure M2E delay
          auto correlation
          no clock synchronization needed
     Assume a LAN environment by default
          Serve as a baseline of reference, or lower bound
                                                stereo      PC
                                                signal
                                                           line in
    notebook
           original
           audio          coupler                          coupler
speaker                              IP phone                        IP phone
           (mouth)        In                      output   In
                                                  audio
                          Out                              Out
                                         ethernet (ear)                  ethernet


                                                LAN
VoIP End-points Tested
   Hardware End-points
       Cisco, 3Com and Pingtel IP phones
       Mediatrix 1-line SIP/PSTN Gateway
   Software clients
       Microsoft Messenger, NetMeeting (Win2K, WinXP)
       Net2Phone (NT, Win2K, Win98)
       Sipc/RAT (Solaris, Ultra-10)
            Robust Audio Tool (RAT) from UCL as media client
   Operating parameters:
       In most cases, codec is G.711 -law, packet
        interval is 20ms
IP Phone Hardware




•   DSP for audio coding, AEC
•   C for protocol processing
•   embedded OS (Linux, Windriver, …) with web browser
•   Ethernet interface, maybe with hub
Example M2E Delay Plot
   3Com to Cisco, shown with gaps > 1sec
   Delay adjustments correlate with gaps,
    despite 3Com phone has no silence
    suppression 60 experiment 1-1
                   experiment 1-2
                                     silence gaps
                          55
                M2E delay (ms)

                          50

                          45

                          40

                          35
                                 0    50   100    150 200     250   300   350
                                                 time (sec)
       Visual Illustration of M2E
       Delay Drop, Snapshot #1
   3Com to Cisco
    1-1 case
   Left/upper
    channel is
    original audio
   Highlighted
    section shows
    M2E delay
    (59ms)
    Snapshot #2
   M2E delay
    drops to
    49ms, at
    time of
    4:16
      Snapshot #3
   Presence of
    a gap during
    the delay
    change
Effect of RTP Marker Bits on
Delay Adjustments
   Cisco phone sends M-bits, whereas Pingtel
    phone does not
       Presence of M-bits results in more adjustments
                100
                                        Cisco to 3Com 1-1
                          90          Pingtel to 3Com 2-1
                                   new talkspurt (M-bit=1)
                          80
         M2E delay (ms)




                          70
                          60
                          50
                          40
                          30
                          20
                               0      50    100     150 200    250   300
                                                  time (sec)
Sender Characteristics
   Certain senders may introduce delay
    spikes, despite operating on a LAN
            300
                                     Mediatrix to 3Com 3-1
                                     Mediatrix to Cisco 1-1
            250                     Mediatrix to Pingtel 1-1
     M2E delay (ms)




            200

            150

            100

                      50
                           0   50    100     150 200     250   300
                                           time (sec)
Average M2E Delays for IP
phones and sipc
   Averaging the M2E delay allows more compact
    presentation of end-point behaviors
   Receiver (especially RAT) plays an important role in
    M2E delay
                                    250
           Average M2E delay (ms)




                                    200


                                    150

                                    100


                                    50

                                     0
                                          3Com   Cisco   Mediatrix   Pingtel    RAT
                                                         Receiver

                                                 Sender: 3Com   Sender: Cisco
Average M2E Delays for PC
Software Clients
   Messenger 2000 wins the day
       Its delay as receiver (68ms) is even lower than Messenger
        XP, on the same hardware
       It also results in slightly lower delay as sender
   NetMeeting is a lot worse (> 400ms)
   Messenger’s delay performance is similar to or better
    than a GSM mobile phone.

    A               B                      AB      BA
    MgrXP (pc)      MgrXP (notebook)       109ms    120ms
    Mgr2K (pc)                             96.8ms 68.5ms
    NM2K (pc)       NM2K (notebook)        401ms    421ms
    Mobile (GSM) PSTN (local number)       115ms    109ms
    Delay Behaviors for PC Clients,
    contd.
       Net2Phone’s delay is also high
           ~200-500ms
           V1.5 reduces PC->PSTN delay
           PC-to-PC calls have fairly high delays
A                         B                AB       BA
N2P v1.1 NT P-2 (pc2)     PSTN             292ms     372ms
N2P v1.5 NT P-2 (pc2)     (local number)   201ms     373ms
N2P v1.5 W2K K7 (pc)                       196ms     401ms
N2P v1.5 W2K K7 (pc)      N2P v1.5 W98 P-3 525ms     350ms
                          (notebook2)
       Effect of Clock Skew: Cisco to
       3Com, Experiment 1-1
   Symptom of
    playout buffer
    underflow
   Waveforms
    are dropped
   Occurred at
    point of delay
    adjustment
   Bugs in
    software?
Clock Skew Rates
   Mostly symmetric between two devices
   RAT (Sun Ultra-10) has unusually high drift rates, > 300
    ppm (parts per million)
        High clock skews confirmed in many (but not all) PCs and
         workstations
Drift Rates 3Com          Cisco      Mediatrix Pingtel     RAT
(in ppm)
3Com           -8.3       55.4       43.3       41.2       -333

Cisco          -55.2      -0.4       -11.8      -12.1      -381

Mediatrix      -43.1      11.7       1.3        -0.8

Pingtel        -40.9      12.7       2.8        -3.5       -380

RAT            343        403                   376        12.3
Drift Rates for PC Clients
   Drift Rates not always symmetric!
       But appears to be consistent between Messenger
        2K/XP and Net2Phone on the same PC
       Existence of 2 clocking circuits in sound card?
A                     B               AB    BA
MgrXP (pc)            MgrXP           172    87.7
Mgr2K (pc)            (notebook)      165    85.6
NM2K (pc)             NM2K (notebook) ?      -33?
Net2Phone NT (pc2)    PSTN            290    -287
Net2Phone 2K (pc)                     166    82
Mobile (GSM)                          0      0
Packet Loss Concealment
   Common PLC methods
       Silence substitution (worst)
       Packet repetition, with optional fading
       Extrapolation (one-sided)
       Interpolation (two-sided), best quality
   Use deterministic bursty loss pattern
       3/100 means 3 consecutive losses out of every
        100 packets
       Easier to locate packet losses
       Tested 1/100, 3/100, 1/20, 5/100, etc.
PLC Behaviors
   Loss tolerance (at 20ms interval)
       By measuring loss-induced gaps in output audio
       3Com and Pingtel phones: 2 packet losses
       Cisco phone: 3 packet losses
   Level of audio distortion by packet loss
       Inaudible at 1/100 for all 3 phones
       Inaudible at 3/100 and 1/20 for Cisco phone, yet
        audible to very audible for the other two.
   Cisco phone is the most robust
       Probably uses interpolation
Effect of PLC on Delay
   No affirmative effect on M2E delay
                             E.g., sipc to Pingtel
                     80
                                  0/100
    mouth-to-ear delay (ms)




                     75           3/100
                                   1/20
                     70

                     65

                     60

                     55

                     50
                              0      10    20       30     40   50   60
                                                time (sec)
Silence Suppression
   Why?
       Saves bandwidth
       May reduce processing power (e.g., in
        conferencing mixer)
       Facilitates per-talkspurt delay adjustment
   Key parameters
       Silence detection threshold
       Hangover time, to delay silence suppression and
        avoid end clipping of speech
            Usually 200ms is long enough [Brady ’68]
Hangover Time
   Measured by feeding ON-OFF
    waveforms and monitor RTP packets
   Cisco phone’s is the longest (2.3-2.36
    sec), then Messenger (1.06-1.08 sec),
    then NetMeeting (0.56-0.58 sec)
   A long hangover time is not necessarily
    bad, as it reduces voice clipping
       Indeed, no unnatural gaps are found
       Does waste a bit more bandwidth
Robustness of Silence
Detectors to Music
   On-hold music is often used in
    customer support centers
       Need to ensure music is played without
        any interruption due to silence suppression
   Tested with a 2.5-min long soundtrack
   Messenger starts to generate many
    unwanted gaps at input level of -24dB
   Cisco phone is more robust, but still
    fails from input level of -41.4dB
Acoustic Echo Cancellation
   Important for hands-free/conferencing
    (business) applications
   Primary metric: Echo Return Loss (ERL)
        Measured by LAN-sniffing RTP packets
   Most IP phones support AEC
        ERL depends slightly on input level and
         speaker-phone volume
        Usually > 40 dB (good AEC performance)
    IP Phone 3Com Cisco   ipDialog Pingtel   Snom-100
    ERL (dB) 40-45 53-   49-54    33-42     -5 (no AEC)
     M2E Delay under Jitter
   Delay properties under the LAN environment
    serves as a baseline of reference
   When operating over the Internet:
       Fixed portion of delay adds to M2E delay as a constant
       Variable portion (jitter) has a more complex effect
   Initial test                                180                        High jitter (uplink)


                                    mouth-to-ear delay (ms)
                                                                         Low jitter (downlink)
                                                170
       Used typical cable modem                160
        delay traces                            150
                                                140
       Tested RAT to Cisco                     130
       No audible distortion due               120
        to late loss                            110
                                                100
       Added delay is normal                    90
                                                              0   20 40 60 80 100 120 140 160 180
                                                                          time (sec)
           M2E Delay under Jitter, contd.
                  Cisco phone generally within expectation
                          Can follow network delay change timely
                               Takes longer (10-20sec) to adapt to decreasing delay
                          Does not overshoot playout delay
                  More end-points to be examined
                 160                                                       800
                                            Trace                                                            Trace
                 140                         test1                         700                                test1
                                             test2                                                            test2
                 120                                                       600
M2E delay (ms)




                                                          M2E delay (ms)
                 100                                                       500
                  80                                                       400
                  60                                                       300
                  40                                                       200
                  20                                                       100
                   0                                                         0
                       0 10 20 30 40 50 60 70 80 90 100                          0   10   20   30 40 50      60       70   80
                                  time (sec)                                                    time (sec)

                           Artificial Trace                                Real Trace with Spikes
Conclusions
   Average M2E Adelay:
       Low (mostly < 80ms) for hardware IP phones
       Software clients: lowest for Messenger 2000 (68.5ms)
       Application (receiver) most vital in determining delay
            Poor implementation easily undoes good network QoS
   Clock skew high on SW clients (RAT, Net2Phone)
   Packet loss concealment quality
       Acceptable in all 3 IP phones tested, w. Cisco more robust
   Silence detector behavior
       Long hangover time, works well for speech input
       But may falsely predict music as silence
   Acoustic Echo Cancellation: good on most IP phones
   Playout delay behavior: good based on initial tests
Future Work
   Further tests with more end-points on
    how jitter influences M2E delay
   Measure the sensitivity (threshold) of
    various silence detectors
   Investigate the non-symmetric clock
    drift phenomena
   Additional experiments as more brands
    of VoIP end-points become available

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:9/7/2012
language:English
pages:48