Docstoc

diagnosis

Document Sample
diagnosis Powered By Docstoc
					       Wireless Proliferation




• Sharp increase in deployment
  – Enterprises, airports, malls, coffee shops, homes…
  – 4.5 million APs sold in 3rd quarter of 2004!




                                                         1
 Wireless Network Management
• Residential wireless networks
  – Unplanned, unmanaged  chaotic
    deployments
  – Independent users set up APs with default
    configuration


• Enterprise wireless networks
  – Need to automate and scale to large user
    bases

                                                2
Self Management in Chaotic
    Wireless Networks
         Aditya Akella, Glenn Judd,
       Srini Seshan, Peter Steenkiste
         Carnegie Mellon University



                                        3
 Chaotic Wireless Networks
• Unplanned:
  –   Independent users set up APs
  –   Spontaneous
  –   Variable densities
  –   Other wireless devices

• Unmanaged:
  – Configuring is a pain
  – ESSID, channel, placement, power
  – Use default configuration

 “Chaotic” Deployments                4
Implications of Dense Chaotic Networks

   • Benefits
     – Great for ubiquitous
       connectivity, new
       applications


   • Challenges
     – Serious contention
     – Poor performance
     – Access control,
       security

                                    5
                  Outline
• Quantify deployment densities and other
  characteristics
• Impact on end-user performance
• Initial work on mitigating negative effects
• Conclusion




                                                6
Characterizing Current Deployments

   Datasets
   • Place Lab: 28,000 APs
      – MAC, ESSID, GPS
      – Selected US cities
      – www.placelab.org
   • Wifimaps: 300,000 APs
      – MAC, ESSID, Channel, GPS (derived)
      – wifimaps.com
   • Pittsburgh Wardrive: 667 APs
      – MAC, ESSID, Channel, Supported Rates, GPS




                                                    7
AP Stats, Degrees: Placelab
          (Placelab: 28000 APs, MAC, ESSID, GPS)

             #APs   Max.
                    degree      50 m

Portland     8683     54


San Diego    7934     76

San
             3037     85
Francisco
                                   1    2     1
 Boston      2551     39


                                                   8
Degree Distribution: Place Lab




                                 9
            Unmanaged Devices
         WifiMaps.com
(300,000 APs, MAC, ESSID, Channel)

        Channel   %
                              • Most users don’t change
           6      41.2          default channel
                              • Channel selection must
           2      12.3
                                be automated
           11     11.5


           3      3.6


                                                          10
    Opportunities for Change
        Wardrive
     (667 APs, MAC,
     ESSID, Channel,
       Rates, GPS)

Linksys (Cisco)        33.5
Aironet (Cisco)        12.2   • Major vendors
Agere                   9.6     dominate
D-Link                  4.9   • Incentive to reduce
Apple                   4.6     “vendor self
Netgear                 4.4
                                interference”
ANI Communications      4.3
Delta Networks            3
Lucent                  2.5
Acer                    2.3
Others                 16.7
                                                      11
                  Outline
• Quantify deployment densities and other
  characteristics
• Impact on end-user performance
• Initial work on mitigating negative effects
• Conclusion




                                                12
     Impact on Performance
   • Glomosim trace-driven simulations
• “D” clients per AP     Map Showing Portion of Pittsburgh Data



• Each client runs
  HTTP/FTP
  workloads

• Vary stretch “s”
   scaling factor for
  inter-AP distances
                                                           13
Impact on HTTP Performance
   3 clients per AP. 2 clients run FTP sessions.
                All others run HTTP.
                    300 seconds

             Degradation                        5s sleep time




                                               20s sleep time




                                                        14
    Max interference                No interference
What can we do to reduce
     interference?




                           15
       Optimal Channel Allocation vs.
Optimal Channel Allocation + Tx Power Control

         Channel Only      Channel + Tx Power Control




                                                    16
Incentives for Self-management
• Clear incentives for automatically selecting
  different channels
  – Disputes can arise when configured manually
• Selfish users have no incentive to reduce
  transmit power
• Power control implemented by vendors
  – Vendors want dense deployments to work
• Regulatory mandate could provide further
  incentive
  – e.g. higher power limits for devices that implement
    intelligent power control

                                                          17
                  Outline
• Quantify deployment densities and other
  characteristics
• Impact on end-user performance
• Initial work on mitigating negative effects
• Conclusion




                                                18
Power and Rate Selection Algorithms
 • Rate Selection
    – Auto Rate Fallback: ARF
    – Estimated Rate Fallback: ERF
 • Joint Power and Rate Selection
    – Power Auto Rate Fallback: PARF
       • TxPower is reduced after a given number of successful pkts until
         MinPower is reached or a threshold number of failures is reached
       • If failure continues, txPower is increased
    – Power Estimated Rate Fallback: PERF
       • TxPower is reduced until estimatedSNR =
         decisionThreshold+powerMargin
    – Conservative Algorithms
       • Always attempt to achieve highest possible modulation rate
 • Implementation
    – Modified HostAP Prism 2.5 driver
       • Can’t control power on control and management frames         19
                                     Lab Interference Test
                                                                             Victim Pair                 Aggressor Pair


                                                                         TCP                                Rate limited file
                                                                         benchmark                          transfer
                                                        79 dB pathloss                                                          95 dB pathloss
                                                                                       110 dB pathloss


                                                                                           Topology
                                                  Results
                     4

                    3.5

                     3
Throughput (Mbps)




                    2.5

                     2

                    1.5

                     1

                    0.5

                     0                                                                                                              20
                          No Interference   ARF             ERF           PERF
                    Conclusion
• Significant densities of APs in many metro areas
   – Many APs not managed

   – High densities could seriously affect performance

• Static channel allocation alone does not solve the
  problem
• Transmit power control effective at reducing impact

• What other control knobs can we use to reduce
  interference?


                                                         21
Architecture and Techniques for
Diagnosing Faults in IEEE 802.11
   Infrastructure Networks

      Atul Adya, Victor Bahl,
     Ranveer Chandra, Lili Qiu


       Microsoft Research
                                 22
    Wireless Network Woes
• How many times have you heard users say:
  – “My machine says: wireless connection unavailable”
  – “Why can’t my machine authenticate?”
  – “My performance on wireless really sucks”

IT Dept: Several hundred complaints per month

• You may have heard network admins say:
  – “I wonder if some one has sneakily installed an
    unauthorized access point”
  – “Do we have complete coverage in all the
    buildings?”
                                                      23
Enterprise Wireless Problems

Main problems observed by IT department:
  – Connectivity: RF Holes
  – Performance: Unexplained delay
  – Security: Rogue APs
  – Authentication: 802.1x protocol issues




                                             24
             Existing Products
• Provide management/diagnostic functions
  – E.g., AirWave, CA’s NSM, Air Defense, Air Magnet
  – Proprietary and no publicly available technology
    details


• Insufficient functionality:
  –   No support for disconnected clients
  –   Weak root-cause analysis (raw data, mostly)
  –   Diagnosis only from the AP perspective
  –   Sometimes need expensive sensor deployment

                                                       25
         Our Contributions
• Flexible client-based framework for
  detection and diagnosis of wireless faults

• Client Conduit: communication for disconnected
  clients via nearby connected clients

• Diagnostic mechanisms
  – Approximate location of disconnected clients
  – Rogue AP detection
  – Performance problem analysis
                                                   26
                   Outline
• Diagnostics architecture and implementation
• Client Conduit: diagnosing disconnected
  clients
• Diagnostic mechanisms
  – Locating disconnected clients
  – Detecting unauthorized APs
  – Analyzing performance problems
• Summary and Future Work

                                                27
               Assumptions
• Can install diagnostic software on clients
  – APs are typically closed platforms
  – Can provide improved diagnosis with modified APs

• Nearby clients available for fault diagnosis
  – At least 13 active clients on our floor (approx.
    2500 sq. feet)

• Network admins maintain AP Location
  Database


                                                       28
     Client-Centric Architecture
Diagnostic            Authentication/User Info
Server (DS)



                                                 RADIUS    Kerberos

              Diagnostic AP
              Module (DAP)            Client              Legacy AP
                                     Conduit



                              Disconnected          Diagnostic Client
                                  Client              Module (DC)

                                                                 29
      Diagnostic Architecture
            Properties
• Exploits client-view of network (not just APs)

• Supports proactive and reactive mechanisms

• Scalable

• Secure: doesn’t introduce new security
  problems

                                               30
            Client Implementation
User      Diagnostics Daemon           • Prototype system on
Mode
                                         Windows
Kernel
Mode           TCP/IP

                                       • Native WiFi: Extensibility
                                         framework for 802.11
          Diagnostics IM Module
          Native WiFi IM Driver
                                         [Microsoft Networking
                                         2003]
NDIS




         Diagnostics Miniport Module   • Daemon: most of
         Native WiFi Miniport Driver
                                         functionality and main
                                         control flow
                Native WiFi NIC
                                       • IM driver: limited changes
                                                                         31
                                         – Packet capture & monitoring
                   Outline
• Diagnostics architecture and implementation
• Client Conduit: diagnosing disconnected
  clients
• Diagnostic mechanisms
  – Locating disconnected clients
  – Detecting unauthorized APs
  – Analyzing performance problems
• Summary and Future Work

                                                32
What are causes for
  disconnection?




                      33
     Cause of Disconnection

• Lack of coverage
  – In an RF Hole
  – Just outside AP range
• Authentication issues, e.g., stale certificates
• Protocol problems, e.g., no DHCP address

Can we communicate via nearby connected
  clients?

                                                    34
  Communication via Nearby Clients
         Adhoc Mode

                                                         Access Point
Disconnected Client
     “Grumpy” Cannot be on 2 networks.
                    Packet dropped!    Connected Client “Happy”
                                           (Infrastructure)
 Possible (unsatisfactory) solutions:
 • Multiple radios: extra radio for diagnostics
 • MultiNet [InfoCom04]: Multiplex “Happy”
   between Infrastructure/Adhoc modes
 Penalizing normal case behavior for rare scenario
                                                                  35
   Our Solution: Client Conduit
Becomes an Access Point
Stops beaconing
   (Starts beaconing)
                                                          Access Point
                          SOS Ack
                        (Probe Req)
                   Ad hoc network
                    via MultiNet       Connected Client
  Disconnected
                  SOS (Beacon)            “Happy”
     Client
    “Grumpy”
“Not-so-Grumpy”                        Disconnected
                                      station detected
         Help disconnected wireless clients with:
         • Online diagnosis
         • Certificate bootstrapping
                                                                  36
     Client Conduit Features

• Incurs no extra overhead for connected
  clients
  – Use existing 802.11 messages: beacons & probes


• Works with legacy APs


• Includes security mechanisms to avoid abuses


                                                     37
                     Client Conduit Performance
                 8
                       6.7 seconds
                                                       Adhoc-mode association
Time (seconds)




                 6                                     Become Station
                                                       Get Ack
                 4                                     Set Beacon Period
                                       2.7 seconds
                                                       Set SSID
                 2                                     Become AP
                                                       Set channel
                 0
                                     No mode changes

• Time for “Grumpy” to get connected < 7 seconds
    – Reduced time can enable transparent recovery
• Bandwidth available for diagnosis > 400 Kbps
  (when “Happy” donates only 20% of time)
                                                                           38
Security issues?




                   39
    Client Conduit Security and
              Attacks
• Performance degradation of helping
  disconnected clients
  – Connection setup: low overhead
  – Data transfer potential issues
     • Enter multinet mode unnecessarily
     • Waste bandwidth for bad client
  – Countermeasure
     • Limit how often it enters multinet mode
     • Help only authenticated clients
     • If clients have trouble authenticated, relay disconnected
       client’s latest diagnosis logs to DS


                                                               40
    Client Conduit Security and
              Attacks
• Preventing disguised rogue APs
  – If an AP that beacons SOS SSID and
    sends/receives data packets  flag as a potential
    Rogue device
  – Disconnected client continues to beacon after
    receiving Probe Response  flag as a potential
    Rogue device




                                                    41
                   Outline
• Diagnostics architecture and implementation
• Client Conduit: diagnosing disconnected
  clients
• Diagnostic mechanisms
  – Locating disconnected clients
  – Detecting unauthorized APs
  – Analyzing performance problems
• Summary and Future Work

                                                42
 Locating Disconnected Clients
Goal: Approximately locate to determine RF Holes
Solution: Use nearby connected clients

• “Grumpy” starts beaconing
• Nearby clients report signal strength to server
• Diagnostic server uses RADAR [InfoCom00]
  twice
  – Locates connected clients
  – Locates “Grumpy” with connected clients as “anchor
    points”
• Location error: 10 – 15 meters                     43
                   Outline
• Diagnostics architecture and implementation
• Client Conduit: diagnosing disconnected
  clients
• Diagnostic mechanisms
  – Locating disconnected clients
  – Detecting unauthorized APs
  – Analyzing performance problems
• Summary and Future Work

                                                44
         Rogue AP Problems
Why problematic?
• Allow network access to unauthorized users
• Hurt performance: interfere with existing
  APs

Detection goals:
• Common case: mistakes by employees
• Detect unauthorized IEEE 802.11 APs
   – Not considering non-compliant APs
Solution: Use clients for monitoring nearby APs
                                               45
        Rogue AP Detection
• Clients monitor nearby APs. Send to server:
  – MAC address, Channel, SSID, RSSI (for location)
• Server checks 4-tuple in AP Location Database




                                                      46
         Rogue AP Detection
             Overheads
• Obtaining AP Information at clients:
  – Same/overlapping channel as client: from Beacons
  – AP on non-overlapping channel:
     • Active Scan periodically
     • AP information from Probe Response




                                                       47
 Rogue AP Detection Overheads
            (Cont.)
• Bandwidth usage < 0.2 Kbps per client

• Can active scans be performed without
  disruption?
  – Sufficient idleness available (2.5 – 3 min.)
  – Simple threshold-based prediction:
    Active scan completed in idle period for 95% cases




                                                     48
                   Outline
• Diagnostics architecture and implementation
• Client Conduit: diagnosing disconnected
  clients
• Diagnostic mechanisms
  – Locating disconnected clients
  – Detecting unauthorized APs
  – Analyzing performance problems
• Summary and Future Work

                                                49
Analyzing Performance Problems
• Detect performance problems
  – Why not use throughput for detection?
  – Passively measure delay and loss rates at
    DC
  – How to detect high e2e delay?
  – How to detect high e2e loss rates?




                                                50
Analyzing Performance Problems
• Detect performance problems
  – Don’t use throughput for detection. Why?
  – Passively measure delay and loss rates at DC
  – High delay: a significant number of packets
    experience delay > 250 msec or higher than
    2*current TCP
     • Sender: measure RTT directly
     • Receiver: infer RTT by grouping packets into flights
       and matching with slow start and congestion avoidance
       patterns
  – High loss rates: > 5%
     • Sender: # retransmissions/#transmissions
     • Receiver: # out of sequence pkts/# transmissions


                                                               51
Analyzing Performance Problems
• Isolate problem between wired and wireless
   – DAP measures TCP delays in wired and sends to DC
• Diagnose a wireless problem: Estimate Delay using
  Eavesdropping Neighbors (EDEN)
   – DAP and DC exchange snoop requests/responses
   – DAP initiated measures client delay
   – DC initiated measures AP delay




                                                        52
   EDEN Accuracy




Very low error delay estimate (< 5%)
Estimates improve with increasing delay
                                          53
                   Summary
•   Diagnostics critical for 802.11 deployments
•   Client-centric architecture
•   Client Conduit
•   Diagnosis using nearby clients
    – Locate disconnected clients
    – Detect rogue APs
    – Analyze performance problems
• Prototype in Windows using Native WiFi
    – Mechanisms are effective with low overheads


                                                    54
Jigsaw: Solving the Puzzle of
 Enterprise 802.11 Analysis
               Yu-Chung Cheng
   John Bellardo, Peter Benko, Alex C. Snoeren,
         Geoff Voelker, Stefan Savage




                                                  55
        How to monitor 802.11?




Measurement             Limitations
AP traces               Only packets that AP sees
1 passive sniffer       Limited coverage
N passive sniffers in   Limited frequency (roaming, broadband
1 channel               interference, AP channel assignments)
N passive sniffers of   Need synchronized traces
all channels                                                    56
                        Jigsaw
• Measure real large wireless networks
  – Collect every possible information
     • PHY/Link/IP/TCP/App layer trace
     • Collect every single wireless packet
  – Need many sniffers for 100% coverage


• Provide global view of wireless networks across
  time, locations, channels, and protocol layers




                                                57
    New CSE building at UCSD
• 150k square feet
  – 4 floors
• >500 occupants
  – 150 faculty/staff
  – 350 students
• Building-wide WiFi
  – 39 access points
  – 802.11b/g
    • Channel 1, 6, 11
  – 10 - 90 active
    clients anytime
  – Daily traffic ~5 GB        58
UCSD passive monitor
      system
• Overlays existing WiFi
  network
  – Series of passive sniffers
  – Blanket deployment over 4
    floors
• 39 sensor pods (156
  radios)
  – 4 radios per pod, cover all
    channels in use
  – Captures all 802.11 activities
    • Including CRC/PHY events
  – Stream back over wired
    network to a centralized
    storage
                                     59
 Jigsaw design
        Traces
        synchronization
        and unification



       L2 state
       reconstruction




TCP flow
reconstruction
                          60
               Synchronization
•   Create a virtual global                            TSF diff of two sniffers
    clock
    – To keep unification working
    – Critical evidence for analysis




                                       TSF diff (us)
      • If A and B are transmitting
        at the same time they could
        interfere
      • If A starts transmitting
        after B has started then A
        can’t hear B

•   Require fine time-scales
    (10-50us)
    – NTP is >100 usec accuracy                               Time (s)
    – 802.11 HW clocks (TSF) have
      100PPM stability
                                                                            61
     Traces synchronization and
             unification
•   Sniffers label packets w/ local timestamp (TSF)
•   Need a global clock
•   Estimate the offset between TSF and the global clock for
    each sniffer




                                                               62
       Trace unification (ideal)




Time




                                   63
       Trace unification (reality)
                                 Jigsaw
                                 unified
                                  trace


                               JFrame 1

                               JFrame 2
Time




                               JFrame 3




                               JFrame 4
                               JFrame 5


                                           64
Challenge: sync at large-scale
                 1           2          3                    4
To
                     ∆t1
                                         ∆t2

• How to bootstrap?
     – Goal: estimate the offset between TSF and the global clock for
       each sniffer
     – Time reference from one sniffer to the other
• Sync across channels
     – Dual radios on same sniffer slaved to same clock
• Manage TSF clock skews
     – Continuously re-adjust offsets when unifying frames
                                                                        65
Can all frames be used for
     synchronization?




                             66
         Frame Unification
• Use a search window to identify candidate
  frames for unification among radio queues

• Unify identical frames into a jframe
  timestamped with the median offset frame

• Adjust the time offsets of the radio queues
  to account for skew, and use the search
  window for the next set of frames

                                                67
             Jigsaw in action
• Jigsaw unifies 156       Starts               Jan 24,2006
                                                (Tuesday)
  traces into one global
                           Duration             24 hr
  trace
                           Total APs            107 (39 CSE)
• Covers 99% of AP
  frames, 96% of client    CSE Clients          1026

  frames                   Active CSE clients   10 - 90
                           anytime
                           Total Events         2,700M
                           PHY/CRC Errors       48%
                           Valid Frames         52%
                           JFrames              530M
                           Events per Jframe    2.97


                                                               68
Jigsaw syncs 99% frames < 20us
• Measure sync. quality by
  max dispersion per Jframe




• 20 us is important
  threshold
  –   802.11 back-off time is 20 us
  –   802.11 inter frame time is 50
      us
  –   Sufficient to infer many
      802.11 events




                                      69
        Hidden terminal problems
    •    How much packet is lost due to hidden-terminal?


                                    ?



                sender      receiver      hidden terminal

•       Infer transmission failure by absence of ACK
•       Estimate conditional probability of loss given
        simultaneous transmission by some hidden-terminal


                                                            70
      Hidden Terminal Problems




•   10% of sender-receiver pairs have over 10% losses
    due to hidden terminals                             71
            Trace analysis


802.11 b/g interactions   TCP loss rate in wireless vs. in Internet




                                                             72
            Moving forward
• Developed “Jigsaw” that allows
  – 24x7 monitor system in UCSD CSE w/ 156 sniffers
  – Global fine-grained view of large wireless network
    (time, locations, channels)
  – Jigsaw software is available

• Ongoing work
  – Root cause diagnoses of end-to-end performance in
    wireless networks
  – Standard wireless problem analysis
     • Ex. Exposed terminal problems

                                                    73

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:69
posted:7/26/2012
language:English
pages:73