Document Sample
intel_ixa_workshop_2002 Powered By Docstoc
					Security services and the IXP

             Wu-chang Feng

       Systems Software Laboratory
Dept. of Computer Science and Engineering
                 About the project..
• 6 months old
   – Just started, pardon the vapor
• Supported by Intel (12/2001) and ETIC (4/2002)
   – Graduate Students
      • Francis Chang:
      • Deepa Srinivasan:
      • Jin Choi (1/2003):
   – Undergraduate Interns from Charles Consel’s group
      • Ludovic Martorel
      • Damien Berger
                      Talk outline
•   IXP and network security research
•   Packet classification
•   Packet classification caching strategies
•   Curriculum
The IXP and network security
           A research opportunity
  – Provides an open high-speed networking platform
  – Research enabler
     • Analyzing packet classification/routing algorithms
     • Analyzing packet classification/routing lookup caching
     • Security functions
  – Sandbox to test and compare algorithms on a real platform
                   IXP and research
• Quickly becoming the ns of experimental networking
   – Open hardware
   – Open software
• What’s needed?
• A library of reference implementations and benchmarks
   –   IP route lookup (longest-prefix match) algorithms
   –   General packet classification algorithms
   –   Route and classification lookup caching algorithms
   –   Security functions
                  Our focus: Security
• Borrow and use liberally…
   –   Princeton (VERA)
   –   Columbia (NetBind)
   –   Georgia Tech (IDS)
   –   Utah (Emulab)
   –   Others..
• Build what’s missing
   – Range of full packet classifiers
   – Range of lookup caching algorithms
   – Merging the goals of research and education
      • A security-focused IXP laboratory course
• Eventually, examine additional security services
   – Anomaly detection
   – Content filtering
   – etc.
Packet classification

 Student: Deepa Srinivasan
               Packet classification
• Use the IXP and open-source tools to
   – Compare full, packet classification algorithms
   – Benchmark algorithms via real rule sets and real traffic traces
   – Explore adaptive packet classifiers
   A hard, but well-studied problem
• What are the key issues?
   – Storage
   – Search time
   – Update time
• General filter matching problem ~ Problems in
  computational geometry
   – N=number of filters or rules, d=number of dimensions
   – Requires
      • O(log N) time with O(Nd) space
      • O((log N)(d-1) time with O(N) space
• Classic space-time tradeoff problem
     A space-time tradeoff example
• Hierarchical tries: slow and compact
• Set-pruning tries: fast and large
            Hierarchical Trie

(Figure should terminate at R2)
Set-pruning Trie
          A space-time tradeoff example
• Hierarchical tries vs. Set-pruning tries (worst-case)
 Algorithm            Time       Storage    Updates   Notes
 Linear Search         N           N           1      Simple, poor scaling, iptables

 Hierarchical trie     Wd         NdW        d2W      Backtracking search

 Set-pruning trie      dW          Nd         Nd      Fast retrieval at the cost of storage. Good for
                                                      relatively static classifiers.

    N – Number of Rules W – Width of dimension d – Number of dimensions
              Packet classification
• Approaches
  – Generic classifiers
     • Optimized for best worst-case performance
  – Heuristic classifiers
     • Take advantage of structure in rule sets (as done with IP
       router lookups)
     • Tradeoff speed, storage, and update time in the worst case
       for speed and storage in the common case
  – Hardware classifiers
     • Throw hardware and parallel processing at the problem
     • Serves as a wish-list for the IXP
         – Is a hardware-based packet classification engine in the works?
         – Can I go home?
         – Will I need to shoot myself when the IXP4xxx comes out?
 So many algorithms, so little time…
• Which one to choose?
   –   Hierarchical tries with backtracking search
   –   Set-pruning tries
   –   Bit vector, Fractional cascading [Lakshman98]
   –   Aggregated bit vector [Baboescu00]
   –   Grid of tries, Cross-producting [Srinivasan98]
   –   Area-based quadtrees [Buddhikot99]
   –   Fat inverted segment tree [Feldman00]
   –   Tuple-space search [Srinivasan99]
   –   Recursive flow classification [Gupta99]
   –   Hierarchical intelligent cuttings [Gupta00]
• Performance and cost a function of
   –   d = number of dimensions
   –   W = width of dimensions
   –   N = number of rules
   –   l = number of levels in tree (FIS-tree only)
             Summary of schemes [Gupta00]
Algorithm                    Time       Storage        Updates      Notes
Linear Search                  N            N              1        Simple, poor scaling

Hierarchical trie              Wd         NdW            d2W

Set-pruning trie              dW            Nd            Nd        Fast retrieval at the cost of storage. Good for
Cross-producting                                                    relatively static classifiers.

Grid-of-tries                 W d-1       NdW            NdW        Rebuild for each update; Could be used for last 2
                                                                    dimensions of a multi-dimensional hierarchical trie.
AQT                           aW           NW          a Sqrta(N)   a is a tunable integer parameter

FIS-tree                    (l + 1) W   l x N1 + 1/l       --       Tree must be recomputed on update
RFC                             d           Nd            ---       Not suitable for large sets of rules (> 6000); pre-
                                                                    processing and large storage space. 10Gbps line rates
                                                                    in hardware and 2.5Gbps rates in software.
Hierarchical Intelligent        d           Nd            ---       Parameters can be tuned to trade-off query time
Cuttings                                                            against storage requirements.
Tuple-space search             M            N              1        Performs well for multiple dimensions if the number
                                                                    of tuples (i.e. hash entries) are small. Only supports
                                                                    prefixes; generic rules increase storage complexity.

Ternary CAM                    1            N             1*        Simple; Good for small classifiers; Costly
Bit vector                    dW +                                  Incremental updates not supported; Good for multiple
                                           dN2            ---
                           N/memwidth                               dimension and a small number of rules

N=# of rules, W=Width of dimensions, d=# of dimensions, l=levels of tree, M=# of Tuples
                  Is there a winner?
• Not really, it depends on….
   –   Rule sets
   –   Incoming traffic characteristics
   –   Metric desired (average vs. worst-case lookup time)
   –   Hardware cost (memory, ternary CAM)
        • How much chip area did that 16-entry CAM on the
          IXP2xxx take?
           Adaptive packet classifiers
• Hypothesis
   – Value in adaptation
   – Reconfigure for high-speed based on amount of memory and rule set given
     a fixed hardware configuration and performance metric
• Approach
   – Implement a small set of classifiers
   – Build modules that translate ipchains/iptables/netfilter rule sets into data
     structures of individual classifiers
   – Study adaptation policies for classifiers based on rule analysis
   – Implement seamless switching between implementations (i.e. double
     buffering [Partridge98])
   – Performance evaluation using
       • Library of publicly available rule sets
       • Public traffic trace
       • An Emulab with loadable IXPs 
Classification lookup caching

       Student: Francis Chang
       Caching and IP route lookups
• IP destination-based routing
   – A one-dimensional packet classifier
• Caching instrumental in building gigabit IP routers
   – Full lookup extremely expensive to support at high rates
   – Cache of 12,000 entries gives 95% hit rate [Jain86, Feldmeier88,
     Heimlich90, Jain90, Newman97, Partridge98]
   – “A 50 Gb/s IP Router” [Partridge98]
      • Switched interconnection fabric
      • Alpha 21164-based forwarding cards (separate from line cards)
      • First-level on-chip caches Icache=8kB (2048 instructions), Dcache=8kB
      • Secondary on-chip cache=96kB
           – Fits 12000 entry route cache in memory
           – 64 bytes per entry presumably due to cache line size
       • Tertiary cache=16MB (full, double-buffered route table)
Caching and multi-dimension lookups
• Flow-based firewalls
   – A five-dimensional packet classifier
• Caching even more important
   – Full classification algorithms will not run anywhere near line-
     speed on the current incarnation of the IXP
   – Inherently harder to do
   – Much lower hit rates [Xu00]
   – Rule and traffic dependent
                Current approaches
• Direct-mapped hashing with LRU replacement
   – Typical for IP route caches [Partridge98]
• Parallel hashing and searching with set-associative
  hardware [Xu00]
   – ASIC solution with parallel processing and a fixed, LRU
     replacement scheme
• Proprietary vendor solutions
   – ?
                 Class-based caching
• Structure of application traffic can provide useful information
• W. Feng, F. Chang, W. Feng, J. Walpole, “Provisioning On-line
  Games: A Traffic Analysis of a Busy Counter-Strike Server”
   – Packet load of an on-line game server over 10ms intervals
• Game traffic
   –   Large number of periodic packets
   –   Extremely small packet sizes
   –   Persistent flows
   –   Small number of clients per server
   –   Without caching, a packet classification disaster
   –   With caching, a poster-child for LFU replacement?
• Web traffic
   – Bursty, heavy-tailed packet arrival
   – Many more clients per server
   – Small number of packets per flow
                      Goal of study
• Attack the packet classification caching problem
• Resource requirements and data structures for high
  performance packet classification caches
• “Segregate, Hash, and Cache”
   – Understand traffic characteristics
   – Examine hierarchical class-based partitioning of cache
   – Examine class-based partitioning of classification function (i.e.
   – Examine alternative replacement algorithms per class such as

Student: Jin Choi
    An IXP course for OGI/OHSU
• Goal
  – Spread the IXP gospel
  – Provide students with experience on a modern networking
     • Train (and test drive) potential Ph.D. students
     • Train future Intel employees
         – 171 OGI/OHSU alums @ Intel
         – Intel is the single largest employer of OGI/OHSU graduates
• Ask for help
   – Dirk & Raj (PCs, IXP boards, and support)
   – Ken Mackenzie (course material and advice)
• Keep it simple
• Align with security research project
• Ask for feedback
   – Curriculum completed
   – Guide and slide presentation available at
   – Course will be offered as CSE58?: Networking Practicum
   – Scheduled for Spring 2003
                    The course itself
• Errata
   – Weekly 3-hour sessions
   – Dedicated laboratory of 10 IXP workstations
      • Cloned via Norton Ghost
• Week #1
   – Conceptual framework
   – IXP architecture
      • Hardware: StrongARM, memory resources, micro-engines
      • Software: ACEs, microACEs
• Week #2
   – Introduce Linux/Windows2000/VMware, and the IXP platform
   – Remedial Linux network administration material
       • ifconfig, route, netstat, ipchains, ping, traceroute, arp etc.
   – Learn the IXP environment setup/configuration
       • Building core components on Linux using standard GNU toolchain
       • Building microcode using microengine toolchain on Windows2000
            The course itself (cont.)
• Week #3
  – Build and run the L3 forwarder application
     • Test with external sources and sinks
• Week #4
  – Add a packet counter to the L3 forwarder
     • Makes sure that everyone with a CS degree from OGI/OHSU has
       programmed in assembly code at some point.
• Week #5
  – In-line port filter
      • Add microcode to block TCP segments based on destination port
  – Code review of L3 forwarder to design full port filter
           The course itself (cont.)
• Week #6: continued
           The course itself (cont.)
• Week #6
  – Full port filtering functionality
     • Pass port numbers to be blocked as arguments
     • SRAM management (allocation and initialization of multi-
       stride trie in the core component, access to data structure
       from the microengine)
     • Add logic in core component to handle port filtering of
       exceptional packets
            The course itself (cont.)
• Week #7-#10
  – Propose and implement functions of their own for a final project
     • Packet classifiers
     • Classification lookup caching
                      Future work
• Support for high-speed intrusion and anomaly detection
  (E-boxes and A-boxes)
   – Content-based filters
      • Basic network-level filters (Snort)
      • Application-specific filters (Bro)
   – Usage-based filters
      • Accounting
      • Logging
     What makes sense on an IXP?
• Function-based decomposition used in security
   – Common Intrusion Detection Framework (CIDF) [Porras01]
      • Event generators (E-boxes)
         – produce entries based on filtered activities
      • Event databases (D-boxes)
         – store events in a persistent manner
      • Event analyzers (A-boxes)
         – synthesize higher-level activity based on individual range of events
      • Response units (R-boxes)
         – perform actions based on events

Shared By: