Docstoc

slides

Document Sample
slides Powered By Docstoc
					              Packet Classification # 3
              Ozgur Ozturk
              CSE 581: Internet Technology
              Winter 2002




Packet Classification # 3   CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk   02/11/02
        Introduction
                Importance
                     Identify the context of packets 
                      Apply necessary actions
                     Differentiated services
                Memory and Time Efficiency
                     Must handle Ks of rules
                     Must be at wire-speed (No queuing)


                                                                                         2
Packet Classification # 3    CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk    02/11/02
        Packet Classification # 3


        Paper List
                T. Lakshman, D. Stiliadis, "High-Speed Policy-based
                Packet Forwarding Using Efficient Multi-dimensional
                Range Matching” [Bit-Parallelism]
                     http://www.bell-labs.com/user/stiliadi/filter/paper.html
                F. Baboescu, G. Varghese, "Scalable Packet
                Classification” [ABV: Agregated Bit Vector]
                M. Buddhikot, S. Suri, M. Waldvogel, "Space
                Decomposition Techniques for Fast Layer-4
                Switching“ [Space Decomposition]
                V. Srinivasan, G. Varghese, S. Suri, M. Waldvogel,
                "Fast and Scalable Layer Four Switching“ [Paper4]

                                                                                                3
Packet Classification # 3           CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk    02/11/02
        Bit-Parallelism Paper-Intro.
                Presents packet classification schemes
                     traffic-independent and worst-case
                      performance metric
                     a few K rules, at rates of M packets per
                      second using range matches on more than
                      4 packet header fields




                                                                                         4
Packet Classification # 3    CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk    02/11/02
        Bit-Parallelism Paper

        Requirement for Real-Time Operation

           Traditional router architectures
                flow-cache architectures to classify packets
                identified flows are expected to arrive in near
                 future
                Current backbone routers
                    active flows extremely high
                               OC-3 links, 256K flows
                    Cashes implemented as hash tables
                               scales well to that size
                                                                                                   5
Packet Classification # 3              CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk    02/11/02
        Bit-Parallelism Paper

        Requirement for Real-Time Operation
        2 - Hash-Table Prob.s
            Good hash function is non-trivial
                 100 to 200 bits of header to be randomly distributed to no
                  more than 20 to 24 bits of hash index
                 header value distribution is unknown
            Performance of cache-based schemes is heavily
            traffic dependent
            Malicious Users
              limitations of hashing algo. & cashing techniques

            Packet queuing delays acceptable after classification

                                                                                            6
Packet Classification # 3       CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk    02/11/02
        Bit-Parallelism Paper

        Packet Classification Constraints
                Scale to large routers with Gigabit links.
                Process at wire-speed
                     75% of packets < typical TCP packet size (552 bytes)
                     Nearly half are 40 to 44 bytes (TCP Ack)
                Rules on several fields, specifying ranges, exact matches
                and prefixes
                     Two prefix fields in some cases
                Allow arbitrary priorities for policies to allow distinction
                for multiple matches
                Optimize for lookups, sacrifice update performance
                     lookup rate/update rate 107.


                                                                                             7
Packet Classification # 3        CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk    02/11/02
     Bit-Parallelism Paper

     Packet Classification Constraints-2

                Memory access time; dominant factor in
                worst-case lookup execution time
                Amenable to hardware implementation
                Time vs. Space




                                                                                         8
Packet Classification # 3    CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk    02/11/02
     Bit-Parallelism Paper

     General Packet Classification

                Decomposable search to perform multi-
                dimensional search for packet filtering
                     k-dimensional query  a set of 1-dimensional queries
                      on 1-dimensional intervals
                     Exploit parallelism where possible
                     Seek poly-logarithmic solution
                Packet header fields  k-dimensions
                Filters  overlapping regions in the k-
                dimensional space

                                                                                           9
Packet Classification # 3      CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk    02/11/02
     Bit-Parallelism Paper

     Efficiency of Proposed Algorithms

                1st Algorithm
                  Memory: k*n O(n) bits per dimension
                                2

                  Time: log(2n)+1

                  Memory access: n/w

                2nd Algorithm
                  Memory reduce to O(n log n) bits

                  Time increase constant

                  Can be optimized for time and memory budget

                  Exploit on-chip memory in traffic-independent
                   manner, to speed up worst case.

                                                                                         10
Packet Classification # 3    CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02
        Notation
                Rule rm in k dimentions
                     rm = (e1,m, e2,m,…. ek,m)
                     e range




                                                                                         11
Packet Classification # 3    CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02
     Bit-Parallelism Paper

     Algorithm demo on 2-D/Preprocessing                                                 1




                                                                                             12
Packet Classification # 3    CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk         02/11/02
     Bit-Parallelism Paper


       Algorithm demo on 2-D/Preprocessing                                               2




       Max 2n+1
       intervals for n
       rules

                                                                                             13
Packet Classification # 3    CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk         02/11/02
     Bit-Parallelism Paper


       Algorithm demo on 2-D/Preprocessing                                               3




    Sets of rules
    formed
    corresponding to
    each region


                                                                                             14
Packet Classification # 3    CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk         02/11/02
     Bit-Parallelism Paper


       Algorithm demo on 2-D/Online                                               1


                P1 (x*,y*) to be classified
                     find intervals x* and y* belongs to
                         binary search  log(2n+1)+1
                            comparisons/dimension
                     Create Intersection of all sets
                         conjunction of corresponding bit vectors
                     Highest Priority entry in the resultant bit
                      vector

                                                                                             15
Packet Classification # 3        CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02
     Bit-Parallelism Paper


       Algorithm demo on 2-D/Online                                            2


                Max Set Cardinality = O(n)
                Intersection step examines all rules at least
                ones  Time complexity = O(n)
                With bit-level parallelism
                     The bitmaps representing sets stored in a
                      (2n+1)*n array Bj[i,1..n] (Ri,j set stored for each
                      dimension)
                     k*n/w memory accesses
                Different processing elements for each
                dimension in hardware implementation
                     Prototype
                                                                                          16
Packet Classification # 3     CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02
        dimension in hardware implementation
            Prototype




                                                                                        17
Packet Classification # 3   CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02
        Packet Class. based on Inc.
        Reads
                     Algorithm utilizes incremental reads to reduce required
                     memory
                     Allows time-space optimization and increases
                     localization for off-chip SDRAM and wide on-chip
                     memory implementations
                     Consider a specific dimension j
                           Assume maximum 2n+1 non-overlapping intervals
                           Corresponding to intervals in an n-bit bitmap with the
                            positions of the 1s indicating the filter rules that overlap this
                            interval
                           Adjacent intervals’ corresponding bitmaps differ in only one bit
                           A single bitmap and 2n pointers of size log n to the differing
                            bits can be used to reconstruct any bitmap
                                                                                               18
Packet Classification # 3          CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02
          Bit-Parallelism Paper- Algorithm 2


          Packet Class. based on Inc.
          Reads 2
                     Reduces space requirement to O(n log n) from
                      O(n2)
                Further Generalize
                     (2n+1)/l bitmaps instead of 1
                     (2n+1)/2l pointers needed
                     Choose l by need
                         2n+1  memory reduce to O(n log n)
                             Memory access increase n/w2n log n /w


                Trade off decision according to on-chip/off-
                chip memory ratio.

                                                                                            19
Packet Classification # 3       CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02
        Bit-Parallelism Paper- Algorithm 2

        Special Case: 2-D Classification
                Necessary for best-effort traffic aggregation in Internet backbone
                Determine next hop and resource allocations based on destination
                and source addresses only
                  Longest prefix match lookups

                      Restrict source prefix ranges to powers of 2 in order to
                        reduce space
                      space requirement O(n) with trie implementation
                Virtual intervals
                      Map intervals of prefix lengths to both dimensions, sorted
                        by length
                      “Virtual Intervals” allow worst-case lookup time of O(ls+log
                        n) where ls is the number of possible prefix lengths
                  Multicast group identification requires only two additional
                    memory accesses

                                                                                            20
Packet Classification # 3       CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02
        Bit-Parallelism Paper- Algorithm 2


        Conclusions
                Packet classification, or filtering, is a useful primitive in
                connectionless networks to provide differentiated
                service and policy-based routing
                More recently, security and active processing
                     Two multi-dimensional range matching algorithms allow
                      millions of packets per second to be processed on a set of
                      thousands of filter rules
                     Robust and predictable worst-case performance
                Efficient 2-D algorithm for backbone routers with
                hundreds of thousands of routing entries
                Algorithms demonstrate that there may be no need to
                restrict filtering to edge routers
                                                                                            21
Packet Classification # 3       CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02
Paper4

 Layer Four Switching

     Traditional router performs looking-up based on destination
     address
     Layer four switching provides increased flexibility: it gives a
     router the capability to distinguish and deal with traffics
     differently:
            Block traffic from dangerous site
            Provide QoS service for certain traffics
            Give preferential treatment to certain traffic (say, database flow).
     Difficulties: need layer four header information, which may not
     always available
            any modification of layer four header may cause problems
            Do not how to get header info when encrypted
     Some variants of L4S:
            Firewall
            Reservation protocols such as RSVP
            Routing based on traffic type, say web traffic
                                                                                22
Paper4

The Best Matching Filter Problem
  A packet P has k distinct header fields for lookup: H[1], … , H[k]
  The filter database of a Layer 4 Router consists of a finite set of
  filters: F1, F2, …, FN, each filter Fi has an associated directive acti
  Match: each field of P matches the corresponding field of F
  Cost: used to determine an unambiguous match (say order of
  filters)
  An address range can always be transferred into a sequence of
  prefixes so we can use prefix match
                                              A filter database
                                     Dest   Src   DP    SP      SP
                                     M       *    25     *      *
         A packet example:           M       *    53     *     UDP
         (M, S, UDP, 53, 125)        M       S    53     *      *
                                     M       *    23     *      *
                                     T1     T0    123   123    UDP
                                      *     Net    *     *      *
                                     Net     *     *     *    TCP-ACK
                                      *      *     *     *      *

                                                                        23
Paper4

Set Pruning Trees (1)

• Build a trie on the destination prefixes in the database
• Each valid prefix in the destination trie points to a trie
  containing some source prefixes.
• A single filter may be fit into multiple destination prefixes,
  thus has multiple source trie copies.
• Memory space: O(N2)
• Time complexity: O(N)




                                                               24
  Set Pruning Trees (2)

Filter   Destination Source
 F1          0*       10*                                 0                 1
 F2          0*       01*
 F3          0*        1*         Dest-Trie
                                                  0                          0
 F4         00*        1*
 F5         00*       11*
 F6         10*        1*
 F7           *       00*

                          Src-Trie
                                  0       1           0       1         0        0        1
                                      F4 F3                       F3
E.g.: Looking for: (001, 001) 0   1   0       1   0   1   0       1 0       0             F6
                              F7 F2 F1        F5 F7 F2 F1              F7   F7



                                                                                     25
Avoid the Memory Blowup (1)



  Avoid the copying by having each destination prefix D
  point to a source trie that stores the filters whose
  destination field is exactly D
  When searching, may need go back to the
  destination trie for multiple times
  Time complexity: O(W2)
  Space complexity: O(NW)




                                                     26
  Avoid the Memory Blowup (2)


Filter   Destination Source
 F1          0*       10*                                   0                    1
 F2          0*       01*
 F3          0*        1*          Dest-Trie
                                                    0                            0
 F4         00*        1*
 F5         00*       11*
 F6         10*        1*
 F7           *       00*


                                           1            0       1            0            1
E.g.: Looking for: (001, 001)         F4                            F3
                                               1        1   0            0                F6
                                Src-Trie
Memory requirement=O(NW)                       F5       F2 F1            F7


Lookup Worst Case= O(W2)
                                                                                     27
Improving Search Time: Basic Grid-of-Tries (1)



  Basic idea:
      Use pre-computation and switch pointers (in the lower lever
       tries) to speed up search in a later source trie base on the
       search in an earlier source trie. (Remember the previous
       searching result)
  Role of switch pointer
      Allow us to increase the length of the matching source
       prefix, without having to restart at the root of the next
       ancestor source trie.
      Stored Filter: node (D,S) stores the least cost filter whose
       dest field is a prefix of D and src field is a prefix of S
  Time complexity: 2W
  Space complexity: O(NW)
                                                                      28
  Improving Search Time: Basic Grid-of-Tries (2)


Filter   Destination Source
 F1          0*       10*                                           0                  1
 F2          0*       01*
 F3          0*        1*           Dest-Trie
 F4         00*        1*                              0                               0
 F5         00*       11*
 F6         10*        1*
 F7           *       00*


                                           1       0                                       0
E.g.: Looking for: (001, 001)                              x0           1
                                                                            F3
                                                                                   0                1
                                       F4          0            0
                                               1            1       0             0                 F6
                                Src-Trie                                         yF7
                                               F5          F2 F1




                                                                                               29
Further Improvement & Extension

  Use some faster scheme for destination address
  matching
     Time complexity O(W)  O(log W)
  Use multi-bit tries for source address matching
     Time complexity O(W)  O(W/k)
  Extend Grid-of-tries to handle protocol and port fields
     3 GOT copies for TCP, UDP and OTHER respectively,
     4 hash tables for 4 port combinations:
        both unspecified, destination only, source only, both specified




                                                                           30
                   Cross-Producting (1)
How-to
   Slice filter database into column, the i-th column storing all distinct
    prefixes in field i.
   Make a cross-product table of all k columns
   Pre-compute the least cost filter that matches each cross-product
    entry
   When packet comes in, do best prefix matching for each field
    respectively
   With matching results, find out the corresponding entry in the
    cross-product table
Discussion
   Very fast (for matching)
   Problem: memory explosion: N^k
   Solution: On Demand Cross-Producting


                                                                          31
                            Cross-Producting (2)
  Dest   Src    DP    SP      SP
   M      *     25     *      *               Dest       Src       DestPort    SrcPort      Flags
   M      *     53     *     UDP             Prefix     Prefix      Prefix     Prefix      Prefixes
   M      S     53     *      *                M           S            25       123        UDP
   M      *     23     *      *                                                Default
                                               T1         T0            53                  TCP-
   T1    T0     123   123    UDP                                                            ACK
    *    Net     *     *      *               Net        Net         23
   Net    *      *     *    TCP-ACK
                                                                                           Default
                                             Default    Default     123
    *     *      *     *      *
                                                                   Default


                      Num                 CrossProduct                   Matching Filter
                       1               M, S, 25, 123, UDP                     F1
                       2            M, S, 25, 123, TCP-ACK                    F1
                       3              M, S, 25, 123, default                  F1
                       4             M, S, 25, default, UDP                   F1
                       5          M, S, 25, default, TCP-ACK                  F1
E.g. Looking for:      6            M, S, 25, default, default                F1
(M,S,UDP,25,57)        …                       ……                             …
                      479   default,default,default,default,TCP-ACK           F8
                      480     default,default,default,default,default         F8

                                                                                           32
        Conclusions
                GOT solution scalable (linear) storage &
                fast lookups for D-S filters.
                     More general filters  high lookup cost
                Cross-Producting solution, higher
                variance, but faster on average (for
                lookup) because of cashing need.
                Hybrid scheme combines flexibility with
                efficiency.

                                                                                         33
Packet Classification # 3    CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02
        ABV:

          "Scalable Packet Classification”
         F. Baboescu, G. Varghese,


                GOAL
                     Packet classification
                         scalable (in rules, upto 100,000)
                         wire speed

                Past Work
                     Linear time search
                     Linear amount of TCAMS
                     Lucent scheme
                         worst case doesn't scale


                                                                                            34
Packet Classification # 3       CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02
        SOLUTION
                Aggregated Bit Vector
                     improvement on Lucent bit vector
                     rule aggregation
                     rule rearrangement
                Rule Aggregation
                     bit vectors are sparse
                         i.e., few rules match
                     Some compression scheme

                                                                                           35
Packet Classification # 3      CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02
        SOLUTION continued
                Rule Rearrangement
                     overlap is rare
                     place rules w/ common values together
                     sort out rule ordering later




                                                                                        36
Packet Classification # 3   CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02
        Comparing ABV w/ BV of Lucent




                                                                                        37
Packet Classification # 3   CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02
        Results
                At least an order magnitude faster than
                BV
                Scales well for memory access




                                                                                        38
Packet Classification # 3   CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02
        Paper # 3

        “Space Decomposition Techniques for
        Fast Layer-4 Switching"
        M. Buddhikot, S. Suri, M. Waldvogel


                new scheme, based on space decomposition,
                whose search time is comparable to the best
                existing schemes, but which also offers fast
                worst-case filter update time.
                three key ideas
                     innovative data-structure based on quadtrees for a
                      hierarchical representation of the recursively
                      decomposed search space
                     fractional cascading and precomputation to
                      improve packet classification time
                     prefix partitioning to improve update time
                                                                                           39
Packet Classification # 3      CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02
        Space Decomposition Evaluation
                Depending on the actual requirements of the
                system this algorithm is deployed in, a single
                parameter  can be used to tradeoff search time for
                update time.
                Amenable to fast software and hardware
                implementation.
                For N two-dimensional filters specified using prefixes
                of up to W bits in length, Area-based Quadtrees
                (AQT) data structure requires O(N) space, O(W)
                search time, and O((N)1/)
                 Both the average and worst-case search times and
                memory consumption are comparable or better than
                other schemes known in the literature.
                                                                                        40
Packet Classification # 3   CSE 581: Internet Technology (Winter 2002)   Ozgur Ozturk     02/11/02

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:11/21/2011
language:English
pages:40