Docstoc

tree-lookup

Document Sample
tree-lookup Powered By Docstoc
					                  Advanced topics
                                in
  Computer Networks
          Lecture 9: Tree-based lookup

                   University of Tehran
          Dept. of EE and Computer Engineering
                              By:
                    Dr. Nasser Yazdani
Univ. of Tehran       Adv. topics in Computer Network   1
      Outline
   Issues
   Multiway and Multicolumn search
   DMP-Tree
   Some implementation issues




    Univ. of Tehran   Adv. topics in Computer Network   2
      Issues
   How to sort prefixes
       Prefixes as ranges
       Comparing prefixes
             Based on length
             Add extra bits at the end.
             New definition (DMP-tree)
   How to apply tree structures like binary tree
    or m_way tree to prefixes


    Univ. of Tehran        Adv. topics in Computer Network   3
       Multiway tree lookup.
    Proposed by G. Varghese and his students.
   Consider prefixes as range.
   First try: Pad 0’s to prefixes in order to apply
    binary search tree.
    consider {1*, 101* and 10101* }prefixes
        100000                 Should match here

        101000
        101010        101011      Binary search fail for
    Binary search ends      101110                  all of them!.
    here.                   111110
     Univ. of Tehran     Adv. topics in Computer Network            4
        Multiway tree lookup(cont)
   Two problem in the previous example
       Being Far away from matching prefix
       Multiple addresses matching different prefixes end up in
        the same region.
   Solution: Prefixes as ranges, Put the end of range
    in the table.
        100000      L
        101000      L          We have the explicit
        101010      L          ranges.
        101011      H 101011   Search maps to one range
        101111      H 101110   only.
                      111110
        111111      H
    Univ. of Tehran     Adv. topics in Computer Network      5
     Multiway tree lookup(cont)
         100000      L
         101000      L
         101010      L
         101011      H 101011
         101111      H 101110
                       111110
         111111      H

For 101011, we try to find first L which is not followed by H.
For the rest, we can have a stack operation to find the first
L.
Problem: Linear search to find L

   Univ. of Tehran       Adv. topics in Computer Network        6
    Multiway tree lookup(cont)
Solution: Precompute prefixes corresponding to
  ranges.                         >    =
                                P1)100000             P1   P1
                                P2)101000             P2   P2
                                P3)101010             P3   P3
                                  101011              P2   P3
        100000      L             101111              P1   P2
        101000      L             111111              -    P1
        101010      L
        101011
                    H 101011
        101111      H 101110
                      111110
        111111      H                     1* matching prefix.
  Univ. of Tehran       Adv. topics in Computer Network         7
    DMP-Tree
   Comparing prefixes.
   Sorting prefixes
   Binary prefix Tree.
   M_way prefix tree.




    Univ. of Tehran   Adv. topics in Computer Network   8
    Trie structure
   Trie or radix tree




Univ. of Tehran   Adv. topics in Computer Network   9
    Sorting prefixes
   Question? Why well-known tree structures cannot
    be applied to the longest prefix matching problem?
   Answer- No a well-known method for sorting.
   Definition: Assume Aa1a2…an and B=b1b2…bm
    to be prefixes of  and there a character 
       1. If n=m, the numerical values of A and B are
        compared.
       2. If n  m (assume n<m), the two substrings a1a2…an
        and b1b2…bn are compared. If a1a2…an and b1b2…bn are
        equal, then, the (n+1)th character of string B is checked.
        It is considered B>A if bn+1 is before  and B  A
        otherwise.
        Univ. of Tehran   Adv. topics in Computer Network      10
      Sorting prefixes (cont)
   Example- Assume M is  Then, BOAT is smaller
    than GOAT and SAD is bigger than BALLOON. CAT is
    considered bigger than CATEGORY since the fourth
    character in CATEGORY, E, is smaller than M.

   Sorting is a function to determine the position of each
    prefix.

 Prefixes of table is sorted as:
00010*,0001*,001100*,01001100*,0100110*,01011,00
  1*,01011*,01*,10*,10110001*,1011001*,10110011*,
  1011010*,1011*,110*
       Univ. of Tehran   Adv. topics in Computer Network   11
     Binary prefix tree




•Unfortunately, it fails for 101100001000 Why?
•Prefixes are ranges and not just a data point in the search space.


       Univ. of Tehran       Adv. topics in Computer Network          12
          Binary prefix tree (cont)
        Definition: prefixes A and B are disjoint if
         none of them is a prefix of other.
        Definition : prefix A is called enclosure if
         there exists at least one element set such that
         A is a prefix of that element.
        We modify the sort structure;
    1.      Each enclosure has a bag to put its data element on it.
    2.      Sort remaining elements.
    3.      Distribute the bag elements to the right and left according
           the sort definition.
    4.      Apply algorithm recursively.
          Univ. of Tehran    Adv. topics in Computer Network      13
     Binary prefix tree (cont)
   Example- Prefixes in table 1. First step.




The second step,



Note- enclosures are in the higher level than the
contained elements. (important!)
     Univ. of Tehran   Adv. topics in Computer Network   14
      Binary prefix tree (cont)
   The final tree structure




       Univ. of Tehran   Adv. topics in Computer Network   15
    Sorting prefixes (cont)
   Sorting algorithms
       Based on bubble sort
       Based on Radix sort.
          Tmp= MinLength(list)
          for all i in list except tmp do
            compare i with tmp
            if i matches tmp then;
                     put i in tmp’s bag
            if i<tmp then
                      put i in leftList:
            if i>tmp then
                      put i in rightist:
          endfor
          list = Sort(leftList)  Sort(rightList)
     Univ. of Tehran          Adv. topics in Computer Network   16
          M_way prefix tree
        Problems with the binary prefix tree.
          Two way branching.
          The structure is not dynamic and insertion may
           cause problems!.
1.       Divide by m after sorting the strings
           Static m_way tree.
2.       Build a dynamic data structure like B-tree.
          How to guarantee enclosure to be in the higher
           level than its contained elements.
          Define node splitting and insertion.
          Univ. of Tehran   Adv. topics in Computer Network   17
         M_way prefix tree (Cont)
    Node splitting: Finding the split point.
    1.      Take the median if the data elements are
            disjoint.
    2.      If there is an enclosure containing other
            elements, take it as split point.
    3.      Otherwise, take an element which gives the best
            splitting result.
    Note, this does not guarantee the final tree
     will be balanced.

         Univ. of Tehran   Adv. topics in Computer Network   18
     M_way prefix tree (Cont)
    Insertion:
    1.      If the new element is not an enclosure of
            others, find its place and insert in the
            corresponding leaf, like B-tree.
    2.      Otherwise, replace the closet element with
            element and reinsert the replace elements.
    3.      Resort the resulted subtree, (space division)
            if necessary.
    Building tree is similar to building B-tree.

         Univ. of Tehran   Adv. topics in Computer Network   19
      M-way prefix tree (cont)
   Example               Prefix              Abbrv.             Prefix     Abbrv.
                        10                  -               1101110010    K
                        01                  -               10001101      L
                        110                 -               11101101      M
                        1011                -               01010110      N
                        0001                -               00100101      O
                        01011               -               100110100     P
                        00010               -               101011011     Q
                        001100              A               11101110      R
                        1011001             B               10110111      S
                        1011010             C               011010        T
                        0100110             D               011011        U
                        01001100            E               011101        V
                        10110011            F               0110010       W
                        10110001            G               101101000     X
                        01011001            H               101101110     Y
                        001011              I               00011101      Z
                        00111010            J               011110110     II
      Univ. of Tehran              Adv. topics in Computer Network               20
    M-way prefix tree (cont)
 We insert prefixes randomly.
 The tree uses 5 branching factor (at most 4
  prefixes in each node)
 Insert   01011, 1011010, 10110001 and
  0100110. Then, adding 110 cause overflow.
  Split node
                  10110001 
      (0100110,01011)      (1011010, 110)
(all element are disjoint)
    Univ. of Tehran   Adv. topics in Computer Network   21
      M-way prefix tree (cont)
   Insert 10110011, 1101110010,                            00010.
    Adding 1011001 causes overflow.
                 10110001           1011010 
(00010,0100110,01011) (1011001,10110011) (110,1101110010)

(case 3 of splitting)
 Latter adding 1011 cause problem. It is the
  case of adding an enclosure. We will have
  space division.


       Univ. of Tehran    Adv. topics in Computer Network        22
       M-way prefix tree (cont)
   The final tree




    • The tree supersede B-tree or B-tree is a special
    case of this tree. Then, when data element are
    relatively disjoint, the height of tree is logMN.
       Univ. of Tehran    Adv. topics in Computer Network   23
   DMP-Tree
  Max. height                                         No. of Data
  25
  20                                                        50
  15                                                        60
  10                                                        70
   5                                                        80
   0
                                                            90




                0
    =3

              =4

              =5

              =6

              =7

              =8

              =9
                                                            100




             =1
  BF

            BF

            BF

            BF

            BF

            BF

            BF

           BF
• BF is Branching factor in the internal nodes.
• No. of Data is in1000s.

  Univ. of Tehran   Adv. topics in Computer Network                 24
  DMP-Tree

                           Min. memory utilization                      No. of Data

    0.68                                                                     50K
    0.66                                                                     60K
    0.64                                                                     70K
    0.62
                                                                             80K
                   3   4     5        6      7      8      9       10
                                                                             90K
                                   branching                                 100K


• Number of prefixes in the right.

 Univ. of Tehran                 Adv. topics in Computer Network                      25
   DMP-Tree

• Height of tree for 100K data prefixes.
 Height
  25
  20
                                                                        Min
  15
                                                                        Ave.
  10
                                                                        Max
   5
   0
           3       4   5    6       7      8       9     10 Branching




 Univ. of Tehran           Adv. topics in Computer Network                     26
       DMP-Tree
Analyzing of results.
      With increasing BF, Branching Factor, the
      height decreases.
      The result are for the worst case, Max
      height, and the ave. case is much less.
      After BF=9, increasing Branching Factor
      does not decrease the max. height.
      The results are for the set of prefixes of
      50,000-100,000 with lengths from 8 t0 31.
      The size of actual prefixes in use is around
      50,000 and the length is 8-31.
      Univ. of Tehran   Adv. topics in Computer Network   27
       DMP-Tree
Memory utilization:,
      Mem. Utilization is 0.64%-0.67% without
      considering the tree branching overhead.
      Mem. Utilization is 0.53%-0.62% with tree
      branching overhead (pointers).
      Without considering branching pointers, the
      mem. Utilization decreases with increasing
      the branching factor.
      Total mem. Utilization increases with
      increasing the branching factor.

      Univ. of Tehran   Adv. topics in Computer Network   28
    DMP-Tree
Therefore,
 The longest matching prefix of a

  network can be determined in 5 steps
  with 9 or more branching factor.
 In the worst case, we need at most 2

  times of total prefix data size of memory
  to implement the scheme. For instance,
  for 50,000 prefixes of 32bit, we need at
  most 3.2 Mbit of memory.

   Univ. of Tehran   Adv. topics in Computer Network   29
                    Overall Design
    All operations need search first in the Tree
     structure.
    Two search procedures, one for the
     longest matching prefix and another for
     update.
    The prefix tree data structure is on the
     chip.
    The Policy table is on the off chip memory.
    There is a port to data link layer mapping
     module.
    Univ. of Tehran Adv. topics in Computer Network 30
                           Tree Nodes
                                          Internal nodes       Branching factor
   Internal nodes.                 Leaf nodes



Addr     Prefix 1   Port     Addr   Prefix 2     port
                                                           …
                                                                Addr
1        [33]       [?]      2      [33]         [?]            [?]
[?]                          [?]
            Each prefix has a left and right pointer which
             are pointing to left and right subtrees
             respectively.
            We can have N prefixes in each internal node.
             Then, N+1 is the branching factor.
            The bigger N, the faster search time, but the
             more logic is needed.
                          Adv. topics in of the port in the switch to
    Univ. of Port is the address Computer Network
            Tehran                                               31
                             Tree Nodes
   Leaf nodes.
          Prefix 1    port        Prefix 2     port
          [33]        [?]         [33]         [?]             …

       There is no left and right subtree pointers.
       The number of prefixes in the leaf node is M.
       The leaf nodes are stored in a off chip memory
        to make the scheme scalable to the large
        number of prefixes.

    Univ. of Tehran          Adv. topics in Computer Network       32
               Branching Factor
   What is the best number for N? (Branching
    factor)
       The bigger N, the faster search process. (Fact 1)
       The bigger N, the more memory pins are and
        usually the more mem. Bandwidth is needed
        (Fact 2).
       The bigger N, the more logic we need to process
        the node (Fact 3).
   Simulation result shows
        The bigger N, the better memory utilization in
         the memory.
    Univ.For N  8, theAdv. topics inheight of the tree does not
      of Tehran        max. Computer Network                 33

                  Simulation result block and
           Total memory: assuming one memory
          OC-192.
    # of       required    Branching    Mem. Pins   Mem BW      Max mem       Mem. Size   Max
    Prefixes   Mem.        Factor                   (G/s)       Access        (on         heights
                                                    max                       chip)mm2
    64K        5.4 Mbits   15           897         89.7        4             81          5
    64K        5.5 Mbits   11           655         65.5        4             82          5
    64K        5.4Mbit     9            527         52.7        4             81          5
    64K        6 Mbit      6            335         46.9        6             96          7
    64K        6.6 Mbit    5            275         44          7             110         8
    64K        6.5 Mbit    4            207         62.1        14            112         15
    100K       8.3 Mbit    15           897         89.7        4             122         5
    100K       8.5 Mbit    11           655         65.5        4             125         5
    100K       8.3         9            527         63.24       5             122         6
    100K       9 Mbit      6            335         53.6        7             135         8
    100K       9.1Mbit     5            275         49.5        8             140         9
    100K       9.5         4            207         62.1        14            150         15
    30K        2.6
      Univ. of Tehran      9            527           52.7       4 expected
                                       Adv. topics in Computer Network        40          5    34
               Branching Factor
   It seems any number between 8-16 is
    reasonable. But, N=9 gives a better search
    time, memory size.
   Assuming 9 branching factors in the internal
    node, %50 node utilization and 128K
    prefixes, we need max. 128K/4.5= 28.5K
    address. Then, 15 bit address for left and
    right pointers are more than enough. But,
    we need more for off chip addressing
   The number of switch port are usually
    limited, around 64, We can assume 256,
                     Adv. topics in to address
    then 8 bit is enough Computer Network them.
     Univ. of Tehran                            35
                      Branching Factor
   In order to make the internal node
    branching and leaf node branching even,
    M=10.
   If we want to read a node at once, we will
    need 41x10=410 pins which is difficult to
    support in one chip.
   We can divide a node in two and read/write
    in two clock cycles. This reduce the memory
    pins to 205 which is affordable.
    Univ. of Tehran      Adv. topics in Computer Network   36
                Memory requirement
    Prefix tree: Assuming 128K prefixes.
    N = 9 (BF) and M=10 (BF in leaves), the majority of
       prefixes, %80 will be in leaves, assume %65 node
       utilization,
    # of ave prefixes in a leaf node node = 10*0.6 5= 6.5
    # of leaf nodes  128Kx%80x2/6.5 = 31.5K and %10
       overhead  35 K
    Total off chip memory = 35K x 205(Mem BW) = 7.2 Mbits
    Then, we need 16 bits for addressing. 1 bit for
       internal/external.
    # of internal nodes= 128Kx%20/5.8=4.41K and %10
       overhead 4.9 K                            Link Addr
    Total on chip memory=4.9Kx529K  2.6Mbits 48 bit addr.
    Port to link address mapping table.
    Univ. of Tehran   Adv. topics in Computer Network   37
                      Memory requirement
      In summary:
# of      on chip Branchin     On         Off chip mem     Mem. Size       Offchip
Prefixes memor g Factor        chip       memor Access( (on chip)          mem
         y Mbits               Mem.       y        search) mm
                                                             2             pins
                               Pins       Mbits
128K       2.6        10       529        7.2         5        40          250
Note:
          Branching factor is the # of branching in internal nodes.
          The size of the memory scales with the size of data or
            # of prefixes.
          Power dissip. depends on the r/w freq, current & core voltage
          Considering Faraday Mem. Modules
             A 10Kx32 bits single port mem size is 36x1.45 mm2.


    Univ. of Tehran          Adv. topics in Computer Network                     38
                             Overall Design

                         Memory                       Mem.
                                                       Ctrl



                               root addr
                                                  Update Search
To/From                        root content
NP                  Search

                                  Insertion update delete
                                                                  CPU     To/From
              Port2Addr                                           Inter   CPU
                                                                  face
            Ctrl      MEM               Output mem Ctrl
                                              To/From out Mem
  Univ. of Tehran             Adv. topics in Computer Network                       39
                                             Search Path
                                                                         Mem Ctrl
                               To/From




                                                             RdAdd[19]
                               Off mem




                                                                                                               Root
                                                                                      Node
 Data[32]                                                                                                                                  Node
                    Addr[32]                                                                                          Node
 SOA[1]                                                                  Addr[32]                                                          CResult[1]
                                             Piping                                          GetLen                   Len[Nx6]   Compare                Next
 InClk[1]   Input                                                         First[1]                                                         Match[1]
                         SOA[1]
  Prty[1]                                                                                                             Addr[32]
                                               PackAdd[29]                  MemAddr[14]                                                    Addr[32]
                                  Found[1]




                                                                                                                                                           OutMemAddr
                                                                            IpAddr[32]


                                                                         LinkAddr[48]
                                    Cashing                               Port[8]               Dispatch
                    Addr[32]                                               Addr[32]


                                                                                                 DataOut[32]


                                                                                      To Scheduler

There are data assertion signals between blocks which has not been
shown every where because of space limitation.
   Univ. of Tehran                                           Adv. topics in Computer Network                                                                            40
                              Search Path
    Input Module:
           Get the packet destination addresses from the
            parser.
           Do parity checking.
           It has the following input signals
                 Input data which 32 bits.
                Start of Address, 1 bit, (SOA)
                Parity, 1 bit, (prty)
                Input clock, (InClk)
           It gets Data in two clock cycles, first the IP
            address and then, the packet address in the
            memory or packet id (cid)
    Univ. of Tehran         Adv. topics in Computer Network   41
                               Search Path

    Input Module:
      29 bits is used for the packet address and the
         last 3 bit for the policy, Then, 512 Mbytes can
         be supported to store the packet before
         sending them out.
        The 2nd clock cycle data format:
                          31                       2   0
                                              Packet address   Policy


      
    InClk     The timing
      SOA

       Data           IpAddr   PackAddr
                               Or cid



    Univ. of Tehran        Adv. topics in Computer Network        42
                                 Search Path

    Piping Module:
           Pipelines the search process.
            For new elements from input block does.
             For each new IP address do
                  If found in the hash table
                     send the packet memory address to dispatcher
                  Else
                     Enter IP address and the policy into the pipe FIFO
             End do,



    Univ. of Tehran           Adv. topics in Computer Network             43
                                  Search Path

    Piping Module:
      For elements in the FIFO
             For the first IP address in FIFO do
                  If IP address is new then,
                     assert first signal and send IP address and policy out.
                  Else if next addr is on chip send the next node address to Mem.
                     Ctrl.
                      Else send the next node address to OffMemCtrl.
                     send to the pipe the IP address and policy.
      For the recirculated address
             If the node was leaf then,
                  Send the longest matching address to OutMemCtrl.
                  Send Policy to Extract port and the packet address to dispatch.
             Else
                  Put the IP address into the FIFO
                  Replace the longest matching prefix address if a new one found.
    Univ. of Tehran            Adv. topics in Computer Network                  44
                              Search Path

    Piping Module:
           FIFO . Keep the current information of IPs.
                IP Addr   Port [8]     Next Node             LMPA[]   New [1]
                [32]                   [19]
      LMPA = Longest Matching Prefix Address
      New = 1 new , 0 old
      If the packet is new the next address will be zero
          and we can read root cash content instead of
          reading from memory.
      The address is off chip if the first, most
          significant bit is 1, otherwise it on chip.
    Univ. of Tehran        Adv. topics in Computer Network                      45
                         Search Path

    GetLen Module: This module get the length
     of prefixes. We add 1 to the end of a prefix
     and then padded with ‘0’s to make it 33
     bits.
      Ex. 11011010  110110101000…0 (33 bits).
      Then, we should start from right and the first ‘1’
         we meet, the rest is the prefix length.
      GetLen can be implemented as a multiplexer with
         case statement (32 case statement) and it can
         be done in one clock cycle.
    Univ. of Tehran   Adv. topics in Computer Network   46
                                 Search Path

      Compare Module: compare two prefix A
       and B with lengths L1 and L2.
      Assume L1>L2 and A[1:L2] is the first L2 bit from
         A, Then,
             1.       If A[1:L2] = B  A and B match. If A[L2+1] = 0
                       A B. Otherwise, A> B.
             2.       If A[1:L2] > B  A >B, otherwise A<B.
      One of the prefixes here is IP address with length
        32.
      We assume there are no two identical elements
        in the tree.
    Univ. of Tehran           Adv. topics in Computer Network    47
                             Search Path

      Next Module: Get the next node address to
       read and also the matching prefix and its
       corresponding port number.
      It gets two signals for each prefix, Match and
          ComResult (compare),
             Match =1  the prefix match,
             ComResult = 1  Prefix is bigger.
             It gets the left address of the first prefix, from the
                 left, such that its ComResult signal is 1.
             It compares the matching prefix lengths and the get
                 the one with the largest length.
    Univ. of Tehran       Adv. topics in Computer Network      48
                            Search Path

      Dispatch Module: forms the Routing Group
       Address, RGA, from the port number and
       send it with packet stored memory address
       (PSMA) or CID.
             RGA is a 64 bit size bit map. The bit correspond to
                port number is set to 1.
             PSMA is dispatched first and Port and DLL address
                follows.
      Cashing Module: keep a cash of IP address
       and corresponding port.address
                           IP           Port[8]
                                             [32]
    Univ. of Tehran      Adv. topics in Computer Network      49
                             Search Path

      Cashing Module:
      The cash is kept as a FIFO and its depth depends
         on the technology.
             Check IP address in FIFO.
             If the address found, then,
                 assert found signal.
                 write IP address on top of FIFO if it is not there
                 already.
             Else
                  write IP address on top of FIFO
      Cashing system always removes the last
         reference IP address from the cash.
    Univ. of Tehran       Adv. topics in Computer Network       50
                               Search Process

    operations: This is for large prefixes (50K &up)
cashing
              Piping        Piping         Piping        Piping                       Piping
                                                                       Off Mem                      Dispatch
              root              On chip memory nodes                                  Leaf node


0         2             5              8            11            14             16            19
                                                                                                            Time
     • This operation is for an IP address lookup
     • Piping is the bottleneck in the system and in ave. take 5 cycles.
     • Assuming 100 MHZ operation
          # of packets = 109/50 = 20 Million
          Line speed = 512x20 = 10.24 G for 64 byte packets.
                      = 256x8x20= 41 G for 256 byte packets.
     • It is possible to support higher speeds with duplicating pipe.
      Univ. of Tehran                Adv. topics in Computer Network                                   51
                            Output pins

             Pin Name      Type         Number             Comment
     DataIn (IP Addr)    Input      32             IP address, from parser
     DataOut(Port&cid)   Output    32              Port and cid, to schedular
     DataBus(cpu)        In/out    32              CPU data bus
     CtrlBus(cpu)        In/out    12              CPU control bus
     MemData2(Tree)      In/out    205             If off chip is used
     MemAddr2(Tree)      In/out    18              If off chip is used
     MemCtrl1(Tree)      In/out    8?              If off chip is used
     Total                         340             This value can change
                                                   around 10 percent




Univ. of Tehran          Adv. topics in Computer Network                        52

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:20
posted:9/20/2011
language:English
pages:52