# tree-lookup

Document Sample

```					                  Advanced topics
in
Computer Networks
Lecture 9: Tree-based lookup

University of Tehran
Dept. of EE and Computer Engineering
By:
Dr. Nasser Yazdani
Univ. of Tehran       Adv. topics in Computer Network   1
Outline
   Issues
   Multiway and Multicolumn search
   DMP-Tree
   Some implementation issues

Univ. of Tehran   Adv. topics in Computer Network   2
Issues
   How to sort prefixes
   Prefixes as ranges
   Comparing prefixes
   Based on length
   Add extra bits at the end.
   New definition (DMP-tree)
   How to apply tree structures like binary tree
or m_way tree to prefixes

Univ. of Tehran        Adv. topics in Computer Network   3
Multiway tree lookup.
    Proposed by G. Varghese and his students.
   Consider prefixes as range.
   First try: Pad 0’s to prefixes in order to apply
binary search tree.
consider {1*, 101* and 10101* }prefixes
100000                 Should match here

101000
101010        101011      Binary search fail for
Binary search ends      101110                  all of them!.
here.                   111110
Univ. of Tehran     Adv. topics in Computer Network            4
Multiway tree lookup(cont)
   Two problem in the previous example
   Being Far away from matching prefix
   Multiple addresses matching different prefixes end up in
the same region.
   Solution: Prefixes as ranges, Put the end of range
in the table.
100000      L
101000      L          We have the explicit
101010      L          ranges.
101011      H 101011   Search maps to one range
101111      H 101110   only.
111110
111111      H
Univ. of Tehran     Adv. topics in Computer Network      5
Multiway tree lookup(cont)
100000      L
101000      L
101010      L
101011      H 101011
101111      H 101110
111110
111111      H

For 101011, we try to find first L which is not followed by H.
For the rest, we can have a stack operation to find the first
L.
Problem: Linear search to find L

Univ. of Tehran       Adv. topics in Computer Network        6
Multiway tree lookup(cont)
Solution: Precompute prefixes corresponding to
ranges.                         >    =
P1)100000             P1   P1
P2)101000             P2   P2
P3)101010             P3   P3
101011              P2   P3
100000      L             101111              P1   P2
101000      L             111111              -    P1
101010      L
101011
H 101011
101111      H 101110
111110
111111      H                     1* matching prefix.
Univ. of Tehran       Adv. topics in Computer Network         7
DMP-Tree
   Comparing prefixes.
   Sorting prefixes
   Binary prefix Tree.
   M_way prefix tree.

Univ. of Tehran   Adv. topics in Computer Network   8
Trie structure

Univ. of Tehran   Adv. topics in Computer Network   9
Sorting prefixes
   Question? Why well-known tree structures cannot
be applied to the longest prefix matching problem?
   Answer- No a well-known method for sorting.
   Definition: Assume Aa1a2…an and B=b1b2…bm
to be prefixes of  and there a character 
   1. If n=m, the numerical values of A and B are
compared.
   2. If n  m (assume n<m), the two substrings a1a2…an
and b1b2…bn are compared. If a1a2…an and b1b2…bn are
equal, then, the (n+1)th character of string B is checked.
It is considered B>A if bn+1 is before  and B  A
otherwise.
Univ. of Tehran   Adv. topics in Computer Network      10
Sorting prefixes (cont)
   Example- Assume M is  Then, BOAT is smaller
than GOAT and SAD is bigger than BALLOON. CAT is
considered bigger than CATEGORY since the fourth
character in CATEGORY, E, is smaller than M.

   Sorting is a function to determine the position of each
prefix.

 Prefixes of table is sorted as:
00010*,0001*,001100*,01001100*,0100110*,01011,00
1*,01011*,01*,10*,10110001*,1011001*,10110011*,
1011010*,1011*,110*
Univ. of Tehran   Adv. topics in Computer Network   11
Binary prefix tree

•Unfortunately, it fails for 101100001000 Why?
•Prefixes are ranges and not just a data point in the search space.

Univ. of Tehran       Adv. topics in Computer Network          12
Binary prefix tree (cont)
        Definition: prefixes A and B are disjoint if
none of them is a prefix of other.
        Definition : prefix A is called enclosure if
there exists at least one element set such that
A is a prefix of that element.
        We modify the sort structure;
1.      Each enclosure has a bag to put its data element on it.
2.      Sort remaining elements.
3.      Distribute the bag elements to the right and left according
the sort definition.
4.      Apply algorithm recursively.
Univ. of Tehran    Adv. topics in Computer Network      13
Binary prefix tree (cont)
   Example- Prefixes in table 1. First step.

The second step,

Note- enclosures are in the higher level than the
contained elements. (important!)
Univ. of Tehran   Adv. topics in Computer Network   14
Binary prefix tree (cont)
   The final tree structure

Univ. of Tehran   Adv. topics in Computer Network   15
Sorting prefixes (cont)
   Sorting algorithms
   Based on bubble sort
Tmp= MinLength(list)
for all i in list except tmp do
compare i with tmp
if i matches tmp then;
put i in tmp’s bag
if i<tmp then
put i in leftList:
if i>tmp then
put i in rightist:
endfor
list = Sort(leftList)  Sort(rightList)
Univ. of Tehran          Adv. topics in Computer Network   16
M_way prefix tree
        Problems with the binary prefix tree.
     Two way branching.
     The structure is not dynamic and insertion may
cause problems!.
1.       Divide by m after sorting the strings
      Static m_way tree.
2.       Build a dynamic data structure like B-tree.
     How to guarantee enclosure to be in the higher
level than its contained elements.
     Define node splitting and insertion.
Univ. of Tehran   Adv. topics in Computer Network   17
M_way prefix tree (Cont)
    Node splitting: Finding the split point.
1.      Take the median if the data elements are
disjoint.
2.      If there is an enclosure containing other
elements, take it as split point.
3.      Otherwise, take an element which gives the best
splitting result.
    Note, this does not guarantee the final tree
will be balanced.

Univ. of Tehran   Adv. topics in Computer Network   18
M_way prefix tree (Cont)
    Insertion:
1.      If the new element is not an enclosure of
others, find its place and insert in the
corresponding leaf, like B-tree.
2.      Otherwise, replace the closet element with
element and reinsert the replace elements.
3.      Resort the resulted subtree, (space division)
if necessary.
    Building tree is similar to building B-tree.

Univ. of Tehran   Adv. topics in Computer Network   19
M-way prefix tree (cont)
   Example               Prefix              Abbrv.             Prefix     Abbrv.
10                  -               1101110010    K
01                  -               10001101      L
110                 -               11101101      M
1011                -               01010110      N
0001                -               00100101      O
01011               -               100110100     P
00010               -               101011011     Q
001100              A               11101110      R
1011001             B               10110111      S
1011010             C               011010        T
0100110             D               011011        U
01001100            E               011101        V
10110011            F               0110010       W
10110001            G               101101000     X
01011001            H               101101110     Y
001011              I               00011101      Z
00111010            J               011110110     II
Univ. of Tehran              Adv. topics in Computer Network               20
M-way prefix tree (cont)
 We insert prefixes randomly.
 The tree uses 5 branching factor (at most 4
prefixes in each node)
 Insert   01011, 1011010, 10110001 and
0100110. Then, adding 110 cause overflow.
Split node
 10110001 
(0100110,01011)      (1011010, 110)
(all element are disjoint)
Univ. of Tehran   Adv. topics in Computer Network   21
M-way prefix tree (cont)
   Insert 10110011, 1101110010,                            00010.
 10110001           1011010 
(00010,0100110,01011) (1011001,10110011) (110,1101110010)

(case 3 of splitting)
 Latter adding 1011 cause problem. It is the
case of adding an enclosure. We will have
space division.

Univ. of Tehran    Adv. topics in Computer Network        22
M-way prefix tree (cont)
   The final tree

• The tree supersede B-tree or B-tree is a special
case of this tree. Then, when data element are
relatively disjoint, the height of tree is logMN.
Univ. of Tehran    Adv. topics in Computer Network   23
DMP-Tree
Max. height                                         No. of Data
25
20                                                        50
15                                                        60
10                                                        70
5                                                        80
0
90

0
=3

=4

=5

=6

=7

=8

=9
100

=1
BF

BF

BF

BF

BF

BF

BF

BF
• BF is Branching factor in the internal nodes.
• No. of Data is in1000s.

Univ. of Tehran   Adv. topics in Computer Network                 24
DMP-Tree

Min. memory utilization                      No. of Data

0.68                                                                     50K
0.66                                                                     60K
0.64                                                                     70K
0.62
80K
3   4     5        6      7      8      9       10
90K
branching                                 100K

• Number of prefixes in the right.

Univ. of Tehran                 Adv. topics in Computer Network                      25
DMP-Tree

• Height of tree for 100K data prefixes.
Height
25
20
Min
15
Ave.
10
Max
5
0
3       4   5    6       7      8       9     10 Branching

Univ. of Tehran           Adv. topics in Computer Network                     26
DMP-Tree
Analyzing of results.
    With increasing BF, Branching Factor, the
height decreases.
    The result are for the worst case, Max
height, and the ave. case is much less.
    After BF=9, increasing Branching Factor
does not decrease the max. height.
    The results are for the set of prefixes of
50,000-100,000 with lengths from 8 t0 31.
The size of actual prefixes in use is around
50,000 and the length is 8-31.
Univ. of Tehran   Adv. topics in Computer Network   27
DMP-Tree
Memory utilization:,
    Mem. Utilization is 0.64%-0.67% without
    Mem. Utilization is 0.53%-0.62% with tree
    Without considering branching pointers, the
mem. Utilization decreases with increasing
the branching factor.
    Total mem. Utilization increases with
increasing the branching factor.

Univ. of Tehran   Adv. topics in Computer Network   28
DMP-Tree
Therefore,
 The longest matching prefix of a

network can be determined in 5 steps
with 9 or more branching factor.
 In the worst case, we need at most 2

times of total prefix data size of memory
to implement the scheme. For instance,
for 50,000 prefixes of 32bit, we need at
most 3.2 Mbit of memory.

Univ. of Tehran   Adv. topics in Computer Network   29
Overall Design
    All operations need search first in the Tree
structure.
    Two search procedures, one for the
longest matching prefix and another for
update.
    The prefix tree data structure is on the
chip.
    The Policy table is on the off chip memory.
    There is a port to data link layer mapping
module.
Univ. of Tehran Adv. topics in Computer Network 30
Tree Nodes
Internal nodes       Branching factor
   Internal nodes.                 Leaf nodes

…
1        [33]       [?]      2      [33]         [?]            [?]
[?]                          [?]
       Each prefix has a left and right pointer which
are pointing to left and right subtrees
respectively.
      We can have N prefixes in each internal node.
Then, N+1 is the branching factor.
      The bigger N, the faster search time, but the
more logic is needed.
Adv. topics in of the port in the switch to
Univ. of Port is the address Computer Network
      Tehran                                               31
Tree Nodes
   Leaf nodes.
Prefix 1    port        Prefix 2     port
[33]        [?]         [33]         [?]             …

   There is no left and right subtree pointers.
   The number of prefixes in the leaf node is M.
   The leaf nodes are stored in a off chip memory
to make the scheme scalable to the large
number of prefixes.

Univ. of Tehran          Adv. topics in Computer Network       32
Branching Factor
   What is the best number for N? (Branching
factor)
   The bigger N, the faster search process. (Fact 1)
   The bigger N, the more memory pins are and
usually the more mem. Bandwidth is needed
(Fact 2).
   The bigger N, the more logic we need to process
the node (Fact 3).
   Simulation result shows
    The bigger N, the better memory utilization in
the memory.
Univ.For N  8, theAdv. topics inheight of the tree does not
 of Tehran        max. Computer Network                 33

Simulation result block and
Total memory: assuming one memory
OC-192.
# of       required    Branching    Mem. Pins   Mem BW      Max mem       Mem. Size   Max
Prefixes   Mem.        Factor                   (G/s)       Access        (on         heights
max                       chip)mm2
64K        5.4 Mbits   15           897         89.7        4             81          5
64K        5.5 Mbits   11           655         65.5        4             82          5
64K        5.4Mbit     9            527         52.7        4             81          5
64K        6 Mbit      6            335         46.9        6             96          7
64K        6.6 Mbit    5            275         44          7             110         8
64K        6.5 Mbit    4            207         62.1        14            112         15
100K       8.3 Mbit    15           897         89.7        4             122         5
100K       8.5 Mbit    11           655         65.5        4             125         5
100K       8.3         9            527         63.24       5             122         6
100K       9 Mbit      6            335         53.6        7             135         8
100K       9.1Mbit     5            275         49.5        8             140         9
100K       9.5         4            207         62.1        14            150         15
30K        2.6
Univ. of Tehran      9            527           52.7       4 expected
Adv. topics in Computer Network        40          5    34
Branching Factor
   It seems any number between 8-16 is
reasonable. But, N=9 gives a better search
time, memory size.
   Assuming 9 branching factors in the internal
node, %50 node utilization and 128K
prefixes, we need max. 128K/4.5= 28.5K
right pointers are more than enough. But,
we need more for off chip addressing
   The number of switch port are usually
limited, around 64, We can assume 256,
then 8 bit is enough Computer Network them.
Univ. of Tehran                            35
Branching Factor
   In order to make the internal node
branching and leaf node branching even,
M=10.
   If we want to read a node at once, we will
need 41x10=410 pins which is difficult to
support in one chip.
   We can divide a node in two and read/write
in two clock cycles. This reduce the memory
pins to 205 which is affordable.
Univ. of Tehran      Adv. topics in Computer Network   36
Memory requirement
    Prefix tree: Assuming 128K prefixes.
N = 9 (BF) and M=10 (BF in leaves), the majority of
prefixes, %80 will be in leaves, assume %65 node
utilization,
# of ave prefixes in a leaf node node = 10*0.6 5= 6.5
# of leaf nodes  128Kx%80x2/6.5 = 31.5K and %10
Total off chip memory = 35K x 205(Mem BW) = 7.2 Mbits
Then, we need 16 bits for addressing. 1 bit for
internal/external.
# of internal nodes= 128Kx%20/5.8=4.41K and %10
Total on chip memory=4.9Kx529K  2.6Mbits 48 bit addr.
Univ. of Tehran   Adv. topics in Computer Network   37
Memory requirement
      In summary:
# of      on chip Branchin     On         Off chip mem     Mem. Size       Offchip
Prefixes memor g Factor        chip       memor Access( (on chip)          mem
y Mbits               Mem.       y        search) mm
2             pins
Pins       Mbits
128K       2.6        10       529        7.2         5        40          250
Note:
   Branching factor is the # of branching in internal nodes.
   The size of the memory scales with the size of data or
# of prefixes.
   Power dissip. depends on the r/w freq, current & core voltage
A 10Kx32 bits single port mem size is 36x1.45 mm2.

Univ. of Tehran          Adv. topics in Computer Network                     38
Overall Design

Memory                       Mem.
Ctrl

Update Search
To/From                        root content
NP                  Search

Insertion update delete
CPU     To/From
face
Ctrl      MEM               Output mem Ctrl
To/From out Mem
Univ. of Tehran             Adv. topics in Computer Network                       39
Search Path
Mem Ctrl
To/From

Off mem

Root
Node
Data[32]                                                                                                                                  Node
Piping                                          GetLen                   Len[Nx6]   Compare                Next
InClk[1]   Input                                                         First[1]                                                         Match[1]
SOA[1]
Found[1]

Cashing                               Port[8]               Dispatch

DataOut[32]

To Scheduler

There are data assertion signals between blocks which has not been
shown every where because of space limitation.
Univ. of Tehran                                           Adv. topics in Computer Network                                                                            40
Search Path
    Input Module:
     Get the packet destination addresses from the
parser.
     Do parity checking.
     It has the following input signals
    Input data which 32 bits.
   Start of Address, 1 bit, (SOA)
   Parity, 1 bit, (prty)
   Input clock, (InClk)
     It gets Data in two clock cycles, first the IP
memory or packet id (cid)
Univ. of Tehran         Adv. topics in Computer Network   41
Search Path

    Input Module:
29 bits is used for the packet address and the
last 3 bit for the policy, Then, 512 Mbytes can
be supported to store the packet before
sending them out.
  The 2nd clock cycle data format:
31                       2   0


InClk     The timing
SOA

Or cid

Univ. of Tehran        Adv. topics in Computer Network        42
Search Path

    Piping Module:
     Pipelines the search process.
For new elements from input block does.
For each new IP address do
If found in the hash table
send the packet memory address to dispatcher
Else
Enter IP address and the policy into the pipe FIFO
End do,

Univ. of Tehran           Adv. topics in Computer Network             43
Search Path

    Piping Module:
For elements in the FIFO
For the first IP address in FIFO do
If IP address is new then,
assert first signal and send IP address and policy out.
Else if next addr is on chip send the next node address to Mem.
Ctrl.
Else send the next node address to OffMemCtrl.
send to the pipe the IP address and policy.
If the node was leaf then,
Send the longest matching address to OutMemCtrl.
Send Policy to Extract port and the packet address to dispatch.
Else
Put the IP address into the FIFO
Replace the longest matching prefix address if a new one found.
Univ. of Tehran            Adv. topics in Computer Network                  44
Search Path

    Piping Module:
     FIFO . Keep the current information of IPs.
IP Addr   Port [8]     Next Node             LMPA[]   New [1]
[32]                   [19]
LMPA = Longest Matching Prefix Address
New = 1 new , 0 old
If the packet is new the next address will be zero
The address is off chip if the first, most
significant bit is 1, otherwise it on chip.
Univ. of Tehran        Adv. topics in Computer Network                      45
Search Path

    GetLen Module: This module get the length
of prefixes. We add 1 to the end of a prefix
and then padded with ‘0’s to make it 33
bits.
Ex. 11011010  110110101000…0 (33 bits).
Then, we should start from right and the first ‘1’
we meet, the rest is the prefix length.
GetLen can be implemented as a multiplexer with
case statement (32 case statement) and it can
be done in one clock cycle.
Univ. of Tehran   Adv. topics in Computer Network   46
Search Path

      Compare Module: compare two prefix A
and B with lengths L1 and L2.
Assume L1>L2 and A[1:L2] is the first L2 bit from
A, Then,
1.       If A[1:L2] = B  A and B match. If A[L2+1] = 0
 A B. Otherwise, A> B.
2.       If A[1:L2] > B  A >B, otherwise A<B.
One of the prefixes here is IP address with length
32.
We assume there are no two identical elements
in the tree.
Univ. of Tehran           Adv. topics in Computer Network    47
Search Path

      Next Module: Get the next node address to
read and also the matching prefix and its
corresponding port number.
It gets two signals for each prefix, Match and
ComResult (compare),
Match =1  the prefix match,
ComResult = 1  Prefix is bigger.
It gets the left address of the first prefix, from the
left, such that its ComResult signal is 1.
It compares the matching prefix lengths and the get
the one with the largest length.
Univ. of Tehran       Adv. topics in Computer Network      48
Search Path

      Dispatch Module: forms the Routing Group
Address, RGA, from the port number and
send it with packet stored memory address
(PSMA) or CID.
RGA is a 64 bit size bit map. The bit correspond to
port number is set to 1.
PSMA is dispatched first and Port and DLL address
follows.
      Cashing Module: keep a cash of IP address
IP           Port[8]
[32]
Univ. of Tehran      Adv. topics in Computer Network      49
Search Path

      Cashing Module:
The cash is kept as a FIFO and its depth depends
on the technology.
assert found signal.
write IP address on top of FIFO if it is not there
Else
write IP address on top of FIFO
Cashing system always removes the last
reference IP address from the cash.
Univ. of Tehran       Adv. topics in Computer Network       50
Search Process

operations: This is for large prefixes (50K &up)
cashing
Piping        Piping         Piping        Piping                       Piping
Off Mem                      Dispatch
root              On chip memory nodes                                  Leaf node

0         2             5              8            11            14             16            19
Time
• This operation is for an IP address lookup
• Piping is the bottleneck in the system and in ave. take 5 cycles.
• Assuming 100 MHZ operation
# of packets = 109/50 = 20 Million
Line speed = 512x20 = 10.24 G for 64 byte packets.
= 256x8x20= 41 G for 256 byte packets.
• It is possible to support higher speeds with duplicating pipe.
Univ. of Tehran                Adv. topics in Computer Network                                   51
Output pins

Pin Name      Type         Number             Comment
DataOut(Port&cid)   Output    32              Port and cid, to schedular
DataBus(cpu)        In/out    32              CPU data bus
CtrlBus(cpu)        In/out    12              CPU control bus
MemData2(Tree)      In/out    205             If off chip is used
MemAddr2(Tree)      In/out    18              If off chip is used
MemCtrl1(Tree)      In/out    8?              If off chip is used
Total                         340             This value can change
around 10 percent

Univ. of Tehran          Adv. topics in Computer Network                        52

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 20 posted: 9/20/2011 language: English pages: 52