Distributed Hash Tables - A Modular Approach by leader6


									Distributed Hash Tables
   A Modular Approach


      Gurmeet Singh Manku
   Ph.D. Thesis at Stanford University, 2004

        Advisor: Prof Rajeev Motwani


                 27 May 2004
Design Principle I: Flat Address Space

            0.00      0.010   0.011   0.100   0.101   0.110   0.111

           0                                                          1

  Each host acquires an ID in [0, 1) and “manages” a “partition”.
Design Principle II: Structured Overlay
            Overlay = “Circle” + “Routing Network”
                                                                          insert/lookup hash entries?
                                                                          no global knowledge
                                                                          of who manages what!
0.00         0.010   0.011   0.100   0.101      0.110    0.111           Solution:
                                                                          an overlay of TCP links
                                                                          insert/lookup hash entries
                                                                          by “routing”
       for a strong core                0.111
       for “correctness”

  “Routing Network”                                                                0.010
       for short routes
       for “efficiency”

  A Modular Framework for DHTs

DHT Data Management Layer

DHT Routing Layer            Choice of Routing Network
                              (Description of any family of
                             Deterministic/Randomized ICNs
                            defined over powers-of-two nodes)

             “Circle”              Emulation Engine
                               (handles dynamism & scale)
                          (aware of physical network-proximity)

                          ID Management
                (ensures fine-grained partition balance)

Low-level Routines
      The ID Assignment Problem
         How does a new host acquire an ID in [0, 1)?

No global knowledge of “current set of IDs”   Can a system enjoy
Low cost (# messages)                         all 3 properties
Almost equi-sized partitions                  simultaneously?

Naïve ID Assignment
 Choose “r”, a random number in [0, 1)        [SMK+01, RD01, RFHK01]
                                              [MNR02, FG03, KK03]
                                              [MBR03, HJS+03, ZHS+04]
 σ = θ(log2 n) with n hosts in the system
                                               σ >100 when n = 4K

σ = Partition-balance ratio            Can we do better?
  = Ratio of the largest
                                       perhaps ...
    to the smallest partition
                                       if we could “learn” a few IDs
                 One RANDOM PROBE
 One RANDOM PROBE                         One “Random Walk” down the tree
 Split the manager encountered            Split the leaf encountered

   A host with a k-bit ID                      0
   “manages” an interval of size 2-k           1

Leaves in O(log log n) different levels
σ = θ(log n) with n hosts in the system

      Can we make σ = θ(1)?                                           .0000
                                                        Only leaf nodes
                                                   correspond to IDs in [0, 1)
A New ID Assignment Algorithm [M04]
1 RANDOM walk down the tree
2 Walk up until sub-tree has (c log n) leaves
3 Split the “shallowest leaf” below the sub-tree

 Leaves in at most 3 different levels w.h.p.
            [log2 n], [log2 n] ± 1
 where [x] denotes integer closest to x.
 So σ  4

   Improved ID Assignment [KM04*]
followed by 1 LOCAL PROBE of size (c log n)
                                                          Cost = θ(R + log n)

                     0                            1

(c log n) RANDOM PROBES                                    [AAABMP03][NW03]
                                                           Cost = θ(R log n)
                     0                                1

“a” RANDOM PROBEs                                         [KM04*]
followed by LOCAL PROBEs of size “b”                      σ4
  ab > c log n                                            Cost = θ(aR + b)

                                                            aopt  log log n
                 0                            1
         Emulation Engine in Action
                          Hypercube                Pastry[RD01]
                      Variant of hypercube
                                                   Chord [SMK+01]
                     Multi-dimensional mesh
                        de Bruijn graph            D2B[FG03] Koorde[KK03]
                                                   [NW03] [AAA+03] [LKRG03]
          Choice of Topology
Circle     Emulation Engine
  ID Assignment

            Kleinberg’s construction [Kle00]
                                                   Symphony [MBR03]
           Randomized “standard” butterfly
                                                   Viceroy [MNR02]
          Randomized “Kleinberg-style” butterfly
             Randomized Chord & hypercubes         [CDHR03] [GGG03]
                  Skip-lists /skip-graphs
                                                   [AS03] [HJS+03]
 The Emulation Engine [M03,M04]
                                       Description of parallel network
                                       over static sets of nodes
Circle   Emulation Engine              (successive powers of two)

  ID Assignment                          20    21    22    23    24 ...

                        ID Assignment Schemes [M04] [KM04*]
                     guarantee that leaves are in at most 3 levels

                                     [log2 n], [log2 n] ± 1
                            where [x] denotes integer closest to x.

 Emulation with “Naïve ID Assignment” was shown in [M03]
              Emulation Engine Details
THREE sets of links per node:              Example: HYPERCUBE
 1 set with d-bit ID     (top d bits)       A node with 5-bit ID .01101
 1 set with (d-1)-bit ID (top d-1 bits)     makes links with managers of
 1 set with (d-2)-bit ID (top d-2 bits)
                                                 .111      .1110   .11101
                                                 .001      .0010   .00101
                                                 .010      .0100   .01001
                                                           .0111   .01111

                                          Level [log2 n] - 1
                                          Level [log2 n]
                                          Level [log2 n] + 1

     EVERY node makes a set of links assuming ([log2 n] – 1)-bit ID
   ROUTING: If source has a d-bit ID, routing starts off at level (d-2),
   moving down one level at an intermediate node, if required.
Physical-Network Proximity [MG04*]


The Problem:
  All links in the “Routing Network”                   0.111
     have high latency (ping-times)!
  How do we make these links low-latency?

Existing solutions [GGG+03] [ZGG03]
 tied to specific networks (Chord/hypercube)


A Generic Solution                         [MG04*]                  “Circle”
 A “generic scheme to incorporate network proximity”                    +
 into any deterministic/randomized families of ICNs.
                                                                “Routing network”
Symphony:                [MBR03]
A Randomized DHT Routing Network

                  Each host makes k links
                  Clockwise distance of a link chosen from
                  a probability distribution function:
                                 p(x) = 1/(x log N)

                       1               1              1

                      1/ N
                             p( x)dx  ( x log N ) dx  1
                                      1/ N

           With k=1, GREEDY routing requires O(log2 n) hops
                               Greedy Routing
Chord (log N links per node)       R-Chord (with log N links per node) [ZGG03,GGG+03]
- Clockwise   distance N/2         - Clockwise distance uniformly at random from [N/2, N)
- Clockwise   distance N/4         - Clockwise distance uniformly at random from [N/4, N/2)
- Clockwise   distance N/8         - Clockwise distance uniformly at random from [N/8, N/4)
  ...                                and so on …
- Clockwise   distance 1


                           “Choose the link that makes ‘most progress’
                            towards the destination”

                       GREEDY with 1-LOOKAHEAD

                           “GREEDY augmented by taking IDs of
                            neighbor’s neighbors into account”

                                    How good are these?
   The Story that Emerges [MNW04]
 In topologies with n nodes and log n links per node ...

A. Deterministic topologies (Hypercube/Chord) have diameter θ(log n)
  GREEDY routing is optimal. 1-LOOKAHEAD is no help.

B. Randomization reduces the diameter to θ(log n / log log n) in expectation
   R-Hypercube/R-Chord/Symphony have diameter θ(log n / log log n)

C. GREEDY is unable to discover short routes.
  Path lengths are θ(log n) steps on average.

D. GREEDY with 1-LOOKAHEAD is able to discover short routes.
  Path lengths reduce to θ(log n / log log n) steps on average.

E. GREEDY with 1-LOOKAHEAD is competitive with de Bruijn networks
  Average path lengths are within 10% of each other.

                              More results in [M03][GM03][AMM04*]
          Papers & Manuscripts
[BMR03] “SETS: Search Enhanced by Topic Segmentation”
        by Bawa, Manku and Raghavan, SIGIR 2003.

[MBR03] “Symphony: Distributed Hashing in a Small-World”
        by Manku, Bawa & Raghavan, USITS 2003.

[M03]    “Routing Networks for Distributed Hash Tables”
         by Manku, PODC 2003.

[GM04]   “Optimal Routing in Chord”
         by Ganesan and Manku, SODA 2004

[MNW04] “Know thy Neighbor’s Neighbor: The Power of Lookahead in Randomized P2P Routing”
        by Manku, Naor & Weider, STOC 2004.

[M04]    “Balanced Binary Trees for ID Management in Distributed Hash Tables”
         by Manku, PODC 2004.

               [AMM04*] “The Degree-Diameter Greedy Routing Problem”
                       by Abraham, Malkhi & Manku, submitted to DISC 2004.

               [KM04*] “Structured Coupon Collection over Cliques”
                        by Kenthapadi & Manku, manuscript.

               [MG04*] “Low-Latency DHTs based on Arbitrary Geometries”
                       by Manku & Ganesan, manuscript.


To top