Distributed Hash Tables An Overview

Document Sample
Distributed Hash Tables  An Overview Powered By Docstoc
					Distributed Hash Tables:
                An Overview

    Ashwin Bharambe
    Carnegie Mellon University
Definition of a DHT

n   Hash table è supports two operations
    q   insert(key, value)
    q   value = lookup(key)
n   Distributed
    q   Map hash-buckets to nodes
n   Requirements
    q   Uniform distribution of buckets
    q   Cost of insert and lookup should scale well
    q   Amount of local state (routing table size) should
        scale well
   Fundamental Design Idea - I
   n    Consistent Hashing
        q    Map keys and nodes to an identifier space; implicit
             assignment of responsibility

                       A                 B       C           D
Identifiers
  0000000000                Key                                    1111111111



    n       Mapping performed using hash functions (e.g.,
            SHA-1)
            q   Spread nodes and keys uniformly throughout
Fundamental Design Idea - II

n   Prefix / Hypercube routing

       Source




                Zo om In
                                 Destination
But, there are so many of them!

n   DHTs are hot!
n   Scalability trade-offs
    q   Routing table size at each node vs.
    q   Cost of lookup and insert operations
n   Simplicity
    q   Routing operations
    q   Join-leave mechanisms
n   Robustness
Talk Outline

n   DHT Designs
    q   Plaxton Trees, Pastry/Tapestry
    q   Chord
    q   Overview: CAN, Symphony, Koorde, Viceroy, etc.
    q   SkipNet
n   DHT Applications
    q   File systems, Multicast, Databases, etc.
n   Conclusions / New Directions
Plaxton Trees [Plaxton, Rajaraman, Richa]

n   Motivation
    q   Access nearby copies of replicated objects
    q   Time-space trade-off
        n   Space = Routing table size
        n   Time = Access hops
Plaxton Trees
   Algorithm
    1. Assign labels to objects and nodes
       - using randomizing hash functions
      9   A   E    4                2   4    7   B

          Object                        Node

           Each label is of log2b n digits
Plaxton Trees
   Algorithm
   2. Each node knows about other nodes with varying
      prefix matches     1
                                  2       4       7       B
                                                                 Prefix match of length 0
        2    4   7   B            3

         Node                 2       3
                              2       4       7       B       Prefix match of length 1
                              2       5

         2   4   7   A            2       4       6
         2   4   7   B            2       4       7       B     Prefix match of length 2
          2 4 7 C                 2       4       8
   Prefix match of length 3
    Plaxton Trees
      Object Insertion and Lookup
        Given an object, route successively towards nodes
        with greater prefix matches


2   4   7   B                                               9   A   E    4
                            9   A   7   6
     Node                                                       Object



            9   F   1   0                   9   A   E   2


        Store the object at each of these locations
    Plaxton Trees
      Object Insertion and Lookup
        Given an object, route successively towards nodes
        with greater prefix matches


2   4   7   B                                               9   A   E    4
                            9   A   7   6
     Node                                                       Object

                log(n) steps to insert or locate object
            9   F   1   0                   9   A   E   2


        Store the object at each of these locations
Plaxton Trees
  Why is it a tree?
                            Object
                                     9AE2

                  Object
                           9A76

         Object
                  9F10


Object
         247B
Plaxton Trees
  Network Proximity
n   Overlay tree hops could be totally unrelated
    to the underlying network hops
                               Europe
             USA
                                East Asia


n   Plaxton trees guarantee constant factor
    approximation!
    q   Only when the topology is uniform in some sense
Pastry

n   Based directly upon Plaxton Trees
n   Exports a DHT interface
n   Stores an object only at a node whose ID is
    closest to the object ID
n   In addition to main routing table
    q   Maintains leaf set of nodes
    q   Closest L nodes (in ID space)
        n   L = 2(b + 1) ,typically -- one digit to left and right
Pastry
                                   Only at the root!
                   Object
                            9AE2


                  9A76



           9F10



    247B

                  Key Insertion and Lookup = Routing to Root
                            è Takes O(log n) steps
Pastry
 Self Organization
n   Node join
    q   Start with a node “close” to the joining node
    q   Route a message to nodeID of new node
    q   Take union of routing tables of the nodes on the
        path
n   Joining cost: O(log n)
n   Node leave
    q   Update routing table
        n   Query nearby members in the routing table
    q   Update leaf set
Chord [Karger, et al]
n   Map nodes and keys to identifiers
    q   Using randomizing hash functions
n   Arrange them on a circle
                                      succ(x)
                                           010111110


                         Identifier          x 010110110
                           Circle
                                           pred(x)
                                                010110000
Chord
 Efficient routing
n   Routing table
    q   ith entry = succ(n + 2i)
    q   log(n) finger pointers


                                      Identifier
                                        Circle

               Exponentially spaced
               pointers!
Chord
 Key Insertion and Lookup
      To insert or lookup a key ‘x’,
       route to succ(x)

                                   succ(x)

                                        x



   source




            O(log n) hops for routing
Chord
 Self-organization
n   Node join
    q   Set up finger i: route to succ(n + 2i)
    q   log(n) fingers ) O(log2 n) cost
n   Node leave
    q   Maintain successor list for ring connectivity
    q   Update successor list and finger pointers
CAN [Ratnasamy, et al]

n   Map nodes and keys to coordinates in a multi-
    dimensional cartesian space

                                                           Zone
                    source




                                                          key

                Routing through shortest Euclidean path

       For d dimensions, routing takes O(dn1/d) hops
Symphony [Manku, et al]

n   Similar to Chord – mapping of nodes, keys
    q   ‘k’ links are constructed probabilistically!

            This link chosen with probability P(x) = 1/(x ln n)




                           x

        Expected routing guarantee: O(1/k (log2 n)) hops
SkipNet [Harvey, et al]

n   Previous designs distribute data uniformly
    throughout the system
    q   Good for load balancing
    q   But, my data can be stored in Timbuktu!
    q   Many organizations want stricter control over data
        placement
    q   What about the routing path?
        n   Should a Microsoft à Microsoft end-to-end path pass
            through Sun?
SkipNet
  Content and Path Locality
                Basic Idea: Probabilistic skip lists
Height




                                  Nodes

  n      Each node choose a height at random
         q   Choose height ‘h’ with probability 1/2h
SkipNet
  Content and Path Locality
Height




                                                  Nodes                                 u
                                                                                       d
                                                                                   y.e
                          e   du            edu                                ele
                       u.                u.                               e rk
                    .cm               .cm                             .b
                  1                 2                              e1
             h ine             h ine                            hin
          ac               ac                              a  c
         m                m                               m
                          Still O(log n) routing guarantee!
               nNodes              are lexicographically sorted
Summary (Ah, at last!)

                  # Links per node   Routing hops
Pastry/Tapestry     O(2b log2b n)       O(log2b n)
Chord                   log n            O(log n)
CAN                       d                dn1/d
SkipNet               O(log n)           O(log n)
Symphony                  k           O((1/k) log2 n)
Koorde                    d                logd n
Viceroy                   7              O(log n)
                       Optimal (= lower bound)
What can DHTs do for us?

n   Distributed object lookup
    q   Based on object ID
n   De-centralized file systems
    q   CFS, PAST, Ivy
n   Application Layer Multicast
    q   Scribe, Bayeux, Splitstream
n   Databases
    q   PIER
De-centralized file systems

n   CFS [Chord]
    q   Block based read-only storage
n   PAST [Pastry]
    q   File based read-only storage
n   Ivy [Chord]
    q   Block based read-write storage
PAST

n   Store file
    q   Insert (filename, file) into Pastry
    q   Replicate file at the leaf-set nodes
n   Cache if there is empty space at a node
CFS

n   Blocks are inserted into Chord DHT
    q   insert(blockID, block)
    q   Replicated at successor list nodes
n   Read root block through public key of file
    system
n   Lookup other blocks from the DHT
    q   Interpret them to be the file system
n   Cache on lookup path
  CFS

                          H(D)                    H(F)
public key
                                   D
                                                          File Block

                                 Directory


             signature
                                  Block
                                                             F
             Root Block                           H(B1)
                                                                  H(B2)

                                             B1                B2
                                        Data Block          Data Block
CFS vs. PAST

n   Block-based vs. File-based
    q   Insertion, lookup and replication
n   CFS has better performance for small
    popular files
    q   Performance comparable to FTP for larger files
n   PAST is susceptible to storage imbalances
    q   Plaxton trees can provide it network locality
 Ivy

 n      Each user maintains a log of updates
 n      To construct file system, scan logs of all users


Alice    Log head
                        create       write
                                                  delete




Bob                                                 link
         Log head       delete
                                    ex-create

                                                           write
Ivy

n   Starting from log head – stupid
    q   Make periodic snapshots
n   Conflicts will arise
    q   For resolution, use any tactics (e.g., Coda’s)
Application Layer Multicast

n   Embed multicast tree(s) over the DHT graph
n   Multiple source; multiple groups
    q   Scribe
    q   CAN-based multicast
    q   Bayeux
n   Single source; multiple trees
    q   Splitstream
Scribe

         Underlying Pastry DHT




                             New member
Scribe
 Tree construction
                      Underlying Pastry DHT




            groupID
                                          New member

        Rendezvous point
                                          Route towards
                                          multicast groupID
Scribe
 Tree construction
                     Underlying Pastry DHT




           groupID
                                         New member



                                         Route towards
                                         multicast groupID
Scribe
  Discussion
n   Very scalable
    q   Inherits scalability from the DHT
n   Anycast is a simple extension
n   How good is the multicast tree?
    q   As compared to native IP multicast
    q   Comparison to Narada
n   Node heterogeneity not considered
SplitStream

n   Single source, high bandwidth multicast
n   Idea
    q   Use multiple trees instead of one
    q   Make them internal-node-disjoint
        n   Every node is an internal node in only one tree
    q   Satisfies bandwidth constraints
    q   Robust
n   Use cute Pastry prefix-routing properties to
    construct node-disjoint trees
Databases, Service Discovery



       SOME    OTHER    TIME!
Where are we now?

n   Many DHTs offering efficient and relatively
    robust routing
n   Unanswered questions
    q   Node heterogeneity
    q   Network-efficient overlays   vs.   Structured
        overlays
        n   Conflict of interest!
    q   What happens with high user churn rate?
    q   Security
Are DHTs a panacea?
n   Useful primitive
n   Tension between network efficient
    construction and uniform key-value
    distribution
n   Does every non-distributed application use
    only hash tables?
    q   Many rich data structures which cannot be built on
        top of hash tables alone
    q   Exact match lookups are not enough
    q   Does any P2P file-sharing system use a DHT?

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:7/27/2013
language:English
pages:43
Jun Wang Jun Wang Dr
About Some of Those documents come from internet for research purpose,if you have the copyrights of one of them,tell me by mail vixychina@gmail.com.Thank you!