Docstoc

distributed hash tables

Document Sample
distributed hash tables Powered By Docstoc
					         L3S Research Center, University of Hannover




  Dynamics of Distributed Hash Tables

                 Wolf-
                 Wolf-Tilo Balke and Wolf Siberski
                            21.11.2007



*Original slides provided by S. Rieche, H. Niedermayer, S. Götz, K. Wehrle (University of Tübingen)
 and A. Datta, K. Aberer (EPFL Lausanne)



Peer-to-Peer Systems and Applications, Springer LNCS 3485                                             1




Overview


1. CHORD (again)


2.
2 New node joins: Stabilization


3. Reliability in Distributed Hash Tables


4. Storage Load Balancing in Distributed Hash Tables


5. The P-GRID Approach



                                            Distributed Hash Tables
L3S Research Center                                                                                   2
Distributed Hash Tables

  Distributed Hash Tables (DHTs)
     Also known as structured Peer-to-Peer systems
     Efficient, scalable, and self-organizing algorithms
     For data retrieval and management

  Chord (Stoica et al., 2001)
     Scalable Peer-to-peer Lookup Service for Internet Applications
     Nodes and data are mapped with
     hash function on a Chord ring
     Routing
         Routing (“Finger”) - Tables
         O (log N)



                                 Distributed Hash Tables
L3S Research Center                                                   3




CHORD

  A consistent hash function assigns each node and
  each key an m-bit identifier using SHA 1 (Secure Hash
  Standard).
                b big       h to  k    lli i    improbable
     m = any number bi enough t make collisions i    b bl
     Key identifier = SHA-1(key)
     Node identifier = SHA-1(IP address)

  Both are uniformly distributed
  Both exist in the same ID space




                                 Distributed Hash Tables
L3S Research Center                                                   4
CHORD



  Identifiers are arranged on a
  closed identifier circle
  (e.g. modulo 26)
  (       d l


 => Chord Ring




                             Distributed Hash Tables
L3S Research Center                                    5




CHORD


   A key k is assigned to the
   node whose identifier is equal
         g                  y
   to or greater than the key‘s
   identifier
   This node is called
   successor(k) and is the first
   node clockwise from k




                             Distributed Hash Tables
L3S Research Center                                    6
CHORD


 // ask node n to find the successor of id
 n.find_successor(id)
  if (id (n; successor])
    return successor;
  else
    // forward the query around the
       circle
   return
 successor.find_successor(id);

  => Number of messages linear in the
  number of nodes !




                               Distributed Hash Tables
L3S Research Center                                            7




CHORD



    Additional routing information to accelerate lookups
    Each node n contains a routing table with up to m
    entries (m: number of bits of the identifiers)
    => finger table
    ith entry in the table at node n contains the first node
    s that succeeds n by at least 2
    s = successor (n + 2i-1)
    s is called the ith finger of node n




                               Distributed Hash Tables
L3S Research Center                                            8
CHORD




 Finger table:
 finger[i] :=
     successor (n + 2 i)




                                Distributed Hash Tables
L3S Research Center                                        9




CHORD




   Search in finger table for
   the nodes which most
   immediatly precedes id
   Invoke find_successor
   from that node



=> Number of messages O(log N)




                                Distributed Hash Tables
L3S Research Center                                       10
CHORD


    Important characteristics of this scheme:
        Each node stores information about only a small number of
        nodes (m)
        Each nodes knows more about nodes closely following it than
        about nodes farther away
        A finger table generally does not contain enough information
        to directly determine the successor of an arbitrary key k




                              Distributed Hash Tables
L3S Research Center                                                    11




Overview


1. CHORD (again)


2.
2 New node joins: Stabilization


3. Reliability in Distributed Hash Tables


4. Storage Load Balancing in Distributed Hash Tables


5. The P-GRID Approach



                              Distributed Hash Tables
L3S Research Center                                                    12
Volatility

  Stable CHORD networks can always rely on their finger
  tables for routing


  But what happens, if the network is volatile (churn)?


  Lecture 3: Nodes can fail, depart or join
      Failure is handled by ‘soft-state‘ approach with periodical republish
      by nodes
                                                 (unpublish) or simply
      Departure can be handled by clean shutdown (         )
      as failure
      Joining needs stabilization of the CHORD ring



                                Distributed Hash Tables
 L3S Research Center                                                          13




CHORD Stabilization




                                  ?
         (1)                        (2)                      (3)



                                Distributed Hash Tables
 L3S Research Center                                                          14
CHORD Stabilization



    Stabilization protocol for a node x:


    x.stabilize():
        ask successor y for its predecessor p
        if p ∈ (x; y] then p is x‘s new successor
    x.notify():
        notify x‘s successor p of x‘s existence
        notified node may change predecessor to x




                               Distributed Hash Tables
L3S Research Center                                                 15




CHORD Stabilization




                               • N26 joins the system

                               • N26 aquires N32 as its successor

                               • N26 notifies N32

                               • N32 aquires N26 as its predecessor




                               Distributed Hash Tables
L3S Research Center                                                 16
CHORD Stabilization




                        • N26 copies keys

                        • N21 runs stabilize() and asks its
                        successor N32 for its predecessor
                        which is N26.




                      Distributed Hash Tables
L3S Research Center                                           17




CHORD Stabilization




                        • N21 aquires N26 as its successor

                        • N21 notifies N26 of its existence

                        • N26 aquires N21 as predecessor




                      Distributed Hash Tables
L3S Research Center                                           18
CHORD Stabilization




        (1)                 (2)                   (3)



                        Distributed Hash Tables
L3S Research Center                                     19




Overview


 1. CHORD (again)


 2.
 2 New node joins: Stabilization


 3. Reliability in Distributed Hash Tables


 4. Storage Load Balancing in Distributed Hash Tables


 5. The P-GRID Approach



                        Distributed Hash Tables
L3S Research Center                                     20
“Stabilize” Function

  Stabilize Function to correct inconsistent connections
  Remember:
      Periodically done by each node n
      n asks its successor for its predecessor p
      n checks if p equals n
      n also periodically refreshes random finger x
          by (re)locating successor

  Successor-List to find new successor
                                                     successor-list
      If successor is not reachable use next node in successor list
      Start stabilize function
  But what happens to data in case of node failure?


                                  Distributed Hash Tables
 L3S Research Center                                                  21




Reliability of Data in Chord

  Original
      No Reliability of data
  Recommendation
      Use of Successor-List
      The reliability of data is an application task
      Replicate inserted data to the next f other nodes
      Chord informs application of arriving or failing nodes




       …                                                       …
       …                                                       …
                                  Distributed Hash Tables
 L3S Research Center                                                  22
Properties

  Advantages
      After failure of a node its successor has the data already stored
  Disadvantages
      Node stores f intervals
            More data load
      After breakdown of a node
          Find new successor
          Replicate data to next node
                        g
              More message overhead at breakdown
      Stabilize-function has to check every Successor-list
          Find inconsistent links
              More message overhead


                                    Distributed Hash Tables
 L3S Research Center                                                      23




Multiple Nodes in One Interval

  Fixed positive number f
      Indicates how many nodes have to act within one interval at least
  Procedure
      First node takes a random position
      A new node is assigned to any existing node
      Node is announced to all other nodes in same interval


                        1               4             6       9
         …              2               5             7       10   …
         …              3                             8            …
         Node

                                    Distributed Hash Tables
 L3S Research Center                                                      24
Multiple Nodes in One Interval

  Effects of algorithm
      Reliability of data
      Better load balancing
      Higher security




                        1            4              6          9
         …              2            5              7          10     …
         …              3                           8                 …
         Node

                                  Distributed Hash Tables
 L3S Research Center                                                          25




Reliability of Data

  Insertion
      Copy of documents
          Always necessary for replication
      Less additional expenses
          Nodes have only to store pointers to nodes from the same interval
      Nodes store only data of one interval




        …                                                            …
        …                                                            …

                                  Distributed Hash Tables
 L3S Research Center                                                          26
Reliability of Data

  Reliability
      Failure: no copy of data needed
          Data are already stored within same interval
                                             fingers
      Use stabilization procedure to correct f
          As in original Chord




                        1                4              6       9
         …              2                5              7       10   …
         …              3                               8            …
         Node

                                      Distributed Hash Tables
 L3S Research Center                                                     27




Properties

  Advantages
      Failure: no copy of data needed
      Rebuild intervals with neighbors only if critical
      Requests can be answered by f different nodes


  Disadvantages
      Less number of intervals as in original Chord
          Solution: Virtual Servers




                                      Distributed Hash Tables
 L3S Research Center                                                     28
Fault Tolerance: Replication vs. Redundancy

  Replication
      Each data item is replicated K times
      K replicas are stored on different nodes
  Redundancy
      Each data item is split into M fragments
          K redundant fragments are computed
              Use of an "erasure-code“ (see e.g. V. Pless: Introduction to the
              Theory of Error-Correcting Codes. Wiley-Interscience, 1998)
          Any M fragments allow to reconstruct the original data
      For each fragment we compute its key
          M + K different fragments have different keys




                                  Distributed Hash Tables
 L3S Research Center                                                             29




Overview


 1. CHORD (again)


 2.
 2 New node joins: Stabilization


 3. Reliability in Distributed Hash Tables


 4. Storage Load Balancing in Distributed Hash Tables


 5. The P-GRID Approach



                                  Distributed Hash Tables
 L3S Research Center                                                             30
Storage Load Balancing in Distributed Hash Tables

  Suitable hash function (easy to compute, few collisions)

  Standard assumption 1: uniform key distribution
      Every node with equal l d
      E       d   ith     l load
      No load balancing is needed
  Standard assumption 2: equal distribution
      Nodes across address space
      Data across nodes

  But is this assumption justifiable?
      Analysis of distribution of data using simulation



                                Distributed Hash Tables
 L3S Research Center                                                                 31




Storage Load Balancing in Distributed Hash Tables

 Analysis of distribution                                  Optimal distribution of
 of data                                                  documents across nodes

 Example
     Parameters
         4,096 nodes
         500,000 documents
     Optimum
         ~122 documents per node



   No optimal distribution in Chord without load balancing



                                Distributed Hash Tables
 L3S Research Center                                                                 32
Storage Load Balancing in Distributed Hash Tables

   Number of nodes without
   storing any document
       Parameters
          4,096
          4 096 nodes
          100,000 to 1,000,000
          documents


       Some nodes without any load


   Why is the load unbalanced?
   We need load balancing to keep the complexity
   of DHT management low


                                  Distributed Hash Tables
 L3S Research Center                                                                  33




Definitions

  Definitions
      System with N nodes
      The load is optimally balanced,
          Load of each node is around 1/N of the total load.


      A node is overloaded (heavy)
          Node has a significantly higher load compared to the optimal distribution
          of load.
      Else the node is light




                                  Distributed Hash Tables
 L3S Research Center                                                                  34
Load Balancing Algorithms

  Problem
      Significant difference in the load of nodes

  Several techniques to ensure an equal data distribution
      Power of Two Choices (Byers et. al, 2003)
      Virtual Servers (Rao et. al, 2003)
      Thermal-Dissipation-based Approach (Rieche et. al, 2004)
      A Simple Address-Space and Item Balancing (Karger et. al, 2004)
      …




                                Distributed Hash Tables
 L3S Research Center                                                    35




Outline

  Algorithms
      Power of Two Choices (Byers et. al, 2003)
      Virtual Servers (Rao et. al, 2003)




  John Byers, Jeffrey Considine, and Michael Mitzen-
  macher: “Simple Load Balancing for Distributed Hash
  Tables“ in Second International Workshop on Peer-to-
  Peer Systems (IPTPS), Berkeley, CA, USA, 2003.




                                Distributed Hash Tables
 L3S Research Center                                                    36
Power of Two Choices

  Idea
     One hash function for all nodes
         h0
     Multiple hash functions for data
         h1, h2, h3, …hd


  Two options
     Data is stored at one node
     Data is stored at one node &
      th      d    t        i t
     other nodes store a pointer




                                   Distributed Hash Tables
L3S Research Center                                                37




Power of Two Choices

  Inserting Data
     Results of all hash functions are calculated
         h1(x), h2(x), h3(x), …hd(x)
     Data is stored on the retrieved node with the lowest load
     Alternative
         Other nodes stores pointer
     The owner of a data has to insert the document periodically
         Prevent removal of data after a timeout (soft state)




                                   Distributed Hash Tables
L3S Research Center                                                38
Power of Two Choices (cont'd)

  Retrieving
     Without pointers
         Results of all hash functions are calculated
         Request all of th possible nodes i parallel
         R     t ll f the      ibl    d in      ll l
         One node will answer
     With pointers
         Request only one of the possible nodes.
         Node can forward the request directly to the final node




                                  Distributed Hash Tables
L3S Research Center                                                39




Power of Two Choices (cont'd)

  Advantages
     Simple


  Disadvantages
     Message overhead at inserting data
     With pointers
         Additional administration of pointers
              More load
     Without pointers
         Message overhead at every search




                                  Distributed Hash Tables
L3S Research Center                                                40
Outline

  Algorithms
      Power of Two Choices (Byers et. al, 2003)
      Virtual Servers (Rao et. al, 2003)




  Ananth Rao, Karthik Lakshminarayanan, Sonesh
  Surana, Richard Karp, and Ion Stoica “Load Balancing
                     y
  in Structured P2P Systems” in Second International
  Workshop on Peer-to-Peer Systems (IPTPS), Berkeley,
  CA, USA, 2003.



                               Distributed Hash Tables
 L3S Research Center                                                     41




Virtual Server

  Each node is responsible for several intervals
      "Virtual server"
  Example
      Chord
                                                          Node A




     Node C              Chord Ring
                                                         Node B




                                                                   [Rao 2003]

                               Distributed Hash Tables
 L3S Research Center                                                     42
Rules

   Rules for transferring a virtual server
        From heavy node to light node


   1. The transfer of an virtual server makes the receiving node not heavy
   2. The virtual server is the lightest virtual server that makes the heavy
      node light
   3. If there is no virtual server whose transfer can make a node light, the
      heaviest virtual server from this node would be transferred




                                Distributed Hash Tables
 L3S Research Center                                                           43




Virtual Server

  Each node is responsible for several intervals
      log (n) virtual servers
  Load balancing
      Different possibilities to change servers
          One-to-one
          One-to-many
          Many-to-many
      Copy of an interval is like removing
      and inserting a node in a DHT
                                                          Chord Ring




                                Distributed Hash Tables
 L3S Research Center                                                           44
Scheme 1: One-to-One

  One-to-One
      Light node picks a random ID
      Contacts the node x responsible for it
      Accepts load if x is heavy


              H                                  L
                              L                                 H

                                                 L
             L                                            L
                              L
                              H
                                                                         [Rao 2003]

                               Distributed Hash Tables
L3S Research Center                                                            45




Scheme 2: One-to-Many

  One-to-Many
      Light nodes report their load information to directories
      Heavy node H gets this information by contacting a directory
                       g                         p
      H contacts the light node which can accept the excess load


     L1                                D1                           H1
                  L2
                                                           H3
     L3           L5

                  L4                   D2                           H2

     Light nodes                   Directories           Heavy nodes
                                                                         [Rao 2003]

                               Distributed Hash Tables
L3S Research Center                                                            46
Scheme 3: Many-to-Many

   Many-to-Many
       Many heavy and light nodes rendezvous at each step
       Directories periodically compute the transfer schedule and report it
       b k t the nodes, which th d th actual t
       back to th    d      hi h then do the t l transferf

      L1                                  D1                        H1
                    L2
                                                              H3
      L3            L5

                    L4
                     4
                                          D2                         H2

      Light nodes                   Directories              Heavy nodes
                                                                           [Rao 2003]

                                   Distributed Hash Tables
 L3S Research Center                                                             47




Virtual Server

  Advantages
      Easy shifting of load
           Whole Virtual Servers are shifted


  Disadvantages
      Increased administrative and messages overhead
           Maintenance of all Finger-Tables
      Much load is shifted




                                                                           [Rao 2003]

                                   Distributed Hash Tables
 L3S Research Center                                                             48
Simulation

     Scenario
         4,096 nodes (comparison with other measurements)
         100,000 to 1,000,000 documents
     Chord
         m= 22 bits.
         Consequently, 222 = 4,194,304 nodes and documents
     Hash function
         sha-1 (mod 2m)
            d
         random
     Analysis
         Up to 25 runs per test


                                  Distributed Hash Tables
    L3S Research Center                                               49




Results

     Without load balancing                    Power of Two Choices




+ Simple                                   + Simple
+ Original                                 + Lower load
–    Bad load balancing                    –   Nodes w/o load



                                  Distributed Hash Tables
    L3S Research Center                                               50
Results (cont'd)

             Virtual server




         + No nodes w/o load
         – Higher max. load than
           Power of Two Choices

                           Distributed Hash Tables
 L3S Research Center                                    51




Overview


 1. CHORD (again)


 2.
 2 New node joins: Stabilization


 3. Reliability in Distributed Hash Tables


 4. Storage Load Balancing in Distributed Hash Tables


 5. The P-GRID Approach



                           Distributed Hash Tables
 L3S Research Center                                    52
        L3S Research Center, University of Hannover



                                         1
•     Toy example:                   3               8
                                         2
      Distributing               5
                                             4
                                                     7
      skewed load                        6




                   0                                                             1
                                         Key-space




                             Load-distribution


Peer-to-Peer Systems and Applications, Springer LNCS 3485                                53




        L3S Research Center, University of Hannover

                                                         1                           2
                                                 3               8
    • A globally coordi-                                     4                       6
                                             5
      nated recursive                                            7

      bisection approach
    • Key-space can be
      divided in two
      partitions                         0                                               1
        Assign peers propor-
        tional t the load i
        ti   l to th l d in
        the two sub-partitions



                                                             Load-distribution

Peer-to-Peer Systems and Applications, Springer LNCS 3485                                54
       L3S Research Center, University of Hannover

                                                                      2
                                                                      6
   • Recursively
     repeat the                         1            8

     process to                         7        5       4

     repartition the                                 3

     sub-partitions

                                    0                                     1




                                                  Load-distribution


Peer-to-Peer Systems and Applications, Springer LNCS 3485                 55




       L3S Research Center, University of Hannover


• Partitioning of the                                                 2

  key-space s.t. there                                                6
  is equal load in                      1
  each partition                        7
       Uniform replication of
       the partitions                            4           3
       Important for fault-                      5           8
       tolerance

• Note: A novel and                 0                                     1
          load-
  general load
  balancing problem


                                                  Load-distribution


Peer-to-Peer Systems and Applications, Springer LNCS 3485                 56
       L3S Research Center, University of Hannover


  Lessons from the globally coordinated algorithm


    Achieves an approximate l d b l
  • A hi              i t load-balance
  • The intermediate partitions may be such that
    they can not be perfectly repartitioned
         There’s a fundamental limitation with any bisection based
         approach, as well as for any fixed key-space partitioned
         overlay network
  • Nonetheless practical
         For realistic load-skews and peer populations




Peer-to-Peer Systems and Applications, Springer LNCS 3485                                                   57




       L3S Research Center, University of Hannover

Distributed proportional partitioning for
overlay construction                                                   1         Random
                                                                                interaction
                                                                                                      3

      A mechanism to meet other random peers                           *                              *
      A parameter p for partitioning the space
• Proportional partitioning: Peers                          000,010,100                         101,001

  partition proportional to the load                                        partitioning

  distribution
                                                                            0                   1
      In a ratio p:1-p                                                 1                              3
      Let’s call the sub-partitions as 0 and 1
• Referential integrity: Obtain reference                          1: 3                             0: 1

  to the other partition                                    000,010,001                         101,100
      Needed to enable overlay routing
• Sorting the load/keys: Peers exchange                          pid                             Legend
  locally stored keys in order to store
  only keys for its own partition                           Routing table

                                                               Keys          (only part of the prefix is shown)




Peer-to-Peer Systems and Applications, Springer LNCS 3485                                                   58
P-Grid

  Scalable Distributed Search Tries (prefix tree)


                                 0
                                                           1

 index

                  0         1                             0        1



                 peer 1              peer 2               peer 3       peer 4



     Peer 1 stores all data
     with prefixes 000 or 001
                                Distributed Hash Tables
L3S Research Center                                                         59




P-Grid

  A single peer should not hold the entire index
  Distribute index disjoint
  over peers




                                Distributed Hash Tables
L3S Research Center                                                         60
P-Grid

  Scalable Distribution of Index




                        Distributed Hash Tables
L3S Research Center                                        61




P-Grid

  Routing information at each peer is only logarithmic
  (height of trie)




                                                  peer 4




                                     peer 3
                        Distributed Hash Tables
L3S Research Center                                        62
P-Grid

  Prefix Routing




                               Distributed Hash Tables
L3S Research Center                                                      63




P-Grid

  Two basic approaches for new nodes to join
  Splitting approach (P-Grid)
     peers meet (randomly) and decide whether to extend search tree by
     splitting the data space
     peers can perform load balancing considering their storage load
     networks with different origins can merge, like Gnutella, FreeNet
     (loose coupling)
  Node insertion approach (Plaxton, OceanStore, …)
                            leaf position
     peers determine their "leaf position" based on their IP address
     nodes route from a gateway node to their node-id to populate the
     routing table
     network has to start from single origin (strong coupling)

                               Distributed Hash Tables
L3S Research Center                                                      64
P-Grid

  Load balance effect
     Algorithm converges quickly
     Peers have similar load
     E.g. leaf load in case of 25 = 32 possible prefixes of length 5:




                                Distributed Hash Tables
L3S Research Center                                                     65




P-Grid

  Replication of data items and routing table entries is
  used to increase failure resilience




                                Distributed Hash Tables
L3S Research Center                                                     66

				
DOCUMENT INFO
Categories:
Tags:
Stats:
views:2
posted:10/3/2012
language:English
pages:33