Assignments

Document Sample
Assignments Powered By Docstoc
					T-110.5140 Network Application
Frameworks and XML
Distributed Hash Tables (DHTs)
16.2.2009
Sasu Tarkoma
Contents

n   Motivation
n   Terminology
n   Overlays
    u   Clusters and wide-area
n   Applications for DHTs
    u   Internet Indirection Infrastructure (i3)
    u   Delegation Oriented Architecture (DOA)
n   Next week
    u   Multi-homing, mobility, and the Host Identity
        Protocol
Motivation for Overlays
n   Directories are needed
    u   Name resolution & lookup
    u   Mobility support
         F   Fast updates
n   Required properties
    u   Fast updates
    u   Scalability
n   DNS has limitations
    u   Update latency
    u   Administration
n   One solution
    u   Overlay networks
n   Alternative: DynDNS
    u   Still centralized management
    u   Inflexible, reverse DNS?
Terminology
n   Peer-to-peer (P2P)
    u   Different from client-server model
    u   Each peer has both client/server features
n   Overlay networks
    u   Routing systems that run on top of another
        network, such as the Internet.
n   Distributed Hash Tables (DHT)
    u   An algorithm for creating efficient distributed
        hash tables (lookup structures)
    u   Used to implement overlay networks
n   Typical features of P2P / overlays
    u   Scalability, resilience, high availability, and
        they tolerate frequent peer connections and
        disconnections
Peer-to-peer in more detail

n   A P2P system is distributed
    u   No centralized control
    u   Nodes are symmetric in functionality
n   Large faction of nodes are unreliable
    u   Nodes come and go
n   P2P enabled by evolution in data
    communications and technology
n   Current challenges:
    u   Security (zombie networks,trojans), IPR
        issues
n   P2P systems are decentralized overlays
Evolution of P2P systems
n   Started from centralized servers
    u   Napster
         F   Centralized directory
         F   Single point of failure
n   Second generation used flooding
    u   Gnutella
         F   Local directory for each peer
         F   High cost, worst-case O(N) messages for
             lookup
n   Research systems use DHTs
    u   Chord, Tapestry, CAN, ..
    u   Decentralization, scalability
Challenges

n   Challenges in the wide-area
    u   Scalability
    u   Increasing number of users, requests, files,
        traffic
    u   Resilience: more components -> more
        failures
    u   Management: intermittent resource
        availability -> complex management
Overlay Networks

n   Origin in Peer-to-Peer (P2P)
n   Builds upon Distributed Hash Tables
    (DHTs)
n   Easy to deploy
    u   No changes to routers or TCP/IP stack
    u   Typically on application layer


n   Overlay properties
    u   Resilience
    u   Fault-tolerance
    u   Scalability
               DNS names, custom
Upper layers      identifiers

  Overlay      Overlay addresses

Congestion

                  IP addresses
End-to-end


  Routing
                 Routing paths
DHT interfaces
n   DHTs offer typically two functions
    u   put(key, value)
    u   get(key) --> value
    u   delete(key)
n   Supports wide range of applications
    u   Similar interface to UDP/IP
         F   Send(IP address, data)
         F   Receive(IP address) --> data
n   No restrictions are imposed on the
    semantics of values and keys
n   An arbitrary data blob can be hashed to
    a key
n   Key/value pairs are persistent and global
                     Distributed applications

put(key, value)           get(key)              value
                                                        DHT balances keys and
                                                          data across nodes
                  Distributed Hash Table (DHT)


        Node            Node         Node           Node
Some DHT applications

n   File sharing
n   Web caching
n   Censor-resistant data storage
n   Event notification
n   Naming systems
n   Query and indexing
n   Communication primitives
n   Backup storage
n   Web archive
Cluster vs. Wide-area

n   Clusters are
    u   single, secure, controlled, administrative
        domains
    u   engineered to avoid network partitions
    u   low-latency, high-throughput SANs
    u   predictable behaviour, controlled environment
n   Wide-area
    u   Heterogeneous networks
    u   Unpredictable delays and packet drops
    u   Multiple administrative domains
    u   Network partitions possible
Wide-area requirements

n   Easy deployment
n   Scalability to millions of nodes and
    billions of data items
n   Availability
    u   Copes with routine faults
n   Self-configuring, adaptive to network
    changes
n   Takes locality into account
Distributed Data Structures
(DSS)
n   DHTs are an example of DSS
n   DHT algorithms are available for clusters
    and wide-area environments
    u   They are different!
n   Cluster-based solutions
    u   Ninja
    u   LH* and variants
n   Wide-area solutions
    u   Chord, Tapestry, ..
    u   Flat DHTs, peers are equal
    u   Maintain a subset of peers in a routing table
Distributed Data Structures
(DSS)
n   Ninja project (UCB)
    u   New storage layer for cluster services
    u   Partition conventional data structure across
        nodes in a cluster
    u   Replicate partitions with replica groups in
        cluster
         F   Availability
    u   Sync replicas to disk (durability)
n   Other DSS for data / clusters
    u   LH* Linear Hashing for Distributed Files
    u   Redundant versions for high-availability
Cluster-based Distributed
Hash Tables (DHT)
n   Directory for non-hierarchical data
n   Several different ways to implement
n   A distributed hash table
    u   Each “brick” maintains a partial map
         F   “local” keys and values
    u   Overlay addresses used to direct to the right
        “brick”
         F   “remote” key to the brick using hashing
n   Resilience through parallel, unrelated
    mappings
 clients interact
     with anyclient       client          client        client
     service
   “front-end”                                           Service interacts
                                                          with DSS lib
                      service             service        Hash table API
                      DSS lib             DSS lib
                                                      Redundant, low
Brick = single-
                                                       latency, high
node, durable                      SAN                   throughput
  hash table,
                                                          network
  replicated
               storage          storage            storage
               “brick”          “brick”            “brick”

               storage          storage            storage
               “brick”          “brick”            “brick”
    Chord
n   Chord is an overlay algorithm from MIT
    u   Stoica et. al., SIGCOMM 2001
n   Chord is a lookup structure (a directory)
    u   Resembles binary search
n   Uses consistent hashing to map keys to
    nodes
    u   Keys are hashed to m-bit identifiers
    u   Nodes have m-bit identifiers
         F   IP-address is hashed
    u   SHA-1 is used as the baseline algorithm
n   Support for rapid joins and leaves
    u   Churn
    u   Maintains routing tables
    Chord routing I

n   A node has a well determined place
    within the ring
n   Identifiers are ordered on an identifier
    circle modulo 2m
     u   The Chord ring with m-bit identifiers
n   A node has a predecessor and a
    successor
n   A node stores the keys between its
    predecessor and itself
     u   The (key, value) is stored on the successor
         node of key
n   A routing table (finger table) keeps track
    of other nodes
    Finger Table
n   Each node maintains a routing table with
    at most m entries
n   The i:th entry of the table at node n
    contains the identity of the first node, s,
    that succeeds n by at least 2i-1 on the
    identifier circle
n   s = successor(n + 2i-1)
     u   The i:th finger of node n
Finger   Maps to       Real node
1,2,3    x+1,x+2,x+4   N14                                  m=6
4        x+8           N21           2m-1 0                            for j=1,...,m the
5        x+16           N32    N56            N1                       fingers of p+2j-1
                                                   Predecessor node
6        x+32           N42
                                                                N8
                       N51                                       +1
                                                                 +2
                                                                  +4
                                                                      N14
                   N42
                                                                  +8

                         +32                                   N21
                              N38                     +16
                                          N32
Chord routing II

n   Routing steps
    u   check whether the key k is found between n
        and the successor of n
    u   if not, forward the request to the closest finger
        preceding k
n   Each knows a lot about nearby nodes
    and less about nodes farther away
n   The target node will be eventually found
      Chord lookup           m=6
               2m-1 0
         N56            N1

                               N8
 N51


                                    N14
N42

                              N21
       N38
                    N32
Invariants

n   Two invariants:
    u   Each node's successor is correctly
        maintained.
    u   For every key k, node successor(k) is
        responsible for k.
n   A node stores the keys between its
    predecessor and itself
    u   The (key, value) is stored on the successor
        node of key
Join

n   A new node n joins
n   Needs to know an existing node n’
n   Three steps
    u   1. Initialize the predecessor and fingers of
        node
    u   2. Update the fingers and predecessors of
        existing nodes to reflect the addition of n
    u   3. Notify the higher layer software and
        transfer keys
n   Leave uses steps 2. (update removal)
    and 3. (relocate keys)
1. Initialize routing
information
n   Initialize the predecessor and fingers of
    the new node n
n   n asks n’ to look predecessor and fingers
    u   One predecessor and m fingers
n   Look up predecessor
    u   Requires log (N) time, one lookup
n   Look up each finger (at most m fingers)
    u   log (N), we have Log N * Log N
    u   O(Log2 N) time
Steps 2. And 3.

n   2. Updating fingers of existing nodes
    u   Existing nodes must be updated to reflect the
        new node (any keys that are transferred to
        the new node)
    u   Performed counter clock-wise on the circle
         F   Algorithm takes i:th finger of n and walks in the
             counter-clock-wise direction until it encounters
             a node whose i:th finger precedes n
         F   Node n will become the i:th finger of this node
    u   O(Log2 N) time
n   3. Transfer keys
    u   Keys are transferred only from the node
        immediately following n
    Chord Properties

n   Each node is responsible for K/N keys (K
    is the number of keys, N is the number of
    nodes)
n    When a node joins or leaves the
    network only O(K/N) keys will be
    relocated
n   Lookups take O(log N) messages
n   To re-establish routing invariants after
    join/leave O(log2 N) messages are
    needed
    Tapestry
n   DHT developed at UCB
    u   Zhao et. al., UC Berkeley TR 2001
n   Used in OceanStore
    u   Secure, wide-area storage service
n   Tree-like geometry
n   Suffix - based hypercube
    u   160 bits identifiers
n   Suffix routing from A to B
    u   hop(h) shares suffix with B of length digits
n   Tapestry Core API:
    u   publishObject(ObjectID,[serverID])
    u   routeMsgToObject(ObjectID)
    u   routeMsgToNode(NodeID)
           Suffix routing

                                 3 hop: shares 3
                                suffix with 1643
          0312 routes to 1643 via

              0312 -> 2173 -> 3243 -> 2643 -> 1643

       1 hop: shares 1        2 hop: shares 2
      suffix with 1643       suffix with 1643

Rounting table with b*logb(N) entries
Entry(i,j) – pointer to the neighbour j+(i-1) suffix
Pastry I

n   A DHT based on a circular flat identifier
    space
n   Prefix-routing
    u   Message is sent towards a node which is
        numerically closest to the target node
    u   Procedure is repeated until the node is found
    u   Prefix match: number of identical digits before
        the first differing digit
    u   Prefix match increases by every hop
n   Similar performance to Chord
Pastry II

n   Routing a message
    u   If leaf set has the prefix --> send to
        local
    u   else send to the identifier in the
        routing table with the longest common
        prefix (longer then the current node)
    u   else query leaf set for a numerically
        closer node with the same prefix
        match as the current node
    u   Used in event notification (Scribe)
Security Considerations

n   Malicious nodes
    u   Attacker floods DHT with data
    u   Attacker returns incorrect data
         F   self-authenticating data
    u   Attacker denies data exists or supplies
        incorrect routing info
n   Basic solution: using redundancy
    u   k-redundant networks
n   What if attackers have quorum?
    u   Need a way to control creation of node Ids
    u   Solution: secure node identifiers
         F   Use public keys
Applications for DHTs

n   DHTs are used as a basic building block
    for an application-level infrastructure
    u   Internet Indirection Infrastructure (i3)
         F   New forwarding infrastructure based on Chord
    u   DOA (Delegation Oriented Architecture)
         F   New naming and addressing infrastructure
             based on overlays
Internet Indirection
Infrastructure (i3)
n   A DHT - based overlay network
    u   Based on Chord
n   Aims to provide more flexible
    communication model than current IP
    addressing
n   Also a forwarding infrastructure
    u   i3 packets are sent to identifiers
    u   each identifier is routed to the i3 node
        responsible for that identifier
    u   the node maintains triggers that are installed
        by receivers
    u   when a matching trigger is found the packet is
        forwarded to the receiver
i3 II

n   An i3 identifier may be bound to a host,
    object, or a session
n   i3 has been extended with ROAM
    u   Robust Overlay Architecture for Mobility
    u   Allows end hosts to control the placement of
        rendezvous-points (indirection points) for
        efficient routing and handovers
    u   Legacy application support
         F   user level proxy for encapsulating IP packets
             to i3 packets
                                     R inserts a trigger (id, R) and receives
                                         all packets with identifier id.




Mobility is transparent for the sender

                                           the host changes its address from R1 to R2,
                                          it updates its trigger from (id, R1) to (id, R2).




                      Source: http://i3.cs.berkeley.edu/
    A multicast tree using a hierarchy of triggers




Source: http://i3.cs.berkeley.edu/
Anycast using the longest matching prefix rule.




       Source: http://i3.cs.berkeley.edu/
Sender-driven service composition using
          a stack of identifiers




                                Receiver-driven service composition using
                                           a stack of identifiers




                       Source: http://i3.cs.berkeley.edu/
    Layered Naming Architecture

n   Presented in paper:
     u   A Layered Naming Architecture for the Internet,
         Balakrishnan et al. SIGCOMM 2004
n   Service Identifiers (SIDs) are host-independent
    data names
n   End-point Identifiers (EIDs) are location-
    independent host names
n   Protocols bind to names and resolve them
     u   Applications use SIDs as handles
n   SIDs and EIDs should be flat
     u   Stable-bame principle: A stable name should not
         impose restrictions on the entity it names
n   Inspiration: HIP + i3 + Semantic Free
    Referencing
n   Prototype: Delegation Oriented Architecture
    (DOA)
User level descriptors (search query..)

          Search returns SIDs


                            Use SID as handle
App session                                           App session

          SIDs are resolved to EIDs


                                Bind to EID
 Transport                                            Transport

          Resolves EIDs to IP



     IP            IP HDR       EID    TCP      SID       IP
    DOA cont.

n   DOA header is located between IP and
    TCP headers and carries source and
    destination EIDs
n   The mapping service maps EIDs to IP
    addresses
n   This allows the introduction of various
    middleboxes to the routing of packets
    u   Service chain, end-point chain
n   Outsourcing of intermediaries
    u   Ideally clients may select the most useful
        network elements to use
n   Differences to i3
    u   Not a forwarding infrastructure
OpenDHT

n   A publicly accessible distributed hash table (DHT)
    service.
n   OpenDHT runs on a collection of 200 - 300 nodes on
    PlanetLab.
n   Client do not need to participate as DHT nodes.
    Bamboo DHT
n   A test bed infrastructure
     u   Open infrastructure for puts and gets
     u   Organizing clients that handle application upcalls
n   www.opendht.org
n   OpenDHT: A Public DHT Service and Its Uses. Sean
    Rhea, Brighten Godfrey, Brad Karp, John Kubiatowicz,
    Sylvia Ratnasamy, Scott Shenker, Ion Stoica, and
    Harlan Yu. Proceedings of ACM SIGCOMM 2005,
    August 2005.
    Summary
n   Mobility and multi-homing require
    directories
    u   Scalability, low-latency updates
n   Overlay networks have been proposed
    u   Searching, storing, routing, notification,..
    u   Lookup (Chord, Tapestry, Pastry),
        coordination primitives (i3), middlebox
        support (DOA)
    u   Logarithmic scalability, decentralised,…
n   Many applications for overlays
    u   Lookup, rendezvous, data distribution and
        dissemination, coordination, service
        composition, general indirection support
n   Deployment open. PlanetLab.
Discussion Points

n   Why most popular P2P programs have
    simple algorithms?
n   How secure should P2P / overlays be?
n   Will DHTs be eventually deployed in
    commercial software?
n   Will DHTs be part of future core Internet?
T-110.5140 Network Application
Frameworks and XML
Additional material: Chord Joins and
Leaves
Invariants

n   Two invariants:
    u   Each node's successor is correctly
        maintained.
    u   For every key k, node successor(k) is
        responsible for k.
n   A node stores the keys between its
    predecessor and itself
    u   The (key, value) is stored on the successor
        node of key
Join

n   A new node n joins
n   Needs to know an existing node n’
n   Three steps
    u   1. Initialize the predecessor and fingers of
        node
    u   2. Update the fingers and predecessors of
        existing nodes to reflect the addition of n
    u   3. Notify the higher layer software and
        transfer keys
n   Leave uses steps 2. (update removal)
    and 3. (relocate keys)
1. Initialize routing
information
n   Initialize the predecessor and fingers of
    the new node n
n   n asks n’ to look predecessor and fingers
    u   One predecessor and m fingers
n   Look up predecessor
    u   Requires log (N) time, one lookup
n   Look up each finger (at most m fingers)
    u   log (N), we have Log N * Log N
    u   O(Log2 N) time
Steps 2. And 3.

n   2. Updating fingers of existing nodes
    u   Existing nodes must be updated to reflect the
        new node (any any keys that are transferred
        to the new node)
    u   Performed counter clock-wise on the circle
         F   Algorithm takes i:th finger of n and walks in the
             counter-clock-wise direction until it encounters
             a node whose i:th finger precedes n
         F   Node n will become the i:th finger of this node
    u   O(Log2 N) time
n   3. Transfer keys
    u   Keys are transferred only from the node
        immediately following n
References

n   http://www.pdos.lcs.mit.edu/chord/

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:7/12/2013
language:English
pages:53