Tapestry by dfhdhdhdhjr


A Resilient Global-scale Overlay
    for Service Deployment

   Ben Y. Zhao, Ling Huang, Jeremy Stribling,
   Sean C. Rhea, Anthony D. Joseph, and
   John Kubiatowicz

   IEEE Journal on Selected Areas in Communications, January 2004   1
   Limitations of Plaxton routing
1. Need for global knowledge to construct neighbor table
2. Static architecture - no provision for node addition or
3. Single root for an object is a bottleneck. It is a single
   point of failure

Tapestry provides innovative solutions to some
of the bottlenecks of classical Plaxton routing.


Tapestry is a self-organizing robust scalable wide-
area infrastructure for efficient location and delivery
of contents in presence of heavy load and node or
link failure. It is the backbone of the Oceanstore, a
persistent wide-area storage system.

    Major features of Tapestry

Most contemporary P2P systems did not take take
network distances into account when constructing their
routing overlay; thus, a single overlay hop may span the
diameter of the network. In contrast, Tapestry constructs
locally optimal routing tables from initialization and
subsequently maintains them.
     Major features of Tapestry
DHT based systems fix the number and location
of object replicas. In contrast, Tapestry allows
applications to place objects according to their needs.
Tapestry “publishes” location pointers throughout the
network to facilitate efficient routing to those objects
with low network stretch.

 Major features of Tapestry

Unlike classical Plaxton routing, Tapestry allows
node insertion and deletion to support a
dynamic environment. These algorithms are fairly

          Routing and Location
Namespace (both nodes and objects)
  160 bits using the hash function SHA-1
  Each object has has one or more roots
  H (ObjectID) = RootID

Suffix routing from A to B
   At hth hop, arrive at nearest node Nh that
   shares suffix with B of length h digits
   – Example: 5324 routes to 0629 via
      5324 --> 2349 --> 1429 --> 7629 --> 0629
             Tapestry API
PUBLISH OBJECT : Publish (make available) object
on the local node.

UNPUBLISH OBJECT : Best-effort attempt to
remove location mappings for an object .

ROUTE TO OBJECT : Routes message to location
of an object with GUID (globally unique id).

ROUTE TO NODE : Route message to application
on the exact node .                                8
     Requirements for Insert and
• No central directory can be used
  – No hot spot or single point of failure
  – Reduced danger/threat of DoS.
• Must be fast (should contact only a few nodes)
• Keep objects available

               Node Insertion
• Need-to-know nodes are notified of, since the
  inserted node fills a null entry in their routing tables.

• The new node might become the root for some
  existing objects. References to those objects must be
  moved to maintain object availability.

• The algorithms must construct a near optimal routing
  table for the inserted node. Nodes near the inserted
  node are notified, and such nodes may consider
  using it in their routing tables as an optimization. 10
  Choosing Root in Tapestry
Compute H (ObjectID) = RootID. Attempt to route to this node
(which should be the root) without knowing if it exists. If it exists,
then it becomes the root. But otherwise

– Whenever null entry encountered, choose the next “higher”
  non-null pointer entry (thus, if xx53 does not exist, try xx63),
  or a secondary entry
– If current node S is only non-null pointer in the rest of map,
  terminate route, and choose S as the root

Acknowledged Multicast Algorithm

   Locates & Contacts all nodes with a given suffix

 • Popular tool: Useful in node insertion. Create a tree based on IDs
 • Starting node knows when all nodes reached

The node then sends to any ?0345, any ?1345,
any ?2345, etc. if possible                                    04345 & 54345

                                   ??345                            

                       ?1345                           ?4345
                      ?4345 sends to 04345,
                      54345… if they exist
                                               04345            54345

       Three Parts of Insertion

1. Establish pointers from surrogates to new node.
2. Notify the need-to-know nodes
3. Create routing tables & notify other nodes

           Finding the surrogates
• The new node sends a join
  message to a surrogate                            79334

• The primary surrogate                   39334
  multicasts to all other                            01334

  surrogates with similar           ????4
  suffix.                                           ???34
• Each surrogate establishes a
  pointer to the new node.     Gate

• When all pointers are                   01234

  established, continue
                                         new node
       Need-to-know nodes

• “Need-to-know” = a node with a hole in
  neighbor table filled by new node
  • If 01234 is new node, and no 234s existed, must
    notify ???34 nodes
  • Acknowledged multicast to all matching nodes

• During this time, object requests may go
  either to new node or former surrogate, but
  that’s okay

• Once done, delete pointers from surrogates.
Constructing the Neighbor Table via a
      nearest neighbor search

• Suppose we have an algorithm A for finding the
  three nearest neighbors for a given node.
• To fill in a slot, apply A to the subnetwork of
  three nodes that could fill that slot.
   (For ????1, run A on network of nodes ending
     in 1, and pick nearest neighbor.)

      Finding Nearest Neighbor
•   Let G be such that surrogate     j-list is the closest
    matches new node in last j       k=O(log n) nodes
    digits of node ID                matching in j digits

A. G sends j-list to new node;
   new node pings all nodes on
   j-list.                                       32134
B. If one is closer, goto that       61524
   node. If not, done with this               11111
   level, and let j = j-1 and goto    01234

           Is this the nearest node?
                 Yes, with high probability under an assumption

• Pink circle = ball around new node        New
  of radius d(G, new node)
• Progress = find any node in pink
• Consider the ball around the G
  containing all its j-list. Two cases:
    – Black ball contain pink ball; found
      closest node
    – High overlap between pink ball
      and G-ball so unlikely pink ball
      empty while G-ball has k nodes                              G, matches in
                                                                  j digits

        Deleting a node



                    12345           exiting node

           xxxx5              xxx45


                 Planned Delete
• Notify its out-neighbors: Exiting node says “I’m no
  longer pointing to you”
• To in-neighbors: Exiting node says it is going and
  proposes at least one replacement.
   – Exiting node republishes all objects ptrs it stores
• Objects rooted at exiting node get new roots






 Unplanned Delete

• Planned delete relied exiting node’s
  neighbor table.
  – List of out-neighbors
  – List of in-neighbors
  – Closest matching node for each level.
• Can we reconstruct this information?
  – Not easily
  – Fortunately, we probably don’t need to.
  Lazy Handling of Unplanned
• A notices B is dead, A fixes its own state
  – A removes B from routing tables
     • If removing B produces a hole, A must fill the
       hole, or be sure that the hole cannot be filled—
       use acknowledged multicast
  – A republishes all objects with next hop =
     • Use republish-on-delete as before
                                 Tapestry Mesh
                  Incremental suffix-based routing (slide
                    borrowed from the original authors)
                                                   4                    2
          NodeID                      NodeID                                        NodeID
          0x79FE                      0x23FE                                        0x993E
                                                               NodeID                              1
                             3            NodeID               0x43FE
                0x44FE                    0x73FE
      2                                                                                  1
                                                                        3                          NodeID
                                 4                         4      3
            2                                                                                      0xF990
0x035E                                        3                                 NodeID
                                                  NodeID                        0x04FE                 4
           NodeID        2           NodeID       0x13FE                                     NodeID
  3        0x555E                    0xABFE                             1                    0x9990
           1                         2                                                         3
 NodeID                                                    NodeID           1        NodeID
 0x73FF                                                    0x239E                    0x1290
                Fault detection
• Soft-state vs. explicit fault-recovery
   - Soft-state periodic republish is more attractive
    – Expected additional traffic for periodic republish is low

• Redundant roots for better resilience
    – Object names hashed w/ small salts i.e. multiple
    – Queries and publishing utilize all roots in parallel

 Summary of Insertion steps
Step 1: Build up N’s routing maps
    – Send messages to each hop along path from gateway to
      current node N
    – The ith hop along the path sends its ith level route table to N
    – N optimizes those tables where necessary
Step 2: Move appropriate data from N’ to N
Step 3: Use back pointers from N’ to find nodes which have null
   entries for N’s ID, tell them to add new entry to N
Step 4: Notify local neighbors to modify paths to route through N
   where appropriate
                Dynamic Insertion Example
                               borrowed from the original slides
                                               4                      2
          NodeID                     NodeID                                   NodeID
          0x779FE                   0xA23FE                                   0x6993E
                                                            NodeID                            1
                           3            NodeID              0x243FE
                0x244FE                 0x973FE
      2                                                                             1
                                                                      3                       NodeID
                               4                        4      3
            2                                                                                 0x4F990
0xC035E                                  3                                NodeID
                                              NodeID                      0x704FE                 4
            NodeID 2                NodeID    0x913FE                                   NodeID
  3        0xB555E                 0x0ABFE                            1                 0x09990
                                                       NodeID             1    NodeID
                 Gateway             NEW               0x5239E
                 0xD73FF                                                       0x71290
                                    0x143FE                                                   27

To top