Lecture 8-9
Naming services
Logistics / reminders
Assignment1
Share ideas / DO NOT share code.
Marks: You should have received them
Assignment2:
due this Friday 11:59pm
Quizzes:
Q1: Tuesday in two weeks (10/18)
Q2: 11/17
EECE 411: Design of Distributed Software Applications
“ Here is a list of mistakes …: TA‟s
Not handling exceptions comments
Bad software design. E.g. The complete code is in
a single method with noticeable repetitions.
Not checking the user input
Incorrectly receives/parses the message. E.g.
Hard coded buffer size.
Hard coded server and port number.
Not implementing reliable read operation”
The code does not reattempt to read the rest of the message if the
message is not completely received from the first read operation.
EECE 411: Design of Distributed Software Applications
Rodamap
A distributed system is:
a collection of independent computers that appears
to its users as a single coherent system
Components need to:
Communicate
point-to-point communictaion
Sockets, RPC, RMI
point-to-multipoint / data distribution [last time]
Multicast
Epidemic communication
Cooperate [next]
EECE 411: Design of Distributed Software Applications
Quiz question
A distributed service operates on a large cluster. The service
has one component running on each node.
Nodes might be shut down for maintenance, they might
simply fail, or they might come back online.
To function correctly each service component needs an
accurate list of all other nodes/service components that are
active.
TO DO: Design a mechanism that provides this list
Describe the mechanism in natural language
Provide the pseudocode.
Evaluate overheads
EECE 411: Design of Distributed Software Applications
Roadmap
A distributed system is:
a collection of independent computers that appears to its
users as a single coherent system
Components need to:
Communicate
Cooperate => support needed
Naming
Synchronization
EECE 411: Design of Distributed Software Applications
Naming systems
Functionality
Map: names access points (addresses)
Names are used to denote entities in a
distributed system.
To operate on an entity, we need to access it at an
access point (address).
Note: A location-independent name for an entity
E, is independent from the addresses of the
access points used to access E.
EECE 411: Design of Distributed Software Applications
Names are valuable!
NYT, August‟00
EECE 411: Design of Distributed Software Applications
Functionality
Map: names access points (addresses)
One challenge: scaling
#of names,
#clients,
geographical
distribution,
Management!
EECE 411: Design of Distributed Software Applications
War stories (1): Pic database
Saving pics in a student registration database
Issues:
Naming conflict
System vs. user chosen names
EECE 411: Design of Distributed Software Applications
War stories (2): ISPs merge!
Story
Bob‟s email is bob@superman.com
But the ISP running superman.com is bought by
superwoman.com and they want to translate all email
addresses to their new domain
bob@superwoman.com
bob99@superwoman.com
Issues:
Overloading (address is not location independent!)
Market solution: indirection service
EECE 411: Design of Distributed Software Applications
War stories (3): ZIP codes – more overloading
US zip code structure: routing built into name.
1st digit: zone (e.g., New England, NW)
2nd-3rd digit: „section‟
4-5th digit: post-office
Story: Congestion at Boston section (021).
Solution adopted: split in two (021 and 024)
Result?
Issue:
Overloading
EECE 411: Design of Distributed Software Applications
War stories (4): Running out of phone numbers
Phone numbers
10 digit: area(3)+switch(3)+identifier(4)
Story: Running out of numbers
Issue: Splitting vs. overlay
EECE 411: Design of Distributed Software Applications
Terminology
Names
Names vs. identifiers
Identifiers have three properties
refer to at most one entry
each entity is referred by at most one identifier
always refers to the same entity
Human friendly vs. arbitrary (random strings)
Namespace
Flat (names have no structure), vs.
Hierarchical (names have structure)
EECE 411: Design of Distributed Software Applications
Naming system implementation
Functionality
Map: names access points (addresses)
Strawman #1: Why not centralize?
Single point of failure
High latency
Distant centralized database
Scalability bottleneck:
Traffic volume
Management: Single point of update
EECE 411: Design of Distributed Software Applications
Naming system implementation
Strawman #2: Why not use a replicated database
(old /etc/hosts)?
Original Name to Address Mapping
Flat namespace
/etc/hosts
SRI kept main copy
Downloaded regularly
Count of hosts was increasing: machine per domain
machine per user
Many more downloads
Many more updates
Still a scalability bottleneck
EECE 411: Design of Distributed Software Applications
Naming system implementation
Strawman #3:…. ?
EECE 411: Design of Distributed Software Applications
Naming system implementation
Idea: partition the namespace
Hierarchical namespace (e.g., DNS)
EECE 411: Design of Distributed Software Applications
Naming system implementation
Idea: partition the namespace
What if I want to keep the namespace
flat?
EECE 411: Design of Distributed Software Applications
Implementation options:
Flat namespace
Problem: Given an essentially unstructured name
how can we locate its associated address?
Possible designs:
Simplistic solutions
broadcasting, forwarding pointers
Limited scalability; reliability problems
Hash table-like approaches
Consistent hashing,
Distributed Hash Tables
EECE 411: Design of Distributed Software Applications
Flat namespaces – simple solutions
Broadcasting: Simply broadcast the ID, requesting the
entity to return its current address.
Can never scale beyond local-area networks (think of
ARP/RARP)
Requires all processes to listen to incoming location
requests
EECE 411: Design of Distributed Software Applications
Flat namespaces – simple solutions (II)
Forwarding pointers: Each time an entity moves, it
leaves behind a pointer telling where it has gone to.
Update a client‟s reference as soon as present location
has been found
Geographical scalability problems:
Long chains are not fault tolerant
Increased network latency at dereferencing
EECE 411: Design of Distributed Software Applications
Implementation options:
Flat namespace
Problem: Given an essentially unstructured name
how can we locate its associated address?
Possible designs:
Simplistic solutions
broadcasting, forwarding pointers
Limited scalability; reliability problems
Hash table-like approaches
Consistent hashing,
Distributed Hash Tables
EECE 411: Design of Distributed Software Applications
Functionality to implement
Map: names access points (addresses)
Similar to a hash-table: manage list of (name,
access point) pairs
API
Put (key, value)
Lookup (key) value
Issue: scalaing
Key idea: partitioning.
Allocate parts of the list to different nodes
EECE 411: Design of Distributed Software Applications
Why the put()/get() interface?
API supports a wide range of applications
imposes no structure/meaning on keys
Key/value pairs are persistent and global
Can store keys in other values (indirection)
And thus build complex data structures
EECE 411: Design of Distributed Software Applications
Why Might The Design Be Hard?
Decentralized: no central authority
Scalable: low network traffic overhead
Efficient: find items quickly (latency)
Dynamic: nodes fail, new nodes join
General-purpose: flat naming
EECE 411: Design of Distributed Software Applications
The Lookup Problem
N2
N1 N3
Put (Key=“title” Internet
Value=file data…) ? Client
Publisher
Get(key=“title”)
N4 N6
N5
• At the heart of all these services
EECE 411: Design of Distributed Software Applications
Motivation: Centralized Lookup (Napster)
SetLoc(“title”, N4) N1 N2
N3
Client
Publisher@N4 DB Lookup(“title”)
Key=“title”
Value=file data…
N9 N8
N7
N6
Simple, but O(N) state and a single point of failure
EECE 411: Design of Distributed Software Applications
Motivation: Flooded Queries (Gnutella)
N1 N2 Lookup(“title”)
N3
Client
Publisher@N
4
Key=“title”
Value=file data…
N6 N7 N8
N9
Robust, but worst case O(N) messages per lookup
EECE 411: Design of Distributed Software Applications
Motivation: FreeDB, Routed DHT Queries (Chord, &c.
N1 N2
N3
Client
Publisher N4 Lookup(H(audio data))
Key=H(audio data)
Value={artist,
album N6 N7 N8
title,
track title}
N9
EECE 411: Design of Distributed Software Applications
How wo each of the previous schemes do in terms of
Decentralized: no central authority
Scalable: low network traffic overhead
Efficient:
find items quickly (latency)
low overheads (generated traffic)
Dynamic: nodes fail, new nodes join
General-purpose: flat naming
EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications
Partition Solution: Consistent hashing
Consistent hashing:
the output range of a hash function is treated as a
fixed circular space or “ring”.
K5
Key ID Node ID N10
0
K11
N100 128
Circular K30
K99 ID Space N32
K33
N80
K52
N60
EECE 411: Design of Distributed Software Applications
Partition Solution: Consistent hashing
Mapping keys to nodes
Advantages: incremental scalability, load
balancing
Key ID Node ID N10 K5, K10
K99 N100
Circular
ID Space N32 K11, K30
K65, K70 N80
N60 K33, K40, K52
EECE 411: Design of Distributed Software Applications
Consistent hashing
How do store & lookup work?
Key ID Node ID N10 K5, K10
K99 N100
“Key 5 is N32 K11, K30
At N10”
K65, K70 N80
What node stores K5? N60 K33, K40, K52
EECE 411: Design of Distributed Software Applications
Properties
Lookup: O(?)
State at each node: O(?)
EECE 411: Design of Distributed Software Applications
More on load balancing
Problem: How to do load balancing when nodes are heterogeneous?
Solution idea: Each node owns an ID space proportional to its „power‟
Virtual Nodes:
Each physical node is responsible for multiple (similar) virtual nodes.
Virtual nodes are treated the same
Advantages: load balancing, incremental scalability, dealing with failures
Dealing with heterogeneity: The number of virtual nodes that a node is
responsible for can decided based on its capacity, accounting for
heterogeneity in the physical infrastructure.
When a node joins (if it supports many VN) it accepts a roughly equivalent
amount of load from each of the other existing nodes.
If a node becomes unavailable the load handled by this node is
evenly dispersed across the remaining available nodes.
EECE 411: Design of Distributed Software Applications
Theoretical foundation
Notation:
N number of nodes,
k number of keys in the system]
Theorems
[With high probability] Each node is responsible for
at most (1+)K/N keys
Load balancing
[With high probability] Joining or leaving of a node
relocates O(K/N) keys (and only to or from the
responsible node)
Local impact of failures
EECE 411: Design of Distributed Software Applications
Consistent Hashing – Summary so far
Mechanism:
Nodes get an identity by hashing their IP address, keys are also hashed
into same space
A key with id (hashed into) k, is assigned to first node whose hashed id
is equal or follows k, in circular space: successor(k)
Properties
O(1) lookup; (N) state
Advantages
Incremental scalability, load balancing
Theoretical results:
[With high probability] Each node is responsible for at most (1+)K/N
keys
[With high probability] Joining or leaving of a node relocates O(K/N)
keys (and only to or from the responsible node)
EECE 411: Design of Distributed Software Applications
BUT …
How large is the state maintained at each node?
O(N); N number of nodes.
Can we do better?
Key ID Node ID N10 K5, K10
K99 N100
“Key 5 is N32 K11, K30
At N10”
K65, K70 N80
N60 K33, K40, K52
EECE 411: Design of Distributed Software Applications
Basic Lookup (nonsolution)
N5
N10
N110 “Where is key 50?”
N20
N99
“Key 50 is
N32
At N60”
N40
N80
N60
• Lookups find the ID‟s successor
• Correct if successors are correct Applications
EECE 411: Design of Distributed Software
Successor Lists Ensure Robust Lookup
10, 20, 32
N5
N10
20, 32, 40
5, 10, 20 N110
N20
32, 40, 60
110, 5, 10 N99
N32
40, 60, 80
N40 60, 80, 99
99, 110, 5 N80
N60 80, 99, 110
• Each node remembers r successors
• Lookup can skip over dead nodes
EECE 411: Design of Distributed Software Applications
“Finger Table” Accelerates Lookups
¼ ½
1/8
1/16
1/32
1/64
1/128
N80
EECE 411: Design of Distributed Software Applications
Lookups take O(log N) hops
N5
N10
N110 K19
N20
N99
N32 Lookup(K19)
N80
N60
EECE 411: Design of Distributed Software Applications
Summary of Performance Characteristics
Efficient: O(log N) messages per lookup
Scalable: O(log N) state per node
Robust: survives massive membership changes
EECE 411: Design of Distributed Software Applications
Joining the Ring
Three step process
Initialize all „fingers‟ of new node
Update fingers of existing nodes
Transfer keys from successor to new node
Two invariants to maintain to insure
correctness
Each node‟s successor list is maintained
successor(k) is responsible for monitoring k
EECE 411: Design of Distributed Software Applications
Join: Initialize New Node‟s Finger Table
N5
N20
N99
N36
1. Lookup(37,38,40,…,100)
N40
N80
N60
EECE 411: Design of Distributed Software Applications
Join: Update Fingers of Existing Nodes
New node calls update function on existing nodes
Existing nodes recursively update fingers of other
nodes
N5
N20
N99
N36
N40
N80
N60
EECE 411: Design of Distributed Software Applications
Join: Transfer Keys
Only keys in the range are transferred
N5
N20
N99
N36 K30
Copy keys 21..36
N40 K30 from N40 to N36
K38
N80
K38
N60
EECE 411: Design of Distributed Software Applications
Handling Failures
Problem: Failures could cause incorrect lookup
Solution: Fallback: keep track of successor
successor (i.e., keep list of r successors)
N120
N113 N10
N102
N85 Lookup(90)
N80
EECE 411: Design of Distributed Software Applications
Quiz like question
How long should one dimension the size of the successor list? Assume
that 50% of the nodes fail.
r - length of successor list
N – nodes in the system
Assume 50% of the nodes fail
P(successor list all dead for a specific node) =
(1/2)r
i.e., P(this node breaks the ring)
depends on independent failure assumption
P(no broken nodes in the entire system) =
(1 – (1/2)r)N
r = 2log(N) makes prob. = 1 – 1/N
52
EECE 411: Design of Distributed Software Applications
DHT – Summary so far
Mechanism:
Nodes get an identity by hashing their IP address, keys are also hashed
into same space
A key with id (hashed into) k, is assigned to first node whose hashed id
is equal or follows k, in circular space: successor(k)
Finger table
Properties
Incremental scalability, good load balancing
Efficient: O(log N) messages per lookup
Scalable: O(log N) state per node
Robust: survives massive membership changes
EECE 411: Design of Distributed Software Applications
Some applications
EECE 411: Design of Distributed Software Applications
An Example Application:
The CD Database
Compute Disc
Fingerprint
Recognize Fingerprint?
Album & Track Titles
EECE 411: Design of Distributed Software Applications
An Example Application:
The CD Database
Type In Album and
Track Titles
Album & Track Titles
No Such Fingerprint
EECE 411: Design of Distributed Software Applications
A DHT-Based FreeDB Cache
FreeDB is a volunteer service
Has suffered outages as long as 48 hours
Service costs born largely by volunteer mirrors
Idea: Build a cache of FreeDB with a DHT
Add to availability of main service
Goal: explore how easy this is to do
EECE 411: Design of Distributed Software Applications
Cache Illustration
DHT
DHT
EECE 411: Design of Distributed Software Applications
Some experimental results
EECE 411: Design of Distributed Software Applications
Chord Lookup Cost Is
O(log N)
Average Messages per Lookup
Number of Nodes
EECE 411: Design of Distributed Software Applications
Failure Experimental Setup
Start 1,000 CFS/Chord servers
Successor list has 20 entries
Wait until they stabilize
Insert 1,000 key/value pairs
Five replicas of each
Stop X% of the servers
Immediately perform 1,000 lookups
EECE 411: Design of Distributed Software Applications
DHash Replicates Blocks at r Successors
N5
N10
N110
N20
N99
Block
N40 17
N80 N50
N68 N60
• Replicas are easy to find if successor fails
• Hashed node IDs ensure independent failure
EECE 411: Design of Distributed Software Applications
Massive Failures Have Little Impact
1.4
Failed Lookups (Percent)
1.2 (1/2)6 is 1.6%
1
0.8
0.6
0.4
0.2
0
5 10 15 20 25 30 35 40 45 50
Failed Nodes (Percent)
EECE 411: Design of Distributed Software Applications
Next
A distributed system is:
a collection of independent computers that appears
to its users as a single coherent system
Components need to:
Communicate
Cooperate => support needed
Naming – enables some resource sharing
Synchronization
EECE 411: Design of Distributed Software Applications