Notion of Client/Server
To understand Peer-to-peer, you need to understand what it is not. On the internet, we can
find on the one hand clients : application you install on your computer to use internet services
or those offered by a private network (emails, newsgroups, …) These kind of software are
often free but data you can download are not necessarily free.
On the other hand, there are servers. They are powerful computer, designed to provide service
for several clients which are online and connected on the same machine. So, it is servers
which allow to share data on the network.
Instead, in a peer-to-peer environment, all users are client and servers in the same time (
expect for centralised network we will explain about later ).
To conclude, there is no hierarchy between users on peer-to-peer network : all users are half-
client, half-server which allow each user to share and to handle resources as he wants.
CHORD PROTOCOL
Chord is a protocol for quality because of its simplicity. It has all specifications needed
for a peer-to-peer network management. First, each node of the network is fairly
contributionned because of use of consistent hash function. Consequently, each peer supports
an identical charge in the network operation which is avoiding bottlenecks and surcharging
problems. Secondly, Chord is a totally distributed network protocol, each node is autonomous
which make network extremely robust. Moreover, requests cost increase logarithmically by
the numbers of network's peer which made Chord protocol scalable, available to support a lot
of peers.
Thanks to its performance algorithm of routing table, Chord is also a reactive protocol
which give a strong availability. Chord has made to work like a under-layer application.
The key principle of Chord is to use a function hash that is at once rapid and evenly
distributed so that the load is equal on each pair of the network. Such a hash is called
consistent. Also, when a peer leaves or enters in the network, it is highly probable only 1 / N
(N total number of peers) key are displaced. The scalable Chord is maintained through to
manage a routing table containing only m inputs (m number of bits of a key). Thanks to this,
ensures that a motion will be made only through log (N) pairs.
-Hash function
The hash function assigns each node and each key identifier of m bit. The value m should be
of course large enough that the probability of collision between identifiers is low. The
notation of a pair can be created for example by hashing its IP address or its host name. The
policy of assigning a key to a chopped pair is simplistic. The key is associated with the first
node where the value of the identifier is less than or equal to the value of the key. This
particular node is called successor.
The above figure shows an example of a Chord network consisting of three nodes whose
identifiers are 0, 1, and 3. The set of keys (or more precisely, keys' identifiers) is 1, 2, 6 and
they are assigned to the three nodes. Because the successor of key 1 among the nodes in the
network is node 1, key 1 is assigned to node 1. Similarly, the successor of key 2 is 3, the first
node found moving clockwise from 2 on the identifier circle. For key 6, the successor (node
0) is found by wrapping around the circle, so key 6 is assigned to node 0.
Distributed hash table
Different type of network architectures
In fact, there are several architectures of « p2p » networks which focus each on their own
type of date sharing.
1. The centralized networks
Centralized networks are closest to traditional networks, indeed there is a server
which is responsible to connect all clients which need. The server identifies each
ones ( it recovers their « name » and their IP address ) and all shared resources
available with an indexing system called Hash Table ( HT ).
HT is like directory : if a client is seeking a song ( for example ) it will ask to the
server : « who has a part of this song ? ». The server will read the directory and
answers with names ( hash ) and ip address of all others clients which have it.
Allowing to the client application to know « where » are the other part of the song
between the millions users of the network.
But the hash is not really the name of the date needed but it is a kind of unique code
designed by the real content of the resources needed. The HT allows to avoid to
download of several files which seems to be different because they have a different
name whereas they have the same content. Or avoiding to download a file which
looks like the same whereas has a different content.
Centralized networks used to be the most popular architecture used in the first
generation of p2p network. But this solution is more and more abandoned because of
its high fragility. Indeed, even if the main advantage of this method is the comfort
and efficiency of searches, the service must be not surcharge and need a community
large enough to be interesting.
We will finish by the main defects of this type of architecture.
First, the security level of centralized network is really low. There is only one entry
which is the centralized server. It suffices to stop the one to disconnect all the users
and block the network entirely.
Secondly, you need to register on the server. There is, also, no guarantees of
anonymity. The service has recorded all IP address of clients and the kind of data
you download.
2. Distributed network
To replace the first generation of p2p network, distributed network were created. The
first change, users want to connect to this kind of network, need a half-client halt-
server application to be accepted by others « pairs » ( users ) who have the same
software.
But, unlike centralized networks where you just need to be connected to the server to
have the information access, the new pair need to learn the network topology you
want to connect, to seek information on all nodes and to receive an answer of a node
which corresponds to research.
When the new user is finally connected, it becomes a pair and is an integral part of
the network. It can make a search with one or more key words, like any good search
engine. But there is even a difference : the client search never stop until it make
another search.
Another main difference with centralized network is the use of the HT. Indeed, since
there is no central-server how to use HT if one of pair is disconnected ? In fact,
servers still keep the Hash Table but, each pair own a part of the table : it is called
Distributed Hash Table. When a pair makes a search, it will ask to the pair who takes
care of the first hash of its search. If it answers is negative, it answers to ask to
another pair which take care of another part of the hash table, and that until the
search is positive. This way allows to avoid any problems about down-pair or hash
table corruption.
The distributed networks, the second generation of peer-to-peer, is really more
powerful than the old generation. It keeps the advantages of the centralized networks
with quick search and solves a few defects of the first generation. Indeed, even if a
few pairs left the network, the data sharing goes on. Resulting in a very robust
system which combines
speed and security for pairs. Indeed, none can have the entire Hash Table : each pair
is connect anonymously.
There is still a defect, the networks is quickly polluted by the search which go on
infinitely. That's why another architecture exists to solve others problems.
// FIXME HYBRID