Indexing Overlay Network

Document Sample
Indexing Overlay Network Powered By Docstoc
					                             Indexing data-oriented overlay networks
                                                                                                                     



                        Karl Aberer, Anwitaman Datta, Manfred Hauswirth, Roman Schmidt
                                             School of Computer and Communication Sciences
                                                                  e e
                                             Ecole Polytechnique F´ d´ rale de Lausanne (EPFL)
                                                     CH-1015 Lausanne, Switzerland


                              Abstract                                        as an access structure for highly distributed data-oriented
                                                                              applications, such as relational query processing, metadata
        The application of structured overlay networks                        search or information retrieval [5, 19]. Their use was moti-
        to implement index structures for data-oriented                       vated by the presence of certain features that are supported
        applications such as peer-to-peer databases or                        by their design such as scalability, decentralized mainte-
        peer-to-peer information retrieval, requires highly                   nance, and robustness under network churn. Compared to
        efficient approaches for overlay construction,                         unstructured overlay networks which are also being pro-
        as changing application requirements frequently                       posed for these applications [13, 16], structured overlay
        lead to re-indexing of the data and hence (re-                        networks additionally exhibit much lower bandwidth con-
        )construction of overlay networks. This prob-                         sumption for search.
        lem has so far not been addressed in the liter-                           The standard maintenance model for peer-to-peer over-
        ature and thus we describe an approach for the                        lay networks assumes a dynamic group of peers forming
        efficient construction of data-oriented, structured                    a network where peers can join and leave, essentially in a
        overlay networks from scratch in a self-organized                     sequential manner. In addition proactive or reactive main-
        way. Standard maintenance algorithms for over-                        tenance schemes are used to repair inconsistencies result-
        lay networks cannot accomplish this efficiently, as                    ing from node and network failures or to re-balance load
        they are inherently sequential. Our proposed al-                      in order to react to data updates. These approaches to
        gorithm is completely decentralized, parallel, and                    maintenance, that have been extensively studied in the lit-
        can construct a new overlay network with short                        erature, correspond essentially to updating database index
        latency. At the same time it ensures good load-                       structures in reaction to updates.
        balancing for skewed data key distributions which                         In contrast to this, almost no results exist on how to ef-
        result from preserving key order relationships as                     ficiently construct a large overlay network from scratch,
        necessitated by data-oriented applications. We                        i.e., how to bootstrap a new, large-scale, structured overlay
        provide both a theoretical analysis of the basic al-                  network in a practical way within reasonable time. This
        gorithms and a complete system implementation                         is understandable insofar as most of the work on over-
        that has been tested on PlanetLab. We use this im-                    lay networks was done under the assumption of providing
        plementation to support peer-to-peer information                      an efficient resource location scheme using an application-
        retrieval and database applications.                                  specific, yet fairly stable, resource identifier space (e.g., file
                                                                              names for file sharing).
1 Introduction                                                                    With the increasing adoption of structured overlay net-
                                                                              work technology for data-oriented applications this as-
In standard database systems it is common practice to reg-                    sumption no longer holds. Resources are identified by dy-
ularly (re-)index attributes to meet changing requirements                    namically changing predicates and different overlay net-
and optimize search performance. Recently, structured                         works can be used simultaneously, each of them supporting
peer-to-peer overlay networks are increasingly being used                     a specific addressing need. We can illustrate these require-
    ¡
      The work presented in this paper was supported (in part) by the Na-     ments by a typical application case of peer-to-peer infor-
tional Competence Center in Research on Mobile Information and Com-           mation retrieval which we investigated recently.
munication Systems (NCCR-MICS), a center supported by the Swiss                   The standard application of structured overlay networks
National Science Foundation under grant number 5005-67322 and was             in peer-to-peer information retrieval is the implementation
(partly) carried out in the framework of the EPFL Center for Global Com-
puting and supported by the Swiss National Funding Agency OFES as             of a distributed inverted file structure for efficient keyword
part of the European project Evergrow No 001935.                              based search. In this scenario, several situations occur,
Permission to copy without fee all or part of this material is granted pro-   in which the overlay network has to be constructed from
vided that the copies are not made or distributed for direct commercial       scratch:
advantage, the VLDB copyright notice and the title of the publication and
its date appear, and notice is given that copying is by permission of the
                                                                                ¢    A set of documents that is distributed among (a po-
Very Large Data Base Endowment. To copy otherwise, or to republish,                  tentially very large number) of peers is identified as
requires a fee and/or special permission from the Endowment.                         holding information pertaining to a common topic. To
Proceedings of the 31st VLDB Conference,                                             support efficient retrieval for this specific document
Trondheim, Norway, 2005                                                              collection, a dedicated overlay network implementing
      inverted file access may have to be set up.                  joins and leaves.
  ¢   A new indexing method, for example, a new text ex-              In this paper we will address the problem how a struc-
      traction function for identifying semantically relevant     tured overlay network can be constructed efficiently from
      keywords or phrases, is being used to search a set of       scratch, a problem that the research community has only
      semantically related documents distributed among a          recently identified and started to address [2, 8, 14]. Our
      large set of peers. Since the index keys change as a        approach is a generic mechanism to autonomously parti-
      result of changing the indexing method a new overlay        tion a keyspace in a completely parallel manner. The ap-
      network needs to be constructed to support efficient         proach can potentially be used for constructing any struc-
      access.                                                     tured overlay with fixed key space partitioning [7].
  ¢   Due to updates to a distributed document collection             In data-oriented applications there exists an additional
      an existing distributed inverted file has become ob-         factor that adds to the difficulty of finding a solution to this
      solete. This may either result from not maintaining         problem: load balancing. When using overlay networks for
      the inverted file during document updates or due to          semantic processing of keys (range queries being a popu-
      changing characteristics of the global vocabulary and       lar example) the canonical method of uniform hashing of
      thus changing the indexing strategy (e.g., term selec-      keys to remove skew in the key distribution is no more ap-
      tion based on inverse document frequency). Thus a           plicable. This has led to substantial research on including
      complete reconstruction of the overlay network is re-       load balancing features into overlay networks [2, 12, 17].
      quired.                                                     During construction this must be taken into account, thus
  ¢   Due to catastrophic network failures the standard           the construction approach also has to solve load balancing
      maintenance mechanisms no longer can reconstruct            problems. In fact, we will address two types of load bal-
      a consistent overlay network. Thus the overlay net-         ancing problems simultaneously: the balancing of storage
      works needs to be constructed from scratch. Of              load among peers under skewed key distributions and the
      course, this scenario applies generally in any applica-     balancing of the number of replica peers across key space
      tion, but becomes more probable when multiple over-         partitions. The first problem is important to balance work-
      lay networks are deployed in parallel.                      load among peers and is solved by adapting the overlay
   In principle a (re-)construction of an overlay network         network structure to the key distribution. The second one
in any of these scenarios can be achieved by the standard         is important to guarantee approximately uniform availabil-
maintenance model of sequential node joins and leaves.            ity of keys in unreliable networks where peers have poten-
Most existing proposals for structured overlay networks           tially low availability. This is a classical “balls into bins”
[17, 24, 25] do not offer a completely parallel construction      load balancing problem.
process involving all peers simultaneously. They assume a             Our approach is based on a keyspace bisection process
model of joins of peers in an essentially sequential process.     through a completely decentralized, parallel, and random-
However, this approach encounters two serious problems:           ized algorithm for assigning peers to key space partitions
  ¢   The peer community will have to decide on a serial-         in proportion to the key distributions of the partitions. By
      ization of the process, e.g., electing a peer to initiate   recursively applying keyspace bisections, peers can incre-
      the process. Thus the peer community has to solve           mentally construct the overlay network while maintaining
      a leader election problem, which might turn out to          load balance. We will introduce our approach in the context
      be unsolvable for very large peer populations without       of the P-Grid overlay network structure [3], which we have
      making strong assumptions on coordination or limit-         developed over the last years, though the essential elements
      ing peer autonomy.                                          of the approach are applicable to all overlay networks us-
  ¢   Since the process is performed essentially in a serial-     ing fixed key space partitioning schemes, such as CAN [23]
      ized manner, it incurs a substantial latency. In partic-    or Pastry [24]. We demonstrate the theoretical correctness
      ular it does not take any advantage of potential paral-     of the basic keyspace bisection process by analysis and
      lelization, which would be a natural approach.              simulation and show the feasibility of building a complete
                                                                  system matching the theoretically predicted behavior with
   In principle some systems like Pastry [24] would sup-          experimental results obtained from a full-fledged imple-
port concurrent construction as they take an optimistic ap-       mentation deployed on the PlanetLab [11] infrastructure.
proach in which concurrent node joins are possible as long        The resulting system (available at http://www.p-grid.org)
as there are no conflicts. However, this assumes that there        is currently used to implement both peer-to-peer retrieval
already exists a large overlay, so that conflicts are rather       (http://www.alvis.info/) and peer-to-peer data management
unlikely. In an early stage of bootstrapping and with large       systems [1].
number of peers joining concurrently, conflicts will be very
likely, however. Thus this type of strategy is not applica-
ble to the problem we are addressing. DKS [6] avoids this         2 Overview of the Approach
problem by equipping joining peers with an approximate            2.1 A trie-structured overlay
routing table which in the course of the operation of the
overlay will be corrected (correction on use). While this         We assume that data keys are taken from the key space
                                                                                              £ § ¥
                                                                                              ©¨¦ £¤
approach is robust, it incurs considerable efforts as on av-      consisting of the interval    . The design of the P-Grid
erage the number of lookups per peer required to stabilize        overlay network is based on two simple principal ideas: (1)
the network is of the same order as the number of node            Divide and conquer: The key space is recursively bisected
such that the resulting partitions carry approximately the                                search in terms of the communication cost of                                      $$ # 
                                                                                                                                                                            &%"!
same workload and peers are associated with those parti-                                  message, where is the number of leaf nodes in the tree,
                                                                                                                   #
tions. Using a bisection approach greatly simplifies de-                                   irrespective of the shape of the tree [2].
centralized load balancing by local decision making. (2)
Canonical trie structure: Bisecting the key space induces                                 2.2 Overlay Network Construction
a canonical trie structure which is used as the basis for im-
plementing a standard, distributed prefix routing scheme for                               The process of constructing such an overlay network from
efficient search. The resulting overlay is illustrated in Fig-                             scratch should require low latency, i.e., be highly paral-
ure 1.                                                                                    lel and require minimal bandwidth consumption. At the
                                                                                          same time the following load balancing criteria should be
                                                                                          achieved:
                                                                                            1. The partitioning of the search space should be such
                                                                                                that each partition holds a maximal data load of        ,                            20 (
                                                                                                                                                                                     31)'
                                                                                                e.g., measured as the number of keys present in the
                                                                                                partition. We will call        also the maximal storage
                                                                                                                                          20
                                                                                                                                          31( '
                                           Replica sub-network
                                                                                                load in the following.
                                                                                            2. Each resulting partition should be associated with a
                                                                                                constant number of peers           , such that the avail-
                                                                                                                                                  64
                                                                                                                                                  75( #
                                                                                                ability of the different data keys is approximately the
                                                                                                same. We will call         also the minimal replication
                                                                                                                                     64
                                                                                                                                     98( #
                                                                                                factor in the following.
                                                                 Recursive Partitioning




                                                                                              With perfect load balancing these properties can be
                                                                                          achieved iff.         C 64 #@ A@
                                                                                                                D75( B&©'      , where      is the total
                                                                                                                                            20
                                                                                                                                          # 31( '                @A@
                                                                                                                                                                 &©'
                                                                                          number of data keys and is the number of peers. Algo-
                                                                                                                                      #
                                                                                          rithm 1 shows our global partitioning algorithm                                   # QIQI H F
                                                                                                                                                                            PRRBRP©GE
                       Load distribution                                                  that attempts to achieve these load balancing goals by best
                                                                                          effort while bisecting the key space, if the idealizing as-
                                                                                          sumptions are not met.
         Figure 1: Trie-structured overlay network
                                                                                          Algorithm 1 Partition(p, n, d)
    At the bottom we see a possible skewed distribution
of data keys in the interval
                                £ § ¥
                                 £¤
                                       . We bisect the interval                           1: if  fe b U T
                                                                                                 9`W dVcb         and
                                                                                                                   aY WSU T
                                                                                                                   3`X3V8S                   then
                                                                                          2:    if   fe b w v u s qi fe
                                                                                                     hgW VT tdq tqrphgW b                then
such that each resulting partition carries (approximately)
                                                                                           3:                     ;
                                                                                                 v b y ƒ        s b y x
                                                                                                  dq „%b b ‚q €)b
the same load. Each partition can be uniquely identified                                            q             q
                                                                                           4:      Partition( , , ); Partition( , , )
by a bit sequence. We associate one or more peers—in the
                                                                                                        x x x
                                                                                                         XS b p…                                 ƒ ƒ ƒ
                                                                                                                                                  ‡S †b )…
                                                                                           5:   else
example exactly two—with each of the partitions. We will                                   6:      if           then     ƒ S ˆ x
                                                                                                                         P‰XS
call the bit sequence of a peer’s partition the peer’s path.                               7:                       ;
                                                                                                        x b ‘ b y ƒ fe b y x
                                                                                                         )“’%†b 9gW €)b
The bit sequences induce a trie structure which is used to                                 8:         Partition( , , ); Partition( , , )
                                                                                                                x x x
                                                                                                                XS )b p…                             ƒ ƒ ƒ
                                                                                                                                                     PS †b )…
                                                                                           9:      else
implement prefix routing. Each peer maintains references                                   10:           analogous
                                                                                                         ”              •
in its routing table that pertain to its path. More specifically,                          11:      end if
for each bit position of its path it maintains one or more                                12:   end if
randomly selected references to a peer that has a path with                               13: end if
the opposite bit at this position. Thus the trie structure is
represented in a distributed fashion by the routing tables of                                 The algorithm works as follows. Assume peers are                          #
the peers. This topology is analogous to other prefix rout-                                associated with one key space partition containing data           –                    '
ing schemes that have been devised [20, 24] and have been                                 keys and two sub-partitions and containing respec-    ™ "–        —
                                                                                                                                                            ˜–              —
                                                                                                                                                                            ©'
classified as a fixed key space partitioning scheme for struc-                              tively data keys, such that
                                                                                                  ™
                                                                                                  d'                                   . To achieve load
                                                                                                                                          )gf' C '
                                                                                                                                          ™ ' e —
tured overlay networks in the literature [7]. Search in such                              balance criterion 1, a fraction of      of peers should be as-
                                                                                                                                                  jh
                                                                                                                                                  di#
                                                                                                                                                  §¥
                                                                                                                                                  h ¤
an overlay network is performed by resolving a requested                                  sociated with partition for           4. In case
                                                                                                                                "–                      C Q         75l# k j h #
                                                                                                                                                                    64 (
key bit by bit. When bits cannot be resolved locally, peers                               at least      peers should be associated with to achieve
                                                                                                       98( #
                                                                                                       64                                                                    h 4 –
forward the request to a peer from its routing table.                                     load balance criterion 2.             recursively applies this
                                                                                                                                     # QIQI H F
                                                                                                                                     PRRBRP©GE
    We use replication in two ways in order to increase the                               bisection step to the key space.
resilience of the overlay network when nodes of network                                       For various reasons this algorithm will achieve the load
links fail. Multiple references are kept in the routing ta-                               balancing goals only approximately. Provided the number
bles, thus providing alternative access paths, and multi-                                 of data keys is large enough, i.e.,                       , the
                                                                                                                                                    64 ( # n # 2 0 ( m @ A
                                                                                                                                                    98lPop31©' 8&@ '
ple peers are associated with the same key space partitions                               number of peers associated with a partition will be between
                                                                                                                            §
(structural replication) in order to provide data redundancy.                             64 (
                                                                                          98l#   and             , instead of constant
                                                                                                             V78l‡p
                                                                                                             q 64 ( #                         . For very         64 (
                                                                                                                                                                 98%#
Since the routing choices are made by randomly choosing                                   skewed data distributions it can happen that very small par-
peers from the complementary sub-tree at each level, the                                  titions contain a large fraction of the data keys, and bisec-
resulting overlay network additionally provides efficient                                  tion “disperses” many peers to underloaded partitions even
before reaching such partitions. These are fundamental                                                                                                     We can thus reduce the problem of load-balanced over-
problems of any bisection approach. However, for practi-                                                                                               lay network construction to the problem of decentralized
cal data distributions and large peer populations these prob-                                                                                          partitioning of one key space partition. The problem is that
lems are more theoretical in nature and              achieves                                PRRrRPGE
                                                                                            # QIQI H F                                                a large number of peers have to perform the decision to split
good load balancing properties provided            and                                        78( #
                                                                                              64                                           X8( '
                                                                                                                                           20          independently for allowing a fast construction of the over-
are chosen properly.                                                                                                                                   lay network, while making these independent decisions in a
    We will use in the following             as an algorithm               PRRrRPGE
                                                                           # QIQI H F                                                                 way that the ratio of the number of peers matches the ratio
that defines what we consider as an optimal partitioning                                                                                                of the data load in the two partitions. In other words, the
of the search space among peers and a resulting optimal                                                                                                global behavior of the distributed decision making process
overlay network. Since in a peer-to-peer system no global                                                                                              should match the outcome of the partitioning step in the
coordination exists, the problem we intend to solve is to                                                                                              global partitioning algorithm              (corresponding to
                                                                                                                                                                                                                # QIQI H F
                                                                                                                                                                                                                PBRBRPGE
achieve the partitioning generated by                by a de-                              # QIQI H F
                                                                                           PBRBRPGE                                                   lines 3 and 7 in             ). The solution to this problem
                                                                                                                                                                                      # QIQI H F
                                                                                                                                                                                      PRRrRPGE
centralized process approximately. We will measure the                                                                                                 is one of the central contributions of the paper and will be
quality of a solution by determining the deviation from the                                                                                            discussed in detail in Section 3.
optimal partitioning.
    In a decentralized process peers do not have precise in-
formation on the number of peers and keys present in a par-                                                                                            3 Decentralized Partitioning
tition and cannot know which decision the other peers in a                                                                                             Consider a set of          E
                                                                                                                                                                                                   §
                                                                                                                                                                                     peers which hold data keys from
                                                                                                                                                                                          e
                                                                                                                                                                                          w#
partition take with respect to associating themselves with a                                                                                           key space . The space is partitioned into two parts,
                                                                                                                                                                          x                            x
                                                                                                                                                                                                                                                      ¤
partition. The only available information is on the set of lo-                                                                                               §
                                                                                                                                                       and , such that the load measured in number of data keys
cally stored data keys and information gathered from local                                                                                             related to the partitions, and are and          —
                                                                                                                                                                                                       y      . In the
                                                                                                                                                                                                                   – |¤ 
                                                                                                                                                                                                                        ™
                                                                                                                                                                                                                                        §
                                                                                                                                                                                                                                                 –
                                                                                                                                                                                                                                                 zq
interactions with other peers.                                                                                                                         following we assume w.l.o.g. that                   . Then the
                                                                                                                                                                                                                     {
                                                                                                                                                                                                                              –
                                                                                                                                                                                                                                    {       ™}
    The decentralized process we design is based on random                                                                                             partitioning that we would ideally like to achieve should
peer encounters and a set of basic local interactions. The                                                                                             have the following properties:
random encounters can be initiated by performing random
                                                                                                                                                          1. Proportional replication: Each peer has to decide for
walks on a pre-existing unstructured overlay network. The
                                                                                                                                                             one of the two partitions such that (in expectation) a
interactions peers can perform in their encounters can be                                                                                                    fraction of the peers decides for 0 and a fraction
classified in three categories, as shown in Figure 2.                                                                                                             §            –
                                                                                                                                                                   for 1. Thus the workload becomes uniformly dis-
                                                                                                                                                                     –
                                                                                                                                                                     8q
                                                                                                                                                             tributed among the peers, meeting the load-balancing
                                                                                                                                                             criteria in the resulting overlay.
                                                                                                                                                          2. Referential integrity: During the process each peer has
          1        Random          3               1         Random          6                      1           Random              4
                                                                                                                                                             to encounter at least one peer that decided for the other
                  interaction

                                                  1: 3
                                                            interaction

                                                                           1: 5
                                                                                                               interaction
                                                                                                                                                             partition. Thus the peers have the necessary informa-
                                                                                                                                                             tion to construct a routing structure, i.e., the overlay
                                                                                                  1: 3
          *                        *             01: 2                     01:2                                                   0: 2
                                                                                                 01: 2

   000,010,100                  101,001     0001,0011                     0000
                                                                                                                                                             infrastructure, for delegating requests for keys they are
                                                                                                                                                             no longer associated with.
                                                                                                          Interact with 3

                                                                                         Possibility 3:
                                                                                         Peers can update their routing table
          1                        3                1                        6           entries (to add redundancy and
                                                                                         randomization), apart from recommending                           A peer can initiate interactions with any peer selected
                                                                                                                                                       uniformly randomly from . We measure the cost of an
                                                                                         the peers to meet some other peers (with
                                                   1: 3                     1: 5         better match of path). This induces the
        1: 3                      0: 1
                                                  01: 2                    01: 2         random interactions.
                                                                                                                                                                                                            E
   000,010,001                  101,100
                                             0000,0001,
                                               0011
                                                                      0000,0001,
                                                                        0011             Peers from different partitions meet
                                                                                                                                                       algorithm solving the problem in terms of the number of
 Possibility 1:                            Possibility 2:
                                                                                              pid                                Legend                interactions initiated by peers and this cost should be min-
                                                                                                                                                       imized. The quality of an algorithm solving the problem
 Exchange content, Split the key space,    Become replicas, and reconsile content
 and update routing table                  Should also have a partial list of replicas
                                           (not shown here) for reconciling content                       (can have multiple entries for each level)

                                                                                                                                                       is measured by the deviation of the resulting distribution
                                           later, using, e.g. anti-entropy algorithm.     Routing table
                                                                                                           (only part of the prefix is shown)
                                                                                          Index data

 Peers from same partition (or one’s path is the prefix of other) meet                                                                                 of peers from an optimal distribution that can be achieved
                                                                                                                                                       based on global knowledge and coordination. First we as-
                                       Figure 2: Network evolution                                                                                     sume that the value of is known to all peers. We will an-
                                                                                                                                                                                               –
    If peers belong to the same partition they can either                                                                                              alyze the influence of having only approximate knowledge
# QIQI H F –s
PRRrRP†‚‡H     the present partition (a divide-and-conquer                                                                                            of by sampling the locally stored data keys later.
                                                                                                                                                         –
strategy) or              the data keys they currently hold. If
                             u©B†‚‡H
                             sI F tQ –s                                                                                                                   To clarify the critical issues we first discuss two sim-
they do not belong to the same partition, they can                                                                                                     ple heuristic approaches: In the case of             , a simple
                                                                                                                                         !"‡‡H
                                                                                                                                         H s v s                                                                                  C –            ™}
each other to other peers using their routing table entries                                                                                            strategy to adopt would be that peers which have not yet
and thus route to a peer that belongs to the same partition.                                                                                           decided for a partition, initiate a random interaction. If the
    If peers from the same partition meet, they may de-                                                                                                contacted peer is also undecided, the peers decide for dif-
cide to                 in case the current partition contains
                  PRRBR!˜t‡H
                  # QIQI H F –s                                                                                                                       ferent partitions (balanced split), otherwise the peer initi-
a sufficient number of data keys to justify a further split,                                                                                            ating the interaction decides opposite to the contacted peer
i.e., the partition is overloaded (corresponding to line 1 in                                                                                          which has decided already (unbalanced split). In this way
# QIQI H F
PRRrRPGE   ). They can coordinate locally their decision. In                                                                                          it learns about a peer from the other partition. Since the
addition, peers keep a reference to the peer encountered af-                                                                                           algorithm is symmetric, in expectation the same number
ter a split, and thus incrementally construct their routing                                                                                            of peers will decide for each partition, and it provides the
tables.                                                                                                                                                best possible performance within the model, since in each
interaction every possible decision is taken. We call this                                                                               the partitioning proceeds as fast as possible, optimizing the
strategy eager partitioning. While the eager partitioning                                                                                required number of interactions. Then the model can be
strategy works well for          , it cannot be employed for                                                                             given as
                                              C –    ™}
other values of .             –
    For an arbitrary but known , a possible strategy, which
                                                          –                                                                                                                                                        §
we call autonomous partitioning (AUT), would be that                                                                                                  —                                          —                     Š  §
                                                                                                                                                     4 –             C                         4 –
                                                                                                                                                                                                            $ ™ ™4 Ž$ q q ™ 4— Ž
                                                                                                                                                                                                                  –                  – q #        e ™
each peer makes a decision for one of the two partitions                                                                                                                                                                                       §#
in advance, even without meeting any other peer and then                                                                                             ™ –             C                           ™
                                                                                                                                                                                               4 –
                                                                                                                                                                                                                                        Š
                                                                                                                                                                                                                              $ ™ ™4 – Žq #      e ™
tries to meet some peer from the other partition in order
                                                                                                                                                      4                                                                          
                                                                                                                                                                                                                                                  #
to satisfy the referential integrity constraint. In this set-
ting, obviously some of the peer interactions are “wasted,”                                                                                                                                                                                            Š
                                                                                                                                            To determine the proper value of for a given value of ,
whenever peers which have decided for the same parti-
                                                                                                                                                                                                                                                                                                           –
                                                                                                                                         we have to solve this recursive system. The first important
tion meet. For the specific case of             , by modeling
                                                                                          ™}
                                                                                                                                         observation is that the recursion terminates as soon as no
                                                                           C –
the interactions as Markovian processes, we observed that                                                                                                                                                                                                                                                      §
                         ©p‚ p§
                         „ ƒ                                                                                                            more undecided peers exist, i.e., as soon as                  .                                                                  e‰# C ™4 ‘e 4— –
                    interactions are initiated on an average
                                                                                                                                                                                                                                                                                     –
                                                                                                                                         Thus we have first to find a value such that
    C p˜p7gp
      $ p € ~
per peer asymptotically (i.e., for large ), as compared to                        #                                                            §                                                                                              ’
                                                                                                                                                                                                                                              uI                        C y™@ –”e y—@ –
                                                                                                                                                                                                                                                                           “      “
C pr†p7~
                   ‚ … 
                   ©p„ ¦¤
                  interactions per peer with eager partition-                                                                            e
                                                                                                                                         ‘#   . In general this will not be an integer value, but in the
                                                                                                                                         context of mean value analysis we allow fractional steps.
  $ p € 
ing. Thus autonomous partitioning is not an optimal strat-
egy.                                                                                                                                     By standard solution methods we obtain

3.1 Adaptive eager partitioning
                                                                                                                                                                                                              Š                                                                    §
                                                                                                                                                    —                                      #     # Š
                                                                                                                                                                                                 q        4         §        §       Š                                                     4
                                                                                                                                                   4 –           C                     Š              |q $
                                                                                                                                                                                                       p         q •e             q r  p                                             $ $
In the following we introduce a method for such an op-                                                                                                                                             #        #
                                                                                                                                                                                                                          Š
timized solution to the partitioning problem, that has the                                                                                         ™ –                                     #                      4            §       §
characteristics of eager partitioning but works for all . Due                                                                                       4            C                     Š                        $ $              
                                                                                                                                                                                                                            q –q 
                                                                                                               –                                                                                                        #
to space constraints we can only summarize the main points
of the analysis. However, the full analysis can be found in
the long version of this paper [4].                                                                                                           and evaluating the termination condition, we obtain

Adaptive eager partitioning (AEP) algorithm:                                                                                                                                                                                                                   §
                                                                                                                                                                                                                                                                                                           (1)
                                                                                                                                                                                                                          $ p € 
                                                                                                                                                                                                                          pr†©9~
                                                                                                                                                                                               C %lBI
                                                                                                                                                                                                 $ # ’                                            e
   1. Each undecided peer initiates interactions with a uni-
                                                                                                                                                                                                                             6
                                                                                                                                                                                                                       $ ™ )6 ˜p7~
                                                                                                                                                                                                                              € 
       formly randomly selected peer until a decision is
       reached. Selecting peers uniformly at random is a                                                                                     Note, that    does not depend on , and thus the par-
                                                                                                                                                                                  ’I                                                                               –
       non-trivial problem in itself which we solve by a vari-                                                                           titioning process requires the same number of interactions
       ant of random walks.                                                                                                              among peers independent of the load distribution. By defi-
                                                                                                                                                                         s™
   2. If the contacted peer is undecided the peers perform                        ˆ†¤
                                                                                  ‡ {                         §
                                                                                                              ‰{                         nition           , thus we obtain a relationship between the
                                                                                                                                                                              “
       a balanced split with probability                    and                                                                                            š —6
                                                                                                                                                         ™ ›˜C –                                     §                                                                                             ¥
                                                                                                                                         network size         and the load distribution with        ,
                                                                                                 ˜h
                                                                                                 $ –                                                                                                                                                                                   Š
       maintain references to each other.                                                                                                              eœ#                                                                                                               –                     $ –
                                                                                                                                                                                                                                                                                               %# h
   3. If the contacted peer has already decided for then
                                                                                                                   ¤                     the decision probability to be used.
                                                                                                                                                        ¥ Š
       the contacting peer decides for and maintains a ref-
                                                                       §                                                                     Having  %# Œ
                                                                                                                                                     $ –      dependent on is problematic for two                                         #
       erence to the contacted peer.                                                                                                     reasons: First the resulting equation is hard to solve, and
   4. If the contacted peer has already decided for then
                                                                                                                       §                 second, more importantly, is not necessarily known to                              #
       the contacting peer decides for with probability
                                                               ¤                                                                   {
                                                                                                                                   w¤    the peers. Since we are interested in situations where is                                                                                                     #
         Š         §
                   ‹{
                   and with probability            for 1. In the
                                                                   §                  Š                                                  (relatively) large we thus perform an asymptotic analysis.
                                                                                                                                         By letting             we obtain the following relationship
             $ –
             ˜h                                                              q           $ –
                                                                                          "Œ
       first case it maintains a reference to the contacted peer.                                                                                                     Š
                                                                                                                                                                     ž#                                  Ÿ
       In the second case it obtains a reference to a peer from                                                                          among and       –                                     $ –
                                                                                                                                                                                               ˜Œ
       the other partition from the contacted peer.
                                                                                                                                                                                                          §
    It is straightforward to see that condition (2) of the par-                                                                                                                                         §    §
                                                                                                                                                                                                                                                                                                           (2)
                                                                                                                                                                                                    
titioning problem is satisfied. The question is now to deter-                                                                                                                                    $ ’ |q  Š q
                                                                                                                                                                                                      p        C –
mine how to satisfy condition (1) by properly choosing the
                         ‡
probabilities         and
                                          Š
                               .
                                                                                                                                                                                                                        Š
                              ˜h
                              $ –             $ –
                                              "Œ                                                                                            Positive solutions for      cannot be obtained for all                         $ –
                                                                                                                                                                                                                            ˜h
    We model the peer interactions as a Markovian process                                                                                values of . From Equation 2 we derive that positive so-
                                                                                                                                                             –                                                    §
using mean value analysis. We assume that in each step                                                                                   lutions exist for                 . This means that the al-
                                                                                                                                                                                                  p˜p7¢q
                                                                                                                                                                                                  $ p € ~            
                                                                                                                                                                                                                      ¡–
Q  a peer which has not yet found its counterpart contacts                                                                               gorithm cannot partition correctly for too highly skewed        { g¤                                              §
another randomly selected peer. By            and    we denote                                                                           partitions. Therefore for                       we have to
                                                                            —                   ™ –                                                                                              k –
                                                                           4 –                   4                         ¤                                                                                                                                       $ p € ~
                                                                                                                                                                                                                                                                   ©†©9¢q
the number of peers that have decided in step for and
    §                                                                         ¤                       Q                                  pursue a different strategy, by reducing the probability of    § k         ‡
  , respectively. Initially,                . At the end of the                                                                          balanced splits, i.e.,        .
                                                 ™ – C — –               C                                                                                                                                    $ –
                                                                                                                                                                                                               ˜Œ
                                                 —— –   —                                                 §                                                                                                                                                                                    ¤
process in some step we have                          . We first                                                                              Through an analogous analysis, by setting
                                                                                                                                                                                                                                                                             Š
                     ‡               I§        4              e# C ™4 ze
                                                                       –                                                       ‡                                                                , we                            ‡                                                C "Œ
                                                                                                                                                                                                                                                                                   $ –
assume that              . Informally speaking, with this
                             C "Œ
                               $ –                                                                                                 ˜h
                                                                                                                                   $ –   can derive relationships for     :                                                         $ –
                                                                                                                                                                                                                                    "Œ
                                                                                                                                                                                           Since the sampling errors are presumably small we use                                                    Š
                           f‡
                           ¥
                                                           ‡
                                                                                                                                                              §                         a Taylor series expansion to approximate        . In fact, for                                                   $ –
                                                                                                                                                                                                                                                                                                         "Œ
                                                                                                                                                                                  (3)
                                                         $ r˜©9~ p € 
                     £
                    l&I         C %#
                                  $        e       ‡           §                                                                                                                        reasons that will become clear later, we need to make a sec-
                                             $%#†e p|q ˜p7¤€%Ž†©9~ € ~ q $ # €                                                                                                      ond order approximation to perform a proper error analysis.
                                                                                                                                                                                        For a given value , we have
                                                             ‡
            and for the relation between        –     ˜Œ
                                                     $ –                                                and when                                                  ¦¥#
                                                                                                                                                                  Ÿ                                                         –
                                               ‡     § ‡                                                    ‡                                                                                                                                                    §
                                                                                                                                                                                                                                 %Š
                                                                                                                                                                                                                                 ¿                                   ¿ %Š
                                                                                                                                                                                                                                                                       ¿                    }
                                p  €}  ~
                                r˜©9†e |q 
                                           ‡
                                                 p
                                                   §     q C –
                                                                                                                             &$
                                                                                                                             $
                                                                                                                                                                                  (4)                                 ¾ 4
                                                                                                                                                                                                                      od´              e 4 ²$ –
                                                                                                                                                                                                                                       ˜)R˜Œ                                        ²$ –
                                                                                                                                                                                                                                                                                   4 R˜h                                     (6)
                                       $ |q p                                                                                                                                                                                                               p
                                                                                                                                                                                           for small . We now determine the expectation value
                                                                                                                                                                                                                  4
                                                                                                                                                                                                                  d²
    Before we continue with the discussion of different par-                                                                                                                            and standard deviation for      (to simplify the presenta-         ™ ¶
                                                                                                                                                                                                                                                            @
titioning algorithms, a statement on the modeling approach                                                                                                                              tion we will write instead of      in the following). Since
is necessary: We use a sequential approach to model and                                                                                                                                 £ À               ¤             ’
                                                                                                                                                                                                                        RI £         }I £ À                                             §
                                                                                                                                                                                                  and
                                                                                                                                                                                              C )²                       Á4 H Â       we obtain for
                                                                                                                                                                                                                       C Žd² F DC Á 4 ²                                            ˜q o– ™­
analyze what is a concurrent process. This is a simplifi-
                                                                                                                                                                                                Á4                                                                                  $ –   
                                                                                                                                                                                        the expectation value using (5)
cation as well as an appropriate approximation for our pur-
pose. Assume that the latency in one interaction is such that                                                                                                                                                                                                                 §
   other interactions among peers occur concurrently. Then
                                                                                                                                                                                                                    £ À                 ¥ ¥ Š          §               ¿¿ Š
                                                                                                                                                                                                                                                                                                                              (7)
                                                                                                                                                                                                                                                              #
    t                                                                                                                                                                                                      C Á ™@ ¶                   $        $ –
                                                                                                                                                                                                                                      &I # ""„q o– "Œ           $ –
the concurrent behavior of      peers corresponds (approxi-          §                                                                                            §                                       {¤§ ¦¤
                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                 ¤ ¤
                                                                                                                                                                                                                                                            «
                                                                                                                                                                                                                                                           { ¥ ¥ Š
                                                                                                                                                                                                                                                                                p
mately) to the sequential behavior of                groups.                                                            ¨©                C             ”#
                                                                                                                                                        e                                  where              p q                 . This shows thatp q          $ &I # Ã    
The analysis we perform shows that the models we use are                                                                                                                                sampling introduces a systematic shift of the balance be-
sufficiently accurate for relatively small . Thus for large                                                                        #                                                     tween the resulting partitions. In a concrete implementa-
numbers of peers the model is a sufficiently good approx-                                                                                                                                tion we will have to compensate for this systematic error,
imation, whereas for small concurrency is less likely to                 §                                                                                                              as will be discussed in more detail subsequently.
occur and less critical.
                                                                                                                                                                                                                   £                  @ £                                                   §
                                                                                                                                                                                           Since                Á4 H                      we obtain
                                                                                                                                                                                                              C )² ©F  I C d² ™ ° 4 ® F Â
                                                                                                                                                                                                                             Á4           H                                             "Äq `– @­
                                                                                                                                                                                                                                                                                        $ –   
                                                                                                                                                                                        for the standard deviation by a similar analysis
3.2 Error Analysis
Up to now we assumed that the value of is known to all                                                                                –                                                                 ¥ ¥ Š         § I                  ¿ Š           £
peers. Practically peers will derive an estimate for by                                                                                                               –                               $        v $ –
                                                                                                                                                                                                      &I # g¦˜q `–               d˜Œ
                                                                                                                                                                                                                                   Æ # $ –       C Á ™@ ¶ Å                                                                   (8)
                                                                                                                                                                                                                                 «
sampling. Therefore, in the following we analyze the effect                                                                                                                                                                p¤ ¦–{ ¥ ¥ Š
                                                                                                                                                                                                                           …  ¤                  { Ç 
                                                                                                                                                                                                                                                  “)¤ ¤
of errors introduced by only approximate knowledge of .                                                                                                                            –       where                           .         $
                                                                                                                                                                                                                                     &I # gv   
Other potential sources of errors, such as taking the limit                                                                                                                                The impact of the errors depends in particular also on
                                                                                                                                                                                        the behavior of the functions
                                                                                                                                                                                                                              ¿ Š
                                                                                                                                                                                                                             and
                                                                                                                                                                                                               ¿¿ Š
case           and using mean value analysis turned out to
                    ª¢#
                    Ÿ                                                                                                                                                                                    ˜h
                                                                                                                                                                                                          $ –          $ –
                                                                                                                                                                                                                        "Π           . Using nu-
have a negligible influence.                                                                                                                                                             merical differentiation we observed that the functions are
    Assume peers obtain samples from their locally stored
                                                     «                                                                                                                                  well-behaved in the relevant region.                                                         § k
data keys. The samples correspond to Bernoulli variables                                                                                                                                   Performing an analogous analysis for                 the  ¿ ‡                                   – ¿¿ ‡                 $ p € ~
                                                                                                                                                                                                                                                                                                                  prp7Èq
¬
        ™
             V¬ ¨¨¥
             ­ ¥   
              with probability . The peers estimate by                       – ¬                          ­                                                               –             behavior of the functions       and        will be relevant        $ –
                                                                                                                                                                                                                                                           ˜h                           $ –
                                                                                                                                                                                                                                                                                         "Œ                    ¿¿ ‡
computing the mean value                         which is bi-                             C        ¯ ¬ ™ t¯ f™­
                                                                                                         ° ®                                                                            for the error behavior. We have included a plot of                                                                                    $ –
                                                                                                                                                                                                                                                                                                                              "Œ
nomially distributed. We would like to determine the effect                                                                                                                             in order to point out an important observation (Figure 3):
of an error in estimating on the values of           and
                                                                                                    ‡                                                                         Š         For very small values of the second derivative grows ex- –
                                                                                                                                                                                        tremely fast, and consequently the error will be large as
                                                         –                                                                                               $ –
                                                                                                                                                         "Œ                      $ –
                                                                                                                                                                                  ˜h
and the resulting effect on the partitioning process when
using approximate values of and . In the following we
                                                                                     ‡              Š                                                                                   well.
will use and instead of
                            ‡
                                    and
                                       Š
                                               as long as the
                                                                                 ‡                                  Š                                                                      The error analysis shows that in the presence of sam-
                                                                                                                                                                                        pling errors, we have to include correction terms in the
                                                                                         $ –
                                                                                         "Œ                                  $ –
                                                                                                                              "Œ
meaning is clear.                                                                                                                                                                                                 ‡
                                                                                                                                                                                        probabilities      and
                                                                                                                                                                                                                                         Š
    We provide an exemplary error analysis for the evolu-                                                                                                                                                           used in AEP.
                                                                                                                                                                                                                       $ –
                                                                                                                                                                                                                       ˜h                       $ –
                                                                                                                                                                                                                                                 "Œ
                                                                                                        §
tion of     for the case where                      since this
                          ™ –                                                              
                                                                                          ‹–                                 $ p € ~
                                                                                                                             p"p7Dq                                                                                                                                  §                         §
                           4
is algebraically the simplest case. Analogous analysis have                                                                                                                                           ‡
                                                                                                                                                                                                              "Œ ‚&A ©
                                                                                                                                                                                                              $ – ÉÉ             C
                                                                                                                                                                                                                                             ‡
                                                                                                                                                                                                                                                      o– ˜h  $ –
                                                                                                                                                                                                                                                                            ¿¿ ‡
                                                                                                                                                                                                                                                                              q $ –
                                                                                                                                                                                                                                                                               g"Œ
                                                                                                                                                                                                                                                                                                          §
                                                                                                                                                                                                                                                                                                               "„q
                                                                                                                                                                                                                                                                                                               $ –            (9)
been done for the other case, but they are substantially more                                                                                                                                                                                             «§             §p
complex.                                                                                                                                                                                                            Š                              §                ¿¿ Š             Š
                                                                                                                                                                                                          "Œ ‚&A ©                                                                                                       (10)
    We assume that in step the estimation value
                                                                                                                                                                                                          $ – ÉÉ                 C           $"–„q o– ˜h
                                                                                                                                                                                                                                                               $ –            q $ –
                                                                                                                                                                                                                                                                                €˜Œ
                                                             Q                                                                    dœo– C %– Š
                                                                                                                                  4 ² e      ±4                                                                                                              «              p
is used to determine an estimation value
                                                                                                                                                   Š
                                                         . The                                                                          4 ¢e
                                                                                                                                          ´     C
                                                                                                                                                ³4                                                               alpha’’ p
error is the sampling error obtained by the peer initiating
                    4
                    d²
                                                                                                                                                                                                                      60
step . Let us denote by                  the error introduced
                Q                                    ™ ¢e ™ – C ™ µ –
                                                      4 ¶  4     4                                                                                                                                                    50
into the result of the partitioning process due to sampling                                                                                                                                                           40
errors. We can derive the following closed-form expression                                                                                                                                                            30
for from analyzing the Markov model of the process.
             ™ ¶
              4                                                                                                                                                                                                       20
                                                                                                                                                                                                                      10
                                                                 ·                                                                            ¯                                                                                                                                                               p
                                                                             §                 §                ’                                            ¯ G#                                                                0.05 0.1 0.15 0.2 0.25 0.3
                                4 º 4 Š     ·                                                ½ l6 q ¼q»                                                        ´
                                          §
                                                                                                                                                                                  (5)
                                                                                                                                                    ¸
            C ™4 ¶          q           q                                                         Š Š                                                                                                         Figure 3: Numerical Solution for
                                                                                                                                                                                                                                                                                                        ¿¿ ‡
                                                                                                                                                                                                                                                                                                               "Œ
                                                                                                                                                                                                                                                                                                               $ –   .
                              ™ t¯ ¹#
                                 °   ¸                                                   $%#¤q 
3.3 Numerical Simulation of the Markov Model                                                                                   Interactions           Number of Interactions
                                                                                                                                3000
To validate the correctness of our analytical models we per-
formed numerical simulation experiments. We simulated                                                                           2500
five models:
  1. MVA: simulation of the mean value model for AEP                                                                            2000

     with known          –
  2. SAM: simulation of the mean value model for AEP;                                                                           1500
                                                                                                                                          MVA
     the value of is estimated from samples
                     –                                                      «                                                             SAM
  3. AEP: discrete Simulation of AEP with peers taking    ‡                                                                     1000
                                                                                                                                          AEP

     discrete decisions based on          and
                                                                                    Š
                                                   instead of         ˜Œ
                                                                      $ –                "Œ
                                                                                         $ –                                              COR
                                                                                                                                          AUT
     adding mean value contributions as in the mean value                                                                                                                                      p
     model                                                                                                                                      0.1      0.2         0.3       0.4       0.5

  4. COR: discrete Simulation of AEP with corrected
                             ‡                                   Š
     probabilities           and  ˜h ttA ©
                                  $ – ÉÉ               ˜Œ ‚tA ©
                                                       $ – ÉÉ
  5. AUT (Discrete Autonomous Partitioning): Discrete                                                                           Figure 5: Mean total number of interactions over 100
     simulation of autonomous decision making where                                                                      –                             experiments
     is estimated from samples           «                                                                                       Superficially, AEP appears to be a more complex al-
   We present the results for              and      C #. Each
                                                                      ¤ ¤ ¤
                                                                      ©p!§
                                                                                         Ê
                                                                                         ‹C «
                                                                                                       ¤                     gorithm than AUT while not considerably outperforming
experiment has been repeated 100 times.                                                                                      AUT. However, the complexity is in the analysis required
   Figure 4 shows the deviation of the mean value of
                                                                                                                   @ –       to determine the correct decision probabilities, whereas for
                                                                                                                             practical implementation AEP has even advantages since it
                                                                                                                 —
from the expected value       averaged over all experiments.
                                             #
                                             ¹–
As expected, using sampling for estimating leads to a sys-                      –                                            provides an invariant: When taking a decision for a parti-
tematic deviation of the resulting distribution (SAM, AEP).                                                                  tion, the availability of a reference is guaranteed.
The error correction strategy (COR) eliminates the devia-                                                                        We would like to point out that the problem studied in
tion almost completely. Clearly, autonomous partitioning                                                                     this section is a novel load distribution problem in the area
(AUT) on average achieves the desired distribution.                                                                          of distributed systems, particularly because of the referen-
                                                                                                                             tial integrity constraint. A solution to this problem can be
  Mean p0 t   n p
                                                                                                                             useful beyond overlay network construction as we use it
                                                                                                                             here, but also in resource and task distribution and decen-
                                             Deviation from Mean
       12                                                                                MVA

       10
                                                                                         SAM                                 tralized load-balancing in general.
                                                                                         AEP


                                                                                                                             4 Algorithmic Issues
                                                                                         COR
        8
                                                                                         AUT


                                                                                                                             In order to use AEP for implementing the                    al-
        6
                                                                                                                                                                                # QIQI H F
                                                                                                                                                                                PRRBRP©GE
        4                                                                                                                    gorithm in a decentralized fashion we have to address sev-
        2                                                                                                                    eral issues related to the global organization of the indexing
                                                                                                                             process.
                                                                                                                 p
                    0.1                       0.2                     0.3               0.4                0.5

        2                                                                                                                    4.1 Initiating the Indexing Process
                                                                                                                             In the absence of global coordination the mechanism to
                                                                                                                             reach a decision to initiate the indexing process is not ob-
 Figure 4: Mean of over 100 experiments, the expected
                                 @
                                   — –
                                                                                                                             vious. While it is not the focus of this paper, and the initi-
      value   #
              ˖is subtracted to highlight the deviation.                                                                    ation process is orthogonal to the index evolution process,
   Figure 5 shows the cost of each algorithm measured in                                                                     we nonetheless describe a simple, decentralized strategy.
number of interactions. As theoretically predicted, we ob-                                                                       Depending on locally observed queries, individual peers
serve that adaptive eager partitioning performs better than                                                7¦¤
                                                                                                           §                may make autonomous decisions on whether a new index
AUT, except for small values of (approx.                   )                                                                 may be necessary or re-indexing may be required. Any of
                                                                                                   k                 Ê
                                                                  –                            –
independent of which version is considered (MVA, SAM,                                                                        the peers that locally decide that indexing is useful can ini-
COR).                                                                                                                        tiate a vote, by flooding the peer network. This flooding can
   Further experiments with different sample sizes showed                                                                    use the pre-existing, generic, unstructured overlay network
that the sample size has practically no influence. Even very                                                                  which we assume to exist.
small samples (1 or 2 samples) lead to the same results as                                                                       When peers receive a voting request they can reply
larger sample sizes. Experiments also showed that adaptive                                                                   back their local decision. Additionally, helpful informa-
eager partitioning has a further advantage over autonomous                                                                   tion, such as locally available storage space that the peer is
partitioning as it reduces the standard deviation of the er-                                                                 willing to contribute to store information for the new index
ror in partitioning by approximately a factor of 2. Thus                                                                     and the number of local data items to be indexed can be
our AEP approach optimizes both performance in terms of                                                                      piggy-backed. Votes are sent back along the paths they ar-
number of required interactions and error control in terms                                                                   rived, and multiple votes are aggregated while flowing back
of matching the partitioning ratio .                          –                                                              to reduce bandwidth consumption. Based on the number of
positive and negative responses, the peer which initiated the                              tacted by another peer. In this way peers that are “ahead
voting can then decide whether to initiate index construc-                                 of the crowd”, e.g., due to faster network connections, are
tion or not, and can flood the decision back to all peers. Ad-                              forced to wait for the slower ones. The same mechanism
ditionally, based on the aggregate storage space available,                                also eventually leads to termination of the process, when
and the amount of storage required for all the data items                                  peers encounter only fully synchronized copies of them-
(references) in the system, the decision will contain the pa-                              selves.
rameters for ensuring optimized utilization of the available
resources and for synchronization of the indexing process.                                 4.3 Complexity
We assume a collaborative environment where the major-                                     The goal of our approach to index construction is to per-
ity of peers does not behave maliciously or in a Byzan-                                    form it with low bandwidth consumption and low latency.
tine manner, and adheres to the democratic decision of the                                 With regard to bandwidth consumption a necessary require-
group, and thus participates in the indexing irrespective of                               ment is to perform no worse than a sequential approach
their individual votes.                                                                    using standard construction mechanisms, i.e.,                 .         $
                                                                                                                                                                     }
                                                                                                                                                                        #
                                                                                                                                                                   %# !Ž
                                                                                           To study this, we look at the complexity in the case of a
4.2 Synchronizing and Terminating the Indexing Pro-
                                                                                           balanced key distribution (        ). Then for partitioning at
                                                                                                                                          ™} C –
    cess
                                                                                           one level, peers engage in           bilateral interactions on
                                                                                                                                             
                                                                                                                                       $©pÃ!
The partitioning algorithm introduced in Section 2 enables                                 average. In addition to locating a peer in the same partition
reaching a decision in parallel on bisecting the key space                                 at level , peers have to route on expectation
                                                                                                       Ø                                            steps        p n$ Ø 
                                                                                                                                                                 ›‚¦Ã!
proportionally among a group of autonomous peers. In the                                   when performing the refer interaction. This shows that the                   }
indexing process the algorithm is executed multiple times                                  total number of interactions is also of order
                                                                                                                                       }                 .         $         #
                                                                                                                                                                     %# !†Ž
and a synchronization mechanism is needed. In addition                                     However, the latency is               as opposed to
                                                                                                                                     $
                                                                                                                                     %# !
                                                                                                                                                    in              $ #
                                                                                                                                                                       %Ž
peers need to autonomously recognize when to terminate                                     the standard maintenance model.
the indexing process. We realize this as follows.
    The peer communicating the decision to start the in-                                   4.4 Simulation of the System
dexing process provides the parameters                and        31( '
                                                                 20                75( #
                                                                                   64      To study the global behavior of the indexing algorithms
as used in      PRRBRPGE. The values are chosen such that
               # QIQI H F                                                                 when integrating all the elements discussed so far, we per-
C 20
iX8( '        p 64       , where
                           ÍÌ
              pn 75( # d30 '            is the average number
                                                  d30 '
                                                  ÍÌ                                       formed simulation studies implemented in Mathematica.
of data keys peers hold (as mentioned in Section 4.1 this                                  We were mainly interested whether the desired load balanc-
information can be derived from information piggy-backed                                   ing properties would be achieved under the various approx-
to the votes). Additionally, it provides a time        . Before          @ ›74 I
                                                                           4 6             imations and whether the algorithm performs as predicted.
starting to partition, peers replicate their data keys at time                                 In the simulations we used peer populations of sizes
  4 64
@ ›7uIto   64 (
           98l#randomly chosen other peers. Thus at the start                              256, 512, and 1024. As data distributions we used a uni-
of the indexing process all data keys are already replicated
                                                                                                                                                                              Չ
                                                                                           form distribution, a Pareto distribution with PDF          with                      ÚÙ
                                                                                                                                                                             Õ v 0 2
the desired number of times in the network.                                                parameters          and
                                                                                                                        §           §¥  § 
                                                                                                                                   ©¦¤ p¨¥ ¦¤
                                                                                                                                      , and a Normal dis-
    Besides estimating the number of data keys in the cur-
                                                                                                             C Ø                            Ê     C F   Ê             ‚!§ ¤ ¦¤
                                                                                           tribution with mean value and standard deviation               ,
                                                                                                                                               ™}
rent partition, peers also have to estimate the number of
                                                                                                                                                                          Ê
                                                                                           and test data from text retrieval experiments (project Alvis).
current peers, in order to perform the proper decisions in                                 In Figure 6 these distributions are denoted as U, P0.5, P1,
algorithm              . Attempting this directly, by learning
               PBRBRPGE
               # QIQI H F                                                                 P1.5, N and A. The Pareto and Normal distributions repre-
about all existing replicas at each level of the partitioning                              sent cases with extremely skewed distributions. Initially,
process, would unnecessarily slow down the progress of                                     we randomly assigned 10 keys from the distributions to
indexing. Instead, we estimate the number of replicas in a                                 peers, so that they held samples. We tested with
partition by analyzing the overlap in the sets of data keys of                                                  ¤
                                                                                                                ¨§                                                    Ê ÛC 75`#
                                                                                                                                                                           64 (
                                                                                           and               such that at least 5 (respectively 10) repli-
two peers interacting in a balanced split. If       denotes the
                                                                                                 C 75%#
                                                                                                   64 (
                                       ¥         p§
                                                 ¥                    4 Î                  cas of the keys are generated. Typically the experiments
set of data keys peers                hold, and
                                    C Q ˜–
                                         4            p          ,Î         } Î g™ Î C
                                                                                Ï          had    C 31©'
                                                                                                    20 (
                                                                                                                   ¨§
                                                                                                                   ¤
                                                                                                                   . All experiments were repeated 10
                                                                                                                        64 (
                                                                                                                        78l#
then                is a maximum likelihood estimate of the
                   Ó
                   ›6
                hj &%Ó gÑ rÐ v ÐÑ
                Ô Ö Õ Ò Ð
                      Ð                                                                    times and the results were averaged. The algorithms were
expected number of peers in the current partition. For ex-                                 implemented as described above. The experiments were
                        h Ë ÐÑ
                          Ð
ample, if   } Î C ™ Î   and                   then it should be
                                             C × ™ ×Î     31©'
                                                          20 (                             executed on a workstation cluster using up to 36 machines
expected to have
         6
         748(%#            peers in the partition since initially                          and were running for more than a week. Note that there
data keys have been replicated            times. To ensure the
                                                      75( #
                                                      64                                   were 36 separate experiments, each conducted 10 times.
correctness of this estimation was the purpose of initially                                Furthermore, in a real network the peers would use exclu-
replicating the data.                                                                      sive resources, and thus the actual overlay construction pro-
    During partitioning, peers that have extended their paths                              cess is much faster.
attempt to immediately contact other peers to perform the                                      For evaluating the experiments we primarily were de-
partitioning at the next level. If they do not succeed in iden-                            termining the degree to which the load balancing of peers
tifying a different peer in the same partition with which a                                across key space partitions worked. To do so, we compared
useful interaction can take place, i.e., “divide and conquer”                              the generated key sets to the distribution, that would be
or “replicate”, after a fixed number of attempts (e.g., 2),                                 generated by global coordination (                algorithm).
                                                                                                                                                   # QIQI H F
                                                                                                                                                   PRRrRPGE
using the refer interaction (see Figure 2), they stop to ini-                                  The
                                                                                           ¨PRR©§RPGE ¥
                                                                                           ¥  #  Q ¥ rQ I H F
                                                                                                  I             algorithm generates a distribution
                                                                                                                       ¥
tiate interactions and only will continue after being con-                                                          , where
                                                                                                                C Q Bl# ©
                                                                                                                    $4 4 Ø     x are the     partitions of
                                                                                                                                              4
                                                                                                                                              ©Ø             x
the key space generated and are the number of peers as-                 4 #                                                grow gracefully in terms of the network size, as expected
sociated with each partition. We compared this distribution
                                          ¥                                                                                from theory. However, skew in the data distribution can
to the distribution          generated by the decentralized
                                    $ h4 # h 4 Ø                                                                         significantly increase the bandwidth consumption.
algorithm.                                     Ü
                                            •®
                                            Ý                                            }                                 5 Experimental evaluation
                                                                $ h4 ¤q 4 Ž
                                                                     #         #
                                                                                                                           We used the PlanetLab infrastructure [11] to obtain re-
                                                      4
                                                                                                                           sults from large-scale experiments under realistic network-
                                                  ™                           Ý ®
                                              Ý                        h4 # 4
    As explained in Section 2, we consider the distribution                                                                ing conditions and to verify our theoretical predictions and
generated by                as the optimal distribution. Mea-
                             PBRBRPGE
                             # QIQI H F                                                                                   simulation experiments. PlanetLab (http://www.planet-lab.
suring the distance to this distribution provides a measure                                                                org/) is a global testbed for large-scale experiments with
for the quality of load balancing.                                                                                         distributed systems. At the moment it consists of ap-
    The first experiment (Fig 6(a)) for                      and                              Þ98( #
                                                                                             C 64                Ê         proximately 530 nodes geographically distributed over the
    wX8( '
    C 20
                       ¤
                       ¨§
             shows the quality of load balancing depending                                                                 whole planet running a modified version of Linux to sup-
on the peer population size for the different distributions.                                                               port efficient administration and resource sharing for large-
One can observe that the quality remains practically stable                                                                scale experiments. Nodes are connected via a diverse col-
independent of the size.                                                                                                   lection of links. Our experiments on PlanetLab ran on up
    We also investigated the influence of the replication fac-                                                              to 300 nodes depending on the number of available nodes.
tor       by comparing
          98( #
          64
                                            ¥ ¤ ¥ ¨¦!¥
                                                 p Ê
                                                     § ¥ ¤ §
                                                    (Fig 6(b)).
                                                             Ê C 64
                                                             ¡Ä98( #                                  Ê p                  Each node executed one instance of a P-Grid node. When
In principle the load balancing properties should not be                                                                   interpreting the results presented in the following, it is im-
affected as we measure deviations relative to the average                                                                  portant to consider that PlanetLab is shared by a large num-
replication. This is confirmed for less skewed distribu-                                                                    ber of research groups for experiments that are executed in
tions, whereas for the strongly skewed distributions a cer-                                                                parallel and thus mutually influence the performance con-
tain degradation can be observed. We have still to investi-                                                                siderably especially with respect to absolute latency.
gate in detail the reasons for this effect, but most likely it is
related to the relatively low number of partitions with high                                                               5.1 Experimental setup
replication factors.
                                                                                                                           We deployed the P-Grid software, i.e., the peers, on all
    We were also interested in the influence of the sample
                                                                                                                           available nodes at the times the experiments were con-
size        on the quality of load balancing. It might be ex-
                                                                                                                           ducted and assigned 10 keys from a real text collection
           20
           31( '
pected that more samples lead to higher accuracy. In fact,
                                                                                                                           (taken from our Alvis information retrieval project) to each
the result (Fig 6(c)) shows that no such influence exists.
                                                                                                                           peer. This relatively low number of keys was chosen to
This is insofar important as it shows that the partitioning
                                                                                                                           speed up experiments and as we have already seen, sample
can be done using very small samples which enables sev-
                                                                                                                           size has little influence on load balancing. To validate our
eral possibilities for optimization to reduce bandwidth con-
                                                                                                                           experiments, we also performed tests with larger numbers
sumption.
                                                                                                                           (up to 2000 keys per peer) and used various distributions,
    In order to understand the quality of the load distribu-
                                                                                                                           including uniform random distribution and Pareto distribu-
tions achieved we also analyzed the role of our theoretical                                                 ‡              tion.
framework (Fig 6(d)). We replaced the functions                                                                 "Œ›‚tA ©
                                                                                                                               The time-line of the experiments was as follows: In an
                     Š                                                                                          $ – É É
and            by heuristic functions which likely would be
           ˜Œ ttA ©
                                                                                                                           initial phase starting at time , peers join the system by
           $ – ÉÉ
chosen in the absence of a theoretical understanding of
                                                                                                                                                                                  I                            â ¤
                                                                                                                                                                                                               pp‚
                                                                                                                           contacting a bootstrap peer (until                   ) and form
their properties. The hypothesis we wanted to verify was
                                                                                                                                                                                                 e
                                                                                                                                                                                                 ¹I                  #
                                                                                                                                                                                                                     BQ           â
                                                                                                                           an unstructured overlay network (from until                    )
whether the concrete nature of these functions plays a sig-
                                                                                                                                                                                                           I              Ê –oI
                                                                                                                                                                                                                            ã e       #
                                                                                                                                                                                                                                      BQ
                                                                                                                           which is used later to replicate data a fixed number of times
nificant role in view of the many approximations made in                                                                                          â                          â ¤
                                                                                                                                                                            ›p„
                                                                                                                           (from              until             ). In the replication phase
the overall distributed algorithm. We chose
                                                                                                                                    Ê dI
                                                                                                                                      ã e            #
                                                                                                                                                     rQ           e
                                                                                                                                                                  )I                  #
                                                                                                                                                                                      BQ
                                                                                                                           peers randomly choose 5 peers from the unstructured over-
                                                                                                                           lay network to replicate their data. Subsequently, from
                                                                    §
                            áà ß
                            RP`‡                                               áà
                                                                               RPß Š ¥            ¤                              â ¤
                                                                                                                                 pp„                      â ¤ ¤
                                                                                                                                                          pp©‚
                                   C ˜Œ©É
                                     $ –
                                                              q ™
                                                                           §             C ˜ŒÉ
                                                                                           $ –                            e
                                                                                                                           oI           to
                                                                                                                                         BQ
                                                                                                                                         #           oI
                                                                                                                                                     e  , the structured overlay network
                                                                                                                                                                       #
                                                                                                                                                                       BQ
                                                                                                                           is constructed using the approach presented in this paper.
    These functions exhibit qualitatively the same behavior                                                                We were especially interested in evaluating the bandwidth
                                                          —
as the ones used by AEP. The experiment was executed for
                   „                                                                                                       consumption during this phase and to verify whether the
#     C     and
             Ê p           . The conclusion is clear from
                               Ê ¦C 75%#
                                    64 (                                                                                   theoretically predicted load balancing properties of the al-
the result: Even a minor change to the theoretically correct                                                               gorithm are achieved under realistic networking conditions.
functions degrades the quality of load balancing substan-                                                                  Then we run queries on the constructed overlay network
                                                                                                                                 â ¤ ¤
                                                                                                                                 ›p©‚                      â ¤
                                                                                                                                                           pp¤
tially. Thus the theoretical basis proves valuable despite                                                                 (e
                                                                                                                            XI            to        ã e
                                                                                                                                                     “I) to analyze search performance.
                                                                                                                                                             #
                                                                                                                                                             BQ        #
                                                                                                                                                                       BQ
many idealizing assumptions.                                                                                               Each peer performed a search every 1–2 minutes. In the fi-
                                                                                                                                                     ¤
                                                                                                                                                  ⛩¤                                     â ¤
                                                                                                                                                                                           ›p¤
    We also analyzed the communication costs of the algo-                                                                  nal phase (        #BQ    to ã e
                                                                                                                                                         ³I            ) network churn is
                                                                                                                                                                             Ê ³I
                                                                                                                                                                               e                      #
                                                                                                                                                                                                      BQ
rithm. We can see that both the number of interactions per                                                                 simulated to evaluate the failure resilience of P-Grid. Each
peer (Fig 6(e)), and the overall bandwidth consumption per                                                                 peer independently decides to go offline 1–5 minutes ev-
peer measured in terms of the total number of data keys ex-                                                                ery 5–10 minutes which causes considerable churn that the
changed among all peers during the interactions (Fig 6(f))                                                                 system has to compensate.
                                   Deviation for various peer populations                                                             Deviation for various desired replications n_min                                                                    Deviation for various data sample sizes
                                                                                                                1
                                                                                                                                                                                                                              0.8
   0.5
                                                                                                               0.8

   0.4                                                                                                                                                                                                                        0.6

                                                                                                               0.6
   0.3
                                                                                                                                                                                                                              0.4
                                                                                                               0.4
   0.2

                                                                                                                                                                                                                              0.2
                                                                                                               0.2
   0.1




               U            P0.5             P1.0              P1.5              N                A                         U            P0.5          P1.0          P1.5                     N              A                            U             P0.5        P1.0          P1.5           N            U


 (a) Varying peer population: n = 256, 512, 1024; (b) Varying required replication: n = 256; (c) Varying data sample size: n = 256;                                                                                                                                                                    y 3`W S
                                                                                                                                                                                                                                                                                                         aY
                   ;
 æ fe     fe b å ä
 y 9`W b hgW d&¤y 3`W S
                   aY                                             ;          5, 10, 15, 20, 25 10, 20, 30    ;       fe b å ä a Y
                                                                                                             y hgW b 9gW d&zy 3oW S
                                                                                                               fe                                                                                                                               y hgW b hgW b
                                                                                                                                                                                                                                                æ  fe    fe
                         Deviation for n_min 5,10 model theoretical using vs.heuristics                              Interactions required per peer for overlay construction for various population sizes
                                                                                                                                                                                                                            Total number of data keys moved per peer bandwidth consumption for overlay construction

                                                                                                                                                                                                                       15000
  1.2                                                                                                         12

                                                                                                                                                                                                                       12500
   1                                                                                                          10

                                                                                                                                                                                                                       10000
  0.8                                                                                                          8

                                                                                                                                                                                                                        7500
  0.6                                                                                                          6

                                                                                                                                                                                                                        5000
  0.4                                                                                                          4

                                                                                                                                                                                                                        2500
  0.2                                                                                                          2


                                                                                                                                                                                                                                          U           P0.5        P1.0         P1.5          N            A
         U 5   U 10   P0.5 5 P0.5 10 P1.0 5 P1.0 10 P1.5 5 P1.5 10         N 5       N 10   A 5       A 10                 U           P0.5          P1.0          P1.5                   N                  A


                      (d) Theory vs. Heuristics                                                              (e) Interactions per peer: n = 256, 512, 1024; (f) Bandwidth consumed (data keys moved): n =
                                                                                                             æ fe              ;
                                                                                                              y hgW b 9gW d&z13oS
                                                                                                                       fe b å ä y a Y W                     256, 512, 1024;                 ;                                                                æ        fe b å ä y a Y W
                                                                                                                                                                                                                                                             y hgW b 9gW d&z13oS
                                                                                                                                                                                                                                                                fe
                                                                                 Figure 6: Simulation results for various experiment scenarios.


5.2 Experimental Evaluation                                                                                                                                        bandwidth consumed by queries.
                                                                                                                                                                                                   300

We first verified that the system behavior matches the the-
oretical predictions and the simulations. The experiment                                                                                                                                           250

was performed with 296 peers and compared to simulation
results using the same number of peers and the same key                                                                                                                                            200

set.
    The quality of load balancing is evaluated as defined in
                                                                                                                                                                                 peers




                                                                                                                                                                                                   150

Section 4.4 and is practically identical for simulations and
experiments, with an average of 0.38 for 10 simulations                                                                                                                                            100

(the standard deviation is 0.05) resp. a value of 0.39 for
the experiment. This indicates that the theoretically pre-                                                                                                                                          50

dicted load distribution properties are met quite accurately
by the implementation even under realistic network condi-                                                                                                                                            0
                                                                                                                                                                                                         0       50   100           150       200        250       300      350       400        450      500
tions with slow connections and communication failures.                                                                                                                                                                                             Time [minutes]


    We now report some system measurements that we
                                                                                                                                                                                                         Figure 7: Number of participating peers
made to evaluate the performance of the overlay network,
both during the construction phase, as well as in its opera-
tional lifetime both in a static situation (no change in peer                                                                                                                                      300

population) as well as under churn (peers leave and join the
                                                                                                                                                                                                                                                                                            maintenance
                                                                                                                                                                                                                                                                                            queries

network).                                                                                                                                                                                          250


    Figure 7 shows the number of peers in the overlay at a
given time. We see how first peers join the network and the                                                                                                                                         200


number of peers in the network increases to the maximal
                                                                                                                                                                                 Bandwidth [Bps]




number. Then during the construction phase this number                                                                                                                                             150


is stable (approx. 300 peers) while decreasing again in the
final phase where we simulate network churn and a sub-                                                                                                                                              100


stantial dynamic fraction of peers becomes unavailable.
    Figure 8 shows the aggregate bandwidth consumption                                                                                                                                              50


of all peers (maintenance and queries) in Bytes/sec. Dur-
ing the construction phase the bandwidth consumption                                                                                                                                                 0
                                                                                                                                                                                                         0       50   100           150       200        250       300      350       400        450      500

reaches a peak of 250 Bytes/sec per peer. The mainte-                                                                                                                                                                                               Time [minutes]



nance consumption decreases quickly down to less than
                                                                                                                                                                                               Figure 8: Aggregate bandwidth consumption
100 Bytes/sec and becomes negligible compared to the
   Figure 9 shows the average query latency and its stan-                                                 (range queries, etc.). Thus in standard overlay approaches,
dard deviation. The absolute values are relatively high                                                   typically an additional index on top of the overlay network
and essentially reflect the poor response time of PlanetLab                                                needs to be created [22]. The advantage of this approach
nodes. The response time is slightly higher with a larger de-                                             is its universal usability on top of any DHT. However, it is
viation during the network churn because requested peers                                                  considerably less efficient than our approach since seman-
may be offline which has to be compensated.                                                                tically close data items are not necessarily stored close to
                      60
                                                                               standard deviation
                                                                                                          each other in the overlay network (high fragmentation), and
                                                                               average
                                                                                                          hence, multiple overlay network queries are required to lo-
                      50
                                                                                                          cate all the semantically close content. Thus, apart from the
                                                                                                          additional effort of constructing an additional index, such
                      40                                                                                  schemes additionally suffer from inefficiencies throughout
                                                                                                          the operational phase of the system.
     Time [seconds]




                      30
                                                                                                              In contrast to that, we build a trie that clusters semanti-
                                                                                                          cally close data, thus realizing in-network indexing which
                      20
                                                                                                          enables more efficient query processing. This comes at the
                                                                                                          expense of a more sophisticated construction process for
                      10
                                                                                                          such data-oriented overlay networks. Additionally, more
                                                                                                          complex online load-balancing strategies have to be ap-
                      0
                      300   320   340   360   380        400       420   440   460      480         500   plied, as presented in this paper.
                                                    Time [minutes]

                                                                                                              Online load-balancing is widely researched area in the
                  Figure 9: Query latency                                                                 distributed systems domain which often been modeled as
    We observed that the number of query hops per query                                                   “balls into bins” [21]. Traditionally, randomized mech-
is as low as theoretically expected, i.e, approx. half of the                                             anisms for load assignment, including load-stealing and
mean path length, even during churn. The average path                                                     load-shedding and power of two choices [18], have been
length was slightly below 6 and the average number of                                                     used, some of which can partly be reused in the context of
query hops per query was approximately 3. Moreover after                                                  P2P systems [10, 15], but with limited applicability. For
the construction phase has led to full evolution of the over-                                             example, [15] provides storage load-balancing as well as
lay network, all peers discovered all their replicas, and the                                             key order preservation to support range queries, but at the
system had an expected mean replication factor of 5, as in-                                               cost that efficient searches of isolated keys can no longer
tended, and success rate for queries was between 95% and                                                  be guaranteed.
100% even during network churn. Queries were mainly                                                           The dynamic nature of P2P systems is also different
unsuccessful because of network problems such as lost or                                                  from the online load-balancing of temporary tasks [9] be-
corrupted messages.                                                                                       cause of the lack of global knowledge and coordination.
    Finally, we would like to point out that the current ex-                                              Moreover, for replication balancing, there are no real bins,
perimental evaluation is still limited in the following sense:                                            and actually the number of bins varies over time because
The moderate number of available peers does not allow us                                                  of storage load balancing, but the balls (peers) themselves
to obtain significant results on the reduction of latency dur-                                             have to autonomously migrate to replicate overloaded key
ing bootstrapping as predicted by our theoretical analysis                                                spaces. Also, for storage load balancing, the balls are es-
in Section 4.3 and which is one of the main properties of                                                 sentially already determined by the data distribution, and it
our approach.                                                                                             is essentially the bins that have to fit the balls by dynami-
                                                                                                          cally partitioning the key space, rather than the other way
                                                                                                          round.
6 Related work
                                                                                                              A distinguishing property of our approach to all other
The fundamental problems to address for any large-scale                                                   related load-balancing strategies is actually that we address
distributed indexing systems are distributed index construc-                                              two, sometimes conflicting load-balancing problems—
tion and load-balancing. Traditionally structured overlay                                                 storage load, i.e., balancing the amount of storage used
networks, mainly based on distributed hash tables (DHTs),                                                 at the nodes, and replication load, i.e., ensuring approx-
have followed sequential construction and maintenance                                                     imately uniform data availability by having roughly the
strategies (online balancing) [12, 17, 24, 25]. In contrast to                                            same number of replicas per data partition. The first step
this, our approach applies a highly parallel strategy which                                               in that direction was a heuristic key space bisection pro-
speeds up the construction process, takes advantage of the                                                posal [2]. In comparison to the heuristics, we now exhaus-
distributed computing resources by allowing the partici-                                                  tively analyze and refine the bisection mechanism, in order
pants to work independently and asynchronously on the                                                     to better understand and guarantee superior load-balancing
construction, and enables the merging of independently                                                    characteristics in the overlay network emerging from the
created indices.                                                                                          recursive use of the bisection algorithm. Additionally, we
   To address load-balancing, the standard strategy of over-                                              now not only simulate the construction process, but verify
lay approaches is to use uniform hashing of keys to remove                                                the analytically predicted properties using a fully-fledged
skew from the distribution. However, this defeats the appli-                                              implementation (P-Grid), deployed on PlanetLab, to back
cability of overlay networks to semantic processing of keys                                               up our analysis and simulation results with large-scale ex-
perimental data. The overlay network is already used as                    [2] K. Aberer, A. Datta, and M. Hauswirth. Multifaceted Simultaneous
a substrate for two data-oriented applications—a peer-to-                      Load Balancing in DHT-based P2P systems: A new game with old
                                                                               balls and bins. Self-* Properties in Complex Information Systems,
peer search engine (http://www.alvis.info/) and a semantic                     “Hot Topics” series, LNCS, 2005.
overlay network [1].
                                                                           [3] Karl Aberer. P-Grid: A self-organizing access structure for P2P
    Furthermore, most existing load-balancing as well as                       information systems. In CoopIS, 2001.
overlay network construction mechanisms have so far been                   [4] Karl Aberer, Anwitaman Datta, Manfred Hauswirth, and Roman
sequential. However, the need for faster overlay construc-                     Schmidt. Indexing data-oriented overlay networks. Technical
tion has recently generated interest in the research commu-                                                              e e
                                                                               Report IC/2005/008, Ecole Polytechnique F´ d´ rale de Lausanne
nity, as is evident from some recent publications [8, 14].                     (EPFL), 2005.
    Both [8] and [14] use random interactions among peers,                 [5] S. Abiteboul, I. Manolescu, and N. Preda. Constructing and Query-
                                                                               ing Peer-to-Peer Warehouses of XML Resources. In SWDB, 2004.
induced potentially by the original unstructured topology,
and try to build a desired topology, by essentially trying to              [6] L. Onana Alima, S. El-Ansary, P. Brand, and S. Haridi. DKS(N,k,f):
                                                                               A Family of Low Communication, Scalable and Fault-Tolerant In-
sort the peers according to their identifiers that are gener-                   frastructures for P2P Applications. In 3rd IEEE/ACM International
ated at the beginning of the process. These mechanisms                         Symposium on Cluster Computing and the Grid (CCGRID), 2003.
can again be used for overlay networks construction which                  [7] L. Onana Alima, A. Ghodsi, and S. Haridi. A Framework for Struc-
support search of keys generated by uniform hashing, since                     tured Peer-to-Peer Overlay Networks. In Post-proceedings of the
then peer identifiers can be simply generated using uni-                        Global Computing Conference, LNCS. Springer Verlag, 2004.
form hashing, as there is no skew in the load-distribution.                [8] D. Angluin, J. Aspnes, J. Chen, Y. Wu, and Y. Yin. Fast construction
                                                                               of overlay networks. In SPAA, 2005.
However, for data-oriented applications such a mechanism
has a critical limitation, since peers are predestined for the             [9] Y. Azar, B. Kalyanasundaram, S. Plotkin, K. Pruhs, and O. Waarts.
                                                                               On-line load balancing of temporary tasks. Journal of Algorithms,
amount of load (based on the whole set of peer identifiers                      22:93–110, 1997.
generated at the beginning of the process), and there is no               [10] J. Byers, J. Considine, and M. Mitzenmacher. Simple Load Balanc-
flexibility or adaptivity for load-balancing, particularly if                   ing for Distributed Hash Tables. In IPTPS, 2003.
the load is skewed. Our scheme—this paper as well as                      [11] B. Chun, D. Culler, T. Roscoe, A. Bavier, L. Peterson, M. Wawrzo-
[2]—on the other hand adaptively creates the key space par-                    niak, and M. Bowman. PlanetLab: An Overlay Testbed for Broad-
titions and assigns peers to these partitions based on load                    Coverage Services. ACM SIGCOMM Computer Communication
                                                                               Review, 33(3), July 2003.
characteristics, and is thus a more generic parallel overlay
construction mechanism. For the special case of uniform                   [12] P. Ganesan, M. Bawa, and H. Garcia-Molina. Online Balancing of
                                                                               Range-Partitioned Data with Applications to Peer-to-Peer Systems.
load distribution (as it is traditionally assumed in DHTs                      In VLDB, 2004.
using uniform hashing), we can easily construct a load-
                                           ¦¤
                                                                         [13] A. Y. Halevy, Z. G. Ives, J. Madhavan, P. Mork, D. Suciu, and
balanced overlay by requiring        C –   in each step of the
                                                Ê                              I. Tatarinov. The Piazza Peer Data Management System. TKDE,
partitioning.                                                                  16(7), 2004.
                                                                          [14] M. Jelasity and O. Babaoglu. T-Man: Gossip-based overlay topol-
7 Conclusions                                                                  ogy management. In Engineering Self-Organising Applications
                                                                               (ESOA’05), 2005.
The fast (re-)construction of data-oriented structured over-              [15] D. R. Karger and M. Ruhl. New Algorithms for Load Balancing in
lay networks is an emerging research topic which has                           Peer-to-Peer Systems, 2003. IRIS Student Workshop (ISW).
not yet been covered exhaustively in the literature. (Re-                 [16] G. Koloniari and E. Pitoura. Content-Based Routing of Path Queries
)indexing due to changing application requirements is a                        in Peer-to-Peer Systems. In EDBT, 2004.
frequent scenario in data-oriented applications and neces-                [17] G. S. Manku. Balanced binary trees for ID management and load
                                                                               balance in distributed hash tables. In ACM PODC, 2004.
sitates the efficient (re-)construction of overlay networks.
                                                                          [18] M. Mitzenmacher. The power of two choices in randomized load
Existing approaches are essentially serialized and do not                      balancing. IEEE Transactions on Parallel and Distributed Systems,
take into account inherent intricacies like preservation of                    12(10), 2001.
key-ordering relationships to enable semantic processing                  [19] W. Nejdl, M. Wolpers, W. Siberski, C. Schmitz, M. Schlosser,
on data keys. In this paper we have presented an effi-                                                  o
                                                                               I. Brunkhorst, and A. L¨ ser. Super-peer-based routing strategies for
cient, completely decentralized algorithm which supports                       RDF-based peer-to-peer networks. J. Web Sem., 1(2), 2004.
the fast, parallel construction of structured overlay net-                [20] C. G. Plaxton, R. Rajaraman, and A. W. Richa. Accessing Nearby
works from scratch based on a recursive bisection scheme                       Copies of Replicated Objects in a Distributed Environment. In
                                                                               SPAA, 1997.
that preserves key semantics and provides good load-
balancing for skewed distributions both for storage and                   [21] M. Raab and A. Steger. “Balls into Bins” - A Simple and Tight
                                                                               Analysis. In RANDOM, 1998.
replication load. We prove the efficiency of our approach
                                                                          [22] S. Ramabhadran, S. Ratnasamy, J. M. Hellerstein, and S. Shenker.
by analytical results which are verified by simulation and                      Brief Announcement: Prefix Hash Tree. In ACM PODC, 2004.
large-scale experiments of a complete system implementa-                  [23] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A
tion on PlanetLab. The implementation is available from                        Scalable Content-Addressable Network. In ACM SIGCOMM, 2001.
http://www.p-grid.org/.                                                   [24] A. Rowstron and P. Druschel. Pastry: Scalable, distributed object
                                                                               location and routing for large-scale peer-to-peer systems. In Mid-
References                                                                     dleware, 2001.
                                                                          [25] I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan.
                       e
 [1] K. Aberer, P. Cudr´ -Mauroux, M. Hauswirth, and T. van Pelt. Grid-        Chord: A Scalable Peer-To-Peer Lookup Service for Internet Appli-
     Vine: Building Internet-Scale Semantic Overlay Networks. In               cations. In Proceedings of ACM SIGCOMM, 2001.
     ISWC, 2004.

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:10
posted:9/2/2012
language:
pages:12
Description: Articles about different types of topics useful for College students.