Document Sample

Indexing data-oriented overlay networks Karl Aberer, Anwitaman Datta, Manfred Hauswirth, Roman Schmidt School of Computer and Communication Sciences e e Ecole Polytechnique F´ d´ rale de Lausanne (EPFL) CH-1015 Lausanne, Switzerland Abstract as an access structure for highly distributed data-oriented applications, such as relational query processing, metadata The application of structured overlay networks search or information retrieval [5, 19]. Their use was moti- to implement index structures for data-oriented vated by the presence of certain features that are supported applications such as peer-to-peer databases or by their design such as scalability, decentralized mainte- peer-to-peer information retrieval, requires highly nance, and robustness under network churn. Compared to efﬁcient approaches for overlay construction, unstructured overlay networks which are also being pro- as changing application requirements frequently posed for these applications [13, 16], structured overlay lead to re-indexing of the data and hence (re- networks additionally exhibit much lower bandwidth con- )construction of overlay networks. This prob- sumption for search. lem has so far not been addressed in the liter- The standard maintenance model for peer-to-peer over- ature and thus we describe an approach for the lay networks assumes a dynamic group of peers forming efﬁcient construction of data-oriented, structured a network where peers can join and leave, essentially in a overlay networks from scratch in a self-organized sequential manner. In addition proactive or reactive main- way. Standard maintenance algorithms for over- tenance schemes are used to repair inconsistencies result- lay networks cannot accomplish this efﬁciently, as ing from node and network failures or to re-balance load they are inherently sequential. Our proposed al- in order to react to data updates. These approaches to gorithm is completely decentralized, parallel, and maintenance, that have been extensively studied in the lit- can construct a new overlay network with short erature, correspond essentially to updating database index latency. At the same time it ensures good load- structures in reaction to updates. balancing for skewed data key distributions which In contrast to this, almost no results exist on how to ef- result from preserving key order relationships as ﬁciently construct a large overlay network from scratch, necessitated by data-oriented applications. We i.e., how to bootstrap a new, large-scale, structured overlay provide both a theoretical analysis of the basic al- network in a practical way within reasonable time. This gorithms and a complete system implementation is understandable insofar as most of the work on over- that has been tested on PlanetLab. We use this im- lay networks was done under the assumption of providing plementation to support peer-to-peer information an efﬁcient resource location scheme using an application- retrieval and database applications. speciﬁc, yet fairly stable, resource identiﬁer space (e.g., ﬁle names for ﬁle sharing). 1 Introduction With the increasing adoption of structured overlay net- work technology for data-oriented applications this as- In standard database systems it is common practice to reg- sumption no longer holds. Resources are identiﬁed by dy- ularly (re-)index attributes to meet changing requirements namically changing predicates and different overlay net- and optimize search performance. Recently, structured works can be used simultaneously, each of them supporting peer-to-peer overlay networks are increasingly being used a speciﬁc addressing need. We can illustrate these require- ¡ The work presented in this paper was supported (in part) by the Na- ments by a typical application case of peer-to-peer infor- tional Competence Center in Research on Mobile Information and Com- mation retrieval which we investigated recently. munication Systems (NCCR-MICS), a center supported by the Swiss The standard application of structured overlay networks National Science Foundation under grant number 5005-67322 and was in peer-to-peer information retrieval is the implementation (partly) carried out in the framework of the EPFL Center for Global Com- puting and supported by the Swiss National Funding Agency OFES as of a distributed inverted ﬁle structure for efﬁcient keyword part of the European project Evergrow No 001935. based search. In this scenario, several situations occur, Permission to copy without fee all or part of this material is granted pro- in which the overlay network has to be constructed from vided that the copies are not made or distributed for direct commercial scratch: advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the ¢ A set of documents that is distributed among (a po- Very Large Data Base Endowment. To copy otherwise, or to republish, tentially very large number) of peers is identiﬁed as requires a fee and/or special permission from the Endowment. holding information pertaining to a common topic. To Proceedings of the 31st VLDB Conference, support efﬁcient retrieval for this speciﬁc document Trondheim, Norway, 2005 collection, a dedicated overlay network implementing inverted ﬁle access may have to be set up. joins and leaves. ¢ A new indexing method, for example, a new text ex- In this paper we will address the problem how a struc- traction function for identifying semantically relevant tured overlay network can be constructed efﬁciently from keywords or phrases, is being used to search a set of scratch, a problem that the research community has only semantically related documents distributed among a recently identiﬁed and started to address [2, 8, 14]. Our large set of peers. Since the index keys change as a approach is a generic mechanism to autonomously parti- result of changing the indexing method a new overlay tion a keyspace in a completely parallel manner. The ap- network needs to be constructed to support efﬁcient proach can potentially be used for constructing any struc- access. tured overlay with ﬁxed key space partitioning [7]. ¢ Due to updates to a distributed document collection In data-oriented applications there exists an additional an existing distributed inverted ﬁle has become ob- factor that adds to the difﬁculty of ﬁnding a solution to this solete. This may either result from not maintaining problem: load balancing. When using overlay networks for the inverted ﬁle during document updates or due to semantic processing of keys (range queries being a popu- changing characteristics of the global vocabulary and lar example) the canonical method of uniform hashing of thus changing the indexing strategy (e.g., term selec- keys to remove skew in the key distribution is no more ap- tion based on inverse document frequency). Thus a plicable. This has led to substantial research on including complete reconstruction of the overlay network is re- load balancing features into overlay networks [2, 12, 17]. quired. During construction this must be taken into account, thus ¢ Due to catastrophic network failures the standard the construction approach also has to solve load balancing maintenance mechanisms no longer can reconstruct problems. In fact, we will address two types of load bal- a consistent overlay network. Thus the overlay net- ancing problems simultaneously: the balancing of storage works needs to be constructed from scratch. Of load among peers under skewed key distributions and the course, this scenario applies generally in any applica- balancing of the number of replica peers across key space tion, but becomes more probable when multiple over- partitions. The ﬁrst problem is important to balance work- lay networks are deployed in parallel. load among peers and is solved by adapting the overlay In principle a (re-)construction of an overlay network network structure to the key distribution. The second one in any of these scenarios can be achieved by the standard is important to guarantee approximately uniform availabil- maintenance model of sequential node joins and leaves. ity of keys in unreliable networks where peers have poten- Most existing proposals for structured overlay networks tially low availability. This is a classical “balls into bins” [17, 24, 25] do not offer a completely parallel construction load balancing problem. process involving all peers simultaneously. They assume a Our approach is based on a keyspace bisection process model of joins of peers in an essentially sequential process. through a completely decentralized, parallel, and random- However, this approach encounters two serious problems: ized algorithm for assigning peers to key space partitions ¢ The peer community will have to decide on a serial- in proportion to the key distributions of the partitions. By ization of the process, e.g., electing a peer to initiate recursively applying keyspace bisections, peers can incre- the process. Thus the peer community has to solve mentally construct the overlay network while maintaining a leader election problem, which might turn out to load balance. We will introduce our approach in the context be unsolvable for very large peer populations without of the P-Grid overlay network structure [3], which we have making strong assumptions on coordination or limit- developed over the last years, though the essential elements ing peer autonomy. of the approach are applicable to all overlay networks us- ¢ Since the process is performed essentially in a serial- ing ﬁxed key space partitioning schemes, such as CAN [23] ized manner, it incurs a substantial latency. In partic- or Pastry [24]. We demonstrate the theoretical correctness ular it does not take any advantage of potential paral- of the basic keyspace bisection process by analysis and lelization, which would be a natural approach. simulation and show the feasibility of building a complete system matching the theoretically predicted behavior with In principle some systems like Pastry [24] would sup- experimental results obtained from a full-ﬂedged imple- port concurrent construction as they take an optimistic ap- mentation deployed on the PlanetLab [11] infrastructure. proach in which concurrent node joins are possible as long The resulting system (available at http://www.p-grid.org) as there are no conﬂicts. However, this assumes that there is currently used to implement both peer-to-peer retrieval already exists a large overlay, so that conﬂicts are rather (http://www.alvis.info/) and peer-to-peer data management unlikely. In an early stage of bootstrapping and with large systems [1]. number of peers joining concurrently, conﬂicts will be very likely, however. Thus this type of strategy is not applica- ble to the problem we are addressing. DKS [6] avoids this 2 Overview of the Approach problem by equipping joining peers with an approximate 2.1 A trie-structured overlay routing table which in the course of the operation of the overlay will be corrected (correction on use). While this We assume that data keys are taken from the key space £ § ¥ ©¨¦ £¤ approach is robust, it incurs considerable efforts as on av- consisting of the interval . The design of the P-Grid erage the number of lookups per peer required to stabilize overlay network is based on two simple principal ideas: (1) the network is of the same order as the number of node Divide and conquer: The key space is recursively bisected such that the resulting partitions carry approximately the search in terms of the communication cost of $$ # &%"! same workload and peers are associated with those parti- message, where is the number of leaf nodes in the tree, # tions. Using a bisection approach greatly simpliﬁes de- irrespective of the shape of the tree [2]. centralized load balancing by local decision making. (2) Canonical trie structure: Bisecting the key space induces 2.2 Overlay Network Construction a canonical trie structure which is used as the basis for im- plementing a standard, distributed preﬁx routing scheme for The process of constructing such an overlay network from efﬁcient search. The resulting overlay is illustrated in Fig- scratch should require low latency, i.e., be highly paral- ure 1. lel and require minimal bandwidth consumption. At the same time the following load balancing criteria should be achieved: 1. The partitioning of the search space should be such that each partition holds a maximal data load of , 20 ( 31)' e.g., measured as the number of keys present in the partition. We will call also the maximal storage 20 31( ' Replica sub-network load in the following. 2. Each resulting partition should be associated with a constant number of peers , such that the avail- 64 75( # ability of the different data keys is approximately the same. We will call also the minimal replication 64 98( # factor in the following. Recursive Partitioning With perfect load balancing these properties can be achieved iff. C 64 #@ A@ D75( B&©' , where is the total 20 # 31( ' @A@ &©' number of data keys and is the number of peers. Algo- # rithm 1 shows our global partitioning algorithm # QIQI H F PRRBRP©GE Load distribution that attempts to achieve these load balancing goals by best effort while bisecting the key space, if the idealizing as- sumptions are not met. Figure 1: Trie-structured overlay network Algorithm 1 Partition(p, n, d) At the bottom we see a possible skewed distribution of data keys in the interval £ § ¥ £¤ . We bisect the interval 1: if fe b U T 9`W dVcb and aY WSU T 3`X3V8S then 2: if fe b w v u s qi fe hgW VT tdq tqrphgW b then such that each resulting partition carries (approximately) 3: ; v b y s b y x dq %b b q )b the same load. Each partition can be uniquely identiﬁed q q 4: Partition( , , ); Partition( , , ) by a bit sequence. We associate one or more peers—in the x x x XS b p S b ) 5: else example exactly two—with each of the partitions. We will 6: if then S x PXS call the bit sequence of a peer’s partition the peer’s path. 7: ; x b b y fe b y x )%b 9gW )b The bit sequences induce a trie structure which is used to 8: Partition( , , ); Partition( , , ) x x x XS )b p PS b ) 9: else implement preﬁx routing. Each peer maintains references 10: analogous in its routing table that pertain to its path. More speciﬁcally, 11: end if for each bit position of its path it maintains one or more 12: end if randomly selected references to a peer that has a path with 13: end if the opposite bit at this position. Thus the trie structure is represented in a distributed fashion by the routing tables of The algorithm works as follows. Assume peers are # the peers. This topology is analogous to other preﬁx rout- associated with one key space partition containing data ' ing schemes that have been devised [20, 24] and have been keys and two sub-partitions and containing respec- " ©' classiﬁed as a ﬁxed key space partitioning scheme for struc- tively data keys, such that d' . To achieve load )gf' C ' ' e tured overlay networks in the literature [7]. Search in such balance criterion 1, a fraction of of peers should be as- jh di# §¥ h ¤ an overlay network is performed by resolving a requested sociated with partition for 4. In case " C Q 75l# k j h # 64 ( key bit by bit. When bits cannot be resolved locally, peers at least peers should be associated with to achieve 98( # 64 h 4 forward the request to a peer from its routing table. load balance criterion 2. recursively applies this # QIQI H F PRRBRP©GE We use replication in two ways in order to increase the bisection step to the key space. resilience of the overlay network when nodes of network For various reasons this algorithm will achieve the load links fail. Multiple references are kept in the routing ta- balancing goals only approximately. Provided the number bles, thus providing alternative access paths, and multi- of data keys is large enough, i.e., , the 64 ( # n # 2 0 ( m @ A 98lPop31©' 8&@ ' ple peers are associated with the same key space partitions number of peers associated with a partition will be between § (structural replication) in order to provide data redundancy. 64 ( 98l# and , instead of constant V78lp q 64 ( # . For very 64 ( 98%# Since the routing choices are made by randomly choosing skewed data distributions it can happen that very small par- peers from the complementary sub-tree at each level, the titions contain a large fraction of the data keys, and bisec- resulting overlay network additionally provides efﬁcient tion “disperses” many peers to underloaded partitions even before reaching such partitions. These are fundamental We can thus reduce the problem of load-balanced over- problems of any bisection approach. However, for practi- lay network construction to the problem of decentralized cal data distributions and large peer populations these prob- partitioning of one key space partition. The problem is that lems are more theoretical in nature and achieves PRRrRPGE # QIQI H F a large number of peers have to perform the decision to split good load balancing properties provided and 78( # 64 X8( ' 20 independently for allowing a fast construction of the over- are chosen properly. lay network, while making these independent decisions in a We will use in the following as an algorithm PRRrRPGE # QIQI H F way that the ratio of the number of peers matches the ratio that deﬁnes what we consider as an optimal partitioning of the data load in the two partitions. In other words, the of the search space among peers and a resulting optimal global behavior of the distributed decision making process overlay network. Since in a peer-to-peer system no global should match the outcome of the partitioning step in the coordination exists, the problem we intend to solve is to global partitioning algorithm (corresponding to # QIQI H F PBRBRPGE achieve the partitioning generated by by a de- # QIQI H F PBRBRPGE lines 3 and 7 in ). The solution to this problem # QIQI H F PRRrRPGE centralized process approximately. We will measure the is one of the central contributions of the paper and will be quality of a solution by determining the deviation from the discussed in detail in Section 3. optimal partitioning. In a decentralized process peers do not have precise in- formation on the number of peers and keys present in a par- 3 Decentralized Partitioning tition and cannot know which decision the other peers in a Consider a set of E § peers which hold data keys from e w# partition take with respect to associating themselves with a key space . The space is partitioned into two parts, x x ¤ partition. The only available information is on the set of lo- § and , such that the load measured in number of data keys cally stored data keys and information gathered from local related to the partitions, and are and y . In the |¤ § zq interactions with other peers. following we assume w.l.o.g. that . Then the { { } The decentralized process we design is based on random partitioning that we would ideally like to achieve should peer encounters and a set of basic local interactions. The have the following properties: random encounters can be initiated by performing random 1. Proportional replication: Each peer has to decide for walks on a pre-existing unstructured overlay network. The one of the two partitions such that (in expectation) a interactions peers can perform in their encounters can be fraction of the peers decides for 0 and a fraction classiﬁed in three categories, as shown in Figure 2. § for 1. Thus the workload becomes uniformly dis- 8q tributed among the peers, meeting the load-balancing criteria in the resulting overlay. 2. Referential integrity: During the process each peer has 1 Random 3 1 Random 6 1 Random 4 to encounter at least one peer that decided for the other interaction 1: 3 interaction 1: 5 interaction partition. Thus the peers have the necessary informa- tion to construct a routing structure, i.e., the overlay 1: 3 * * 01: 2 01:2 0: 2 01: 2 000,010,100 101,001 0001,0011 0000 infrastructure, for delegating requests for keys they are no longer associated with. Interact with 3 Possibility 3: Peers can update their routing table 1 3 1 6 entries (to add redundancy and randomization), apart from recommending A peer can initiate interactions with any peer selected uniformly randomly from . We measure the cost of an the peers to meet some other peers (with 1: 3 1: 5 better match of path). This induces the 1: 3 0: 1 01: 2 01: 2 random interactions. E 000,010,001 101,100 0000,0001, 0011 0000,0001, 0011 Peers from different partitions meet algorithm solving the problem in terms of the number of Possibility 1: Possibility 2: pid Legend interactions initiated by peers and this cost should be min- imized. The quality of an algorithm solving the problem Exchange content, Split the key space, Become replicas, and reconsile content and update routing table Should also have a partial list of replicas (not shown here) for reconciling content (can have multiple entries for each level) is measured by the deviation of the resulting distribution later, using, e.g. anti-entropy algorithm. Routing table (only part of the prefix is shown) Index data Peers from same partition (or one’s path is the prefix of other) meet of peers from an optimal distribution that can be achieved based on global knowledge and coordination. First we as- Figure 2: Network evolution sume that the value of is known to all peers. We will an- If peers belong to the same partition they can either alyze the inﬂuence of having only approximate knowledge # QIQI H F s PRRrRPH the present partition (a divide-and-conquer of by sampling the locally stored data keys later. strategy) or the data keys they currently hold. If u©BH sI F tQ s To clarify the critical issues we ﬁrst discuss two sim- they do not belong to the same partition, they can ple heuristic approaches: In the case of , a simple !"H H s v s C } each other to other peers using their routing table entries strategy to adopt would be that peers which have not yet and thus route to a peer that belongs to the same partition. decided for a partition, initiate a random interaction. If the If peers from the same partition meet, they may de- contacted peer is also undecided, the peers decide for dif- cide to in case the current partition contains PRRBR!tH # QIQI H F s ferent partitions (balanced split), otherwise the peer initi- a sufﬁcient number of data keys to justify a further split, ating the interaction decides opposite to the contacted peer i.e., the partition is overloaded (corresponding to line 1 in which has decided already (unbalanced split). In this way # QIQI H F PRRrRPGE ). They can coordinate locally their decision. In it learns about a peer from the other partition. Since the addition, peers keep a reference to the peer encountered af- algorithm is symmetric, in expectation the same number ter a split, and thus incrementally construct their routing of peers will decide for each partition, and it provides the tables. best possible performance within the model, since in each interaction every possible decision is taken. We call this the partitioning proceeds as fast as possible, optimizing the strategy eager partitioning. While the eager partitioning required number of interactions. Then the model can be strategy works well for , it cannot be employed for given as C } other values of . For an arbitrary but known , a possible strategy, which § we call autonomous partitioning (AUT), would be that § 4 C 4 $ 4 $ q q 4 q # e each peer makes a decision for one of the two partitions §# in advance, even without meeting any other peer and then C 4 $ 4 q # e tries to meet some peer from the other partition in order 4 # to satisfy the referential integrity constraint. In this set- ting, obviously some of the peer interactions are “wasted,” To determine the proper value of for a given value of , whenever peers which have decided for the same parti- we have to solve this recursive system. The ﬁrst important tion meet. For the speciﬁc case of , by modeling } observation is that the recursion terminates as soon as no C the interactions as Markovian processes, we observed that § ©p p§ more undecided peers exist, i.e., as soon as . e# C 4 e 4 interactions are initiated on an average Thus we have ﬁrst to ﬁnd a value such that C pp7gp $ p ~ per peer asymptotically (i.e., for large ), as compared to # § uI C y@ e y@ C prp7~ ©p ¦¤ interactions per peer with eager partition- e # . In general this will not be an integer value, but in the context of mean value analysis we allow fractional steps. $ p ing. Thus autonomous partitioning is not an optimal strat- egy. By standard solution methods we obtain 3.1 Adaptive eager partitioning § # # q 4 § § 4 4 C |q $ p q e q r p $ $ In the following we introduce a method for such an op- # # timized solution to the partitioning problem, that has the # 4 § § characteristics of eager partitioning but works for all . Due 4 C $ $ q q # to space constraints we can only summarize the main points of the analysis. However, the full analysis can be found in the long version of this paper [4]. and evaluating the termination condition, we obtain Adaptive eager partitioning (AEP) algorithm: § (1) $ p pr©9~ C %lBI $ # e 1. Each undecided peer initiates interactions with a uni- 6 $ )6 p7~ formly randomly selected peer until a decision is reached. Selecting peers uniformly at random is a Note, that does not depend on , and thus the par- I non-trivial problem in itself which we solve by a vari- titioning process requires the same number of interactions ant of random walks. among peers independent of the load distribution. By deﬁ- s 2. If the contacted peer is undecided the peers perform ¤ { § { nition , thus we obtain a relationship between the a balanced split with probability and 6 C § ¥ network size and the load distribution with , h $ maintain references to each other. e# $ %# h 3. If the contacted peer has already decided for then ¤ the decision probability to be used. ¥ the contacting peer decides for and maintains a ref- § Having %# $ dependent on is problematic for two # erence to the contacted peer. reasons: First the resulting equation is hard to solve, and 4. If the contacted peer has already decided for then § second, more importantly, is not necessarily known to # the contacting peer decides for with probability ¤ { w¤ the peers. Since we are interested in situations where is # § { and with probability for 1. In the § (relatively) large we thus perform an asymptotic analysis. By letting we obtain the following relationship $ h q $ " ﬁrst case it maintains a reference to the contacted peer. # In the second case it obtains a reference to a peer from among and $ the other partition from the contacted peer. § It is straightforward to see that condition (2) of the par- § § (2) titioning problem is satisﬁed. The question is now to deter- $ |q q p C mine how to satisfy condition (1) by properly choosing the probabilities and . h $ $ " Positive solutions for cannot be obtained for all $ h We model the peer interactions as a Markovian process values of . From Equation 2 we derive that positive so- § using mean value analysis. We assume that in each step lutions exist for . This means that the al- pp7¢q $ p ~ ¡ Q a peer which has not yet found its counterpart contacts gorithm cannot partition correctly for too highly skewed { g¤ § another randomly selected peer. By and we denote partitions. Therefore for we have to k 4 4 ¤ $ p ~ ©©9¢q the number of peers that have decided in step for and § ¤ Q pursue a different strategy, by reducing the probability of § k , respectively. Initially, . At the end of the balanced splits, i.e., . C C $ § ¤ process in some step we have . We ﬁrst Through an analogous analysis, by setting I§ 4 e# C 4 ze , we C " $ assume that . Informally speaking, with this C " $ h $ can derive relationships for : $ " Since the sampling errors are presumably small we use f ¥ § a Taylor series expansion to approximate . In fact, for $ " (3) $ r©9~ p £ l&I C %# $ e § reasons that will become clear later, we need to make a sec- $%#e p|q p7¤%©9~ ~ q $ # ond order approximation to perform a proper error analysis. For a given value , we have and for the relation between $ and when ¦¥# § § % ¿ ¿ % ¿ } p } ~ r©9e |q p § q C &$ $ (4) ¾ 4 od´ e 4 ²$ )R ²$ 4 Rh (6) $ |q p p for small . We now determine the expectation value 4 d² Before we continue with the discussion of different par- and standard deviation for (to simplify the presenta- ¶ @ titioning algorithms, a statement on the modeling approach tion we will write instead of in the following). Since is necessary: We use a sequential approach to model and £ À ¤ RI £ }I £ À § and C )² Á4 H Â we obtain for C d² F DC Á 4 ² q o analyze what is a concurrent process. This is a simpliﬁ- Á4 $ the expectation value using (5) cation as well as an appropriate approximation for our pur- pose. Assume that the latency in one interaction is such that § other interactions among peers occur concurrently. Then £ À ¥ ¥ § ¿¿ (7) # t C Á @ ¶ $ $ &I # ""q o " $ the concurrent behavior of peers corresponds (approxi- § § {¤§ ¦¤ ¤ ¤ « { ¥ ¥ p mately) to the sequential behavior of groups. ¨© C # e where p q . This shows thatp q $ &I # Ã The analysis we perform shows that the models we use are sampling introduces a systematic shift of the balance be- sufﬁciently accurate for relatively small . Thus for large # tween the resulting partitions. In a concrete implementa- numbers of peers the model is a sufﬁciently good approx- tion we will have to compensate for this systematic error, imation, whereas for small concurrency is less likely to § as will be discussed in more detail subsequently. occur and less critical. £ @ £ § Since Á4 H we obtain C )² ©F Â I C d² ° 4 ® F Â Á4 H "Äq ` @ $ for the standard deviation by a similar analysis 3.2 Error Analysis Up to now we assumed that the value of is known to all ¥ ¥ § I ¿ £ peers. Practically peers will derive an estimate for by $ v $ &I # g¦q ` d Æ # $ C Á @ ¶ Å (8) « sampling. Therefore, in the following we analyze the effect p¤ ¦{ ¥ ¥ ¤ { Ç )¤ ¤ of errors introduced by only approximate knowledge of . where . $ &I # gv Other potential sources of errors, such as taking the limit The impact of the errors depends in particular also on the behavior of the functions ¿ and ¿¿ case and using mean value analysis turned out to ª¢# h $ $ " . Using nu- have a negligible inﬂuence. merical differentiation we observed that the functions are Assume peers obtain samples from their locally stored « well-behaved in the relevant region. § k data keys. The samples correspond to Bernoulli variables Performing an analogous analysis for the ¿ ¿¿ $ p ~ prp7Èq ¬ V¬ ¨¨¥ ¥ with probability . The peers estimate by ¬ behavior of the functions and will be relevant $ h $ " ¿¿ computing the mean value which is bi- C ¯ ¬ t¯ f ° ® for the error behavior. We have included a plot of $ " nomially distributed. We would like to determine the effect in order to point out an important observation (Figure 3): of an error in estimating on the values of and For very small values of the second derivative grows ex- tremely fast, and consequently the error will be large as $ " $ h and the resulting effect on the partitioning process when using approximate values of and . In the following we well. will use and instead of and as long as the The error analysis shows that in the presence of sam- pling errors, we have to include correction terms in the $ " $ " meaning is clear. probabilities and We provide an exemplary error analysis for the evolu- used in AEP. $ h $ " § tion of for the case where since this $ p ~ p"p7Dq § § 4 is algebraically the simplest case. Analogous analysis have " &A © $ ÉÉ C o h $ ¿¿ q $ g" § "q $ (9) been done for the other case, but they are substantially more «§ §p complex. § ¿¿ " &A © (10) We assume that in step the estimation value $ ÉÉ C $"q o h $ q $ Q do C % 4 ² e ±4 « p is used to determine an estimation value . The 4 ¢e ´ C ³4 alpha’’ p error is the sampling error obtained by the peer initiating 4 d² 60 step . Let us denote by the error introduced Q ¢e C µ 4 ¶ 4 4 50 into the result of the partitioning process due to sampling 40 errors. We can derive the following closed-form expression 30 for from analyzing the Markov model of the process. ¶ 4 20 10 · ¯ p § § ¯ G# 0.05 0.1 0.15 0.2 0.25 0.3 4 º 4 · ½ l6 q ¼q» ´ § (5) ¸ C 4 ¶ q q Figure 3: Numerical Solution for ¿¿ " $ . t¯ ¹# ° ¸ $%#¤q 3.3 Numerical Simulation of the Markov Model Interactions Number of Interactions 3000 To validate the correctness of our analytical models we per- formed numerical simulation experiments. We simulated 2500 ﬁve models: 1. MVA: simulation of the mean value model for AEP 2000 with known 2. SAM: simulation of the mean value model for AEP; 1500 MVA the value of is estimated from samples « SAM 3. AEP: discrete Simulation of AEP with peers taking 1000 AEP discrete decisions based on and instead of $ " $ COR AUT adding mean value contributions as in the mean value p model 0.1 0.2 0.3 0.4 0.5 4. COR: discrete Simulation of AEP with corrected probabilities and h ttA © $ ÉÉ tA © $ ÉÉ 5. AUT (Discrete Autonomous Partitioning): Discrete Figure 5: Mean total number of interactions over 100 simulation of autonomous decision making where experiments is estimated from samples « Superﬁcially, AEP appears to be a more complex al- We present the results for and C #. Each ¤ ¤ ¤ ©p!§ Ê C « ¤ gorithm than AUT while not considerably outperforming experiment has been repeated 100 times. AUT. However, the complexity is in the analysis required Figure 4 shows the deviation of the mean value of @ to determine the correct decision probabilities, whereas for practical implementation AEP has even advantages since it from the expected value averaged over all experiments. # ¹ As expected, using sampling for estimating leads to a sys- provides an invariant: When taking a decision for a parti- tematic deviation of the resulting distribution (SAM, AEP). tion, the availability of a reference is guaranteed. The error correction strategy (COR) eliminates the devia- We would like to point out that the problem studied in tion almost completely. Clearly, autonomous partitioning this section is a novel load distribution problem in the area (AUT) on average achieves the desired distribution. of distributed systems, particularly because of the referen- tial integrity constraint. A solution to this problem can be Mean p0 t n p useful beyond overlay network construction as we use it here, but also in resource and task distribution and decen- Deviation from Mean 12 MVA 10 SAM tralized load-balancing in general. AEP 4 Algorithmic Issues COR 8 AUT In order to use AEP for implementing the al- 6 # QIQI H F PRRBRP©GE 4 gorithm in a decentralized fashion we have to address sev- 2 eral issues related to the global organization of the indexing process. p 0.1 0.2 0.3 0.4 0.5 2 4.1 Initiating the Indexing Process In the absence of global coordination the mechanism to reach a decision to initiate the indexing process is not ob- Figure 4: Mean of over 100 experiments, the expected @ vious. While it is not the focus of this paper, and the initi- value # Ëis subtracted to highlight the deviation. ation process is orthogonal to the index evolution process, Figure 5 shows the cost of each algorithm measured in we nonetheless describe a simple, decentralized strategy. number of interactions. As theoretically predicted, we ob- Depending on locally observed queries, individual peers serve that adaptive eager partitioning performs better than 7¦¤ § may make autonomous decisions on whether a new index AUT, except for small values of (approx. ) may be necessary or re-indexing may be required. Any of k Ê independent of which version is considered (MVA, SAM, the peers that locally decide that indexing is useful can ini- COR). tiate a vote, by ﬂooding the peer network. This ﬂooding can Further experiments with different sample sizes showed use the pre-existing, generic, unstructured overlay network that the sample size has practically no inﬂuence. Even very which we assume to exist. small samples (1 or 2 samples) lead to the same results as When peers receive a voting request they can reply larger sample sizes. Experiments also showed that adaptive back their local decision. Additionally, helpful informa- eager partitioning has a further advantage over autonomous tion, such as locally available storage space that the peer is partitioning as it reduces the standard deviation of the er- willing to contribute to store information for the new index ror in partitioning by approximately a factor of 2. Thus and the number of local data items to be indexed can be our AEP approach optimizes both performance in terms of piggy-backed. Votes are sent back along the paths they ar- number of required interactions and error control in terms rived, and multiple votes are aggregated while ﬂowing back of matching the partitioning ratio . to reduce bandwidth consumption. Based on the number of positive and negative responses, the peer which initiated the tacted by another peer. In this way peers that are “ahead voting can then decide whether to initiate index construc- of the crowd”, e.g., due to faster network connections, are tion or not, and can ﬂood the decision back to all peers. Ad- forced to wait for the slower ones. The same mechanism ditionally, based on the aggregate storage space available, also eventually leads to termination of the process, when and the amount of storage required for all the data items peers encounter only fully synchronized copies of them- (references) in the system, the decision will contain the pa- selves. rameters for ensuring optimized utilization of the available resources and for synchronization of the indexing process. 4.3 Complexity We assume a collaborative environment where the major- The goal of our approach to index construction is to per- ity of peers does not behave maliciously or in a Byzan- form it with low bandwidth consumption and low latency. tine manner, and adheres to the democratic decision of the With regard to bandwidth consumption a necessary require- group, and thus participates in the indexing irrespective of ment is to perform no worse than a sequential approach their individual votes. using standard construction mechanisms, i.e., . $ } # %# ! To study this, we look at the complexity in the case of a 4.2 Synchronizing and Terminating the Indexing Pro- balanced key distribution ( ). Then for partitioning at } C cess one level, peers engage in bilateral interactions on $©pÃ! The partitioning algorithm introduced in Section 2 enables average. In addition to locating a peer in the same partition reaching a decision in parallel on bisecting the key space at level , peers have to route on expectation Ø steps p n$ Ø ¦Ã! proportionally among a group of autonomous peers. In the when performing the refer interaction. This shows that the } indexing process the algorithm is executed multiple times total number of interactions is also of order } . $ # %# ! and a synchronization mechanism is needed. In addition However, the latency is as opposed to $ %# ! in $ # % peers need to autonomously recognize when to terminate the standard maintenance model. the indexing process. We realize this as follows. The peer communicating the decision to start the in- 4.4 Simulation of the System dexing process provides the parameters and 31( ' 20 75( # 64 To study the global behavior of the indexing algorithms as used in PRRBRPGE. The values are chosen such that # QIQI H F when integrating all the elements discussed so far, we per- C 20 iX8( ' p 64 , where ÍÌ pn 75( # d30 ' is the average number d30 ' ÍÌ formed simulation studies implemented in Mathematica. of data keys peers hold (as mentioned in Section 4.1 this We were mainly interested whether the desired load balanc- information can be derived from information piggy-backed ing properties would be achieved under the various approx- to the votes). Additionally, it provides a time . Before @ 74 I 4 6 imations and whether the algorithm performs as predicted. starting to partition, peers replicate their data keys at time In the simulations we used peer populations of sizes 4 64 @ 7uIto 64 ( 98l#randomly chosen other peers. Thus at the start 256, 512, and 1024. As data distributions we used a uni- of the indexing process all data keys are already replicated Õ form distribution, a Pareto distribution with PDF with ÚÙ Õ v 0 2 the desired number of times in the network. parameters and § §¥ § ©¦¤ p¨¥ ¦¤ , and a Normal dis- Besides estimating the number of data keys in the cur- C Ø Ê C F Ê !§ ¤ ¦¤ tribution with mean value and standard deviation , } rent partition, peers also have to estimate the number of Ê and test data from text retrieval experiments (project Alvis). current peers, in order to perform the proper decisions in In Figure 6 these distributions are denoted as U, P0.5, P1, algorithm . Attempting this directly, by learning PBRBRPGE # QIQI H F P1.5, N and A. The Pareto and Normal distributions repre- about all existing replicas at each level of the partitioning sent cases with extremely skewed distributions. Initially, process, would unnecessarily slow down the progress of we randomly assigned 10 keys from the distributions to indexing. Instead, we estimate the number of replicas in a peers, so that they held samples. We tested with partition by analyzing the overlap in the sets of data keys of ¤ ¨§ Ê ÛC 75`# 64 ( and such that at least 5 (respectively 10) repli- two peers interacting in a balanced split. If denotes the C 75%# 64 ( ¥ p§ ¥ 4 Î cas of the keys are generated. Typically the experiments set of data keys peers hold, and C Q 4 p ,Î } Î g Î C Ï had C 31©' 20 ( ¨§ ¤ . All experiments were repeated 10 64 ( 78l# then is a maximum likelihood estimate of the Ó 6 hj &%Ó gÑ rÐ v ÐÑ Ô Ö Õ Ò Ð Ð times and the results were averaged. The algorithms were expected number of peers in the current partition. For ex- implemented as described above. The experiments were h Ë ÐÑ Ð ample, if } Î C Î and then it should be C × ×Î 31©' 20 ( executed on a workstation cluster using up to 36 machines expected to have 6 748(%# peers in the partition since initially and were running for more than a week. Note that there data keys have been replicated times. To ensure the 75( # 64 were 36 separate experiments, each conducted 10 times. correctness of this estimation was the purpose of initially Furthermore, in a real network the peers would use exclu- replicating the data. sive resources, and thus the actual overlay construction pro- During partitioning, peers that have extended their paths cess is much faster. attempt to immediately contact other peers to perform the For evaluating the experiments we primarily were de- partitioning at the next level. If they do not succeed in iden- termining the degree to which the load balancing of peers tifying a different peer in the same partition with which a across key space partitions worked. To do so, we compared useful interaction can take place, i.e., “divide and conquer” the generated key sets to the distribution, that would be or “replicate”, after a ﬁxed number of attempts (e.g., 2), generated by global coordination ( algorithm). # QIQI H F PRRrRPGE using the refer interaction (see Figure 2), they stop to ini- The ¨PRR©§RPGE ¥ ¥ # Q ¥ rQ I H F I algorithm generates a distribution ¥ tiate interactions and only will continue after being con- , where C Q Bl# © $4 4 Ø x are the partitions of 4 ©Ø x the key space generated and are the number of peers as- 4 # grow gracefully in terms of the network size, as expected sociated with each partition. We compared this distribution ¥ from theory. However, skew in the data distribution can to the distribution generated by the decentralized $ h4 # h 4 Ø signiﬁcantly increase the bandwidth consumption. algorithm. Ü ® Ý } 5 Experimental evaluation $ h4 ¤q 4 # # We used the PlanetLab infrastructure [11] to obtain re- 4 sults from large-scale experiments under realistic network- Ý ® Ý h4 # 4 As explained in Section 2, we consider the distribution ing conditions and to verify our theoretical predictions and generated by as the optimal distribution. Mea- PBRBRPGE # QIQI H F simulation experiments. PlanetLab (http://www.planet-lab. suring the distance to this distribution provides a measure org/) is a global testbed for large-scale experiments with for the quality of load balancing. distributed systems. At the moment it consists of ap- The ﬁrst experiment (Fig 6(a)) for and Þ98( # C 64 Ê proximately 530 nodes geographically distributed over the wX8( ' C 20 ¤ ¨§ shows the quality of load balancing depending whole planet running a modiﬁed version of Linux to sup- on the peer population size for the different distributions. port efﬁcient administration and resource sharing for large- One can observe that the quality remains practically stable scale experiments. Nodes are connected via a diverse col- independent of the size. lection of links. Our experiments on PlanetLab ran on up We also investigated the inﬂuence of the replication fac- to 300 nodes depending on the number of available nodes. tor by comparing 98( # 64 ¥ ¤ ¥ ¨¦!¥ p Ê § ¥ ¤ § (Fig 6(b)). Ê C 64 ¡Ä98( # Ê p Each node executed one instance of a P-Grid node. When In principle the load balancing properties should not be interpreting the results presented in the following, it is im- affected as we measure deviations relative to the average portant to consider that PlanetLab is shared by a large num- replication. This is conﬁrmed for less skewed distribu- ber of research groups for experiments that are executed in tions, whereas for the strongly skewed distributions a cer- parallel and thus mutually inﬂuence the performance con- tain degradation can be observed. We have still to investi- siderably especially with respect to absolute latency. gate in detail the reasons for this effect, but most likely it is related to the relatively low number of partitions with high 5.1 Experimental setup replication factors. We deployed the P-Grid software, i.e., the peers, on all We were also interested in the inﬂuence of the sample available nodes at the times the experiments were con- size on the quality of load balancing. It might be ex- ducted and assigned 10 keys from a real text collection 20 31( ' pected that more samples lead to higher accuracy. In fact, (taken from our Alvis information retrieval project) to each the result (Fig 6(c)) shows that no such inﬂuence exists. peer. This relatively low number of keys was chosen to This is insofar important as it shows that the partitioning speed up experiments and as we have already seen, sample can be done using very small samples which enables sev- size has little inﬂuence on load balancing. To validate our eral possibilities for optimization to reduce bandwidth con- experiments, we also performed tests with larger numbers sumption. (up to 2000 keys per peer) and used various distributions, In order to understand the quality of the load distribu- including uniform random distribution and Pareto distribu- tions achieved we also analyzed the role of our theoretical tion. framework (Fig 6(d)). We replaced the functions "tA © The time-line of the experiments was as follows: In an $ É É and by heuristic functions which likely would be ttA © initial phase starting at time , peers join the system by $ ÉÉ chosen in the absence of a theoretical understanding of I â ¤ pp contacting a bootstrap peer (until ) and form their properties. The hypothesis we wanted to verify was e ¹I # BQ â an unstructured overlay network (from until ) whether the concrete nature of these functions plays a sig- I Ê oI ã e # BQ which is used later to replicate data a ﬁxed number of times niﬁcant role in view of the many approximations made in â â ¤ p (from until ). In the replication phase the overall distributed algorithm. We chose Ê dI ã e # rQ e )I # BQ peers randomly choose 5 peers from the unstructured over- lay network to replicate their data. Subsequently, from § áà ß RP` áà RPß ¥ ¤ â ¤ pp â ¤ ¤ pp© C ©É $ q § C É $ e oI to BQ # oI e , the structured overlay network # BQ is constructed using the approach presented in this paper. These functions exhibit qualitatively the same behavior We were especially interested in evaluating the bandwidth as the ones used by AEP. The experiment was executed for consumption during this phase and to verify whether the # C and Ê p . The conclusion is clear from Ê ¦C 75%# 64 ( theoretically predicted load balancing properties of the al- the result: Even a minor change to the theoretically correct gorithm are achieved under realistic networking conditions. functions degrades the quality of load balancing substan- Then we run queries on the constructed overlay network â ¤ ¤ p© â ¤ pp¤ tially. Thus the theoretical basis proves valuable despite (e XI to ã e I) to analyze search performance. # BQ # BQ many idealizing assumptions. Each peer performed a search every 1–2 minutes. In the ﬁ- ¤ â©¤ â ¤ p¤ We also analyzed the communication costs of the algo- nal phase ( #BQ to ã e ³I ) network churn is Ê ³I e # BQ rithm. We can see that both the number of interactions per simulated to evaluate the failure resilience of P-Grid. Each peer (Fig 6(e)), and the overall bandwidth consumption per peer independently decides to go ofﬂine 1–5 minutes ev- peer measured in terms of the total number of data keys ex- ery 5–10 minutes which causes considerable churn that the changed among all peers during the interactions (Fig 6(f)) system has to compensate. Deviation for various peer populations Deviation for various desired replications n_min Deviation for various data sample sizes 1 0.8 0.5 0.8 0.4 0.6 0.6 0.3 0.4 0.4 0.2 0.2 0.2 0.1 U P0.5 P1.0 P1.5 N A U P0.5 P1.0 P1.5 N A U P0.5 P1.0 P1.5 N U (a) Varying peer population: n = 256, 512, 1024; (b) Varying required replication: n = 256; (c) Varying data sample size: n = 256; y 3`W S aY ; æ fe fe b å ä y 9`W b hgW d&¤y 3`W S aY ; 5, 10, 15, 20, 25 10, 20, 30 ; fe b å ä a Y y hgW b 9gW d&zy 3oW S fe y hgW b hgW b æ fe fe Deviation for n_min 5,10 model theoretical using vs.heuristics Interactions required per peer for overlay construction for various population sizes Total number of data keys moved per peer bandwidth consumption for overlay construction 15000 1.2 12 12500 1 10 10000 0.8 8 7500 0.6 6 5000 0.4 4 2500 0.2 2 U P0.5 P1.0 P1.5 N A U 5 U 10 P0.5 5 P0.5 10 P1.0 5 P1.0 10 P1.5 5 P1.5 10 N 5 N 10 A 5 A 10 U P0.5 P1.0 P1.5 N A (d) Theory vs. Heuristics (e) Interactions per peer: n = 256, 512, 1024; (f) Bandwidth consumed (data keys moved): n = æ fe ; y hgW b 9gW d&z13oS fe b å ä y a Y W 256, 512, 1024; ; æ fe b å ä y a Y W y hgW b 9gW d&z13oS fe Figure 6: Simulation results for various experiment scenarios. 5.2 Experimental Evaluation bandwidth consumed by queries. 300 We ﬁrst veriﬁed that the system behavior matches the the- oretical predictions and the simulations. The experiment 250 was performed with 296 peers and compared to simulation results using the same number of peers and the same key 200 set. The quality of load balancing is evaluated as deﬁned in peers 150 Section 4.4 and is practically identical for simulations and experiments, with an average of 0.38 for 10 simulations 100 (the standard deviation is 0.05) resp. a value of 0.39 for the experiment. This indicates that the theoretically pre- 50 dicted load distribution properties are met quite accurately by the implementation even under realistic network condi- 0 0 50 100 150 200 250 300 350 400 450 500 tions with slow connections and communication failures. Time [minutes] We now report some system measurements that we Figure 7: Number of participating peers made to evaluate the performance of the overlay network, both during the construction phase, as well as in its opera- tional lifetime both in a static situation (no change in peer 300 population) as well as under churn (peers leave and join the maintenance queries network). 250 Figure 7 shows the number of peers in the overlay at a given time. We see how ﬁrst peers join the network and the 200 number of peers in the network increases to the maximal Bandwidth [Bps] number. Then during the construction phase this number 150 is stable (approx. 300 peers) while decreasing again in the ﬁnal phase where we simulate network churn and a sub- 100 stantial dynamic fraction of peers becomes unavailable. Figure 8 shows the aggregate bandwidth consumption 50 of all peers (maintenance and queries) in Bytes/sec. Dur- ing the construction phase the bandwidth consumption 0 0 50 100 150 200 250 300 350 400 450 500 reaches a peak of 250 Bytes/sec per peer. The mainte- Time [minutes] nance consumption decreases quickly down to less than Figure 8: Aggregate bandwidth consumption 100 Bytes/sec and becomes negligible compared to the Figure 9 shows the average query latency and its stan- (range queries, etc.). Thus in standard overlay approaches, dard deviation. The absolute values are relatively high typically an additional index on top of the overlay network and essentially reﬂect the poor response time of PlanetLab needs to be created [22]. The advantage of this approach nodes. The response time is slightly higher with a larger de- is its universal usability on top of any DHT. However, it is viation during the network churn because requested peers considerably less efﬁcient than our approach since seman- may be ofﬂine which has to be compensated. tically close data items are not necessarily stored close to 60 standard deviation each other in the overlay network (high fragmentation), and average hence, multiple overlay network queries are required to lo- 50 cate all the semantically close content. Thus, apart from the additional effort of constructing an additional index, such 40 schemes additionally suffer from inefﬁciencies throughout the operational phase of the system. Time [seconds] 30 In contrast to that, we build a trie that clusters semanti- cally close data, thus realizing in-network indexing which 20 enables more efﬁcient query processing. This comes at the expense of a more sophisticated construction process for 10 such data-oriented overlay networks. Additionally, more complex online load-balancing strategies have to be ap- 0 300 320 340 360 380 400 420 440 460 480 500 plied, as presented in this paper. Time [minutes] Online load-balancing is widely researched area in the Figure 9: Query latency distributed systems domain which often been modeled as We observed that the number of query hops per query “balls into bins” [21]. Traditionally, randomized mech- is as low as theoretically expected, i.e, approx. half of the anisms for load assignment, including load-stealing and mean path length, even during churn. The average path load-shedding and power of two choices [18], have been length was slightly below 6 and the average number of used, some of which can partly be reused in the context of query hops per query was approximately 3. Moreover after P2P systems [10, 15], but with limited applicability. For the construction phase has led to full evolution of the over- example, [15] provides storage load-balancing as well as lay network, all peers discovered all their replicas, and the key order preservation to support range queries, but at the system had an expected mean replication factor of 5, as in- cost that efﬁcient searches of isolated keys can no longer tended, and success rate for queries was between 95% and be guaranteed. 100% even during network churn. Queries were mainly The dynamic nature of P2P systems is also different unsuccessful because of network problems such as lost or from the online load-balancing of temporary tasks [9] be- corrupted messages. cause of the lack of global knowledge and coordination. Finally, we would like to point out that the current ex- Moreover, for replication balancing, there are no real bins, perimental evaluation is still limited in the following sense: and actually the number of bins varies over time because The moderate number of available peers does not allow us of storage load balancing, but the balls (peers) themselves to obtain signiﬁcant results on the reduction of latency dur- have to autonomously migrate to replicate overloaded key ing bootstrapping as predicted by our theoretical analysis spaces. Also, for storage load balancing, the balls are es- in Section 4.3 and which is one of the main properties of sentially already determined by the data distribution, and it our approach. is essentially the bins that have to ﬁt the balls by dynami- cally partitioning the key space, rather than the other way round. 6 Related work A distinguishing property of our approach to all other The fundamental problems to address for any large-scale related load-balancing strategies is actually that we address distributed indexing systems are distributed index construc- two, sometimes conﬂicting load-balancing problems— tion and load-balancing. Traditionally structured overlay storage load, i.e., balancing the amount of storage used networks, mainly based on distributed hash tables (DHTs), at the nodes, and replication load, i.e., ensuring approx- have followed sequential construction and maintenance imately uniform data availability by having roughly the strategies (online balancing) [12, 17, 24, 25]. In contrast to same number of replicas per data partition. The ﬁrst step this, our approach applies a highly parallel strategy which in that direction was a heuristic key space bisection pro- speeds up the construction process, takes advantage of the posal [2]. In comparison to the heuristics, we now exhaus- distributed computing resources by allowing the partici- tively analyze and reﬁne the bisection mechanism, in order pants to work independently and asynchronously on the to better understand and guarantee superior load-balancing construction, and enables the merging of independently characteristics in the overlay network emerging from the created indices. recursive use of the bisection algorithm. Additionally, we To address load-balancing, the standard strategy of over- now not only simulate the construction process, but verify lay approaches is to use uniform hashing of keys to remove the analytically predicted properties using a fully-ﬂedged skew from the distribution. However, this defeats the appli- implementation (P-Grid), deployed on PlanetLab, to back cability of overlay networks to semantic processing of keys up our analysis and simulation results with large-scale ex- perimental data. The overlay network is already used as [2] K. Aberer, A. Datta, and M. Hauswirth. Multifaceted Simultaneous a substrate for two data-oriented applications—a peer-to- Load Balancing in DHT-based P2P systems: A new game with old balls and bins. Self-* Properties in Complex Information Systems, peer search engine (http://www.alvis.info/) and a semantic “Hot Topics” series, LNCS, 2005. overlay network [1]. [3] Karl Aberer. P-Grid: A self-organizing access structure for P2P Furthermore, most existing load-balancing as well as information systems. In CoopIS, 2001. overlay network construction mechanisms have so far been [4] Karl Aberer, Anwitaman Datta, Manfred Hauswirth, and Roman sequential. However, the need for faster overlay construc- Schmidt. Indexing data-oriented overlay networks. Technical tion has recently generated interest in the research commu- e e Report IC/2005/008, Ecole Polytechnique F´ d´ rale de Lausanne nity, as is evident from some recent publications [8, 14]. (EPFL), 2005. Both [8] and [14] use random interactions among peers, [5] S. Abiteboul, I. Manolescu, and N. Preda. Constructing and Query- ing Peer-to-Peer Warehouses of XML Resources. In SWDB, 2004. induced potentially by the original unstructured topology, and try to build a desired topology, by essentially trying to [6] L. Onana Alima, S. El-Ansary, P. Brand, and S. Haridi. DKS(N,k,f): A Family of Low Communication, Scalable and Fault-Tolerant In- sort the peers according to their identiﬁers that are gener- frastructures for P2P Applications. In 3rd IEEE/ACM International ated at the beginning of the process. These mechanisms Symposium on Cluster Computing and the Grid (CCGRID), 2003. can again be used for overlay networks construction which [7] L. Onana Alima, A. Ghodsi, and S. Haridi. A Framework for Struc- support search of keys generated by uniform hashing, since tured Peer-to-Peer Overlay Networks. In Post-proceedings of the then peer identiﬁers can be simply generated using uni- Global Computing Conference, LNCS. Springer Verlag, 2004. form hashing, as there is no skew in the load-distribution. [8] D. Angluin, J. Aspnes, J. Chen, Y. Wu, and Y. Yin. Fast construction of overlay networks. In SPAA, 2005. However, for data-oriented applications such a mechanism has a critical limitation, since peers are predestined for the [9] Y. Azar, B. Kalyanasundaram, S. Plotkin, K. Pruhs, and O. Waarts. On-line load balancing of temporary tasks. Journal of Algorithms, amount of load (based on the whole set of peer identiﬁers 22:93–110, 1997. generated at the beginning of the process), and there is no [10] J. Byers, J. Considine, and M. Mitzenmacher. Simple Load Balanc- ﬂexibility or adaptivity for load-balancing, particularly if ing for Distributed Hash Tables. In IPTPS, 2003. the load is skewed. Our scheme—this paper as well as [11] B. Chun, D. Culler, T. Roscoe, A. Bavier, L. Peterson, M. Wawrzo- [2]—on the other hand adaptively creates the key space par- niak, and M. Bowman. PlanetLab: An Overlay Testbed for Broad- titions and assigns peers to these partitions based on load Coverage Services. ACM SIGCOMM Computer Communication Review, 33(3), July 2003. characteristics, and is thus a more generic parallel overlay construction mechanism. For the special case of uniform [12] P. Ganesan, M. Bawa, and H. Garcia-Molina. Online Balancing of Range-Partitioned Data with Applications to Peer-to-Peer Systems. load distribution (as it is traditionally assumed in DHTs In VLDB, 2004. using uniform hashing), we can easily construct a load- ¦¤ [13] A. Y. Halevy, Z. G. Ives, J. Madhavan, P. Mork, D. Suciu, and balanced overlay by requiring C in each step of the Ê I. Tatarinov. The Piazza Peer Data Management System. TKDE, partitioning. 16(7), 2004. [14] M. Jelasity and O. Babaoglu. T-Man: Gossip-based overlay topol- 7 Conclusions ogy management. In Engineering Self-Organising Applications (ESOA’05), 2005. The fast (re-)construction of data-oriented structured over- [15] D. R. Karger and M. Ruhl. New Algorithms for Load Balancing in lay networks is an emerging research topic which has Peer-to-Peer Systems, 2003. IRIS Student Workshop (ISW). not yet been covered exhaustively in the literature. (Re- [16] G. Koloniari and E. Pitoura. Content-Based Routing of Path Queries )indexing due to changing application requirements is a in Peer-to-Peer Systems. In EDBT, 2004. frequent scenario in data-oriented applications and neces- [17] G. S. Manku. Balanced binary trees for ID management and load balance in distributed hash tables. In ACM PODC, 2004. sitates the efﬁcient (re-)construction of overlay networks. [18] M. Mitzenmacher. The power of two choices in randomized load Existing approaches are essentially serialized and do not balancing. IEEE Transactions on Parallel and Distributed Systems, take into account inherent intricacies like preservation of 12(10), 2001. key-ordering relationships to enable semantic processing [19] W. Nejdl, M. Wolpers, W. Siberski, C. Schmitz, M. Schlosser, on data keys. In this paper we have presented an efﬁ- o I. Brunkhorst, and A. L¨ ser. Super-peer-based routing strategies for cient, completely decentralized algorithm which supports RDF-based peer-to-peer networks. J. Web Sem., 1(2), 2004. the fast, parallel construction of structured overlay net- [20] C. G. Plaxton, R. Rajaraman, and A. W. Richa. Accessing Nearby works from scratch based on a recursive bisection scheme Copies of Replicated Objects in a Distributed Environment. In SPAA, 1997. that preserves key semantics and provides good load- balancing for skewed distributions both for storage and [21] M. Raab and A. Steger. “Balls into Bins” - A Simple and Tight Analysis. In RANDOM, 1998. replication load. We prove the efﬁciency of our approach [22] S. Ramabhadran, S. Ratnasamy, J. M. Hellerstein, and S. Shenker. by analytical results which are veriﬁed by simulation and Brief Announcement: Preﬁx Hash Tree. In ACM PODC, 2004. large-scale experiments of a complete system implementa- [23] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A tion on PlanetLab. The implementation is available from Scalable Content-Addressable Network. In ACM SIGCOMM, 2001. http://www.p-grid.org/. [24] A. Rowstron and P. Druschel. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Mid- References dleware, 2001. [25] I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. e [1] K. Aberer, P. Cudr´ -Mauroux, M. Hauswirth, and T. van Pelt. Grid- Chord: A Scalable Peer-To-Peer Lookup Service for Internet Appli- Vine: Building Internet-Scale Semantic Overlay Networks. In cations. In Proceedings of ACM SIGCOMM, 2001. ISWC, 2004.

DOCUMENT INFO

Shared By:

Stats:

views: | 10 |

posted: | 9/2/2012 |

language: | |

pages: | 12 |

Description:
Articles about different types of topics useful for College students.

OTHER DOCS BY anil.docstoc

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.