Exploiting Geographical Proximity in Overlay Networks by giz44836


									      Exploiting Geographical Proximity in Overlay Networks
                        CS 8803H - Project Proposal

                                 Felix Loesch


                              February 5, 2004

1    Motivation and Objectives
Current large-scale distributed Internet applications provide overlay networks
at the application level that are spanned across the world but are not con-
gruent with the underlying IP-level topology. This can lead to inefficient
routing where, nodes that are neighbours at the application level are very far
away at the IP-level e.g. nodes over sea or at other continents. Most of the
applications using overlay networks such as unstructured P2P systems and
replicated Web servers would greatly benefit from a knowledge of the under-
lying IP-level topology and the relative proximity between its participating
host nodes. The current performance and the routing efficiency of overlay
networks could be significantly increased by the introduction of knowledge
about relative proximity between the participating nodes of such networks.
    Apoidea [SSLM03],a completely distributed and decentralized Peer-to-
Peer web crawler is based on Chord [SMK+ 01], a scalable peer-to-peer lookup
service for internet applications. Currently Apoidea does not exploit ge-
ographical proximity of crawlers to the domains they are crawling. This
results in poor performance of the distributed crawling process especially
when the domains being crawled are far away from the crawling peer. The
authors of Apoidea have conducted experiments on geographical proximity
and have shown that domains that are far away from the crawling peer could
be crawled twice as fast with peers that have a greater location proximity to
the domains being crawled. Their results show that these domains can be

crawled twice as fast from a geographically closer location. This emphasizes
the need to exploit geographical proximity in order to improve the perfor-
mance of distributed web crawling applications.

2    Related Work
Peer-to-Peer (P2P) technology is not very new and numerous efforts have
focused on providing such technology to the public. However, in current P2P
applications, little effort is made to ensure that the application-level con-
nectivity is congruent with the underlying IP-level network topolgy. To my
current knowledge none of the current P2P systems exploits relative prox-
imity between ist participating host nodes. This in fact leads to inefficient
routing in those networks and introduces an unnecessary latency.
    S. Ratnasamy, M. Handley, R.Karp et al. from UC Berkely present in
their paper [RHKS02] an easy way to exploit geographical proximity by in-
troducing a binning scheme whereby nodes partition themselves into bins
such that nodes that fall within a given are relatively close to one another in
terms of network layer. Their binning strategy is simple requiring only mini-
mal support from any measurement infrastructure, scalable requiring no form
of global knowledge and completely distribute. Their idea can easily be ap-
plied to overlay networks such as Chord, Gnutella and also to the distributed
P2P crawler Apoidea.
    There exist also more elaborate solutions to the problem of exploiting
geographical proximity. These solutions try to improve the overlay structure
slowly over the time [Fra00]. Depending on the time-scales participant nodes
join and leave the application these solutions could be useful to solve the
geograhical proximity problem. If the set of crawling peers in Apoidea is more
of a static nature these solutions could be applied but this poses unnecessary
constraints on the flexibility of Apoidea. The binning strategy described in
??s able to handle the joining and leaving of nodes on short time-scales and
does not pose any limitations on the flexibility of Apoidea.

3    Proposed Work
The proposed work for this project builds on the existing Apoidea system,
that provides a completely distributed web crawling system and uses Dis-

tributed Hash Table (DHT) based protocols. The goal for this project is to
improve the awareness of geographical proximity and implement the binning
scheme described in [RHKS02] for Apoidea. In order to achieve this goal
I think of updating the overall mapping scheme in Apoidea so that always
the peer that is geographically nearest to the domain actually crawls that
domain. This is fare more complex than providing only a local proximity
awareness where a peer just maintains a small list of peers that are geo-
graphically close to this peer and hands the crawling job to the peer being
closest to the domain in that list.

4    Plan of Action
The project will use existing code of Apoidea and extend that code. Apoidea
is implemented in Java which will also be used for the implementation of the
geographical proximity part. The following table shows a detailed overview
of what will be done in which week:

 Week        Action
 Week   1    Familiarize with Apoidea Structure and Source Code
 Week   2    Design for Proximity Implementation
 Week   3    Coding Geographical Proximity
 Week   4    More Coding Geographical Proximity
 Week   5    Test Implementation
 Week   6    Refine Implementation
 Week   7    Begin Evaluation of Implementation
 Week   8    Continue Evaluation and Update Implementation
 Week   9    Begin Final Report
 Week   10   Final Report

5    Evaluation and Testing Method
In order to evaluate the performance gain of exploiting geographical proxim-
ity in Apoidea I would like to do several experiments in which I will compare
Apoidea without geograhical proximity extensions with Apoidea with added

geographical proximity extensions. The test parameters will of course be the
same for both tests. After obtaining some quantitative results from these
experiments I will be able to show the performance gain from exploiting
geographical proximity in a quantitative way. If I have enough time these
experiments could be conducted with different number of peers and different
geographical locations of the domains being crawled.

[Fra00]    P. Francis. Yoid: Extending the internet multicast architecture,

[RHKS02] S. Ratnasamy, M. Handley, R. Karp, and S. Shenker.
         Topologically-aware overlay construction and server selection. In
         Proceedings of IEEE INFOCOM’02, 6 2002.

[SMK+ 01] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek,
          and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup
          service for internet applications. In Proceedings of the 2001 con-
          ference on applications, technologies, architectures, and protocols
          for computer communications, pages 149–160. ACM Press, 2001.

[SSLM03] A. Singh, M. Srivatsa, L. Liu, and T. Miller. Apoidea: A de-
         centralized peer-to-peer architecture for crawling the world wide
         web. In Proceedings of SIGIR 2003 Workshop on Distributed In-
         formation Retrieval, 2003.


To top