p2p by 4a0Bz8


									Peer-to-Peer Computing

          CSC8530 – Dr. Prasad
          Jon A. Preston
          April 21, 2004

   Overview of Peer-to-peer computing
   Parallel Downloading
   Peer-to-Peer Media Streaming
   References

   Collaborative Software Engineering
Peer-to-Peer Computing

   Autonomy from centralized servers
   Dynamic (peers added & removed

   File Sharing (KaZaA – outpaces Web traffic,
    3,000 terabytes, 3 million up peers)
   Communication (instant messenger)
   Computation (seti@home)
Peer-to-Peer Computing (cont)

   De-centralized data sharing
   Dynamic growth of system capacity
   Various data lookup/discovery schemes
    –   Centralized directory servers (Napster)
    –   Controlled request flooding (Gnutella)
    –   Hierarchy with supernodes (KaZaA)
   Heterogeneous collection of peers
    –   Need a way of encouraging reporting of true outgoing
Worldwide Computer
(P2P Computation)

   “Moonlight” your computer
   Share/lease processor and storage
   Process others’ simulations, etc.
   Archive other’s files (even when computer off)
   Receive micropayments for services rendered
   PC is component of worldwide computer
   “Internet-scale OS” – centralized structure
    –   Must allocate resources, coordination, security/privacy, etc.
Parallel Downloading

   Potential widespread utilization on P2P
   Past work shows parallel downloading (PD)
    has higher aggregated downloading
   Shorter download times by clients
Communication in PD

   Client must determine segments of file for
    each server request
   Alternative: “Tornado Code”
    –   Servers keep sending until client says “enough”
    –   Requires less communication about quantity and
        which part of the file the client wants
    –   Does require high buffering on client (entire file)
Parallel vs. Sequential Download

   Parallel incurs non-trivial cost
    –   Synchronization
    –   Coordination
    –   Encoding/decoding
   Adopt PD if download performance improves
Large-Scale Deployment of PD

   Koo et al developed a model in May 2003
    that shows SD is better than PD
    –   Assumes that Capacityservers >> Capacityclients
    –   Homogenous network
    –   Analyzed average download time
    –   Performance is similar, but SD requires less
Peer-to-Peer Media Streaming

   Peer-to-peer file sharing
    –   Act as server and client
    –   “Open-after-download”
   Media Streaming
    –   “Play-while-downloading”
    –   Subset of peers “owns” a media file
    –   These peers stream media to requesting peers
    –   Recipients become supplying peers themselves
Characteristics of P2P Media
Streaming Systems

   Self-growing – requesting peers become supplying
    peers (total system capacity grows)
   Serverless – each peer is not to act as server (open
    large number of simultaneous/client connections)
   Heterogeneous – peers contribute different
    outbound connection bandwidths
   Many-to-one – many supplying peers to one real-
    time playing client (hard deadlines)
Two Problems

   Media data assignment

   Fast amplification
Media Data Assignment

   Given
    –   Requesting peer
    –   Multiple supplying peers
    –   Heterogeneous outbound bandwidth on suppliers
   Determine
    –   Subset of media to request from each supplier

               A      B      C        D
Buffer Delays

Buffer delay depends
upon the ordering
of which segments of
the media file to obtain
from each supplying
Fast Amplification

   Differential selection algorithm
    –   Favor higher-class (higher outbound bandwidth)
    –   Ultimately benefit all requesting peers
    –   Should not starve any lower-class peer
    –   Enforced via pure distributed algorithm
    –   Probability of selection proportional to requesting
        peer’s promised outbound bandwidth
Selection Algorithm

   Each supplying peer
    –   Determines which requesting peer to serve
    –   Maintains probability vector – one entry per class
        of peers (class defined by bandwidth)
    –   Receives “reminders” from peers
            If supplier (Ps) is busy, it can receive a reminder from
             requesting peer (Pr)
            This reminder tells the supplier to remember the
             requesting peer (Pr) and not elevate other peers in
             classes below Pr when current service complete
Admission Probability Vector

   One entry per class-i set of peers
   If not busy, Ps grants request of Pr with probability
    Pr[i], where i = class of Pr
   If Ps is a class-k peer, Pr[i] defined as follows
    –   For i < k, Pr[i] = 1.0 (favored class)
    –   For i >= k, Pr[i] = 1/(2i-k)
   If idle, elevate non-favored (and non-served) entries
    by factor of 2 (i.e. Pr[i] = Pr[i] * 2)
   Use reminders to effect what happens after service
    completed (raise or not)
Making a Request

   Knows candidate supplying peers {Ps1, Ps2, … Psn}
   Pr will be admitted if it obtains permission from
    enough suppliers such that aggregated outbound
    bandwidth sufficient to service request
    –   Requesting peer then computes media data assignment
   If not admitted, send “reminders” to busy supplying
    peers that favor Pr. Backoff exponentially.
   When request is finished, Pr becomes a supplying
    peer, increasing the overall system capacity.
Differential Acceptance Results
Non-differential Acceptance Results
   Simon Koo, Catherine Rosenberg, Dongyan Xu, "Analysis of Parallel
    Downloading for Large File Distribution", Proceedings of IEEE
    International Workshop on Future Trends in Distributed Computing
    Systems (FTDCS 2003), San Juan, PR, May 2003.
   Dongyan Xu, Mohamed Hefeeda, Susanne Hambrusch, Bharat
    Bhargava, "On Peer-to-Peer Media Streaming", Proceedings of IEEE
    International Conference on Distributed Computing Systems (ICDCS
    2002), Wien, Austria, July 2002
   Ripeanu, M. Peer-to-peer architecture case study: Gnutella network. In
    International Conference on Peer-to-peer Computing (2001).
   J. Kangasharju, K.W. Ross, D. Turner, Adaptive Content Management
    in Structured P2P Communities, 2002,
   Androutsellis-Theotokis S. Whitepaper: A Survey of Peer-to-Peer File
    Sharing Technologies, Athens University of Economics and Business,
    Greece, 2002.
Collaborative Software Engineering

   Overview of Collaborative Computing
   Synchronous and Asynchronous
   Notification Algorithms
   Distributed Mutex
   Achieving “undo” and “redo”
   Transparencies vs. Awareness
   Distributed Software Engineering
Overview of Collaborative Computing

   Utilize computing to improve workflow and
    –   Shared displays/applications
    –   Online meetings
    –   Collaborative development (configuration
    –   Minimize impact of physical distance
   Collaboratories
    –   Emulate scientific labs
Synchronous and Asynchronous

   Synchronous
    –   Same time, different place
    –   ICQ, Chat, etc.
    –   Can store session
   Asynchronous
    –   Different time, same/different place
    –   Email, newsgroups, web forums
    –   Store session, replay
Notification Algorithms

   Unicast
    –   Latency potential issue
   Multicast
    –   Significant bandwidth consumption
    –   Network flooding
   Frequency
    –   Synchronous implies high frequency of change notifications
    –   Asynchronous implies low frequency of change notifications
   Granularity
    –   Differentials or whole state
    –   How to incorporate new users (latecomers)
Distributed Mutex

   Token-based
    –   Only the process that holds the token can enter the critical
    –   Transmission of token algorithm (round-robin, hold & wait
        for request)
    –   How does a process know where to request token?
   Permission-based
    –   Sends request to enter CS to other processes
    –   Other processes get to “vote”
    –   Process enters CS only if it achieves enough votes
Achieving “undo” and “redo”

   Particularly important in collaborative systems
     –   High level of “what if” inherent in the system
     –   Others might adversely affect someone else’s work
   In OO-based systems, undo and redo are inverses of each
   In text-based systems, insert and delete are inverses of each
   In bitmap-based systems, undo and redo are not so easy
     –   Save entire image (too much space)
     –   Save only differential area (replay sequence of actions to recreate
Transparencies vs. Awareness

   Does the application know about the collaboration or
    –   Transparencies
            Communication layer sits on top of the application
            Useful for sharing legacy systems
            Have no access to source (or cannot modify it)
            Negative – no concurrency (one input/output at a time)
    –   Aware Applications
            Collaboration integrated into the application
            Requires centralized execution with distributed I/O
            Or requires a homogeneous architecture (same client on each
             users’ machine)
Distributed Software Engineering

   Synchronous and asynchronous
   Provide meta view of others in system
   Allow for viewing of entire current system
   Fine-grain source locking/check-out
   Provide sandbox for developers to test/build
    local source
   How do we improve concurrency?
Handling Concurrent Development

   Split-combine (low level of concurrent
   Copy-merge (high level of concurrency,
    problematic to merge)

To top