Result Verification and Trust-based Scheduling in Open Peer-to

Document Sample
Result Verification and Trust-based Scheduling in Open Peer-to Powered By Docstoc
					      Cluster Computing on the Fly
    Peer-to-Peer Scheduling
  of Idle Cycles in the Internet
Virginia Lo, Daniel Zappala, Dayi Zhou, Shanyu Zhao, and Yuhong Liu
                       Network Research Group
                         University of Oregon
Cycle Sharing Motivation
   A variety of users and their applications need
    additional computational resources
   Many machine throughout the Internet lie idle
    for large periods of time
    Many users are willing to donate cycles

   How to provide cycles to the widest range of
    users? (beyond institutional barriers)
Cycle Sharing Ancestors
   Condor - load sharing in one institution. -->
    Condor-G, Condor flocks
   Grid Computing - coordinated use of
    resources at supercomputer centers for large
    scale scientific applications --> peer-based
    Grid computing
   SETI@home - volunteer computing using
    client server model --> BOINC
Condor (U. of Wisconson, Livny 1990‟s)

         Online Access to
         Scientific Instruments
  Advanced Photon Source


real-time                         archival   desktop & VR clients
                                             with shared controls
collection                        storage

 tomographic reconstruction
       DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago
The 13.6 TF TeraGrid:
Computing at 40 Gb/s
                                   Site Resources   Site Resources
                               4      HPSS             HPSS

                                    External              External
                          8         Networks              Networks

                                   Caltech          Argonne

   Site Resources                  SDSC              NCSA/PACI                          Site Resources
                                   4.1 TF            8 TF
      HPSS                         225 TB            240 TB                            UniTree

TeraGrid/DTF: NCSA, SDSC, Caltech, Argonne                     
Layered Grid Architecture and Globus Tookit
(Foster and Kesselman)


                                                                        Internet Protocol Architecture
“Coordinating multiple resources”:
ubiquitous infrastructure services,        Collective
app-specific distributed services                       Application

“Sharing single resources”:
negotiating access, controlling use       Resource

“Talking to things”: communication
(Internet protocols) & security        Connectivity     Transport
“Controlling things locally”: Access
to, & control of, resources               Fabric           Link

CCOF Goals and Assumptions
     Cycle sharing in an open peer-to-peer environment

     P2P model; adopt principles from P2P networking

     Application-specific scheduling

     Long term fairness

     Hosts retain local control, sandbox
CCOF Research Issues
   Incentives and fairness
       What incentives are needed to encourage hosts to donate
       How to keep track of resources consumed v. resources
       How to prevent resource hogs from taking an unfair share?
   Resource discovery
       How to discover hosts in a highly dynamic environment
        (hosts come and go, withdraw cycles, fail)
       How to discover hosts that can be trusted, that will provide
        the needed resources?
CCOF Research Issues
   Verification, trust, and reputation
       How to check returned results?
       How to catch malicious or misbehaving hosts that change
        results with low frequency?
       Which reputation system?
   Application-based scheduling
       How does trust and reputation influence scheduling?
       How should a host decide from whom to accept work?
CCOF Research Issues
   Quality of service and performance monitoring
       How to provide local admission control?
       How to evaluate and provide QoS - guaranteed versus
        predictive service?
   Security
       How to prevent attacks launched from guest code running
        on the host?
       How to prevent denial of service attacks in which useless
        code occupies many hosts
Related Work
    Systems most closely resembling CCOF
     SHARP (Fu, Chase, Chun, Schwab, Vahdat, 2003)
     Partage, Self-organizing Flock of Condors (Hu, Butt, Zhang, 2003)
     BOINC (Anderson, 2003) - limited to donation of cycles to workpile)
    Resource discovery
    (Iamnitchi and Foster, 2002); Condor matchmaking
    Load sharing within and across institutions
     Condor, Condor Flocks, Grid computing
    Incentives and Fairness
     See Berkeley Workshop on Economics of P2P Systems
     OurGrid (Andrade, Cirne, Brasileiro, Roisenberg, 2003)
    Trust and Reputation
     EigenRep (Kamvar, Schlosser, Garcia-Molina, 2003); TrustMe(Singh
     and Liu, 2003)
Cycle Sharing Applications
Four classes of applications that can benefit
  from harvesting idle cycles:

   Infinite workpile
   Workpile with deadlines
   Tree-based search
   Point-of-Presence (PoP)
    Infinite workpile
   Consume huge amounts of compute time
   Master-slave model
   “Embarrassingly parallel”: no communication among

   Ex: SETI@home, Stanford Folding, etc.
    Workpile with deadlines
   Similar to infinite workpile but more moderate
   Must be completed by a deadline (days or weeks)
   Some capable of increasingly refined results given
    extra time

   Ex: simulations with a large parameter space, ray
    tracing, genetic algorithms
    Tree-based Search
   Tree of slave processes rooted in single master node
   Dynamic growth as search space is expanded
   Dynamic pruning as costly solutions are abandoned
   Low amount of communication among slave
    processes to share lower bounds

   Ex: distributed branch and bound, alpha-beta
    search, recursive backtracking
   Minimal consumption of CPU cycles
   Require placement of application code dispersed
    throughout the Internet to meet specific location,
    topological distribution, or resource requirements

   Ex: security monitoring systems, traffic analysis
    systems, protocol testing, distributed games
CCOF Architecture
CCOF Architecture
   Cycle sharing communities based on factors such
    as interest, geography, performance, trust, or generic
    willingness to share.

       Span institutional boundaries without institutional
       A host can belong to more than one community
       May want to control community membership
CCOF Architecture
   Application schedulers to discover hosts,
    negotiate access, export code, and collect and verify

       Application-specific (tailored to needs of
       Resource discovery
       Monitors jobs for progress; checks jobs for
       Kills or migrates jobs as needed
CCOF Architecture (cont.)
   Local schedulers enforce local policy
      Run in background mode v. preempt when
       user returns
      QoS through admission control and
       reservation policies
      Local machine protected through sandbox

      Tight control over communication
CCOF Architecture (cont.)
   Coordinated scheduling

       Across local schedulers, across application
       Enforce long-term fairness
       Enhance resource discovery through
        information exchange
CCOF Projects
   Wave Scheduler [JSSPP‟05, IPDPS‟06]
   Resource discovery [CCGRID‟03]
   Result verification [IEEE P2P‟05]
   Point-of-Presence Scheduler [HOTP2P‟05]
Wave Scheduler
   Well-suited for workpile with deadlines
   Provides on-going access to dedicated
    cycles by following night timezones
    around the globe
   Uses a CAN-based overlay to organize
    hosts by timezone
Wave Scheduler
Resource Discovery
(Zhou and Lo, WGP2P‟04 at CC-Grid „04)

   Highly dynamic environment (hosts come, go)
   Hosts maintain profiles of blocks of idle time

Four basic search methods
 Rendezvous points

 Host advertisements

 Client expanding ring search

 Client random walk search
Resource Discovery

   Rendezvous point best
    high job completion rate and low msg
    overhead, but favors large jobs under heavy

   ==> coordinated scheduling needed for long
    term fairness
CCOF Verification
Goal: Verify correctness of returned results for
  workpile and workpile with deadline

   Quizzes = easily verifiable computations that are
    indistinguishable from the actual work
   Standalone quiz v. Embedded quizzes
   Quiz performance stored in reputation system

   Quizzes v. replication
Point-of-Presence Scheduler
   Scalable protocols for identifying selected hosts in
    the community overlay network such that each
    ordinary node is k-hops from C of the selected

   (C,k) dominating set Problem

   Useful for leader election, rendezvous point
    placement, monitor location, etc.
    CCOF Dom(C,k) Protocol

   Round 1: Each node says HI to k-hop neighbors
    <Each node knows size of its own k-hop neighborhood>
   Round 2: Each node sends size of its k-hop neighborhood to
    all its neighbors.
    <Each node knows size of all nbrs k-hop nbrhoods.>
   Round 3: If a node is maximal among its nbrhood,
    it declares itself a dominator and notifies all nbrs.
    <Some nodes hear from some dominators, some don‟t>

For those not yet covered by C dominators, repeat Rounds 1-3
   excluding current dominators, until all nodes covered.
        Direct Research Project
Result Verification and Trust-based
 Scheduling in Open Peer-to-Peer
      Cycle Sharing Systems
    Student: Shanyu Zhao
    Committee Members:
         Prof. Virginia Lo (Advisor)
         Prof. Jun Li
         Prof. Sarah Douglas
     CIS Department, University of Oregon
                Dec. 2, 2004
What‟s this DRP about?
   Problem: in open peer-to-peer cycle
    sharing systems, computational results
    could be cheated
   Current strategy: Replication
   We propose:
       Quiz as an alternative verification scheme
       Trust-based Scheduling to avoid
        scheduling on malicious hosts, and to lower
        the cost of verification
   Result Cheating is a Serious Problem
   Result Verification: Replication vs. Quiz
   Math Analysis of Repl. and Quiz
   Trust-based Scheduling
   Simulations
   Related Work
   Conclusion and Future Work
Berkeley‟s SETI@Home
   Uses Internet-connected computers in
    the Search for Extraterrestrial
    Intelligence (SETI)
   Users download and run programs
    (screen saver) to participate
   Currently
       Over 5 million users
       Utilize 1000 years of CPU time every day
       Faster than IBM ASCI White
Cheating in SETI@Home
To gain higher ranking on the website:
 Some team fakes the number of work
  units completed
       Let every team member return a copy of
        the result for the same task
   Unauthorized patch codes to make SETI
    run faster but return wrong results

Current Strategies
   Specialized Schemes: e.g. Ringer,
    Encrypted Function
       They have restrictions on the type of
   Generic Schemes: Replication
       Susceptible to collusion, e.g.:
            Use Distributed Hash Table
            Patches that return wrong results
       Fixed high overhead
   Result Cheating is a Serious Problem
   Result Verification: Replication vs. Quiz
   Math Analysis of Repl. and Quiz
   Trust-based Scheduling
   Simulations
   Related Work
   Conclusion and Future Work
Result Cheating Models
   Three basic types:
       Type I: foolish cheater, always returns wrong
       Type II: ordinary cheater, returns wrong results
        with fixed probability
       Type III: smart cheater, performs well for a period,
        then turns to Type II
   Non-colluding vs. Colluding Scenarios
   Static vs. Dynamic Scenarios

Quiz: A Result Verification Scheme
   A quiz is an indistinguishable task
    with result known to the client
   Scheduling: mix t tasks and m quizzes
    to form a package. Send the package to
    a host
   Verification: discard all task results in a
    package if any quiz failed, otherwise
    accept them all

Replication vs. Quiz
   Different principle:
       Replication: a voting principle
       Quiz: a sampling principle
   Quiz does not suffer from collusion
   With unlimited resources, Quiz may
    have unfavorable turnaround time

   Accuracy and Overhead Comparison
   of Repl. And Quiz


Accuracy vs. Quiz Ratio / Repl. Factor

                             30% TypeII
                             s=6 for quiz

   Result Cheating is a Serious Problem
   Result Verification: Replication vs. Quiz
   Math Analysis of Repl. and Quiz
   Trust-based Scheduling
   Simulations
   Related Work
   Conclusion and Future Work
System Model
   Philosophy: Trust-but-verify!
       Use reputation systems to choose hosts
       Reduce verification overhead for trusted hosts

Reputation System Classification
   Two steps to calculate reputation:
       Formation of local trust value
       Formation of aggregated trust value
   Classification by degree of trust info
    sharing (aggregation):
       Local Sharing
       Partial Sharing
       Global Sharing

Five Reputation Systems
   Local Sharing
       Local Reputation System
            Private trusted list and optional black list
   Partial Sharing
       NICE Reputation System
            Tree-based search for recruitment when
             trusted list is exhausted
       Gossip Reputation System
            Periodically ask most trusted peers in the
             trusted list, for their most trusted peers.
Five Reputation Systems (cont)
   Global Sharing
       Global Reputation System
            A shared trusted list and blacklist by all peers,
             trust values updated by every transaction.
       EigenTrust Reputation System
            A shared trusted list and blacklist, trust values
             are updated periodically by the algorithm
             described in the EigenTrust paper.

       Trust-based Replication Algorithm
Scheduling                            Verification and Repu System Updating
while(task queue not empty):          for each task scheduled:
for each task in task queue:          upon receiving the result set from all replicas:
a. fetch a most trusted free          a. cross check the results,
host(aid by repu sys);                if(# of majority results > (k + 1)/2) then
                                        accept the majority result;
b. calculate the replication factor
k, and determine the # of extra       else
hosts needed, say c;                    reject all results, reschedule that task later;
                                      b. update reputation system,
c. pick c most trusted free
hosts(aid by repu sys),               if(task result is accepted) then
                                        for each host H who was chosen in replication:
if(not enough hosts are
                                        if(host H gave majority result) then
allocated) then
                                          increase trust value for H;
  stall for a while;
                                        else if(host H gave minority result) then
                                          decrease trust value for H;
       Trust-based Quiz Algorithm
Scheduling                           Verification and Repu System Updating
while(task queue not empty):         for each package scheduled:
a. fetch a most trusted free host    upon receiving all results from the host H:
H(aid by repu sys);                  a. check the results for quizzes in that
b. calculate the quiz ratio r, and   package,
determine the # of tasks to          if(all results for quizzes are correct) then
schedule, say t, and the # of          accept all the results in that package;
quizzes to be inserted, say m;
c. pick t tasks from task queue,
mixed with m quizzes                   reject all results, reschedule later;
                                     b. update reputation system,
                                     if(task result is accepted) then
                                       increase trust value for H;
                                       decrease trust value for H;                49
Accuracy On Demand
   Clients want to have a desired level of
   Accuracy of Trust-based Scheduling
    depends on:
       Trust value of the selected host
       Quiz ratio / Replication factor in the
        verification scheme
   AOD dynamically adjusts quiz ratio or
    replication factor based on trust value
  AOD Example
     Calculate how many quizzes (m‟) should be in
      a package (s), to obtain accuracy A:

The relation
                                1  p  p  f ( x)(1  x) m 1 dx
                           A              0
between A and m:                               1
                                1  p  p  f ( x)(1  x) m dx
v: how many quizzes this
host has passed, derived   m  m  v
from trust value:

Assume p=0.5, f(x)=1:                    1
                                m          v2
                                        1 A
   Result Cheating is a Serious Problem
   Result Verification: Replication vs. Quiz
   Math Analysis of Repl. and Quiz
   Trust-based Scheduling
   Simulations
   Related Work
   Conclusion and Future Work
Simulation Models
   Cycle Sharing Model
          1000 peers, a peer acts as both a client and a host
          Dedicated, no private job
          Tasks have the same length
          Topology not considered, every two peers are connected

   Synthetic Task Generation Model
                # of tasks      task runtime     Inter-arrivals
Syn1        Normal, u=20      1 round          Exponential
                                               u=50 rounds
Syn2        Normal,           1 round          Exponential
            u~Exponential                      u=50 rounds
Simulation Models
   Trust Function (to adjust local trust value)
       Linear Increase Sudden Die (LISD)
       Additive Increase Multiplicative Decrease (AIMD)
       Blacklisting
   Quiz Ratio / Replication Factor Function


                       Threshold    TrustVal               54
Simulation Metrics
   Accuracy
    # of tasks with correct results
    # of tasks accepted by the clients

   Overhead
    # of quizzes or replicas + # of rejected tasks
    # of tasks accepted by the clients

Simulation Results
   Contribution of different reputation
   Replication vs. Quiz
   Inspecting trust functions: LISD, AIMD
    and Blacklisting
   Accuracy on demand

Replication Overhead Reduced by
Reputation Systems

                          No collusion
                          30% Type II
                          Repl factor: 1

  Quiz: Accuracy Boosted to 1, Overhead
  Reduced by Reputation Systems

30% Type II
Quiz ratio: [0.05,1]
Quiz Accuracy Converge to 1 After Break
Out of Type III Malicious hosts

                               30% Type III
                               Quiz ratio:

    Replication vs. Quiz

Local repu system
Calculated among the first 500000 tasks
Quiz ratio, repl factor: [0.05, 1]        60
Inspect Trust Functions for Quiz:
LISD vs. Blacklisting

                            30% TypeII
                            System Load:
                            Quiz ratio:
                            [0.05, 1]

    Accuracy On Demand

50% TypeII, b is evenly distributed
Global repu system
   Result Cheating is a Serious Problem
   Result Verification: Replication vs. Quiz
   Math Analysis of Repl. and Quiz
   Trust-based Scheduling
   Simulations
   Related Work
   Conclusion and Future Work
  Related Work
  on Cycle Sharing Systems
     Three Threads
         Grid computing
         Peer-to-Peer cycle sharing
         Computational network overlay
     Taxonomy of Current Research Projects

                        Open                Institution-based

     P2P      CCOF, Partage     Flock of Condor, SHARP,
  Paradigm                      OurGrid

Client-Server SETI@Home, BOINC, Globus, Condor-G,
                 Folding@Home             XenoServer
Related Work
on Result Verification (1)
Encrypted Function, Sander, 1998
 Basic Idea

       Client sends to host 2 functions: f, and E(f)
       Host returns f(x) and E(f)(x)
       Then verify if P(E(f)(x)) = f(x)
   But, find an encrypted function for a
    general computation is hard

Related Work
on Result Verification (2)
Ringer Scheme, Golle, 2001
   Basic Idea
      f(x) is an one-way function, host should
       compute f(x) for all x in D
      Client pre-compute several values yi=f(xi)

      Client sends all the yi, host should return
       every possible xi
   But, only applies when f(x) is strictly a
    one-way function

Related Work
on Result Verification (3)
Uncheatable Grid Computing, Du, 2003
 Basic Idea
       host should compute f(x) for all x in D, |D|: 2^40
       Host build a Merkle Tree, send the root of the tree (a hash
        value) to the client
       Client randomly choose some x, let the host give proof in
        the merkle tree.
   Drawback:
       Strong restriction on computation model
       The building of a huge merkle tree is costly
       Only combat the cheating of incompletely computing

Related Work
on Reputation Systems (1)
   Centralized Reputation Systems, eBay…
   Improved Gnutella protocol, Damiani,
       Add a reputation polling phase
   EigenTrust, Kamvar, 2003
       Form global reputation value by calculating
        the Eigen vector of the trust matrix
       Use a DHT to decide mother nodes who
        take care of the calculation
Related Work
on Reputation Systems (2)
   TrustMe, Singh, 2003
       Bootstrap server decide THAs for a node
       Flooding trust query, anonymous THAs
        return trust value for the nodes
   NICE, Lee, 2003
       Infer trust value by a trust chain
   Weighted Majority Algorithm, Yu, 2004
       Assign weights to advisors, adjust weights
        based on the quality of advice
Conclusion & Future Work
   Contribution
       Quiz: a result verification scheme
       Math analysis of the two verification
       Trust-based scheduling, “trust-but-verify”
   Future Work
       Quiz Generation
       Accuracy on Demand, how to estimate
        fraction of malicious host and distribution
        of b.                                         70
Thank You!

   Adjust Quiz Ratio or Replication Factor

Gossip repu system
30% malicious hosts

Shared By: