Paxos majority by alicejenny

VIEWS: 0 PAGES: 33

									Paxos
                   Topics
 Distributed consensus
               Readings
 Paxos Made Simple, Leslie Lamport.
 Appears in ACM SIGACT News
 (Distributed Computing Column), Vol. 32,
 No. 4 (December 2001), pages 51-58.
 http://research.microsoft.com/en-
 us/um/people/lamport/pubs/paxos-
 simple.pdf Also,
             Introduction
 Bigtable requires Chubby
 Chubby uses a protocol called Paxos
         Consensus Problem
 Assume a collection of processes can
  propose values
 A consensus algorithm ensures that a single
  one among the proposed values is chosen
                       Example
 Replicate for availability
 Ensure all replicas see same ops in same order
 Primary orders requests, forwards to replicas
 All nodes must agree on primary
 All nodes must agree on view
   Participant   with lowest address in view is primar
               Requirements
 Safety
    Only a value that has been proposed may be
     chosen
    Only a single value is chosen
    A process never learns that a value has been
     chosen unless it actually has been
 Liveness
    Ensure that some proposed value is eventually
     chosen
    If a value has been chosen, then a process can
     eventually learn the value
       What is Paxos protocol?
 Paxos is a simple protocol that a group of
  machines in a distributed system can use to
  agree on a value proposed by a member of
  the group.
 Assumptions
     Asynchronous
       • Processes operat at arbitary speed
     Non-Byzantine model
       • Processes operate at arbitrary speed
       • Fail by stopping
     Processes may fail and the restart; this
      requires that information can be remembered
                  Roles
 Proposer: offer proposals of the form
  [value, number].
 Acceptor: accept or reject offered
  proposals so as to reach consensus on the
  chosen proposal/value.
 Learner: become aware of the chosen
  proposal/value.
 A process can take on all roles
                 Approach 1
 Designate a single process X as acceptor
  (e.g. one with smallest identifier)
    Each proposer sends its value to X
    X decides on one of the values
    X announces its decision to all learners

 Problem?
    Failure of the single acceptor halts decision

 Need multiple acceptors!
                  Approach 2
 Each proposer propose to all acceptors
 Each acceptor accepts the first proposal it
  receives and rejects the rest
 If the proposer receives positive replies
  from a majority of acceptors, it chooses
  its own value
     There is at most 1 majority, hence only a single
      value is chosen
 Proposer sends chosen value to all learners
               Approach 2
 Problem:
   What if multiple leaders propose simultaneously
    so there is no majority accepting?
   What if the process fails?
               Paxos’ solution
 Each acceptor must be able to accept
  multiple proposals
 Order proposals by proposal number
     If a proposal with value v is chosen, all higher
      proposals have value v
Paxos Operation: Process State
 Each node maintains:
   na, va: highest proposal number accepted and its
    corresponding accepted value
   nh: highest proposal number seen
   myn: node’s proposal number in the current
    Paxos
          Paxos Operations
 Choosing a proposal number:
   Use last known proposal number + 1, append
    process’s identifier
              Paxos Operation
 Phase 1 (Prepare)
   A node decides to propose
   Proposer choose myn > nh
   Proposer sends <prepare, myn> to all nodes
   A node receiving <prepare, n> has this logic
      If n < nh
         reply <prepare-reject>
      Else
         nh = n                      This node will not accept
         reply <prepare-ok, na,va>   any proposal lower than n
               Paxos Operation
 Phase 2 (Accept):
     If a proposer gets prepare-ok from a majority
       • V = non-empty value corresponding to the highest na
         received
       • If V= null, then proposer can pick any V
       • Send <accept, myn, V> to all nodes
     If proposer fails to get majority prepare-ok
       • Delay and restart Paxos
     Upon receiving <accept, n, V>
       If n < nh
          reply with <accept-reject>
       else
         na = n; va = V; nh = n
          reply with <accept-ok>
               Paxos Operation
 Phase 3 (Decide)
     If proposer gets accept-ok from a majority
       • Send <decide, va> to all nodes
     If leader fails to get accept-ok from a majority
       • Delay and restart Paxos
           Paxos: Timeouts
 All processes wait a maximum period
  (timeout) for messages they expect
 Upon timeout, a process starts again
       Paxos with One Leader, No Failures:
                    Phase 1

               myn = 1
             “prepare(1,1)”
       0          1            2    3     4

na     -1        -1           -1    -1    -1

va     nil       nil          nil   nil   nil

nh     -1        -1           -1    -1    -1

done   F          F           F     F     F
       Paxos with One Leader, No Failures:
                    Phase 1


             “prepare-accept(-1, nil)”
       0              1                  2    3     4

na     -1            -1                  -1   -1    -1

va     nil          nil              nil      nil   nil

nh     1            1                1        1     1

done   F             F                   F    F     F
       Paxos with One Leader, No Failures:
                    Phase 2

             prepare-accept from
             majority! all v’s nil

       0            1                 2    3     4

na     -1          -1                -1    -1    -1

va     nil        nil                nil   nil   nil

nh     1          1                  1     1     1

done   F           F                 F     F     F
       Paxos with One Leader, No Failures:
                    Phase 2

             “accept(1,1,1)”

       0              1         2    3     4

na     -1            -1        -1    -1    -1

va     nil           nil       nil   nil   nil

nh     1             1         1     1     1

done   F              F        F     F     F
       Paxos with One Leader, No Failures:
                    Phase 2

             “accept(1,1,1)”

       0              1         2    3     4

na     -1            -1        -1    -1    -1

va     nil           nil       nil   nil   nil

nh     1             1         1     1     1

done   F              F        F     F     F
       Paxos with One Leader, No Failures:
                    Phase 2

              accept from
              majority
       0            1       2    3       4

na     1         1          1   1       1

va     1        1           1   1       1

nh     1         1          1   1       1

done   F         F          F    F       F
       Paxos with One Leader, No Failures:
                    Phase 3


              Send (decide,1)
       0            1           2   3    4

na     1         1              1   1   1

va     1        1               1   1   1

nh     1         1              1   1   1

done   F          F             F   F    F
       Understanding Paxos
 What if we get two nodes that send a
  prepare message?

 What if a proposer fails while sending
  accept?

 What if a proposer fails after sending
  prepare-ok?
        More Than Proposer
 Can occur after timeout during Paxos
  algorithm, partition, lost packets
 Two proposers must use different n in
  their prepare messages.
 Suppose two proposers have proposals 1, 2
     More Than One Proposer
 Proposal 1 gets to all nodes which is then
  followed by proposal 2
 In both cases a prepare-ok message is sent
 Both proposes will send a accept message
 However, for proposal 1 an accept-reject
  message is sent
 Proposer Fails Before Sending
            Accept
 Some process will time out and become a
  propose
 Old proposer didn’t send any decide, so no
  risk of non-agreement
         Risks: Leader Failures
 Suppose proposers fails after sending
  minority of accept
     Same as two proposers!
 Suppose proposer fails after sending
  majority of accept
     Same as two leaders!
             Process Fails
 Process fails after receiving accept and
  after sending accept-ok
 Process should remember va and na on disk
 If process doesn’t restart, possible
  timeout in Phase 3, new leader
               Summary
 Distributed consensus protocols are
  important
 We have studied Paxos

								
To top