The Ensemble system by jianghongl


                                                                                 • The Ensemble system
                      The Ensemble system                                          – Introduction
                                                                                   – Architecture and Protocols
                                                                                   – How does Ensemble achieve the group
                                                                                     communication properties ?
                  Phuong Hoai Ha
                                                                                 • Programming with Ensemble in C
                                                                                   – Framework

                       Introduction to Lab. assignments
                              March 17th, 2006                                                           DSII: Ensemble                  2

                    Ensemble history                                                    The Ensemble System
    • Three generations                                                          • A library of protocols that support group
         – ISIS                                                                    communication.
            • a fixed set of properties for applications                         • Ensemble Provides:
         – Horus                                                                   – Reliable communication,
            • more flexible through modular architectures (layers)                 – Group membership service,
         – Ensemble                                                                – Failure detector,
            • adaptive protocols , performance, formal analysis.                   – Secure communication.

                                   DSII: Ensemble                            3                           DSII: Ensemble                  4

          Group membership service                                                     Reliable communication
•    Endpoints
                                                                                 • Multicast communication
     –    Abstraction for process’ communication.
                                                                                   – Messages are delivered by all group member in the
                                                                                     current view of the sender.
•    Groups                                                                        – Based on IP-multicast
     –    Just a name for endpoints to use when communicating
          ⇒  do not change
     –    Corresponding to a set of endpoints that coordinate to provide a       • Point-to-Point communication
                                                                                 • Properties:
•    Views                                                                         – Virtual synchrony
     –    A snapshot of the group membership at a specified point
                                                                                   – Stability
          ⇒  may change from time to time
                                                                                   – Ordering
     –    Maintaining membership

                                   DSII: Ensemble                            5                           DSII: Ensemble                  6
                Virtual synchrony
                                                                                  Virtual Synchrony
• Another name: View-synchronous group
  communication (previous talk)                                      V0={a,b,c}         V1={a,b,c,d}      V2={b,c,d}               V3={b,c,d,e}

• Properties:                                                    a                                      crash
  – Integrity                                                    b
     • A process delivers a message at most once.
  – Validity
     • Correct processes always deliver the messages that        d
                                                                       d want to join
       they send.                                                e
  – Agreement                                                                                                           e want to join
     • Correct processes deliver the same set of messages
       in any given view.
                        DSII: Ensemble                      7                                          DSII: Ensemble                                       8

                   Schedule                                                                Infrastructure
• The Ensemble system                                            • Layered protocol
                                                                   architecture                                                                         low
   – Introduction                                                     – All features are                                                             frequency

   – Architecture & Protocols                                           implemented as micro-
                                                                        protocols/layers                                                   Gmp
   – How does Ensemble achieve the group                              – A stack/combination ~ a                                           Synch
                                                                        high-level protocol
     communication properties ?                                                                                                          Top_appl
                                                                 • A new stack is created for                                            Sequencer
                                                                   a new configuration at
• Programming with Ensemble in C                                   each endpoint                                                          Stable
   – Framework                                                   • Ability to change the                         Slander
                                                                   group protocol on the fly

                        DSII: Ensemble                      9                                          DSII: Ensemble                                      10

         Messages vs. events                                               Interface between layers

                        DSII: Ensemble                      11                                         DSII: Ensemble                                      12
                          Layers                                                                            Stacks
 • Layers are implemented as a set of                                         • Combinations of layers that work together to
   callbacks that handle events passed to                                       provide high-level protocols
   them.                                                                      • Stack creation:
    – Each layer gives the system 2 callbacks to                                 – A new protocol stack is created at each endpoint of a
      handle events from its adjacent layers                                       group whenever the configuration (e.g. the view) of the
    – Layers use 2 callbacks of its adjacent layers                                group changes.
      for passing events.                                                        – All endpoint in the same partition receive the same
                                                                                   ViewState record to create their stack:
                                                                                     •   select appropriate layers according to the ViewState
 • Each instance of a layer maintain a private                                       •   create a new local state for each layer
   local state.                                                                      •   compose the protocol layers
                                                                                     •   connect to the network

                             DSII: Ensemble                              13                                    DSII: Ensemble                                     14

                       Schedule                                                                   The basic stack
                                                                              • Each group has a                Layers             Functionality
 • The Ensemble system                                                          leader for the
    – Introduction                                                              membership protocol.             Gmp            Membership algorithm (7 layers)

                                                                                                               Slander          Failure suspicion sharing
    – Architecture & Protocols
                                                                                                                Synch           Block during membership change
    – How does Ensemble achieve the group
                                                                                                              Top_appl          Interface to the application
      communication properties ?
                                                                                                              Sequencer         Total ordering
                                                                                                               Suspect          Failure detector
 • Programming with Ensemble in C                                                                               Stable          Stability detection
    – Framework                                                                                                  Mnak           Reliable fifo
                                                                                                                Bottom          Interface to the network

                             DSII: Ensemble                              15                                    DSII: Ensemble                                     16

                Failure detector                                                                           Stability
• Suspect layer:                                                              • Stable layer:
  – Regularly ping other members to check for suspected
    failures                                                                     – Track the stability of multicast messages
  – Protocol:                                                                    – Protocol:
     • If (#unacknowledged Ping messages for a member > threshold)
              send a Suspect event down                                             • Maintain Acks[N][N] by unreliable multicast:
                                                                                          – Acks[s][t]: #(s’ messages) that t has acknowledged
• Slander layer:                                                                          – Stability vector
  – Share suspicions between members of a partition                                         StblVct = {(minimum of row s): ∀s}
     • The leader is informed so that faulty members are removed, even                    – NumCast vector
       if the leader does not detect the failures.
                                                                                            NumCast = {(maximum of row s): ∀s}
  – Protocol:
     • The protocol multicasts slander messages to other members                    • Occasionally, recompute StblVct and NumCast, then
       whenever receiving a new Suspect event                                         send them down in a Stable event.

                             DSII: Ensemble                              17                                    DSII: Ensemble                                     18
                        Reliable multicast                                                               Ordering property
 • Mnak layer:                                                                             • Sequencer layer:
       – Implement a reliable fifo-ordered multicast protocol                                  – Provide total ordering
             • Messages from live members are delivered reliably                               – Protocol:
             • Messages from faulty members are retransmitted by live                            • Members buffer all messages received from below in
                                                                                                   a local buffer
       – Protocol:                                                                               • The leader periodically multicasts an ordering
             • Keep a record of all multicast messages to retransmit on                            message
                                                                                                 • Members deliver the buffered messages according to
             • Use Stable event from Stable layer:                                                 the leader’s instructions
                  – StblVct vector is used for garbage collection
                  – NumCast vector gives an indication to lost messages          ⇒
                    recover them                                                           • See Causal layer for causal ordering

                                         DSII: Ensemble                               19                             DSII: Ensemble                        20

       Maintaining membership (1)                                                                Maintaining membership (2)
• Handle Failure by splitting a group into several subgroups: 1                            •   Recover failure by merging non-primary
  primary and many non-primary (partitionable)
• Protocol:                                                                                    subgroups to the primary subgroup
     – Each member keeps a list of suspected members via Suspect layer
     – A member shares its suspicions via Slander layer                                    •   Protocol:
     – View leader l:                                                                             l: local leader, r: remote leader
         •   collect all suspicions
         •   reliably multicast a fail(p i0,…,pik) message                                       1. l synchronizes its view
         •   synchronize the view via Synch layer
         •   Install a new view without p i0,…,pik
                                                                                                 2. l sends a merge request to r
     – A new leader is elected for the view without leader                                       3. r synchronizes its view
         • If pk in view V1 suspects that all lower ranked members are faulty, it
           elects itself as leader and does like l.                                              4. r installs a new view with its mergers and sends the
         • A member that agrees with p k, continues with pk to the new view V2 with                 view to l
           pk as the leader.
         • A member that disagrees with pk, suspects pk.                                         5. l installs the new view in its subgroup

                                         DSII: Ensemble                               21                             DSII: Ensemble                        22

                                                                                                         Virtual synchrony
                           Join Group
                                                                                           • Achieved by a simple leader-based protocol:
     V0={a,b,c}           V1={a,b,c,d}       V2={b,c,d}               V3={b,c,d,e}             – Idea:
 a                                        crash                                                  • Before a membership change from V1 to V2 all
                                                                                                   messages in V1 must become stable
                                                                                               – Protocol: before any membership change
                                                                                                 • The leader activates the Synch protocol ⇒  the set,
 d                                                                                                 MV1, of messages needed to deliver in V1 is bounded.
       d want to join
 e                                                                                               • The leader waits until live members agree on MV1 via
                                                                                                   sending negative acknowledgements and recovering
                                                          e want to join
                                                                                                   lost messages (i.e. StblVct = NumCast)

                                         DSII: Ensemble                               23                             DSII: Ensemble                        24
                    Virtual Synchrony
                                                                                                        • The Ensemble system
       V0={a,b,c}         V1={a,b,c,d}      V2={b,c,d}               V3={b,c,d,e}                            – Introduction
   a                                      crash                                                              – Architecture & Protocols
   b                                                                                                    • Programming with Ensemble in C
   c                                                                                                         – Framework
   d                                                                                                    • Examples
         d want to join
                                                          e want to join

                                         DSII: Ensemble                                          25                                        DSII: Ensemble                                              26

                                   Framework                                                                        Environment variables
                                            main( argc, argv){
                                               ce_appl_intf_t *intf;                                  • Environment variable                          • File ensemble.conf
typedef struct env_t {                         ce_jops_t jops; //endpoint                             setenv ENS_CONFIG_FILE <ensemble.conf>          # The set of communication transports.
    ce_appl_intf_t *intf;                      env_t *env;                                                                                            ENS_MODES=DEERING
    <your variables>                                                                                  • Makefile                                      #The port used for IP-multicast
}                                               /*Initialize Ensemble & process arg*/                 CC         = gcc                                ENS_DEERING_PORT=6793
/*Define 7 callbacks*/                          ce_Init( argc, argv);                                 CFLAGS =                                        # The user-id
void install(){} //new view installed           /*Create an interface/group*/                         ENSROOT =                                       ENS_ID= your_name
void exit(){} //the member leaves               intf = ce_create_flat_intf( env, 7 callbacks);            /users/mdstud/phuong/DSII/ensemble
void receive_cast(){} //multicast msg           env->intf = intf; //Keep the view                     LIB_DIR = $(ENSROOT)/lib/sparc-solaris          •     Login to Sun machines
void receive_send(){} //p2p msg                                                                       INCLUDE = -I$(ENSROOT)/lib/sparc-solaris     
                                                /*Create an endpoint to join*/
void flow_block(){} //flow-control              jops.hrtbt_rate=3;                                    .SUFFIXES: .c.o
                                                                                                      LDFLAGS = -lsocket -lnsl -lm                    •     Choose gcc-2.95
void block(){} // view change                   strcpy( jops.group_name, “demo”);
void heartbeat(){} //timeout                    strcpy(,                              CELIB      = $(LIB_DIR)/libce.a
                                                                                                      demo:      demo.c                               •     All necessary material (e.g. good
                                                CE_DEFAULT_PROPERTIES);                                                                                     tutorial) is in
/*Create your own input socket*/                jops.use_properties=1;                                    $(CC) -o demo $(INCLUDE) $(CFLAGS)                /users/mdstud/phuong/DSII/demo/ens
input() {                                       ce_Join( &jops, intf);                                    demo.c $(CELIB) $(LDFLAGS)                        emble
    …                                           /*Add your own input socket*/                         clean:
    ce_flat_cast(intf, …, msg);                 ce_AddSockRecv(0, input, env);                            $(RM) *.o                                   •
}                                               /*Pass control to Ensemble*/                          realclean:                                            semble/
                                                ce_Main_loop();                                           $(RM) demo

                                         DSII: Ensemble                                          27                                        DSII: Ensemble                                              28

         Lab 1: Create a BBS system                                                                       Lab 2: Construct a reliable and
                with Ensemble                                                                                   ordered broadcast
   • Using group communication in your                                                                  • Fixed number of machines.
     program.                                                                                           • Broadcast
   • One program.                                                                                          – A message which is received by any machine
                                                                                                             should also be received by all other machines.
   • Peer group structure.
                                                                                                        • Reliable
   • C, Java                                                                                               – Integrity, Validity, Agreement
                                                                                                        • Ordered
                                                                                                           – All machines should agree on the order of all
                                                                                                             received messages.

                                         DSII: Ensemble                                          29                                        DSII: Ensemble                                              30
• M. Hayden & O. Rodeh, Ensemble Tutorial,
  Hebrew university, 2003
• M. Hayden & O. Rodeh, Ensemble Reference
  Manual, Hebrew university, 2003
• M. G. Hayden, The Ensemble system, PhD
  dissertation, Cornell university, 1998
• O. Rodeh, The design and implementation of
  Lasis/E, Master thesis, Hebrew university, 1997
• …

                     DSII: Ensemble                 31

To top