Document Sample
SIGOPS-Sept02-final Powered By Docstoc
					    Rewind, Repair, Replay:
Three R’s to improve dependability

  Aaron Brown and David Patterson
           ROC Research Group
    University of California at Berkeley
 SIGOPS European Workshop, 23 September 2002
      What if computer systems
        could travel in time?
• We could have retroactive repair
  – travel back and fix problems before they had a
    chance to corrupt data
• We could eliminate human operator error
  – make a mistake? Just travel back and try it again.
• Our systems could be more robust
  – we could eliminate the dangers of upgrades
  – we could better tolerate buggy software
  – we might even be able to tolerate viruses and hackers

• We could make more dependable systems
                                                         Slide 2
   Sci-fi vs. computer time travel
• Sci-fi time travel             • Computer time travel
  – our hero loses a loved one     – human error, software
    or lives through disaster        bug, or attack causes
                                     data loss
  – hero uses time machine         – Rewind: roll system state
    to travel back in time           backwards in time
  – hero alters the past to        – Repair: make changes to
    avert the future disaster        avert foretold disaster
  – hero returns to the            – Replay: roll system state
    present; past changes            forward, merging the
    have been merged into            original timeline with the
    the original timeline            effects of repairs
• Three R’s are the fundamental primitives of
  computer time travel
                                                          Slide 3
      Key properties of the 3R’s
• Recovery from problems at any system layer
  – rewind, repair, replay cover OS through application
• Recovery from unanticipated problems
  – arbitrary repair
• No assumptions about correct application
  – physical rewind
• Integrated interface
  – provide ―undo for sysadmins‖

                                                     Slide 4
    What about existing approaches?
Approach              Rewind Repair Replay         Comments
snapshots,              
                      physical              read-only view of history
                                             application-level only;
                      physical              cannot alter committed
Workflow w/                                  limited apps; mechanisms
compensating             
                      phys/log              not usefully integrated
transactions                                 for time travel
                                             application-level only;
(PARC collaborative
                         (limited) 
                                             repair limited to well-
productivity apps)
                                             understood history edits
                                                                Slide 5
          Designing a 3R system
• Goals
  – application-neutrality
  – provide abstractions for reasoning about 3R behavior
• Target domain: network services
  – accessed by remote users via well-defined interfaces
  – email, messaging, e-commerce, auctions, forums, web
    hosting, enterprise applications (J2EE, .NET), ...
• Challenges, learned from first attempt
  – integrating history and repair during replay
  – managing inconsistency in externally-visible state

                                                         Slide 6
               Basic architecture
• Application-independent undo manager
  – coordinates 3R cycle; manages external inconsistencies
  – linked via a set of APIs to application, time-travel
    storage, history log, and control UI

        App. Service
        Includes:                        Undo     History
          - user state         3R API   Manager
          - application                             Log
          - operating system

         storage layer
                                                            Slide 7
Abstracting the application service
• To the undo manager, the application is:
  – a collection of state
  – a history of events affecting the state
     » an event is typically a user interaction with the service
  – a model of acceptable external consistency

• These are encoded into application-defined
  – high-level encodings of user interactions (events)
      » records of intent to alter state, not actual state changes
  – reference application state by opaque UIDs
  – provide policies that define external consistency
                                                           Slide 8
           Verbs and the 3R cycle
• Normal operation
  – undo manager logs application-provided verbs to disk


           App. Service
           Includes:              Verbs    Undo     History
             - user state                 Manager
             - application                            Log
             - operating system

               storage layer
                                                              Slide 9
        Verbs and the 3R cycle
• Rewind
  – time-travel storage layer reverts system hard state
    to rewind point
  – all changes since rewind point are discarded

        App. Service
        Includes:                Undo         History
          - user state          Manager
          - application                         Log
          - operating system

           storage layer
                                                        Slide 10
           Verbs and the 3R cycle
• Repair
  – operator edits logged history and/or makes arbitrary
    changes to system

     Repairs                        UI

        App. Service
           Includes:               Undo      History
             - user state         Manager
             - application                     Log
             - operating system

            storage layer
                                                       Slide 11
           Verbs and the 3R cycle
• Replay
  – undo manager feeds verbs back to application for re-
    execution in the context of repaired system


        App. Service
           Includes:                       Undo     History
             - user state                 Manager
             - application        Verbs               Log
             - operating system

            storage layer
                                                              Slide 12
  The fundamental roles of verbs
• Providing application-independence
  – verbs encapsulate application semantics, but remain
    semi-opaque to undo manager
• Integration of repair into history
  – high-level specification of intent makes verbs
    relatively independent of system changes
  – verbs are re-executed, not restored, so they inherit
    effects of repairs
• Scoping restored history
  – only changes logged as verbs will be preserved by 3Rs
     » effects of bugs, corruption, human error are discarded
  – can reason about what is preserved/lost in 3R cycle
                                                        Slide 13
  Managing external inconsistency
• External inconsistency == time paradox?
  – system is internally-consistent after a 3R cycle
  – but external observers see inexplicable state changes
  – external inconsistency is OK unless affected state
    was externalized (observed) before the 3R cycle

• Coping with external inconsistency
  – cannot eliminate
  – must manage: ignore, explain, compensate, encompass

• Verbs let us manage external inconsistency

                                                    Slide 14
 Managing inconsistency with verbs
• To detect inconsistencies:
  – verbs specify the state that they depend upon
  – undo manager tracks signatures of that state
  – if verb is altered or if signatures don’t match, there
    is an inconsistency
     » applications supporting relaxed consistency can replace
       signature-check with arbitrary consistency predicates
• To detect state viewed externally:
  – verbs indicate what state they externalize
     » example: IMAP fetch verb externalizes email message
• To handle externalized inconsistencies:
  – verb supplies compensation functions
                                                         Slide 15
     Email example: original timeline
                        Hello                                                 olleH
                          m                                                     m

System                                   olleH

                       DeliverMsg                MoveMsg                FetchMsg
                     Externalizes: —         Externalizes: —         Externalizes: m
                     ContentDep: —           ContentDep: —           ContentDep: m
                     ExistsDep: Inbox        ExistsDep: Inbox,       ExistsDep: m, Folder1

log                    + input ―Hello‖                               + Signature(m)=―olleH‖

                                                                                      Slide 16
        Email example: replay timeline
                        Hello                                                 olleH
                          m                                                     m
System                                   olleH
             Folder1                                            Hello
                                                                          => inconsistency
                       DeliverMsg                MoveMsg                FetchMsg
                     Externalizes: —         Externalizes: —         Externalizes: m
                     ContentDep: —           ContentDep: —           ContentDep: m
                     ExistsDep: Inbox        ExistsDep: Inbox,       ExistsDep: m, Folder1

log                    + input ―Hello‖                               + Signature(m)=―olleH‖

                                                                                      Slide 17
         Recap: 3R architecture
• Goal: application-neutral implementation of 3R’s
  – verb abstraction couples generic undo manager to app.
  – verbs provide tools to reason about 3R behavior

• Challenges
  – integrating history and repair during replay
     » re-executing verbs restores intent of history
  – managing inconsistency in externally-visible state
     » verbs track externalization, state dependencies, and
       define compensations

                                                         Slide 18
• Prototype implementation of 3R primitives
  nearly complete
  –   app-independent undo manager written in Java
  –   all APIs defined as Java interfaces
  –   Network Appliance filer as time-travel storage layer
  –   BerkeleyDB as history log
• First target app: web-based email service
  – 3R-enhanced JavaMail API provider classes
       » plus additional hooks to verb-ify operator maintenance
         tasks like account creation
  – JWebMail web front-end
  – RDBMS-based backend mail store (DB2 or MySQL)
  – implementation in progress
                                                           Slide 19
      Open issues & future work
• Resource impact of the 3R’s
  – what are the performance/space penalties for the 3R’s?
• Verb definition
  – can we specify verbs & consistency policy declaratively?
• Providing the 3R’s at multiple granularities
  – can we track & manage cross-granularity dependencies?
• Measuring the dependability benefit of 3R’s
  – how do we build recovery/dependability benchmarks?
• Other uses for verb-based characterizations
  – easy georeplication? online self-checking? automatic
    verification of upgrades?
                                                      Slide 20
• We can build time travel for computers
  – using the 3R’s: Rewind, Repair, Replay
• An architecture for the 3R primitives
  – generic undo manager coupled to application by verbs
• Verbs are a useful abstraction for the 3R’s
  – can use to reason about effects of 3R’s on state
  – help address problem of external inconsistencies
• Prototype 3R-enabled email system under
  – hope to demonstrate increased dependability and
    faster recovery from problems

                                                       Slide 21
    Rewind, Repair, Replay:
Three R’s to improve dependability

       For more information:
Backup slides

                Slide 23
          Verbs vs. transactions
• Both encapsulate state-altering events
• But, unlike transactions:
  – verbs are higher-level, recording end-user intent, not
    specific state changes
  – verbs do not depend on internal data models (but do
    depend on external protocols)
     » transactions are the reverse
  – verbs do not necessarily conform to ACID consistency
     » verbs inherit consistency model provided by application
       at the external-protocol level

                                                         Slide 24
            Implementing verbs
• Verbs are defined by a type hierarchy
  – base type defines interfaces for state dependencies,
    externalizations, predicates, compensations
  – applications subclass the base type for their verbs
     » additions to the type are opaque to the undo manager
• Referencing state
  – all user-visible state named by time-invariant UIDs
  – undo manager requires signature method for all state
• Consistency predicates and compensations are
  application-supplied functions
  – they encode the app’s external consistency model

                                                        Slide 25
                Defining verbs
• Currently, verbs are defined procedurally
  – provide dependency information via lists of state IDs
  – provide functions for special consistency predicates
  – provide functions for compensation

• Better: declarative specification
  – compile textual specification into verb code using
    libraries of predicates and compensation fns
  – reduces complexity of adding 3R’s to the application
  – increases confidence in undo system via easier testing

                                                     Slide 26
     External consistency policies
• Verbs capture external consistency policies
• Example: email
  – message order in folder is irrelevant
     » AppendMessage verb does not express dependency on
       content of target folder, only its existence
  – content of messages is relevant, except for headers
     » ReadMsg verb depends on hash of target message body;
       if changed, compensate by inserting explanatory text
• Example: e-commerce
  – order total depends on item prices, not descriptions
     » Checkout verb depends on prices of items in cart, not
       their hash-values; if sum of prices changed, compensate
       by emailing customer for approval
                                                        Slide 27
  External consistency policies (2)
• Example: auctions
  – new bid must be larger than prior bids
     » PlaceBid verb depends on content of all bids in bid set;
       if one is now larger than new bid, compensate by
       canceling new bid and informing bidder

                                                           Slide 28
         Application implications
• To support the 3R’s, an application must have:
  – a high-level, verb-structured interface/API for user,
    operator, and external actions
  – a state model where all user-visible state:
     » is nameable via the API
     » is tagged with GUIDs
     » supports a signature/hash method
  – a relaxed external consistency model that allows
    compensation for externalized inconsistent verbs

                                                       Slide 29
        Example: a 3R email store
                                        IMAP, internal              HTTP
        Transport   internal    Store                     LDAP, internal
             UndoMgr                         Auth.
• State                        verbs

  – mailstores, folders, messages, user properties, aliases
• Verbs
  – transport: create/delete/alter mapping; deliver msg
  – directory: create/alter/delete user-entry;
    create/alter/delete filter-rule; add/remove maildrop
  – store: create/delete store; create/rename/delete
    folder; expunge folder; list folder; set folder flags;
    copy msg; append msg; fetch msg; set msg flags Slide 30

Shared By: