Docstoc

Title is Verdana 36pt Bold.ppt

Document Sample
Title is Verdana 36pt Bold.ppt Powered By Docstoc
					Live Data Center Migration across WANs:
A Robust Cooperative Context Aware Approach

 Kobus Van der Merwe
 with
 K.K. Ramakrishnan and Prashant Shenoy
                                                         Motivation

• Most network based services/applications involve components
  hosted in data centers
   • Internet:

          –   Mail/Web servers, VoIP, IPTV, P2P directory services etc
     •   VPNs:
          –   Mail servers, financial/business applications etc
• Many of these services require 24x7 availability
   • Any downtime is unacceptable

          –   At best inconvenience users; at worst major business impact;
              typically has financial implications
                • Recent well published outages: Blackberry, Skype

• Objective our work:
   • Business continuity in face of data center outages, both
     planned (planned maintenance) or unplanned (disaster
     recovery)


Page 2
                                                  Motivation cont.
Existing solutions to deal with outages are inadequate:
• Local redundancy solutions

  – Component redundancy (hot-swappable), multiple network
    connections
         •   No protection against data center outages
•   Existing cross data center solutions
    – Instance replication
         •   Same content/service available in multiple locations
         •   Works well for stateless services (e.g., Web servers)
             – Not for any statefull applications

    –    Remote replication (either synchronous or asynchronous)
         •   Partial solutions
             – Typically only deals with storage
             – Not seamless; involves server downtime, IP addresses
               change etc




Page 3
                                                      Our approach
•   Basic approach:
    – Seamless live service migration across WANs
         •   Including all components: server, data, network
•   Cooperative, migration aware approach
    – Migration manager orchestrates migration across all three
      subsystems
•   In summary:
    – Planned outages
         •   Migration of both server and data
             – Live server migration
             – Performed once
         •   Atomic switchover of network to complete migration
    –    Unplanned outages
         •   “Continuous live migration”
             – Server and data continuously replicated to remote site
             – On failure, atomic switchover of network




Page 4
                                                            Challenges

 Seamless live server migration across WAN
LAN based live server migration:
 The image of live server migration enabled by new physical
–• LAN basedrunning virtual server is copied to a virtual server
platform (while the server is still running on the old platform)
   technologies (Xen, Vmware)
–Server state is synchronized between the two images
 • WAN based server migration
–Migration software switches over to the new server with minimal
   – Use existing virtual server
downtime (tens of milliseconds)migration
       “Management” connectivity as the old server (same IP address,
–New• server is exactly the sameto remote site to enable image migration
   – Network stays intact etc)
network state support to allow IP address to migrate with the (virtual)
–Storage handled by through network attached storage (NAS), e.g.,
     server
NFS Migrate storage to remote site
   –
         •   Server and storage remain consistent
•   Continuous live migration




Page 5
                                                    Networking Support

•   IP address migration:
    – Challenging to move IP addresses in current Internet
         •   Especially dynamically
         •   Isolate impact on the rest of the network
    – Routing protocols don’t change instantly
    – Connectivity changes not under data center control

•   Our approach:
    – Allow migration management system to initiate network
      connectivity change
         •   Network provides API to migration manager
    –    Time critical changes are kept local
         •   Network-wide (routing protocol) changes not time critical
    –    Use temporary tunnels to deal with mobility




Page 6
                                                         IP Migration Primitive

                                            Migration
                             PSa            Software                                          Physical Server (PS)
                                                                 PSb
            Data Center       VSa
                 A                             Data Center         VSa
                                                    B                                      Virtual Server (VS)


                     PEa                               PEb


                       PEc                         PEd

Goal: Migrate Virtual Server “a” (VSa) with IP address IPa from Physical Server “a” (PSa) in data center “A”
       (DCa) to Physical Server “b” (PSb) in data center “B” (DCb)

Network part of migration
1. Migration software signals to “network” that IPa will (soon) migrate from PEa to PEb
2. “Network” creates a tunnel between PEa and PEb

3. Server migration executed between PSa and PSb

4. Migration software signals to “network” that switchover should take place

5. PEa switches all traffic towards IPa to tunnel between PEa and PEb which delivers the traffic to VSa in
        PSb. (Return traffic does not need to go through tunnel.)

  Page 7
                                                         IP Migration Primitive

                            PSa                                                                Physical Server (PS)
                                                                  PSb
                             VSa
          Data Center                         Data Center          VSa
               A                                   B                                        Virtual Server (VS)


                    PEa                                PEb


                      PEc                          PEd

After first five steps, server migration is done as far as migration software is concerned. Traffic towards IPa is
         “dog-legged” through PEa, so a few more steps remain in the network:



   1. PEb starts to advertise a route to IPa with high local preference. So at this point there are two valid paths
           towards IPa, one though PEa and the tunnel and another directly through PEb. As routers start to
           learn about the newly advertised path they will prefer the direct path towards IPa and the tunnel will
           “dry out”.
                                   IP Migration Primitive:
   2. When PEa detects no more traffic flowing through the tunnel it withdraws the route for IPa (if it had a
    Takes care of planned maintenance without storage needs
          specific route for IPa) and tears down the tunnel.

                             (E.g., VoIP network element)
 Page 8
                                                                      Data Storage
             Synchronous
        Local              Remote
                                                                Asynchronous
                                                            Local               Remote




•   Existing WAN solutions: remote replication
    –    Maintain a primary/local and remote storage system
    –    Replicate data between primary and remote systems
    –    One of two modes:
         •   Synchronous: each write performed locally and remotely before return to
             “application”
             – Local and remote remains synchronized
             – Poor performance: both throughput and application latency
         •   Asynchronous: local and remote allowed to diverge, replicate a consistent
             “snapshot”
             – Good performance (high throughput, low (local) latency
             – Potential data loss because of divergence


Page 9
                                 Migration Aware Replication




                                                   Asynchronous
•   Our approach:                                                 Local   Remote
    – Remote replication that can
      seamlessly move between
      synchronous and asynchronous
      replication                                Switch
    – Allow replication mode to be
      controlled by migration




                                                    Synchronous
      management system:
       •   Allow bulk of data to be replicated
           asynchronously
       •   Switch to synchronous when
           needed
           – Final part of server migration
             process
              IP Migration Primitive + Migration Aware Replication:
           Takes care of planned maintenance with storage needs


    Page 10
                                                    Unplanned Outages
•   Conflicting metrics of concern
    – Recovery point objective (RPO)
          •   How much data loss is acceptable?
    –     Recovery time objective (RTO)
          •   How long can service be down?
    – Cost (overhead of protection)
•   Range of meaning to “unplanned”
    – Catastrophic instantaneous failure
          •   No notice whatsoever
    –     But also imminent failure scenarios
          •   Imminent equipments failure (e.g., increase in disk errors; imminent
              failure of fiber)
          •   Developing natural/man-made disasters
              – E.g., flooding/steam pipe burst in NY, probably even with 911
              – Minutes to hours to react

•   Existing remote replication solutions deal with storage
    – No support for server migration

•   Our goal:
    – Replicate data and server to allow for seamless failover


Page 11
                                Application state requirements
•Limited application state:
  – E.g., VoIP network element that maintains call state (for 3-way calling and
     mid-call events), or VoD servers (for fast-forward, random access events)
  – Lost session state => application impact
          •   Inconvenience
    –     RTO small, RPO medium
          •   Some state loss is tolerable (drop few calls), but service has to stay up
    –     Instrument application to initiate partial migration, when new state has been
          created
•   Statefull applications:
    –     E.g., e-commerce applications (shopping cart, auction sites)
    –     Lost session state => application impact
          •   (At best) inconvenience, (at worst)application correctness, monetary impact
    –     RTO small, RPO small (minimize state loss, site has to stay up)
    –     Continuous (incremental) server migration
•   High integrity applications
    –     E.g., financial transactions, other data base applications
    –     RTO medium, RPO very small (absolutely no data loss, rather some downtime)
    –     Reduce RTO with continuous (incremental) server migration


Page 12
                                   Continuous Server Migration
                                          Enabling Technology: VS record/replay

                 Start Recording      Snapshot of VS
                         PS
     RECORD

                    VS




                              Record execution state




     Restore snapshot
REPLAY




                                    Replay execution state




 •   Virtual server record/replay: available from VMware
     –   Efficient recording: track “external” events + times
     –   Synchronize events with VM state during replay
     –   Developed as a debugging tool
 Page 13
                                  Continuous Server Migration
                                                  With migration aware replication
         Local: RECORD
                                      REPLICATE
                                                            Remote: REPLAY




•    Asynchronously replicate initial snapshot
•    Replication of execution state
    IP Migration Primitive + Migration Aware Replication + Continuous Server Migration:
     –   Asynchronous if application can tolerate some state loss and execution state
                             Takes care of unplanned outages
         represent consistent checkpoint
     –   Synchronous otherwise


Page 14
Status

•   Migration aware replication
    – Key building blocks prototyped
          •   “Semantic Aware Replication” project
          •   Gal Niv (UMass)
•   WAN live migration
    – Key building blocks prototyped (without storage)
          •   “Live virtual router migration” project
          •   Yi Wang (Princeton)
•   Continuous Server Migration
    – Just getting off the ground


•   Work in progress
    – Many open issues remain!


Page 15

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:5/16/2012
language:English
pages:15
shensengvf shensengvf http://
About