On Replication

Document Sample
On Replication Powered By Docstoc
					On Replication

   Yin Chen
       July 2006
 • What is? Why need? Types?
 • Investigation of existing technologies
   –   IBM SQL replication
   –   Sybase replication
   –   Oracle replication
   –   MySQL replication
   –   Globus DRS
   –   EGEE RMS
   –   SRB
 • Our project
   – Goals
   – Solutions
   – Features
What is replication?
  • Copying of data & synchronization of
  • Is not Cashing
     – Client phenomenon
     – Only for improving response time
  • Is not a Backup (not automatically
    overwritten when the original data is modified )
  • Is not a replicated system
     – deal with when/where to copy
     – Optimization (how much replica needed …)
     – Grow or shrink replication tree
Why we need it?
  • Data consolidation (central audit & analyse)
  • Data distribution (for branch offices)
  • Performance
     – Access efficiency (moving data near apps.)
     – Load balance (distributing access load)
     – Security (data protection)
     – Availability (off-line access)
     – Reliability (disaster recovery, avoiding single
       point of failure)
  • Data Grid (to improve availability, response
    time, fault tolerance)
  • Digital Library (copying digital doc, index … )
Replication types
  • Synchronous Replication:
    What is: updating two storages at the same time; roll
      back if one fails
    Benefits: High availability/auto fail-over/minimal data loss
    Usages: Disaster recover
    Drawbacks: Network efficiency /scalability/cost/less
  • Asynchronous Replication:
    What is: changes are captured on the primary storage
      and immediately / timely propagated
    Benefits: low cost / scalability /flexibility
    Usages: load balance/off-line access/access efficiency
    Drawbacks: data lost / network bandwidth
      Existing technologies

     WebSphere Information Integrator V8.2
                                                      IBM Replication
     Supports multivendors DB
     Admin: create replication criteria  control table
IBM  Capture: use log/trigger to capture the changes temp table
     Apply: scheduled apply transactions accumulated target DB
     Alert Monitor: monitor and notify users
     Supports: after-image copy / before-image copy (can rollback)
    Allows subset/simple view/ complex joins & unions copy
     Asynchronous replication, allows specifying schedule
   Pioneer, Since 1993
                                                Sybase Replication
   “publish-and-subscribe” approach
   Replication Agent: runs on each publisher, detects changes base on logs
Sybase Replication
   Replication Server: apply changes to target DBs (use pre-configured
  intelligent routes)
   Replication Server Manager: GUI-based, manage/monitor P2P env.
   Stable Queues: temporary storage of data , ensure no data is lost
   Is advanced in providing high performance
 Multimaster Replication
        Multimaster Replication               Materialized View Replication
                                               Oracle Replications
    P2P structure
     Changes Replications every other site (synchronous/ asynchronous)
       Oracle are pushed to
     Conflicts may happen (Update conflict/Uniqueness conflict /Delete conflict )
Materialized View Replication
     One master site manages several non-master sites (keep one/partial copy)
     Updatable
     Refresh (fast refresh/ complete refresh/ force refresh)
 Hybrid Replication
                                                                   3. dual masters
       Basic replication
 1. simple master/slaver services, using a light weight Master-Slave model
                             2. one slave two masters
      The master writes updates to logs; the slave reads and executes the queries
     from the master’s logs
      the slave checks results on both sites, replication stops if query only succeeds
     on one site
      This simple structure can be combined arbitrarily to build complex
      In a slow network, it is difficult for a slave to catch up with the master –
     improved in 4.0 by adding relay logs
                           5. master ring                   6. master ring with slaves
4. dual master with slaves
     Have to lock or restart the master for initial snapshot copy
 MySQL Replications
    MySQL Replications
Globus DRS

  Existing technologies
              A client creates a request file (requested file
               name & target location) and sends to DRS
              The Replicator checks user’s credential, and
               query RLI to find the LRC that contain
               mappings for the requested file
              Also queries each remote LRC to get the
               physical file names, and selects a best one
              Then starts RFT to transfer files.
              Finally, registers the new replica to its LRC. The
               LRC will updates LRI to make replica visible

                    Globus DRS
Existing technologies

  Designed for large, read-only, file replicating among
  heterogeneous resources
  Implement File Catalogues
      Replica Location Service maps replica’s Grid Unique
      ID to physical location
      Local Replica Catalogues provides information of
      replicas for a single VO
     Replica Metadata Catalogue maps file’s logical name
     to Grid Unique ID
      LCG File Catalogue is used for performance issues

       EGEE RMS
Existing technologies

  DISPATCHER: monitors input port and dispatches requests to handler

                       Enables file searching by attributes
                               High Level
                       MCAT a database system storing metadata
                            Request Handler
                       one or more Master daemon processes having SRB
                       Agent running on them        Remote SRB

                      The dispatcher monitors incoming requests and
                     pass to HLRH (can retrieve metadata from
      Low Level Request Handler MCAT) or LLRH (can retrieve data from
File system drivers DBMS drivers
                    DB2 Oracle
Unitree HPSS
                    ObjectStore Illustra
                      supports synch/asynch replication, MCAT
Our Goals
 • Combining DB2 SQL Replication with OGSA-
   DAI technologies
 • Grid-enabling DB2 Replication to provide a grid
   service interface for managing replication.
 • Supporting more scalable, secure, high
   performance data access
 • Extend OGSA-DAI to provide more powerful
 • Explore metadata technologies
System architecture

                        GridFTP Transfer

           Metadata           Replication
           Catalogue           Control
  Data                         Service          Data
Resource                                       Replica

                       Relational Database
                       Replication Mechanism


            Replication Control Service

             Search                                                 Data
             Engine          Initiator    GridFTP Transfer        Resource

                                          Relational Database
                             Starter      Replication Mechanism
            Metadata                                                Target

 • Keeping the features of relational
 database replication
 • Adding Grid’s features
 • Using Grid service discovery mechanism
 • Supporting more replication scenarios
 • Introduction of replication
 • Introduction of existing technologies
   – Relational database replications are
     advanced in flexibility, offering solutions for
     frequent updating, update everywhere, data
   – Grid file replications are good at scalable,
     secure, and efficient file transferring
 • We studied both model and combine the
   two structures to gain benefits from both

Shared By: