Database replication is the creation and maintenance of multiple copies of the same database. In most implementations of database replication, one database server maintains the master copy of the database and additional database servers maintain slave copies of the database. Database writes are sent to the master database server and are then replicated by the slave database servers. Database reads are divided among all of the database servers, which results in a large performance advantage due to load sharing. In addition, database replication can also improve availability because the slave database servers can be configured to take over the master role if the master database server becomes unavailable. Replication is the process of sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility. It could be data replication if the same data is stored on multiple storage devices, or computation replication if the same computing task is executed many times. A computational task is typically replicated in space, i.e. executed on separate devices, or it could be replicated in time, if it is executed repeatedly on a single device. The access to a replicated entity is typically uniform with access to a single, non-replicated entity. The replication itself should be transparent to an external user. Also, in a failure scenario, a failover of replicas is hidden as much as possible. It is common to talk about active and passive replication in systems that replicate data or services. Active replication is performed by processing the same request at every replica. In passive replication, each single request is processed on a single replica and then its state is transferred to the other replicas. If at any time one master replica is designated to process all the requests, then we are talking about the primary-backup scheme (master-slave scheme) predominant in high-availability clusters. On the other side, if any replica processes a request and then distributes a new state, then this is a multi-primary scheme (called multi-master in database field). In the multi-primary scheme, some form of distributed concurrency control must be used, such as distributed lock manager. Load balancing is different from task replication, since it distributes a load of different (not the same) computations across machines, and allows a single computation to be dropped in case of failure. Load balancing, however, sometimes uses data replication (esp. multi-master) internally, to distribute its data among machines. Backup is different from replication, since it saves a copy of data unchanged for a long period of time. Replicas on the other hand are frequently updated and quickly lose any historical state. Replication in distributed systems Replication is one of the oldest and most important topics in the overall area of distributed systems. Whether one replicates data or computation, the objective is to have some group of processes that handle incoming events. If we replicate data, these processes are passive and operate only to maintain the stored data, reply to read requests, and apply updates. When we replicate computation, the usual goal is to provide fault-tolerance. For example, a replicated service might be used to control a telephone switch, with the objective of ensuring that even if the primary controller fails, the backup can take over its functions. But the underlying needs are the same in both cases: by ensuring that the replicas see the same events in equivalent orders, they stay in consistent states and hence any replica can respond to queries.  Replication models in distributed systems A number of widely cited models exist for data replication, each having its own properties and performance: 1. Transactional replication. This is the model for replicating transactional data, for example a database or some other form of transactional storage structure. The one-copy serializability model is employed in this case, which defines legal outcomes of a transaction on replicated data in accordance with the overall ACID properties that transactional systems seek to guarantee. 2. State machine replication. This model assumes that replicated process is a deterministic finite state machine and that atomic broadcast of every event is possible. It is based on a distributed computing problem called distributed consensus and has a great deal in common with the transactional replication model. This is sometimes mistakenly used as synonym of active replication. 3. Virtual synchrony. This computational model is used when a group of processes cooperate to replicate in-memory data or to coordinate actions. The model defines a new distributed entity called a process group. A process can join a group, which is much like opening a file: the process is added to the group, but is also provided with a checkpoint containing the current state of the data replicated by group members. Processes can then send events (multicasts) to the group and will see incoming events in the identical order, even if events are sent concurrently. Membership changes are handled as a special kind of platform-generated event that delivers a new membership view to the processes in the group. Levels of performance vary widely depending on the model selected. Transactional replication is slowest, at least when one-copy serializability guarantees are desired (better performance can be obtained when a database uses log-based replication, but at the cost of possible inconsistencies if a failure causes part of the log to be lost). Virtual synchrony is the fastest of the three models, but the handling of failures is less rigorous than in the transactional model. State machine replication lies somewhere in between; the model is faster than transactions, but much slower than virtual synchrony. The virtual synchrony model is popular in part because it allows the developer to use either active or passive replication. In contrast, state machine replication and transactional replication are highly constraining and are often embedded into products at layers where end-users would not be able to access them.  Database replication Database replication can be used on many database management systems, usually with a master/slave relationship between the original and the copies. The master logs the updates, which then ripple through to the slaves. The slave outputs a message stating that it has received the update successfully, thus allowing the sending (and potentially re-sending until successfully applied) of subsequent updates. Multi-master replication, where updates can be submitted to any database node, and then ripple through to other servers, is often desired, but introduces substantially increased costs and complexity which may make it impractical in some situations. The most common challenge that exists in multi-master replication is transactional conflict prevention or resolution. Most synchronous or eager replication solutions do conflict prevention, while asynchronous solutions have to do conflict resolution. For instance, if a record is changed on two nodes simultaneously, an eager replication system would detect the conflict before confirming the commit and abort one of the transactions. A lazy replication system would allow both transactions to commit and run a conflict resolution during resynchronization. The resolution of such a conflict may be based on a timestamp of the transaction, on the hierarchy of the origin nodes or on much more complex logic, which decides consistently on all nodes. Database replication becomes difficult when it scales up. Usually, the scale up goes with two dimensions, horizontal and vertical: horizontal scale up has more data replicas, vertical scale up has data replicas located further away in distance. Problems raised by horizontal scale up can be alleviated by a multi-layer multi-view access protocol. Vertical scale up runs into less trouble when the internet reliability and performance are improving.  Disk storage replication Active (real-time) storage replication is usually implemented by distributing updates of a block device to several physical hard disks. This way, any file system supported by the operating system can be replicated without modification, as the file system code works on a level above the block device driver layer. It is implemented either in hardware (in a disk array controller) or in software (in a device driver). The most basic method is disk mirroring, typical for locally-connected disks. Notably, the storage industry narrows the definitions, so mirroring is a local (short-distance) operation. A replication is extendable across a computer network, so the disks can be located in physically distant locations. The purpose is to avoid damage done by, and improve availability in case of local failures or disasters. Typically the above master-slave theoretical replication model is applied. The main characteristic of such solutions is handling write operations: Synchronous replication - guarantees "zero data loss" by the means of atomic write operation, i.e. write either completes on both sides or not at all. Write is not considered complete until acknowledgement by both local and remote storage. Most applications wait for a write transaction to complete before proceeding with further work, hence overall performance decreases considerably. Inherently, performance drops proportionally to distance, as latency is caused by speed of light. For 10 km distance, the fastest possible roundtrip takes 67 μs, whereas nowadays a whole local cached write completes in about 10-20 μs. An often-overlooked aspect of synchronous replication is the fact, that failure of either remote replica or even just the interconnection stops by definition any and all writes (freezing the storage system). This is the behaviour that guarantees zero data loss. However, many commercial systems at such potentially dangerous point do not freeze, but just proceed with local writes, losing the desired zero recovery point objective. Asynchronous replication - write is considered complete as soon as local storage acknowledges it. Remote storage is updated, but probably with a small lag. Performance is greatly increased, but in case of losing a local storage, the remote storage is not guaranteed to have the current copy of data and most recent data may be lost. Semi-synchronous replication - introduced by some vendors, probably as a sales pitch; exact meaning varies. Point-in-time replication - introduces periodic snapshots that are replicated instead of primary storage. Most important implementations: DRBD module for Linux. EMC SRDF IBM PPRC and Global Mirror Hitachi TrueCopy Symantec Veritas Volume Replicator (VVR)  Distributed shared memory replication Another example of using replication appears in distributed shared memory systems, where it may happen that many nodes of the system share the same page of the memory - which usually means, that each node has a separate copy (replica) of this page.  Primary-backup and multi-primary replication Many classical approaches to replication are based on a primary/backup model where one device or process has unilateral control over one or more other processes or devices. For example, the primary might perform some computation, streaming a log of updates to a backup (standby) process, which can then take over if the primary fails. This approach the most common one for replicating databases, despite the risk that if a portion of the log is lost during a failure, the backup might not be in a state identical to the one the primary was in, and transactions could then be lost. A weakness of primary/backup schemes is that in settings where both processes could have been active, only one is actually performing operations. We're gaining fault-tolerance but spending twice as much money to get this property. For this reason, starting in the period around 1985, the distributed systems research community began to explore alternative methods of replicating data. An outgrowth of this work was the emergence of schemes in which a group of replicas could cooperate, with each process backup up the others, and each handling some share of the workload. Jim Gray, a towering figure within the database community, analyzed multi-primary replication schemes under the transactional model and ultimately published a widely cited paper skeptical of the approach ("The Dangers of Replication and a Solution"). In a nutshell, he argued that unless data splits in some natural way so that the database can be treated as n disjoint sub-databases, concurrency control conflicts will result in seriously degraded performance and the group of replicas will probably slow down as a function of n. Indeed, he suggests that the most common approaches are likely to result in degradation that scales as O(n³). His solution, which is to partition the data, is only viable in situations where data actually has a natural partitioning key. The situation is not always so bleak. For example, in the 1985-1987 period, the virtual synchrony model was proposed and emerged as a widely adopted standard (it was used in the Isis Toolkit, Horus, Transis, Ensemble, Totem, Spread, C-Ensemble, Phoenix and Quicksilver systems, and is the basis for the CORBA fault-tolerant computing standard; the model is also used in IBM Websphere to replicate business logic and in Microsoft's Windows Server 2008 enterprise clustering technology). Virtual synchrony permits a multi-primary approach in which a group of processes cooperate to parallelize some aspects of request processing. The scheme can only be used for some forms of in-memory data, but when feasible, provides linear speedups in the size of the group. A number of modern products support similar schemes. For example, the Spread Toolkit supports this same virtual synchrony model and can be used to implement a multi-primary replication scheme; it would also be possible to use C-Ensemble or Quicksilver in this manner. WANdisco permits active replication where every node on a network is an exact copy or replica and hence every node on the network is active at at one time; this scheme is optimized for use in a wide area network. Optimistic replication From Wikipedia, the free encyclopedia Jump to: navigation, search Optimistic replication (also known as lazy replication) is a strategy for replication in which replicas are allowed to diverge. Traditional pessimistic replication systems are based on the principle of single-copy consistency. that is, users should observe the system to behave as if there was only one copy of the data. Optimistic replication does away with this in favor of eventual consistency, meaning that replicas are guaranteed to converge only when a system is idle.  Algorithms An optimistic replication algorithm consists of five elements: 1. Operation submission: Users submit operations at independent sites. 2. Propagation: Each site shares the operations it knows about with the rest of the system. 3. Scheduling: Each site decides on an order for the operations it knows about. 4. Conflict resolution: If there are any conflicts among the operations a site has scheduled, it must modify them in some way. 5. Commitment: The sites agree on a final schedule and conflict resolution result, and the operations are made permanent. There are two strategies for propagation: state transfer, where sites propagate a representation of the current state, and operation transfer, where sites propagate the operations that were performed (essentially, a list of instructions on how to reach the new state). Scheduling and conflict resolution can either be syntactic or semantic. Syntactic systems rely on general information, such as when or where an operation was submitted. Semantic systems are able to make use of application-specific information to make smarter decisions. Note that state transfer systems generally have no information about the semantics of the data being transferred, and so they have to use syntactic scheduling and conflict resolution.  Examples One well-known example of a system based on optimistic replication is the CVS version control system, or any other version control system which uses the copy-modify-merge paradigm. CVS covers each of the five elements: 1. Operation submission: Users edit local versions of files. 2. Propagation: Users manually pull updates from a central server, or push changes out once the user feels they are ready. 3. Scheduling: Operations are scheduled in the order that they are received by the central server. 4. Conflict resolution: When a user pushes to or pulls from the central repository, any conflicts will be flagged for that user to fix manually. 5. Commitment: Once the central server accepts the changes which a user pushes, they are permanently committed. A special case of replication is synchronization, where there are only two replicas. For example, personal digital assistants (PDAs) allow users to edit data either on the PDA or a computer, and then to merge these two datasets together. Note, however, that replication is a broader problem than synchronization, since there may be more than two replicas. May 18, 2004 Database Replication in MySQL By Ian Gilfillan An introduction to replication Recently, while having the knots pounded out of my body during a particularly painful shiatsu lesson, I reflected on what put them there in the first place. Yes, 'the database' was once more to blame. A busy database I work with saw one of its tables jump from 3GB to 7GB overnight, as we imported archive data. As expected, this had some performance impact. Unfortunately, I had not expected quite the knock, and it turned out that this single database server could no longer handle the load. No matter how much more I tried to optimize the queries, tweak the variables or bump up the query cache, it was not enough. The machine could not take any more memory, and a hardware upgrade would do little good (at least with the kind of budget I have to play with). However, MySQL does not claim to be enterprise-ready for nothing, and Yahoo and other high-volume users of MySQL certainly do not run on one database server. There are a number of techniques to handle high volumes, one of which I will introduce this month - MySQL replication (I will look at others in future articles). Replication allows you to take one database, make an exact copy of it on another server, and set one of them (the slave) to take all its updates from the other (the master). The slave reads the master's binary logs, which store all statements that change a database, and repeats these on its database, keeping the two in exact sync. Since a replicating database simply repeats statements, the databases are not necessarily exactly in sync, and advanced users can take advantage of this. That is a topic for another article however, and we will look at simple replication this month - getting one database to be an exact copy of another one. What replication is not Replication is not a backup policy. A mistyped DELETE statement will be replicated on the slave too, and you could end up with two, perfectly synchronized, empty databases. Replication can help protect against hardware failure though. Replication is not an answer to all performance problems. Although updates on the slave are more optimized than if you ran the updates normally, if you use MyISAM tables, table- locking will still occur, and databases under high-load could still struggle. Replication is not a guarantee that the slave will be in sync with the master at any one point in time. Even assuming the connection is always up, a busy slave may not yet have caught up with the master, so you can't simply interchange SELECT queries across master and slave servers. How to start replicating - the master server Grant the slave permission to replicate with the REPLICATION SLAVE privilege, for example as follows: GRANT REPLICATION SLAVE ON *.* TO slave_user IDENTIFIED BY 'slave_password' If the master is not using the binary update log, add the following lines to the my.cnf or my.ini configuration file, and restart the server: log-bin server-id=1 By convention, the master is usually server-id 1, and any slaves from 2 onwards, though you can change this if you wish. If the master is already using the binary update log, either take note of the offset at the moment of the backup (the next step), or use the RESET MASTER statement to clear all binary logs and immediately begin the backup. You may want to make a copy of the binary logs before doing this, in case you need to use the binary logs to restore from backup. Make a backup of the database. You will use this to start the slave server. Note the comments about the binary log above. You can also skip this step if you use the LOAD DATA FROM MASTER statement, but see the comments about locking the master below first. Add the following to the configuration file on the slave: master-host=master-hostname master-user=slave-user master-password=slave-password server-id=2 The slave user and slave password are those to which you set when you granted REPLICATION SLAVE permission on the master. The server-id must be a unique number, different to the master or any other slaves in the system. There are also two other options: master-port, used if the master is running on a non-standard port (3306 is default), and master-connect-retry, a time in seconds for the slave to attempt to reconnect if the master goes down. 60 seconds is default. Restore the data from the master, either as you would normally restore a backup, or with the statement LOAD DATA FROM MASTER. The latter will lock the master for the duration of the operation, which could be quite lengthy, so you may not be able to spare the downtime. Replication in action Once the slave has started, replication should begin. Besides the obvious SELECT queries, you can make sure this is working correctly with the following statements: mysql> SHOW SLAVE STATUS\G *************************** 1. row *************************** Master_Host: masterhostname.co.za Master_User: slave_server Master_Port: 3306 Connect_retry: 60 Master_Log_File: master-bin.054 Read_Master_Log_Pos: 16664104 Relay_Log_File: slave-relay-bin.045 Relay_Log_Pos: 17657643 Relay_Master_Log_File: master-bin.054 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_do_db: vne Replicate_ignore_db: Last_errno: 0 Last_error: Skip_counter: 0 Exec_master_log_pos: 16664104 Relay_log_space: 17657643 This is a mature slave that has been running a while. The master is already on the 54th binary log. You can see if the slave is running correctly by looking at the Slave_IO_Running and Slave_SQL_Running. The most important field is the Last_error field. mysql> SHOW MASTER STATUS; +----------------+----------+--------------+------------------+ | File | Position | Binlog_do_db | Binlog_ignore_db | +----------------+----------+--------------+------------------+ | master-bin.054 | 16664104 | | | +----------------+----------+--------------+------------------+ The above is from a master that has been running a while. It is already on binlog 54. Starting to replicate from a particular point in the binary logs If you need to force the slave to begin at a certain point, usually when the master has been running with an active binary log, you can do so as follows. The following starts with the 3rd binary log, as position 420. You can find the position using mysqlbinlog. mysql> CHANGE MASTER TO MASTER_LOG_FILE='master-bin.003' MASTER_LOG_POS=420; Query OK, 0 rows affected (0.00 sec) mysql> SLAVE START The SLAVE START and SLAVE STOP commands are used to manually stop and start the slave. The slave will also always stop if it comes across an error while replicating. Removing old binary logs On active databases, the binary logs tend to grow quite quickly. You may have used RESET MASTER in the past to clear them, but you cannot do this to the master while replicating! The statement to use is PURGE MASTER LOGS. First, make sure that all slaves have replicated to at least the log beyond which you want to remove. For example, in our earlier example, both the slave and the master are on log master-bin.054, so we can safely remove master-log.053 and before, as follows: mysql> PURGE MASTER LOGS TO 'master-bin.053'; MySQL will not allow you to remove a log that the master is still using though. Replicating specific databases only As mentioned earlier, the master and slave database server do not need to be entirely in sync, with all databases and all tables from the master replicated onto the slave. By default, the slave will replicate everything, but you can change this behavior with the following options in the slave configuration file: replicate-do-db=db_name (replicate this database) replicate-ignore-db=db_name (don't replicate this database) replicate-do-table=db_name.table_name (replicate this table) replicate-ignore-table=db_name.table_name (don't replicate this table) replicate--wild-do-table=db_name.table_name (allows wildcards, e.g db% would be all databases beginning with db) replicate-wild-ignore-table=db_name.table_name (ignore all specified tables, with wildcards) These options can all be used multiple times in a single configuration. A couple of other useful options: replicate-rewrite-db=master_db->slave_db (allows you to use map databases that use different database names on each server) log-slave-update (writes replicated statements to the slaves binary logs) A few complexities Before shutting down a slave server (mysqladmin-shutdown, not STOP SLAVE, make sure it has no temporary tables open (these may be needed for a statement to be replicated). You can do this with SHOW STATUS to see the value of Slave_open_temp_tables. This annoying feature should be fixed soon, so please check the latest documentation. The USER(), UUID(), LOAD_FILE() and CONNECTION_ID() (before MySQL 4.1.1) functions do not work reliably on the slave (they are replicated without changes). Before MySQL 4.1.1, FLUSH, ANALYZE, OPTIMIZE, and REPAIR statements are not replicated. This means that if you change permissions on the master by editing the tables directly, you will need to manually FLUSH PRIVILEGES on the slave too. Make sure the slave and the master are using the same character set. Multi-master replication is a method of replication employed by databases to transfer data or changes to data across multiple computers within a group. Multi-master replication can be contrasted with a master-slave method (also known as single-master replication). The term Multi-master can also be applied to systems in general where a single piece of information can be updated by one of several systems. That is, no one system can be said to own the information and be able to control it consistency and accuracy. Benefits of Multi-master replication : If one master fails, other masters will continue to update the database. Masters can be located in several physical sites i.e. distributed across the network. Disadvantages of Multi-master replication : Most multi-master replication systems are only loosely consistent, i.e. lazy and asynchronous, violating ACID properties. Eager replication systems are complex and introduce some communication latency. Issues such as conflict resolution can become intractable as the number of nodes involved rises and the required latency decreases Methods  Log-Based A database transaction log is referenced to capture changes made to the database. For log-based transaction capturing, database changes can only be distributed asynchronously.  Trigger-Based Triggers at the subscriber capture changes made to the database and submit them to the publisher. With trigger-based transaction capturing, database changes can be distributed either synchronously or asynchronously.  Implementations Many directory servers based on LDAP implement multi-master replication. OpenDS implements "synchronization" instead.  Active Directory For example, one of the more prevalent of multi-master replication implementations in directory servers is Microsoft's Active Directory. Within Active Directory, objects that are updated on one Domain Controller are then replicated to other domain controllers through multi-master replication. It is not required for all domain controllers to replicate with each other's domain controller as this would cause excessive network traffic in large Active Directory implementations. Instead, domain controllers have a complex update pattern that ensures that all servers are updated in a timely fashion without excessive replication traffic. Some Active Directory needs are better served by Flexible single master operation.  Oracle Oracle database clusters implement multi-master replication using one of two methods. Asynchronous multi-master replication commits data changes to a deferred transaction queue which is periodically processed on all databases in the cluster. Synchronous multi-master replication uses Oracle's two phase commit functionality to ensure that all databases with the cluster have a consistent dataset.  MySQL MySQL ships with replication support. Using features introduced in MySQL 5.0, it is possible to achieve a multi-master replication scheme.  PostgreSQL PostgreSQL offers multiple solutions for multi-master replication, including solutions based on two phase commit. There's Bucardo, PgPool and PgPool-II, PgCluster and Sequoia as well as some proprietary solutions. Another promising approach, implementing eager but asynchronous replication is Postgres-R. It's still in development.  Ingres Ingres Within Ingres Replicator, objects that are updated on one Ingres server can then replicated to other servers whether local or remote through multi-master replication. If one server fails, client connections can be re-directed to another server. It is not required for all Ingres servers in an environment to replicate with each other as this could cause excessive network traffic in large implementations. Instead, Ingres Replicator provides an elegant and sophisticated design that allows the appropriate data to be replicated to the appropriate servers without excessive replication traffic. This means that some servers in the environment can serve as failover candidates while other servers can meet other requirements such as managing a subset of columns or tables for a departmental solution, a subset of rows for a geographical region or one-way replication for a reporting server. In the event of a source, target, or network failure, data integrity is enforced through this two-phase commit protocol by ensuring that either the whole transaction is replicated, or none of it is. In addition, Ingres Replicator can operate over RDBMS’s from multiple vendors to connect them.  DBReplicator DBReplicator is a powerful open-source Java application for network-based multi-master heterogeneous database replication or filtered synchronization. Supported databases include: CloudScape / Apache Derby / JavaDB, DaffodilDB / One$DB, IBM DB2, Firebird (database server), MySQL, Oracle Database, PointBase, PostgreSQL, SQLServer, Sybase.
Pages to are hidden for
"Database replication - Scott Streit Content"Please download to view full document