Learning Center
Plans & pricing Sign in
Sign Out

Distributed Databases - PowerPoint


									Distributed Databases

       by Chien-Pin Hsu
       CS157B Section 1
         Nov 11, 2004

        Dr. Sin-Min Lee
       Distributed Database System

A distributed database system consists of
 loosely coupled sites that share no physical
Appears to user as a single system
Database systems that run on each site are
 independent of each other
Processing maybe done at a site other than the
 initiator of request
 Homogenous Distributed Database

 All sites have identical software
 They are aware of each other and agree to
 cooperate in processing user requests
 It appears to user as a single system
   An Homogenous Distributed Database
           Systems example

 A distributed system connects three databases: hq, mfg, and sales
 An application can simultaneously access or modify the data in
  several databases in a single distributed environment.
              What can we do?

A single query from a Manufacturing client on
 local database mfg can retrieve joined data from
 the products table on the local database and the
 dept table on the remote hq database.
 For a client application, the location and
 platform of the databases are transparent.
             Makes life easier!!

For example, if you are connected to database
 mfg but want to access data on database hq,
 creating a synonym on mfg for the remote dept
 table enables you to issue this query:
           SELECT *
           FROM         dept
In this way, a distributed system gives the
 appearance of native data access.
 Users on mfg do not have to know that the data
 they access resides on remote databases.
Heterogeneous Distributed Database

   In a heterogeneous distributed
    database system, at least one of the
    databases uses different schemas and

     A database system having different schema may
      cause a major problem for query processing.
     A database system having different software may
      cause a major problem for transaction processing.
           Distributed Data Storage

 Replication
  – System maintains multiple copies of data, stored in
    different sites, for faster retrieval and fault tolerance.
 Fragmentation
  – Relation is partitioned into several fragments stored in
    distinct sites

Replication and fragmentation can be combined
  • Relation is partitioned into several fragments: system
    maintains several identical replicas of each such
      Advantages of Replication

Availability: failure of site containing relation
 r does not result in unavailability of r is
 replicas exist.
Parallelism: queries on r may be processed
 by several nodes in parallel.
Reduced data transfer: relation r is available
 locally at each site containing a replica of r.
    Disadvantages of Replication

Increased cost of updates: each replica of
 relation r must be updated.
Increased complexity of concurrency
 control: concurrent updates to distinct
 replicas may lead to inconsistent data unless
 special concurrency control mechanisms are
  • One solution: choose one copy as primary copy
    and apply concurrency control operations on
    primary copy.

 Data can be distributed by storing individual
 tables at different sites
 Data can also be distributed by decomposing a
 table and storing portions at different sites –
 called Fragmentation
 Fragmentation can be horizontal or vertical
            Why use Fragmentation?

 Usage - in general applications use views so it’s appropriate to
  work with subsets
 Efficiency - data stored close to where it is most frequently used
 Parallelism - a transaction can divided into several sub-queries to
  increase degree of concurrency
 Security - data more secure - only stored where it is needed

Performance - may be slower
Integrity - more difficult
        Horizontal Fragmentation

Each fragment, Ti , of table T contains a
 subset of the rows
Each tuple of T is assigned to one or more
Horizontal fragmentation is lossless
      Horizontal Fragmentation Example

 A bank account schema has a relation
   Account-schema = (branch-name, account-number, balance).
 It fragments the relation by location and stores each fragment
  locally: rows with branch-name = `Hillside` are stored in the Hillside
  in a fragment
             Vertical Fragmentation

 Each fragment, Ti, of T contains a subset of the
  columns, each column is in at least one fragment,
  and each fragment includes the key:
             Ti = attr_listi (T)
           T = T1        T2 …..    Tn

    All schemas must contain a common candidate key (or
     superkey) to ensure lossless join property.
    A special attribute, the tuple-id attribute may be added to
     each schema to serve as a candidate key.
      Vertical Fragmentation Example

A employee-info schema has a relation
  employee-info schema = (designation, name,
 Employee-id, salary).
It fragments the relation to put information in two
 tables for security concern.
                Commit Protocols

 Commit protocols are used to ensure
 atomicity across sites

   Atomicity states that database modifications must
    follow an “all or nothing” rule.
   a transaction which executes at multiple sites must
    either be committed at all the sites, or aborted at all
    the sites.
 The Two-Phase Commit (2 PC) Protocol
What is this?

   Two-phase commit is a transaction protocol designed
    for the complications that arise with distributed
    resource managers.
   Two-phase commit technology is used for hotel and
    airline reservations, stock market transactions,
    banking applications, and credit card systems.
   With a two-phase commit protocol, the distributed
    transaction manager employs a coordinator to
    manage the individual resource managers. The
    commit process proceeds as follows:
      Phase1: Obtaining a Decision

Step 1  Coordinator asks all participants
 to prepare to commit transaction Ti.

   Ci adds the records <prepare T> to the log
    and forces log to stable storage (a log is a file
    which maintains a record of all changes to the

   sends prepare T messages to all sites where
    T executed
         Phase1: Making a Decision

Step 2  Upon receiving message, transaction
 manager at site determines if it can commit the
   if not:
       add a record <no T> to the log and send abort T
      message to Ci
   if the transaction can be committed, then:
      1). add the record <ready T> to the log
      2). force all records for T to stable storage
      3). send ready T message to Ci
      Phase 2: Recording the Decision

 Step 1  T can be committed of Ci received a ready T
  message from all the participating sites: otherwise T
  must be aborted.

 Step 2  Coordinator adds a decision record, <commit
  T> or <abort T>, to the log and forces record onto stable
  storage. Once the record is in stable storage, it cannot
  be revoked (even if failures occur)

 Step 3  Coordinator sends a message to each
  participant informing it of the decision (commit or abort)

 Step 4  Participants take appropriate action locally.
Two-Phase Commit Diagram
           Costs and Limitations

There have been two performance issues
 with two phase commit:
  – If one database server is unavailable, none of
    the servers gets the updates.
  – This is correctable through network tuning
    and correctly building the data distribution
    through database optimization techniques.

To top