Distributed DB - PowerPoint by fjzhangweiqun


									Distributed Databases

  Dr. Yousry Taha

      Copyright 2010

Distributed Database Technology emerged from:

1. Database Technology.
2. Network Technology.

•   The field has many research efforts and proposed
    prototypes, but a full-scale DDBMS has not yet
•   The products had been directed to a client-server
    model or heterogeneous DBMSs.

                   IS 533 - Distributed DB              2
               Distributed DB Concepts
   is a collection of multiple logically interrelated databases
        distributed over computer network.
   is a software to manage a DDB which making the distribution
        transparent to the user.
Distributed Computing System(DCS):
  is a number of computing elements interconnected by a network
       and cooperating in performing certain tasks.
Advantages of DCS:
   1.   More processing power can be provided to solve complex tasks.
   2.   Managing autonomous elements independently.

                            IS 533 - Distributed DB                     3
                 Advantages of DDB
1. Management of distributed data with
   different levels of transparency.
  –   Distribution or Network transparency
      •   Location transparency
      •   Naming transparency
  –   Replication transparency
  –   Fragmentation transparency
      •   Vertical fragmentation
      •   Horizontal fragmentation

                         IS 533 - Distributed DB   4
                         DDB design

•    Techniques to design DDB:
    1) Data fragmentation.
    2) Data replication.
    3) Data allocation.
             Jeddah                    Communication                 Riyadh
              data                        network                     data
                      Jeddah site                      Riyadh site


                                      Dammam site

                                    IS 533 - Distributed DB                   5
      Advantages of DDB (continued)
1. Increased reliability and availability:
  • Reliability: is the probability that a system is
    running at a certain time point.
  • Availability: is the probability that a system is
    available during a time interval.
  • Distributed system increases both directions.
2. Improved performance:
  •   Data fragmentation provides Data Localization to improve CPU
      and I/O utilization and also network access time.
  •   Interquery and Intraquery parallelism improves performance.

3. Easier expansion.
                      IS 533 - Distributed DB                 6
       Additional Functions of DDB
To achieve the advantages of DDB, DDBMS must have
  these additional functions:
•   Keeping track of data distribution, fragmentation and replication.
•   Distributed query processing.
•   Distributed transaction management.
•   Replicated data management.
•   Distributed data recovery.
•   Security.
•   Distributed catalog management.
         To achieve these functions we must find an optimal
    solutions in many directions which including the hardware of
    the computers and the networks.

                            IS 533 - Distributed DB               7
               Data Fragmentation
Data Fragmentation is the technique used to break up the
  DB into logical units (fragments) which may assigned
  for storage at various sites.
• Horizontal Fragmentation:
   – Subset of the tuples in a relation.
   – The tuples of the fragment are specified by a condition on
     some attributes.
   – The different fragments are stored on different sites.
   – Derived horizontal fragmentation applies the partitioning of a
     primary relation to secondary relations related to the primary
     via a foreign key.
   – Complete horizontal relation is obtained if each tuple of R is
     exists in one of the fragments.

                         IS 533 - Distributed DB                8
     Data Fragmentation (continued)
• Vertical Fragmentation:
   – It divides the relation vertically by columns.
   – It keeps only certain attributes at certain site.
   – It usually adds the primary key to each fragment to be able to
     reconstruct the relation.
   – Complete vertical relation is obtained if the union of all
     fragments attributes results the attributes of the main relation.

• Mixed (Hybrid) Fragmentation:
   – It mixes the two types of fragmentation.
   – Each fragment is obtained by select-project operations.

                           IS 533 - Distributed DB                9
  Data Fragmentation (continued)
Fragmentation Schema:
  is the definition of the set of fragments that
  includes all attributes and tuples of the DB.
• The schema includes all constraints that the whole
  relation can be reconstructed.
• The whole DB can be reconstructed by using the
  outer join (or outer union) and union operations.
Allocation Schema:
  is the description of the allocation of each
  fragments on the sites of the DDB.

                   IS 533 - Distributed DB         10
       Fragments classification

• For each fragment of a relation R:
       • Condition C = True (all tuples are selected).
       • List (L = ATTRS(R)) = True (all attributes are included in the list).

                        Vertical            Horizontal           Mixed
                       fragment             fragment            fragment
                         True                 False               False
Condition C
                      (all tuples )       (not all tuples)    (not all tuples)

                        False                  True               False
                  (Not all columns)        (all columns)     (Not all columns)

                          IS 533 - Distributed DB                                11
                   Data Replication
  It is useful in improving the performance.
• The extreme case is the Fully Replication.
   – All data are replicated on all sites.
   – The update operation is very difficult and slow.
   – The queries are executed locally
• Another extreme is No Replication.
   – The fragments are disjoint.
   – The update is easy, but queries span many sites.
• The third scheme is the Partial Replication.
   – Some fragments are replicated and others are not.
   – Distribution of data is selected to enhance the performance.

                         IS 533 - Distributed DB            12
    Data Replication (continued)

• The description of the replicas is called
  Replication Schema.

• Data Location (Data Distribution) is the
  process of determining the fragments of
  each site.

                  IS 533 - Distributed DB     13
                          Types of DDBS
DDS systems differ in:
   1. Degree of homogeneity of DDBMS
      –    Identical software in each of local DBMS, servers, and clients makes DDBMS homogenous.
           Otherwise, DDBMS is called heterogeneous.

   2. Degree of local autonomy of DDBMS
      –    Local site function as stand-alone DBMS, has its own local users, direct access by local
           transactions, etc.

   Local autonomy extremes:
      1)   No local Autonomy exist:
           –    Single conceptual schema in DDBMS.
           –    All access to system is obtained through a particular site in DDBMS.
           –    Look like centralized database to the user.
      2)   High degree of local autonomy.
           –    Each DBMS is centralized, independent, and autonomous.
           –    Each DBMS has its own local users, transactions, and DBA.

                                     IS 533 - Distributed DB                                   14
                        Types of DDBS
•    Systems that are hybrid between centralized and
     distributed systems:
    –  Multidatabase system:
      • Interactively create local views as needed by
    – Federated Database System (FDBS):
      • Global view is shared between applications.
      • FDBS issues and sources of heterogeneity:
           1.   Differences in data models: (e.g. relational, object, and file data models)
           2.   Differences in constrains: deal with potential conflict between constraints?
           3.   Differences in query language: languages, versions, canonical language.
           4.   Semantic heterogeneity: Differences in meaning, and interpretations..

                                  IS 533 - Distributed DB                                      15
     Types of DDBS (continued)
Semantic Heterogeneity in FDBS
• It is the hardest to be solved
• It is a result of the freedom of design autonomy
The sources of the problem:
   1. The universe of discourse from which the data is
   2. Data representation and naming
   3. Data meaning, understanding and interpretation
   4. Transaction polices
   5. Data summarization

                     IS 533 - Distributed DB             16
       Types of DDBS (continued)
FDBS design:
• The challenge of designing FDBS is to facilitate
  component DBSs interoperation while still
    – Design autonomy: (Freedom of choosing 4 design parameters)
        1. Universe of discourse from which data is drawn
        2. Representation and naming
        3. Understanding, meaning, and subjective interpretation of data
        4. Transaction and policy constraints
   – Communication autonomy: (Ability to communicate)
   – Execution autonomy: (Ability to decide execution order)
   – Association autonomy: (Ability to share functionality and resources)
• This is done through “five-level schema architecture”.

                              IS 533 - Distributed DB                      17
    Five-level schema architecture

5    External schema   External schema    ..    External schema     User view (User)

                                 ....          Federated
                                                                   Global view (Dept)

3                             Export schema                       Mask heterogeneity

2               Component
                 schema          ....          Component
                                                                  Common data model

1              Local schema      ....         Local schema        Conceptual schema

                Component                      Component
                  DBS                            DBS

                              IS 533 - Distributed DB                                   18
     Query Processing in DDB
• One factor that affects the Query Processing
  in DDB is the cost of data transfer.
• The policy of how the data is transferred till
  getting the query result will affect the cost
  of the data transfer.
• Many solutions were proposed to optimize
  the transfer time as by using “Semijoin” and
  Query and Update Decomposition.

                  IS 533 - Distributed DB     19
         Concurrency Control and
            Recovery in DDB
•    The techniques used in solving these
     problems must take into consideration the
    1.   Dealing with multiple copies of data
    2.   Failure of individual site
    3.   Failure in communication links
    4.   Distributed commit
    5.   Distributed deadlock

                      IS 533 - Distributed DB   20
     Concurrency Control and
    Recovery in DDB (continued)
         Distributed concurrency control based on a
                 Distinguished Copy of Data item
•       The idea is to select a particular copy of the item
        and all locks and unlocks requests are sent to the
        site having this copy.
•       Many techniques were proposed for this idea as:
    •     Primary Site Technique
    •     Primary Site with Backup site Technique
    •     Primary copy Technique

                        IS 533 - Distributed DB          21
    Concurrency Control and
   Recovery in DDB (continued)
• Many techniques were also proposed for choosing
  a New Coordinator Site in case of failure.
  – In primary site technique: all transactions are aborted
    and restarted
  – In primary site with backup: the backup site becomes
    primary and new site is selected as backup
  – In primary with backup if both sites are down, the
    Election algorithm can be used to select new

                      IS 533 - Distributed DB                 22
     Concurrency Control and
    Recovery in DDB (continued)
Distributed concurrency control based on Voting

  – There is no distinguished copy, but the lock is sent to all
    sites containing the item.
  – The transaction requested the lock can hold the lock and
    notify all sites have copy of that item that its request has
    been granted by majority of the sites.
  – If a transaction does not receive a majority of votes granting
    it to a lock within a certain time-out period, it cancels its
    request and informs all sites of cancellation.

                        IS 533 - Distributed DB               23
         Distributed Recovery

• The problem is not an easy one.

• The amount of messages needed to
  determine the status of a site is not small.

• The distributed commit also needs a lot of
                  IS 533 - Distributed DB        24

To top