Distributed DB - PowerPoint by fjzhangweiqun

VIEWS: 4 PAGES: 24

									Distributed Databases

        By:
  Dr. Yousry Taha


      Copyright 2010
                Intoduction

Distributed Database Technology emerged from:

1. Database Technology.
2. Network Technology.

•   The field has many research efforts and proposed
    prototypes, but a full-scale DDBMS has not yet
    implemented.
•   The products had been directed to a client-server
    model or heterogeneous DBMSs.


                   IS 533 - Distributed DB              2
               Distributed DB Concepts
DDB:
   is a collection of multiple logically interrelated databases
        distributed over computer network.
DDBMS:
   is a software to manage a DDB which making the distribution
        transparent to the user.
Distributed Computing System(DCS):
  is a number of computing elements interconnected by a network
       and cooperating in performing certain tasks.
Advantages of DCS:
   1.   More processing power can be provided to solve complex tasks.
   2.   Managing autonomous elements independently.


                            IS 533 - Distributed DB                     3
                 Advantages of DDB
1. Management of distributed data with
   different levels of transparency.
  –   Distribution or Network transparency
      •   Location transparency
      •   Naming transparency
  –   Replication transparency
  –   Fragmentation transparency
      •   Vertical fragmentation
      •   Horizontal fragmentation


                         IS 533 - Distributed DB   4
                         DDB design

•    Techniques to design DDB:
    1) Data fragmentation.
    2) Data replication.
    3) Data allocation.
             Jeddah                    Communication                 Riyadh
              data                        network                     data
                      Jeddah site                      Riyadh site




                       Dammam
                         data

                                      Dammam site




                                    IS 533 - Distributed DB                   5
      Advantages of DDB (continued)
1. Increased reliability and availability:
  • Reliability: is the probability that a system is
    running at a certain time point.
  • Availability: is the probability that a system is
    available during a time interval.
  • Distributed system increases both directions.
2. Improved performance:
  •   Data fragmentation provides Data Localization to improve CPU
      and I/O utilization and also network access time.
  •   Interquery and Intraquery parallelism improves performance.

3. Easier expansion.
                      IS 533 - Distributed DB                 6
       Additional Functions of DDB
To achieve the advantages of DDB, DDBMS must have
  these additional functions:
•   Keeping track of data distribution, fragmentation and replication.
•   Distributed query processing.
•   Distributed transaction management.
•   Replicated data management.
•   Distributed data recovery.
•   Security.
•   Distributed catalog management.
         To achieve these functions we must find an optimal
    solutions in many directions which including the hardware of
    the computers and the networks.

                            IS 533 - Distributed DB               7
               Data Fragmentation
Data Fragmentation is the technique used to break up the
  DB into logical units (fragments) which may assigned
  for storage at various sites.
• Horizontal Fragmentation:
   – Subset of the tuples in a relation.
   – The tuples of the fragment are specified by a condition on
     some attributes.
   – The different fragments are stored on different sites.
   – Derived horizontal fragmentation applies the partitioning of a
     primary relation to secondary relations related to the primary
     via a foreign key.
   – Complete horizontal relation is obtained if each tuple of R is
     exists in one of the fragments.

                         IS 533 - Distributed DB                8
     Data Fragmentation (continued)
• Vertical Fragmentation:
   – It divides the relation vertically by columns.
   – It keeps only certain attributes at certain site.
   – It usually adds the primary key to each fragment to be able to
     reconstruct the relation.
   – Complete vertical relation is obtained if the union of all
     fragments attributes results the attributes of the main relation.


• Mixed (Hybrid) Fragmentation:
   – It mixes the two types of fragmentation.
   – Each fragment is obtained by select-project operations.


                           IS 533 - Distributed DB                9
  Data Fragmentation (continued)
Fragmentation Schema:
  is the definition of the set of fragments that
  includes all attributes and tuples of the DB.
• The schema includes all constraints that the whole
  relation can be reconstructed.
• The whole DB can be reconstructed by using the
  outer join (or outer union) and union operations.
Allocation Schema:
  is the description of the allocation of each
  fragments on the sites of the DDB.

                   IS 533 - Distributed DB         10
       Fragments classification

• For each fragment of a relation R:
       • Condition C = True (all tuples are selected).
       • List (L = ATTRS(R)) = True (all attributes are included in the list).

                        Vertical            Horizontal           Mixed
                       fragment             fragment            fragment
                         True                 False               False
Condition C
                      (all tuples )       (not all tuples)    (not all tuples)

                        False                  True               False
L =ATTRS(R).
                  (Not all columns)        (all columns)     (Not all columns)

                          IS 533 - Distributed DB                                11
                   Data Replication
  It is useful in improving the performance.
• The extreme case is the Fully Replication.
   – All data are replicated on all sites.
   – The update operation is very difficult and slow.
   – The queries are executed locally
• Another extreme is No Replication.
   – The fragments are disjoint.
   – The update is easy, but queries span many sites.
• The third scheme is the Partial Replication.
   – Some fragments are replicated and others are not.
   – Distribution of data is selected to enhance the performance.


                         IS 533 - Distributed DB            12
    Data Replication (continued)

• The description of the replicas is called
  Replication Schema.

• Data Location (Data Distribution) is the
  process of determining the fragments of
  each site.



                  IS 533 - Distributed DB     13
                          Types of DDBS
DDS systems differ in:
   1. Degree of homogeneity of DDBMS
      –    Identical software in each of local DBMS, servers, and clients makes DDBMS homogenous.
           Otherwise, DDBMS is called heterogeneous.

   2. Degree of local autonomy of DDBMS
      –    Local site function as stand-alone DBMS, has its own local users, direct access by local
           transactions, etc.


   Local autonomy extremes:
      1)   No local Autonomy exist:
           –    Single conceptual schema in DDBMS.
           –    All access to system is obtained through a particular site in DDBMS.
           –    Look like centralized database to the user.
      2)   High degree of local autonomy.
           –    Each DBMS is centralized, independent, and autonomous.
           –    Each DBMS has its own local users, transactions, and DBA.




                                     IS 533 - Distributed DB                                   14
                        Types of DDBS
•    Systems that are hybrid between centralized and
     distributed systems:
    –  Multidatabase system:
      • Interactively create local views as needed by
         applications.
    – Federated Database System (FDBS):
      • Global view is shared between applications.
      • FDBS issues and sources of heterogeneity:
           1.   Differences in data models: (e.g. relational, object, and file data models)
           2.   Differences in constrains: deal with potential conflict between constraints?
           3.   Differences in query language: languages, versions, canonical language.
           4.   Semantic heterogeneity: Differences in meaning, and interpretations..

                                  IS 533 - Distributed DB                                      15
     Types of DDBS (continued)
Semantic Heterogeneity in FDBS
• It is the hardest to be solved
• It is a result of the freedom of design autonomy
    availability
The sources of the problem:
   1. The universe of discourse from which the data is
      drawn
   2. Data representation and naming
   3. Data meaning, understanding and interpretation
   4. Transaction polices
   5. Data summarization

                     IS 533 - Distributed DB             16
       Types of DDBS (continued)
FDBS design:
• The challenge of designing FDBS is to facilitate
  component DBSs interoperation while still
  providing:
    – Design autonomy: (Freedom of choosing 4 design parameters)
        1. Universe of discourse from which data is drawn
        2. Representation and naming
        3. Understanding, meaning, and subjective interpretation of data
        4. Transaction and policy constraints
   – Communication autonomy: (Ability to communicate)
   – Execution autonomy: (Ability to decide execution order)
   – Association autonomy: (Ability to share functionality and resources)
• This is done through “five-level schema architecture”.

                              IS 533 - Distributed DB                      17
    Five-level schema architecture

5    External schema   External schema    ..    External schema     User view (User)


4
                Federated
                 schema
                                 ....          Federated
                                                schema
                                                                   Global view (Dept)




3                             Export schema                       Mask heterogeneity




2               Component
                 schema          ....          Component
                                                schema
                                                                  Common data model



1              Local schema      ....         Local schema        Conceptual schema



                Component                      Component
                  DBS                            DBS



                              IS 533 - Distributed DB                                   18
     Query Processing in DDB
• One factor that affects the Query Processing
  in DDB is the cost of data transfer.
• The policy of how the data is transferred till
  getting the query result will affect the cost
  of the data transfer.
• Many solutions were proposed to optimize
  the transfer time as by using “Semijoin” and
  Query and Update Decomposition.


                  IS 533 - Distributed DB     19
         Concurrency Control and
            Recovery in DDB
•    The techniques used in solving these
     problems must take into consideration the
     following:
    1.   Dealing with multiple copies of data
    2.   Failure of individual site
    3.   Failure in communication links
    4.   Distributed commit
    5.   Distributed deadlock

                      IS 533 - Distributed DB   20
     Concurrency Control and
    Recovery in DDB (continued)
         Distributed concurrency control based on a
                 Distinguished Copy of Data item
•       The idea is to select a particular copy of the item
        and all locks and unlocks requests are sent to the
        site having this copy.
•       Many techniques were proposed for this idea as:
    •     Primary Site Technique
    •     Primary Site with Backup site Technique
    •     Primary copy Technique

                        IS 533 - Distributed DB          21
    Concurrency Control and
   Recovery in DDB (continued)
• Many techniques were also proposed for choosing
  a New Coordinator Site in case of failure.
  – In primary site technique: all transactions are aborted
    and restarted
  – In primary site with backup: the backup site becomes
    primary and new site is selected as backup
  – In primary with backup if both sites are down, the
    Election algorithm can be used to select new
    coordinator


                      IS 533 - Distributed DB                 22
     Concurrency Control and
    Recovery in DDB (continued)
Distributed concurrency control based on Voting

  – There is no distinguished copy, but the lock is sent to all
    sites containing the item.
  – The transaction requested the lock can hold the lock and
    notify all sites have copy of that item that its request has
    been granted by majority of the sites.
  – If a transaction does not receive a majority of votes granting
    it to a lock within a certain time-out period, it cancels its
    request and informs all sites of cancellation.

                        IS 533 - Distributed DB               23
         Distributed Recovery

• The problem is not an easy one.

• The amount of messages needed to
  determine the status of a site is not small.

• The distributed commit also needs a lot of
  messages.
                  IS 533 - Distributed DB        24

								
To top