Distributed DBMS - PowerPoint

Document Sample
Distributed DBMS - PowerPoint Powered By Docstoc
					DISTRIBUTED
 DBMS


      SUSHIL
      KULKARNI
 DDBMS Concepts
 Applications
 Characteristics, Properties of DDBMS
 Distributed Processing
 Advantages & Disadvantages DDBMS
 Types & Functions of DDBMS
 Main Issues of DDBMS
 Component Architecture for DDBMS
 Data Allocation & Fragmentation
 Transparencies
CONCEPTS
                 CONCEPTS
• So far, we assume a centralized database
   Data are stored in one location (e.g. a single
    hard disk)
   A centralized database management system to
    handle transaction
   To handle multiple requests, a client-server
    system is used
     - Client send requests for data to server
     - Server handle query, transaction management etc.


     SUSHIL KULKARNI
               CONCEPTS
• This is not the only possibility
• In many cases, it may be advantageous
  for data to be distributed
  – Branches of a bank
  – Different part of the government storing
    different kind of data about a person
  – Different organizations sharing part of their
    data
• Thus, distributed databases


   SUSHIL KULKARNI
                  CONCEPTS
• Data spread over multiple machines (also
  referred to as sites or nodes.
• Network interconnects the machines
• Data shared by users on multiple machines




      SUSHIL KULKARNI
                  CONCEPTS

Distributed database

Logical interrelated collection of shared data,
along with description of data, physically
distributed over a computer network.




      SUSHIL KULKARNI
                 CONCEPTS

Distributed DBMS

The software system that permits the
management of the distributed database and
makes the distribution transparent to users




     SUSHIL KULKARNI
                 CONCEPTS

Applications

• User access distributed database via
  applications




     SUSHIL KULKARNI
                  CONCEPTS

TWO types of Applications

• Local application : Application that do not
  required data from other sites.

• Global application : Application that required
  data from other sites.

      SUSHIL KULKARNI
          TYPES OF DDBMS
• In a homogeneous distributed database:

  – All sites have identical software.
  – Are aware of each other and agree to
    cooperate in processing user requests.
  – Each site surrenders part of its autonomy
    in terms of right to change schemas or
    software.
  – Appears to user as a single system.

     SUSHIL KULKARNI
           TYPES OF DDBMS
• In a heterogeneous distributed database:

  – Different sites may use different schemas and
    software.
     • Difference in schema is a major problem for
       query processing.
     • Difference in software is a major problem for
       transaction processing.
  – Sites may not be aware of each other and may
    provide only limited facilities for cooperation in
       transaction processing.
      SUSHIL KULKARNI
TYPE: HOMOGENEOUS DBMS




Identical DBMSs




  SUSHIL KULKARNI
TYPE: HETROGENEOUS DBMS




Non-identical DBMSs




    SUSHIL KULKARNI
         OBJECTIVES : DISTRIBUTED
             ARCHITECTURE
• Location Transparency
  – User does not have to know the location of the data.
  – Data requests automatically forwarded to appropriate
    sites


• Local Autonomy
  – Local site can operate with its database when
    network connections fail
  – Each site controls its own data, security,
     logging, recovery


     SUSHIL KULKARNI
    SIGNIFICANT TRADE -OFF

 Synchronous                     • Asynchronous
  Distributed Database              Distributed Database
  • All copies of the same            • Some data inconsistency
    data are always                     is tolerated
    identical                         • Data update propagation
                                        is delayed
  • Data updates are
    immediately applied to            • Lower data integrity
    all copies throughout             • Less overhead  faster
    network                             response time
  • Good for data integrity
  • High overhead  slow
    response times
   NOTE: all this assumes replicated data (to be discussed later)
Advantages & Disadvantages

 Advantages                 Disadvantages
  • Increased reliability     • Software cost &
    & availability              complexity
  • Local control             • Processing overhead
  • Modular growth            • Data integrity
  • Lower                     • Slow response
    communication
    costs
  • Faster response
  DISTRIBUTED PROCESSING


A centralized database that can be
accessed over a computer network.




    SUSHIL KULKARNI
DISTRIBUTED PROCESSING
 T    T      T                      T    T      T


     COM 1                              COM 2




                       Communication
                          Network
                                                    DB


                   T     T      T


 SUSHIL KULKARNI        COM 3
      FUNCTIONS OF DDBMS
Functions of a centralized DBMS plus:

extended communication to allow the transfer of
queries and data among sites

extended system catalog to store data distribution
details

distributed query processing , including query
optimization

      SUSHIL KULKARNI
     FUNCTIONS OF DDBMS

extended concurrency control to maintain
consistency of replicated data.

extended recovery services to take account
of failures of individual sites and common
links


     SUSHIL KULKARNI
 TWO MAIN ISSUES IN DDBMS

Making query from one site to the same or
remote site.

Logical database is partitioned in to different
data streams and located at different sites.




     SUSHIL KULKARNI
COMPONENT ARCHITECTURE FOR
          DDBMS
• Local DBMS

• Data Communication Component

• Global System Catalog

• Distributed DBMS component

     SUSHIL KULKARNI
   DATA
ALLOCATION
         DATA ALLOCATION
• Centralized

• Fragmented

• Complete replication

• Selective replication

     SUSHIL KULKARNI
       Distributed Data Storage
• Assume relational data model.
• Replication:
  – System maintains multiple copies of data, stored in
    different sites, for faster retrieval and fault tolerance.
• Fragmentation:
  – Relation is partitioned into several fragments stored in
    distinct sites
• Replication and fragmentation can be combined:
  – Relation is partitioned into several fragments: System
    maintains several identical replicas of each such
    fragment.
           Data Replication

• A relation or fragment of a relation is
  replicated if it is stored redundantly in
  two or more sites.
• Full replication of a relation is the case
  where the relation is stored at all sites.
• Fully redundant databases are those in
  which every site contains a copy of the
  entire database.

    SUSHIL KULKARNI
          Data Replication (Cont.)
            Data Replication
• Advantages of Replication:
  – Availability: failure of site containing relation r
    does not result in unavailability of r is replicas
    exist.
  – Parallelism: queries on r may be processed by
    several nodes in parallel.
  – Reduced data transfer: relation r is available
    locally at each site containing a replica of r.



      SUSHIL KULKARNI
        Data Replication (Cont.)
          Data Replication
• Disadvantages of Replication
  – Increased cost of updates: each replica of
    relation r must be updated.
  – Increased complexity of concurrency control:
    concurrent updates to distinct replicas may
    lead to inconsistent data unless special
    concurrency control mechanisms are
    implemented.
    • One solution: choose one copy as primary copy
      and apply concurrency control operations on
      primary copy.
               Data Fragmentation
• Division of relation r into fragments r1, r2, …, rn which
  contain sufficient information to reconstruct relation r.
• Horizontal fragmentation: each tuple of r is assigned to
  one or more fragments.
• Vertical fragmentation: the schema for relation r is split
  into several smaller schemas.
   – All schemas must contain a common candidate key (or
     superkey) to ensure lossless join property.
   – A special attribute, the tuple-id attribute may be added to each
     schema to serve as a candidate key.
• Example : relation account with following schema.
• Account-schema = (branch-name, account-number,
  balance).
       HORIZONTAL FRAGMENTATION

       Original relation                        A1      A2        ……….   An
                                            T1

        A1     A2   ……….           An       T2
T1                                          T3
                                   1
T2                                          .
                                   1
T3                                          .T60
                                   1
.
                                   2                         Site 1
.T60
T61
                                   2
                                                  A1        A2    ……….    An
                                   3
.                                           T61
                                   3
.                                           .
                                   3
Tn                                          .
                                            Tn


                                                             Site 2
    -Fragments contain subsets of complete tuples (all attributes at all
    sites)

    How to reconstruct       R= Rs1  Rs2         …….      Rsn
        VERTICAL FRAGMENTATION

                                      A1    A2         A3   A4
      Original              t1                                        How to Reconstruct:
      Relation        (R)
                            t2                                        R=Rs1   Rs2    Rsn

  TID –Tuple ID
Hidden Attribute to
 ensure account             tn
 and simple join
  reconstruction
                       A1        A2   TID        TID   A3   A4              RS2
                 t1                    1          1              t1
        RS1                            2          2
                 t2                                              t2
                                                                          RS1.TID=RS2.TID
                                       n          n
                 tn                                              tn      Join condition

                       SITE1                            SITE2
        VERTICAL FRAGMENTATION

                                      A1    A2         A3   A4
      Original              t1                                        How to Reconstruct:
      Relation        (R)
                            t2                                        R=Rs1   Rs2    Rsn

  TID –Tuple ID
Hidden Attribute to
 ensure account             tn
 and simple join
  reconstruction
                       A1        A2   TID        TID   A3   A4              RS2
                 t1                    1          1              t1
        RS1                            2          2
                 t2                                              t2
                                                                          RS1.TID=RS2.TID
                                       n          n
                 tn                                              tn      Join condition

                       SITE1                            SITE2
            MIXED FRAGMENTATION

Rs1   A1   A2   A3                                          A4   A5
                                              Rs3
                                                                      u
                          R                                           s
                                                                      a
                     A1   A2       A3   A4   A5



Rs2
      A1   A2   A3                                                    E
                                                            A4   A5   u
                      (Salary            (Benefit                     r
                     Attributes)        Attributes)
                                                                      o
                                                                      p
                                                      Rs4             e
              MIXED FRAGMENTATION

                                      A1    A2         A3   A4
      Original              t1                                        How to Reconstruct:
      Relation        (R)
                            t2                                        R=Rs1   Rs2    Rsn

  TID –Tuple ID
Hidden Attribute to
 ensure account             tn
 and simple join
  reconstruction
                       A1        A2   TID        TID   A3   A4              RS2
                 t1                    1          1              t1
        RS1                            2          2
                 t2                                              t2
                                                                          RS1.TID=RS2.TID
                                       n          n
                 tn                                              tn      Join condition

                       SITE1                            SITE2
Horizontal Fragmentation of account
              Relation
  branch-name          account-number        balance

 Hillside                A-305                  500
 Hillside                A-226                  336
 Hillside                A-155                  62

             account1=branch-name=“Hillside”(account)


  branch-name        account-number          balance

 Valleyview              A-177                   205
 Valleyview              A-402                  10000
 Valleyview              A-408                   1123
 Valleyview              A-639                   750

            account2=branch-name=“Valleyview”(account)
  SUSHIL KULKARNI
  Vertical Fragmentation of employee-info
                 Relation
        branch-name        customer-name            tuple-id

     Hillside                  Lowman                     1
     Hillside                  Camp                       2
     Valleyview                Camp                       3
     Valleyview                Kahn                       4
     Hillside                  Kahn                       5
     Valleyview                Kahn                       6
     Valleyview                Green                      7
deposit1=branch-name, customer-name, tuple-id(employee-info)
       account number          balance             tuple-id

         A-305                500                        1
         A-226                336                        2
         A-177                205                        3
         A-402                10000                      4
         A-155                62                         5
         A-408                1123                       6
         A-639                750                        7
 deposit2=account-number, balance, tuple-id(employee-info)
    Advantages of Fragmentation
• Horizontal:
   – allows parallel processing on fragments of a relation
   – allows a relation to be split so that tuples are located where
     they are most frequently accessed
• Vertical:
   – allows tuples to be split so that each part of the tuple is stored
     where it is most frequently accessed
   – tuple-id attribute allows efficient joining of vertical fragments
   – allows parallel processing on a relation
• Vertical and horizontal fragmentation can be mixed.
   – Fragments may be successively fragmented to an arbitrary
     depth.


       SUSHIL KULKARNI
  REPLICATION and FRAGMENTATION

Partition of Attributes/tuples need not be disjoint
                        A1 A2 A3 A4 A5




      A1 A2 A3 A4                             A2 A3 A4 A5




                             Overlap
                    (replication of attributes)
TRANSPARENCIES
 TRANSPARENCIES IN DDBMS
• Transparencies hide implementation
  details from the user
• Example in Centralized databases : Data
  independence
• Main types of transparencies in DDBMS:
      o Distributed Transparency
      o Transaction Transparency

     SUSHIL KULKARNI
DISTRIBUTED TRANSPARENCY
Allows the user to see the database as a
single, logical entity.

If this transparency is exhibited then the
user does not need to know that
             1. The data are partitioned.
              2. Data can be replicated at several
                 sites.
              3. Data location.
      SUSHIL KULKARNI
                   EXAMPLE
Staff (staffNo, position, sex, dob, salary,
       fName, lName, branchNo)

Vertical fragmentation:
      S  Π staffNo, position, sex , dob, salary (Staff)
       1

      S  Π staffNo, fName, lName , dbranchNo (Staff)
       2

     SUSHIL KULKARNI
                        EXAMPLE
Fragment S 2 according to branch number.
Assume that there are only three branches.
Horizontal fragmentation:
          S                                   (Staff)
              21         branchNo  ' B003 '

          S                                  (Staff)
              22         branchNo  ' B005 '

          S                                  (Staff)
              23         branchNo  ' B007 '
     SUSHIL KULKARNI
                  EXAMPLE
Assume that :

        S 1 and S 2 are at site 5,
        S 21 at site 3
        S 22 at site 5
        S 23 at site 7



     SUSHIL KULKARNI
FRAGMENTATION TRANSPARENCY

If it is provided then the user does not need
to know the data is fragmented.
Example:

SELECT fName, lName
FROM Staff
WHERE position = ‘ Manager ’

     SUSHIL KULKARNI
  LOCATION TRANSPARENCY



If it is provided then the user must know
how the data has been fragmented but still
does not have know the location of the data.




     SUSHIL KULKARNI
  LOCATION TRANSPARENCY
Example:
SELECT fName, lName
FROM S21
WHERE staffNo IN (SELECT staffNO FROM S1 where
                         position = ‘ Manager ’)
UNION
SELECT fName, lName
FROM S22
WHERE staffNo IN (SELECT staffNO FROM S1 where
                         position = ‘ Manager ’)

      SUSHIL KULKARNI
  LOCATION TRANSPARENCY
Example:

UNION

SELECT fName, lName
FROM S23
WHERE staffNo IN (SELECT staffNO FROM S1 where
                         position = ‘ Manager ’ )




        SUSHIL KULKARNI
LOCAL MAPPING TRANSPARENCY



If it is provided then the user must know
how the data has been fragmented as well
as the location of the data.




     SUSHIL KULKARNI
  LOCATION TRANSPARENCY
Example:
SELECT fName, lName
FROM S21 AT SITE 3
WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 5
                   where position = ‘ Manager ’)
UNION
SELECT fName, lName
FROM S22 AT SITE 5
WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 3
                  where position = ‘ Manager ’)

      SUSHIL KULKARNI
  LOCATION TRANSPARENCY
Example:

UNION

SELECT fName, lName
FROM S23 AT SITE 7
WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 3
                   where position = ‘ Manager ’ )




        SUSHIL KULKARNI
TRANSACTION TRANSPARENCY


It maintains distributed database’s integrity
and consistency.




     SUSHIL KULKARNI
     QUERY PROCESSING IN DDMS
Issues 1:
                                                      Horizontal
  Parallel Processing across Fragments             fragmentations

                                       =Emp1 U Emp2

LName(salary>40,000(Employee))          2 Fragments

 LName( salary>40,000(Emp1)) U LName( salary>40,000(Emp2))
                  Site 1                                Site 2

                   Execution in Parallel on fragments
                           and union results together
QUERY PROCESSING IN DDMS

 Site1   Site2   Site3   Joins- symmetric and
                              associative

    (A     B)    C

                          Parallel Processing

                          (xx(A))   (B   C)
    A     (B     C)
      QUERY PROCESSING IN DDMS
Join Strategies
R= Fnames, Cnames, Dnames (Employee                      Department)
         Site 3                      Site 1        Mg rssn        Site 2
 100 records, 2000 bytes         10,000 records,    to ssn   100 records, 3000
                                 1,000,000 bytes                   bytes


Strategies:
                                                              1,003,000
1)Ship both relations to the result site and join there         bytes
                                                             transfered
                                                              1,002,000
2)Ship employee to 2, join at 2, results to 3                   bytes
                                                             transfered
3)Ship Department to 1, join at 1, results to 3              5,000 bytes
                                                              transfered
        minimize total communication cost of data transfer
THANKS !

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:37
posted:12/15/2011
language:English
pages:57