Database Systems: Design, Implementation, and Management THIRD EDITION by 03tUBH5a


									     Chapter 10
     Distributed Database
10   Management System

     Database Systems: Design, Implementation, and Management
     4th Edition
            Story of Papa John’s by Nation’s
              Restaurant News (Aug. 1997)
      Papa John’s selects Oracle Universal Server
       as its database platform

10      

            Papa John’s operates 1300 –plus units in 39 states.
            The Oracle platform is to support distributed databases
            for relational video, audio, text, messages, and other
            types of data.
           The Key component of Oracle’s Universal data server is
            Oracle 7, which has the following capabilities.
                Share data across its communication network, allowing
                 users to access data without knowing or specifying the
                Provide parallel server technology
                Replicates databases among multiple locations
      Distributed Databases and Distributed DBMS
         One    database, and many pieces

10       Many large user organizations are abandoning the
          traditional concept of a massive, single, centralized data
         To expedite the delivery of data into the hands of those who
          need it the most,
         businesses are employing complex new software systems
          that replace the central-site information warehouse with a
          more efficient network of data distribution outlets.
          What Is A Distributed Database and

     One database, and many pieces
10    A distributed database (DDB) is a collection of
       multiple, logically interrelated databases distributed
       over a computer network.
      A distributed database management system (DDBMS)
       permits the management of the DDB and makes the
       distribution transparent to the users.
      Transparent-- easily understood; manifest; obvious.
     The Adv/disadv of Distributed DBMS
      DDBMS Advantages                 DDBMS Disadvantages
          Data are located near the         Complexity of
           “greatest demand” site.            management and control

10     

           Faster data access
           Faster data processing

                                              Lack of standards
          Growth facilitation               Increased storage
          Improved
          Reduced operating costs
          User-friendly interface
          Less danger of a single-
           point failure
          Processor independence
      The Evolution of Distributed DBMS
      Social and Technical Changes in the 1980’s
           Business operations became more decentralized

10      

            Competition increased at the global level.
            Customer demands and market needs favored a
            decentralized management style.
           Rapid technological change created low-cost
            microcomputers. The LANs became the basis for
            computerized solutions.
           The large number of applications based on DBMSs and
            the need to protect investments in centralized DBMS
            software made the notion of data sharing attractive.
      The Evolution of Distributed DBMS
      Two Database Requirements in a Dynamic Business
           Quick ad hoc data access became crucial in the quick-

10      
            response decision making environment.
            The decentralization of management structure based on
            the decentralization of business units made
            decentralized multiple-access and multiple-location
            databases a necessity.

      Developments in the 1990’s affecting DBMS
         The growing acceptance of the Internet and the World
          Wide Web as the platform for data access and
         The increased focus on data analysis that led to data
          mining and data warehousing.
                Distributed Processing
               and Distributed Database
      Distributed processing shares the database’s logical
       processing among two or more physically

10     independent sites that are connected through a
       network. (See Figure 10.1)

      Distributed database stores a logically related
       database over two or more physically independent
       sites connected via a computer network. (See Figure
     Distributed Database Environment


     Figure 10.2
                Distributed Processing
               and Distributed Database
      Distributed processing does not require a distributed
       database, but a distributed database requires

10     distributed processing.
      Distributed processing may be based on a single
       database located on a single computer. In order to
       manage distributed data, copies or parts of the
       database processing functions must be distributed to
       all data storage sites.
      Both distributed processing and distributed
       databases require a network to connect all
          What Is A Distributed Database and
      A distributed database (DDB) is a collection of
       multiple, logically interrelated databases distributed
       over a computer network.

10    A distributed database management system
       (DDBMS) permits the management of the DDB and
       makes the distribution transparent to the users.
      Transparent-- easily understood; manifest; obvious.
                 Features of a DDBMS
      Distribution transparency-- If a DDBMS exhibits it,
       you, the user, are treating a DDB as a single logical
       DB and therefore you do not need to know

10       Whether the data is partitioned (Fragmentation
         the data location (location transparency)
         Whether the data is replicated at several sites
      Performance Transparency– A query can be
       submitted from any location and it will run with
       comparable performance. The system performs as if
       it were a centralized DBMS.
      Transaction Transparency– A user can run a
       transaction that updates data at a number of sites or
       that it will be aborted. The transaction behaves
       exactly like a local transaction.
      Heterogeneity transparency-- If a DDBMS exhibits it,
       it allows the integration of different database models
       (hierarchical, relational, network) under a

10     common/global schema.
            Distributed DB Transparency
      DDBMS transparency features have the
       common property of allowing the end users

10     to think that he is the database’s only user.
           Distribution transparency
           Transaction transparency
           Failure transparency
           Performance transparency
           Heterogeneity transparency
             Distribution Transparency
      Distribution transparency allows us to manage a
       physically dispersed database as though it were a
       centralized database.

10    Three Levels of Distribution Transparency
         Fragmentation transparency
         Location transparency
         Local mapping transparency

     Table 10.2
             Distribution Transparency
      Example (Figure 10.9):
       Employee data (EMPLOYEE) are distributed over three
       locations: New York, Atlanta, and Miami.

10     Depending on the level of distribution transparency
       support, three different cases of queries are possible:

      Figure 10.9 Fragment Locations
            Distribution Transparency
      Case 1: DB Supports Fragmentation Transparency

10      SELECT *
          WHERE EMP_DOB < ‘01-JAN-1940’;
            Distribution Transparency
      Case 2: DB Supports Location Transparency

10      SELECT *
          FROM E1
          WHERE EMP_DOB < ‘01-JAN-1940’;
        SELECT *
          FROM E2
          WHERE EMP_DOC < ‘01-JAN-1940’;
        SELECT *
          FROM E3
          WHERE EMP_DOC < ‘01-JAN-1940’;
            Distribution Transparency
      Case 3: DB Supports Local Mapping Transparency

10      SELECT *
          FROM E1 NODE NY
          WHERE EMP_DOB < ‘01-JAN-1940’;
        SELECT *
          FROM E2 NODE ATL
          WHERE EMP_DOC < ‘01-JAN-1940’;
        SELECT *
          FROM E3 NODE MIA
          WHERE EMP_DOC < ‘01-JAN-1940’;
            What Is A Distributed DBMS?
      Functions of a DDBMS
           Application interface
           Validation to analyze data requests

10      

            Transformation to determine request’s components
            Query-optimization to find the best access strategy
           Mapping to determine the data location
           I/O interface to read or write data
           Formatting to prepare the data for presentation
           Security to provide data privacy
           Backup and recovery
           Database administration
           Concurrency control
           Transaction management
     Fully Distributed Database Management System


     Figure 10.4
                 DDBMS Components
      Computer workstations that form the network
      Network hardware and software components that
10     reside in each workstation.
      Communications media that carry the data from one
       workstation to another.
      Transaction processor (TP) is the software
       component found in each computer that (1) request
       data and (2) receives and processes the application’s
       data requests.
      Data processor (DP) stores and retrieves data located
       at the site. Also known as data manager (DM).
     A Distributed Transaction


     Figure 10.12
                   DDBMS Components
      DDBMS protocol determines how the DDBMS will:
           Interface with the network to transport data and
            commands between DPs and TPs.

10         Synchronize all data received from DPs (TP side) and
            route retrieved data to the appropriate TPs (DP side).
           Ensure common database functions in a distributed
            system -- security, concurrency control, backup, and
     Levels of Data & Process Distribution
      Single-Site Processing, Single-Site Data (SPSD)
           All processing is done on a single CPU or host
           All data are stored on the host computer’s local disk.

10      
            The DBMS is located on the host computer.
            The DBMS is accessed by dumb terminals.
           Typical of most mainframe and minicomputer DBMSs.
           Typical of the 1st generation of single-user
            microcomputer database.

      Table 10.1
     Nondistributed (Centralized) DBMS


     Figure 10.6
     Levels of Data & Process Distribution
      Multiple-Site Processing, Single-Site Data (MPSD)
            All processing is done on different computers sharing a
             single database

10          Typically, MPSD requires a network file server on which
             conventional applications are accessed through a LAN.

     Figure 10.7
     Levels of Data & Process Distribution
      Multiple-Site Processing, Multiple-Site Data (MPMD)
           Fully distributed DBMS with support for multiple DPs
            and TPs at multiple sites.

10              Homogeneous DDMS integrate only one type of
                 centralized DBMS over the network.
                Heterogeneous DDBMS integrate different types of
                 centralized DBMSs over a network. (See Figure
             Distribution Transparency
      Distribution transparency is supported by distributed
       data dictionary (DDD) or a distributed data catalog

10    The DDC contains the description of the entire
       database as seen by the database administrator.

      The database description, known as the distributed
       global schema, is the common database schema
       used by local TPs to translate user requests into
             Transaction Transparency
      Transaction transparency ensures that database
       transactions will maintain the database’s integrity
       and consistency. The transaction will be completed

10     only if all database sites involved in the transaction
       complete their part of the transaction.

      Related Concepts:
         Remote Requests
         Remote Transactions
         Distributed Transactions
         Distributed Requests
              Transaction Transparency
      Distributed Requests and Distributed Transactions
           A remote request allows us to access data to be
            processed by a single remote database processor.

10      
            (Figure 10.10)
            A remote transaction, composed of several requests,
            may access data at only a single site. (Figure 10.11)
           A distributed transaction allows a transaction to
            reference several different (local or remote) DP sites.
            (Figure 10.12)
           A distributed request lets us reference data from
            several remote DP sites. (Figure 10.13) It also allows a
            single request to reference a physically partitioned
            table. (Figure 10.14)
     A Remote Request


     Figure 10.10
     A Remote Transaction


     Figure 10.11
     A Distributed Transaction


     Figure 10.12
     A Distributed Request


      Figure 10.13
     Another Distributed Request


     Figure 10.14
            Performance Transparency and
                  Query Optimization
      The objective of a query optimization routine is to
       minimize the total cost associated with the execution

10     of a request. The costs associated with a request are
       a function of the:
           Access time (I/O) cost involved in accessing the
            physical data stored on disk.
           Communication cost associated with the transmission
            of data among nodes in distributed database systems.
           CPU time cost associated with the processing overhead
            of managing distributed transactions.
            Performance Transparency and
                  Query Optimization
      Query optimization must provide distribution
       transparency as well as replica transparency.

10    Replica transparency refers to the DDBMSs ability to
       hide the existence of multiple copies of data from the
      Most of the query optimization algorithms are based
       on two principles:
         Selection of the optimum execution order
         Selection of sites to be accessed to minimize
          communication costs
           Performance Transparency and
                 Query Optimization

      Operation Modes of Query Optimization
        Automatic query optimization means that the DDBMS

10       finds the most cost-effective access path without user
        Manual query optimization requires that the
         optimization be selected and scheduled by the end user
         or programmer.

      Timing of Query Optimization
        Static query optimization takes place at compilation
        Dynamic query optimization takes place at execution
            Performance Transparency and
                  Query Optimization
      Optimization Techniques by Information Used
           Statistically based query optimization uses statistical

10          information about the database.
               In the dynamic statistical generation mode, the
                DDBMS automatically evaluates and updates the
                statistics after each access.
               In the manual statistical generation mode, the
                statistics must be updated periodically through a
                user-selected utility.
           Rule-based query optimization algorithm is based on a
            set of user-defined rules to determine the best query
            access strategy.
            Distributed Database Design
      The design of a distributed database
       introduces three new issues:

10      

            How to partition the database into fragments.
            Which fragments to replicate.
           Where to locate those fragments and replicas.
                  Data Fragmentation
      Data fragmentation allows us to break a single object
       into two or more segments or fragments.
      Each fragment can be stored at any site over a
10     computer network.
      Data fragmentation information is stored in the
       distributed data catalog (DDC), from which it is
       accessed by the transaction processor (TP) to
       process user requests.
      Three Types of Fragmentation Strategies:
         Horizontal fragmentation
         Vertical fragmentation
         Mixed fragmentation
     A Sample CUSTOMER Table


     Figure 10.16
                    Data Fragmentation
      Horizontal Fragmentation
        Division of a relation into subsets (fragments) of tuples
        (rows). Each fragment is stored at a different node, and

10      each fragment has unique rows. Each fragment represents
        the equivalent of a SELECT statement, with the WHERE
        clause on a single attribute.

     Table 10.3 Horizontal Fragmentation of the CUSTOMER Table By State
     Table Fragments In Three Locations


      Figure 10.17
                  Data Fragmentation
      Vertical Fragmentation
       Division of a relation into attribute (column) subsets.
       Each subset (fragment) is stored at a different node,

10     and each fragment has unique columns -- with the
       exception of the key column. This is the equivalent of
       the PROJECT statement.

     Table 10.4 Vertical Fragmentation of the CUSTOMER Table
     Vertically Fragmented Table Contents


      Figure 10.18
                 Data Fragmentation
      Mixed Fragmentation
       Combination of horizontal and vertical strategies. A
       table may be divided into several horizontal subsets

10     (rows), each one having a subset of the attributes

     Table 10.5 Mixed Fragmentation of the CUSTOMER Table
     Figure 10.19
                      Data Replication
      Data replication refers to the storage of data copies
       at multiple sites served by a computer network.
          Fragment copies can be stored at several sites to serve

10         specific information requirements.
          The existence of fragment copies can enhance data
           availability and response time, reducing
           communication and total query costs.

     Figure 10.20
                      Data Replication
      Mutual Consistency Rule
          Replicated data are subject to the mutual consistency

10     
           rule, which requires that all copies of data fragments be
           identical and that
           DDBMS must ensure that a database update is
           performed at all sites where replicas exist.
                     Data Replication
      Replication Conditions
         A fully replicated database stores multiple copies of all
          database fragments at multiple sites.

10       A partially replicated database stores multiple copies of
          some database fragments at multiple sites.

      Factors for Data Replication Decision
         Database Size
         Usage Frequency
                        Data Allocation
      Data allocation describes the processing of
       deciding where to locate data.

10    Data Allocation Strategies
           Centralized
            The entire database is stored at one site.

           Partitioned
            The database is divided into several disjoint parts
            (fragments) and stored at several sites.

           Replicated
            Copies of one or more database fragments are stored at
            several sites.
                       Data Allocation
      Data allocation algorithms take into
       consideration a variety of factors:

10      

            Performance and data availability goals
            Size, number of rows, the number of relations that an
            entity maintains with other entities.
           Types of transactions to be applied to the database, the
            attributes accessed by each of those transactions.
             Client/Server vs. DDBMS
      Client/server architecture refers to the way
       in which computers interact to form a
10    It features a user of resources or a client
       and a provider of resources or a server.
      The architecture can be used to implement a
       DBMS in which the client is the transaction
       processor (TP) and the server is the data
       processor (DP).

To top