Architectures of Distributed Database Systems

Document Sample
Architectures of Distributed Database Systems Powered By Docstoc
					Architectures of Distributed
Database Systems

 What is an architecture?
 What is a reference model?
 Architecture Dimensions: autonomy, distribution
  and heterogeneity
 Differences between Client/server, peer-to-peer,
  multi-DBS architectures
 Directory Management



                                                 1
Architecture
 Defines the structure of a system
   identify the components in the system
   define the functions of each component
   define the interrelationships and interactions among
    the components
 Reference Model
   an idealized architectural model of the system for
    reference
   a conceptual framework (functional and not physical)
   its purpose is to divide standardization work into
    manageable (smaller) pieces and to show at a
    general level how these pieces are related to one
    another                                             2
An architecture showing the main
components in a database system
            User                                                  User
          Application                      ...                  Application


       Begin_Transaction
                                                                     Results & User
         , Read, Write,
                                                                      Notifications
           Abort, EOT                   Transaction
                                         Manager
                                           (TM)

                           Read, Write,               Results
                           Abort, EOT


                                          Scheduler
                                            (SC)


                           Scheduled
                                                     Results
                           Operations


                                          Recovery
                                          Manager
                                           (RM)




                                                                                      3
Reference Model
 Based on Components (internal)
   What are the (functional units) components in the
    systems
   What are the relationships amongst the components
 Based on functions (got users)
   What are the different classes of users
   What are the functions for each class of users
 Based on Data
   What are the different types of data in the system
   How do the functional units use and access the data
    and what are the output data of each functional unit

                                                           4
ANSI/SPARC Architecture

   Users


   External       External    External    External
                   View        View        View
   Schema



                             Conceptual
  Conceptual                   View
   Schema


                              Internal
   Internal                     View
   Schema



  Which model is this architecture based on?
                                                     5
Architecture based on Data Model
 Internal view
   deal with the physical definitions and organization of
    data
 External view
   concern about how users view the database
   each user may have its own view
   a view may be shared by several users
 Conceptual view
   is an abstract definition of the database
   used by the system (applications)


                                                             6
 DDBS Implementation Alternatives
           Distribution   Peer-to-Peer
                          Distributed DBS                         Distributed
                                                                  Multi-DBS




Client\server

                                                             Autonomy


                                                            Multi-DBS

      Heterogeneity
                                            Federated DBS


                                                                            7
Dimensions of implementation
 Autonomy
   refers to the distribution of control (NOT data) in the
    system (among the sites) (I.e., manage the access of
    data and transaction commit)
   indicates the degree of which individual site can
    operate independently
     o I.e., the process at a site is less affected (and dependent on)
       by the processes at other sites
     o I.e., individual site may make the processing decision for a
       transaction without asking the permission from other sites
   Tight integration (low autonomy)
     o a single-image of the entire database is available to all users
     o similar to a single database system to the users
     o the sites work together closely to complete a transaction
                                                                         8
Dimensions of implementation
  Semi-autonomous (some degrees of autonomy):
    o consists of DBSs that can operate independently, but have
      decided participate in a federation to make their local data
      sharable with other sites
  Total isolation (high autonomy) (multi-DBSs)
    o the individual systems are stand-alone DBSs but they are
      connected
    o mechanisms are provided to users to access other DBSs
      using their own language or a different language
    o mechanisms (i.e., for conversion) are provided to access
      other DBSs
  How to access remote data in different architectures?
  Which one we prefer?

                                                                     9
Dimensions of implementation
 Distribution
   refer to the physical distribution (or migration) of
    data and software components (for processing
    transactions) over multiple sites (from one server to
    another)
   client/server (distributed when required)
     o data are primarily stored at server
     o server provides services and clients generate requests
     o clients may get the data when they need them
   peer-to-peer (fully distributed)
     o each site has a similar structure
     o no distinction between client and server (all are at the same
       level and have the same structure)
     o all are servers (as well as clients) and each DBS at a site
       maintains a fragment (portion) of the database               10
System concept
 Local data item Vs. remote copy
 What is a system?
     A well-defined function (I.e., transaction processing)
     A well-defined system boundary
     Input and output
 How to define a system boundary?
     Normally, the components within a system are closely related to each
      other (tightly coupled)
     Normally, the components within a system are not (or only loosely)
      related to the components outside the system boundary
     System and sub-system
 Autonomy is the relationship among the sub-systems (which are
  managing the data items and transaction processing) in a system
 Distribution is the division and distribution (even or uneven) of the
  software components (and data) of a system to sub-systems (at
  different sites)                                                     11
System concept




                 12
Dimensions of implementation
 Heterogeneity
   Differences between the sub-systems at different
    sites
   Various levels (hardware, communications, operating
    system)
   Related to database:
     o data model, data format, query language,
        transaction management algorithms
   When accessing other (remote) DBSs, conversions
    are required


                                                      13
Example architectures
 Examples:
    A: autonomy; D: distribution; H: heterogeneity
 A0, D0, H0
    no distribution and data migration (D0), same hardware and data
     model (H0)
    a set of logically related multiple DBS (of the same type) (H0)
    users consider the whole system as a single DBS (A0)
 A0, D1, H1
    no autonomy and the sites work together to process
     transactions. Users consider the whole system as a single DBS
     (A0)
    some components are distributed at multiple sites and data may
     migrate to other sites (D1)
    the data formats and the processing methods at different sites
     may be different and the system needs to provide data and
     access method conversion to access remote DB (H1)              14
Example architectures
 A0, D1, H0
    no autonomy and users consider the whole system as a single DBS
     (A0) and the sites work together closely to process transactions
    certain degree of data distribution, I..e., the data may be
     distributed between client and server (D1)
 A1, D0, H0
    each DBS has its own DB (D0)
    each site partly contributes to the processing of (global)
     transactions in the whole system (A1)
    each DBS may has its own transactions (local processing)
 A2, D0, H1
    full autonomy
    different database systems at different sites
    multi-database systems
                                                                  15
Client/Server DB (A0, D1, H0)
 A two-levels architecture: clients and servers
 Server: performs most of data and transaction management
  work
 Client: interface, application, certain degrees of local
  processing and management of cached data
 Advantages of Client-Server Architecture
      More efficient in division of labor
      Better price/performance on client machines
      Ability to use familiar tools on client machines
      Full DBS functionality provided to client workstations
 Disadvantage:
    Consistency between the server copy and client copy (cached
     data) has to be ensured (cached data management). Update at
     server?
    How to divide the jobs between client and server?
                                                                     16
         Processing power, network bandwidth and control problems
Client/Server Database Architecture
                             Use r Application
                                        Applicarion




             Operating
                                                          ...




              System
                           Inte rface     Program
                                    Clie nt DBM S
                            Communications Software

                          SQL                   re sult
                          que ry                table

                            Communication Software

              Operating     Semantic Data Controller
                                   Query Optimizer
                              Storage M anage r
                              Tr ansaction M anage r
                                Recov e ry M anage r
                           Runtime Support Processor
                                    Syste m




                                    Database




                                                                17
Multiple Clients/Single Server
       Applications              Applications           Applications
      Client Services          Client Services         Client Services
      Communications          Communications          Communications




                                                                         LAN
                        High-level               Filtered
                        requests                data only

                               Communications
                                DBMS Services




                                     Database




                                                                               18
Multiple – Client/Single Server
 Single server problem
   Server forms the bottleneck
   Server forms the single point of failure (reliability)
   Database scaling is difficult
 Solution
   Multiple servers to increase the scalability and
    reliability
   The server forms a distributed database system
   Distribution of workload
   Each server maintains a partition of the database


                                                             19
Multiple Servers
 Heavy client
    The division of jobs between server and client
    The client is more powerful and can perform more
     functions
    Each client knows the locations of the servers and
     communicates with other servers as required
    Clients have to manage a directory of servers (heavy)
     and manage their cached data
 Light client
    Each client manages its own connection to a server
     which communicates to other servers for the client
    Very limited amount of jobs will be performed at the
     client                                              20
Multiple Clients/ Multiple Servers

    Ÿ   directoroy           Applications
    Ÿ   caching
    Ÿ   query decomposition Client Services
    Ÿ   commit protocols   Communications




                                                                    LAN



                                  Communications   Communications
                                  DBMS Services    DBMS Services




                                     Database         Database




 This is a Heavy/Light client? Why???
                                                                          21
Server-to-Server

       Ÿ   SQL interface
       Ÿ   programmatic         Applications
           interface
       Ÿ   other application   Client Services
           support             Communications
           environments




                                     Communications   Communications
                                      DBMS Services   DBMS Services




                                         Database        Database




 This is a Heavy/Light client? Why???
                                                                       22
Peer-to-Peer (A0, D2, H0)
 The sites work together to process transactions
 Data Model
      Users consider the whole system as a single DBS
      Individual Local Internal Schema (LIS) at each DB site
      Global Conceptual Schema (GCS) for the whole system
      Each Local Internal Schema connects to the GCS through a
       Local Conceptual Schema (LCS)
      Location transparency is supported by the GCS and LCS
      Each user has an External Schema (ES), which is connected to
       the GCS for its own purposes
      Global queries are translated into local queries based on the
       GCS
      Local queries are executed concurrently at different sites
 Components at each sites are closely related for the
  transaction processing                                               23
Peer-to-Peer Architecture

         ES1    ES2         ESn




                      GCS



         LCS1   LCS2        LCSn



         LIS1   LIS2        LISn

                                   24
Peer-to-Peer Architecture


             USER PROCESSOR                                            DATA PROCESSOR
                                  Global                                 Local                      Local
             External           Conceptual               GD/D          Conceptual                  Internal
             Schema              Schema                                                 Database
   User                                                                 Schema                     Schema
 requests
             User Inertace




                                          Global Query




                                                                          Local Query
                                                                           Processor




                                                                                                    Processor
                             Controller




                                                           Execution
                                           Optimizer




                                                                                        Recovery
                             Semantic




                                                                                        Manager



                                                                                                     Runtime
                                                                                                     Support
               Handler




                                                            Moniter
                                                            Global




                                                                                         Local
                               Data




  user                                                                                                          Database




   System
 responses




             Checking and pre-                                         Processing & Execution
             processing                                                                                            25
Peer-to-Peer Architecture
 User processor
    handle the interactions with users
 Data processor
    data management and query (transaction) processing
 User interface handler
    accept user command
 Semantic data controller
    check command for execution
 Global query optimizer
    optimization (plan for execution)


                                                          26
Peer-to-Peer Architecture
 Execution monitor
    coordinate the distributed execution of a query
 Local query optimizer
 Local recovery manager
    ensure database consistent (correctness) even after failure
 Run-time support processor
    buffer and data management
 Note there are many different ways to classify the
  function units in a DDBS



                                                                   27
Multi-DBS Architecture (A2, D0, H1)
 Each site is a DBS and they connect with each other to form a global
  database system (loosely coupled)
 The components at different sites are loosely connected by a upper
  layer
 Two types of transactions in the systems: local and global transactions
 The Global Conceptual Schema connects to some of the Local
  Conceptual Schemas
 Local database is a sub-set of the global database
 Users may define their local external view on their local database
 Uni-lingual multi-DBS
    access the global database using the same data model and
      language, which may be different from its local ones
 multi-lingual multi-DBS
    access the global database using the its local data model and
      language
    multi-users may use different data model and language
                                                                      28
Multi-DBS Architecture

               GES1     GES2      GESn




LES11           LES1n      GCS       LESn1   LESnm




        LCS1              LCS2      LCSn



        LIS1               LIS2      LISn


                                                29
Components of a Multi-DBS
                                   USER
                           Syste m          Use r
                          Respones        Requests

                           Multi-DBMS Layer




              Query                                    Query
            Processor                                Processor
        D                                    D
        B   Transaction                      B       Transaction

        M
             Manager
                                 .......     M
                                                      Manager



        S   Scheduler                        S       Scheduler




            Re covery                                 Re covery
            Manager                                   Manager


            Runtim e                                  Runtim e
             Support                                  Support
            Processor                                Processor




                                                                   30
Directory Issues
 A directory is required in a distributed database system to access the
  Global Conceptual Schema (and to access global data)
 A directory is a meta-data (data about data) about the database
 It includes information about the locations of the data
 A directory may be:
     Global, local or hierarchical
       o A global directory, and each site has a local directory
       o The global directories are organized in a tree structure
     Distributed (the global directory is partitioned) or centralized
     Replicated or single copy
        The choice depends on performance, reliability, size of
          directory and workload distribution


                                                                         31
    Directory Issues
             Type
                           Local & Central &
      Global & central &   non-replicated (?)                      Local & distributed
      non-replicated                                               & non-replicated

Local & central &
replicated (?)
                                                                     Global & distributed
                                                                     & non-replicated (?)

                                                                            Location
Global & central &
replicated (?)                                                 Local & distributed
                                                               & replicated

         Replication
                                            Global & distributed
                                            & replicated                                    32
Reference

 Ozsu: Ch.4




               33