Reference Book Principles of Distributed Database System

Document Sample
Reference Book Principles of Distributed Database System Powered By Docstoc
					              Reference Book
Principles of Distributed Database System


   4. Distributed DBMS Architecture
    5. Distributed Database Design
    7.5 Layers of Query Processing
                 Preethi Vishwanath

   Week 2 : 5th September 2006 – 12th September 2006
  ANSI/SPARC Architecture
– External View,                        Users
     which is that of the user,
     who might be a
      basically concerned with
     how users view the data.        External View

– Conceptual view,
     that of the enterprise
                                    Conceptual View
– Internal View,
       that of a system or a
       deals with the physical       Internal View
      definition and organization
      of data.
  Possible ways to put together
       multiple databases
Autonomy of          Alternatives to autonomy
Local Systems        – Tight Integration
                     Single image of entire db_ Is available
– Refers to            for any user who wants to share the
  distribution of      info, which may reside in multiple
  control              db_.
– Indicates degree   – Semiautonomous systems
  of independence    Consists of DBMSs that can operate
  of individual        independently, but have decided to
  databases            participate in a federation.
                     – Total Isolation
                     Stand Alone DBMs
– Deals with Physical distribution of data over
  multiple sites

– Three alternative architectures available
    Client-Server, communication duties are shared
    between the client machines and servers.
    Peer-to-peer systems, no distinction of client
    machines versus servers.
    Non-distributed systems
– Occurs in Various forms

– Data models: Representing data with different
  modeling tools

– Query Languages: Not only involves the use
  of completely different data access paradigms
  in different data models, but also covers
  difference in languages, even when the
  individual systems use the same data model.
        Client-Server architecture
Distinguish the functionality and        Multiple Client – Multiple Server
divide these functions into two          –   Multiple Servers accessed by
classes, server functions and                multiple clients
client functions.                        –   2 alternate management
Server does most of the data
management work                     1.   Heavy client Systems
 –   query processing                    –   Each client manages its own
                                             connection to the appropriate
 –   data management                         server.
 –   Optimization                        –   Simplifies server code
 –   Transaction management etc          –   Loads client machines with
                                             additional responsibilities
Client performs
 – Application                      2.   Light Client Systems
 – User interface                        –   Each client knows of only its
 – DBMS Client model                         “home server” which then
                                             communicates with other servers
                                             as required.
Multiple Client - Single Server          –   Concentrates on data
 – Single Server accessed by                 management functionality at the
   multiple clients                          servers.
Peer-to-Peer Distributed Systems
Schemas Present                    Local conceptual schemas are
                                   mappings of the global
                                   schema onto each site.
– Individual internal schema
  definition at each site, local
  internal schema                  Databases are typically
                                   designed in a top-down
– Enterprise view of data is       fashion, and, therefore all
  described the global             external view definitions are
  conceptual schema.               made globally.
– Local organization of data
  at each site is describe in
  the local conceptual             Major Components of a Peer-
  schema.                          to-Peer System
                                    – User Processor
– User applications and user
  access to the database is         – Data processor
  supported by external
Peer-to-Peer Distributed Systems
User Processor                         Data processor

User-interface handler                 Local query optimizer
      responsible for interpreting           Acts as the access path
      user commands, and                     selector
      formatting the result data             Responsible for choosing the
Semantic data controller                     best access path
      checks if the user query can     Local Recovery Manager
      be processed.                          Makes sure local database
Global Query optimizer and                   remains consistent
  decomposer                           Run-time support processor
       determines an execution               Is the interface to the
      strategy                               operating system and
      Translates global queries into         contains the database buffer
      local one.                             Responsible for maintaining
    Distributed execution                    the main memory buffers and
                                             managing the data access.
      Coordinates the distributed
      execution of the user request
                     MDBS Architecture
Models Using a Global Conceptual                 Models without a global
  Schema                                           conceptual schema

   GCS is defined by integrating either        Consists of two layers, local system
   the external schemas of local               layer and multi database layer.
   autonomous databases or parts of            Local system layer , present to the
   their local conceptual schema               multi-database layer the part of their
   Users of a local DBMS define their          local database they are willing share
   own views on the local database.            with users of other database.
   If heterogeneity exists in the system,      System views are constructed above
                                               this layer
   then two implementation
   alternatives exist: unilingual and          Responsibility of providing access to
   multilingual                                multiple database is delegated to the
                                               mapping between the external
   Unilingual requires the users to            schemas and the local conceptual
   utilize possibly different data models      schemas.
   and languages                               Full-fledged DBMs, exists each of
   Basic philosophy of multilingual            which manages a different database.
   architecture, is to permit each user
   to access the global database.           GCS in Logically integrated distributed DBMS
                                                – Mapping is from global schema to local
GCS in multi-DBMS                                   conceptual schema
    – Mapping is from local conceptual          – Top-down procedure
        schema to a global schema
    – Bottom-up design
       Global Directory Issues
Global Directory is an extension of the normal directory, including
information about the location of the fragments as well as the
makeup of the fragments, for cases of distributed DBMS or a multi-
DBMS, that uses a global conceptual schema,

Global Directory Issues

 – Relevant for distributed DBMS or a multi-DBMS that uses a global
   conceptual schema
 – Includes information about the location of the fragments as well as the
   makeup of fragments.
 – Directory is itself a database that contains meta-data about the actual
   data stored in database.
 – Three issues
       A directory may either be global to the entire database or local to each site.
       Directory may be maintained centrally at one site, or in a distributed fashion
       by distributing it over a number of sites.
          – If system is distributed, directory is always distributed
       Replication, may be single copy or multiple copies.
          – Multiple copies would provide more reliability
Organization of Distributed systems
 Three orthogonal dimensions
 – Level of sharing
      No sharing, each application and data execute at one site
      Data sharing, all the programs are replicated at other sites but not
      the data.
      Data-plus-program sharing, both data and program can be shared
 – Behavior of access patterns
        – Does not change over time
        – Very easy to manage
        – Most of the real life applications are dynamic
 – Level of knowledge on access pattern behavior.
      No information
      Complete information
        – Access patterns can be reasonably predicted
        – No deviations from predictions
      Partial information
        – Deviations from predictions
Top Down Design
– Suitable for applications where database needs to be build from
– Activity begins with requirement analysis
– Requirement document is input to two parallel activities:
       view design activity, deals with defining the interfaces for end
      conceptual design, process by which enterprise is examined
        – Can be further divided into 2 related activity groups
              Entity analyses, concerned with determining the entities, attributes
              and the relationship between them
              Functional analyses, concerned with determining the fun
      Distributed design activity consists of two steps
        – Fragmentation
        – Allocation

Bottom-Up Approach
– Suitable for applications where database already exists
– Starting point is individual conceptual schemas
– Exists primarily in the context of heterogeneous database.
Advantages                           Disadvantages

1.   Permits a number of             1. Applications whose views are
     transactions to executed           defined on more than one
     concurrently                       fragment may suffer
                                        performance degradation, if
2.   Results in parallel execution      applications have conflicting
     of a single query                  requirements.

                                     2. Simple asks like checking for
3.   Increases level of
     concurrency, also referred to      dependencies, would result in
     as, intra query concurrency        chasing after data in a number
                                        of sites

4.   Increased System throughput
                              Id    Name       Sal    Dept
                              100   A          10K    D1
                              200   B          20K    D2
                              300   C          30K    D3

  Horizontal Fragmentation                Vertical Fragmentation

Rows split : Sal > 20K                  Columns split : Primary
                                         Key retained
    Id    Name   Sal   Dept
                                         Id    Name          Id    Sal   Dept
    100   A      10K   D1
                                         100   A             100   10K   D1
    200   B      20K   D2
                                         200   B             200   20K   D2

                                         300   C             300   30K   D3
    Id    Name   Sal   Dept
    300   C      30K   D3
Correctness rules of fragmentation
  If a relation instance R is decomposed into fragments R1,R2 ….
   Rn, each data item that can be found in R can also be found in
   one or more of Ri’s.

   If a relation R is decomposed into fragments R1,R2 …. Rn, it
    should be possible to define a relational operator such that
   R = ▼Ri, ¥Ri ε FR ,
 Please note the operator would be different for the different forms
    of fragmentation

  If a relation R is horizontally decomposed into fragments R1,R2 ….
    Rn, and data item di is in Rj, it is not in any other fragment Rk (k
    != j).
       Comparison of Replication
               Full Replication     Partial          Partitioning
  Query             Easy                    Same Difficulty
 Directory         Easy or                  Same Difficulty
Management       nonexistent
Concurrency       Moderate         Difficult            Easy
 Reliability     Very High           High               Low

  Reality         Possible         Realistic          Possible
                 Application                         application
      Derived Horizontal Fragmentation
  Defined on a member relation of a link                  Example
  according to a selection operation specified          Consider two tables
  on its owner.
                                                        Emp                                   PAY
   Link between the owner and the member                Id         Name    Dept        Dept       Sal
  relations is defined as equi-join                     100        A       D1          D1         10K
                                                        200        B       D2          D2         20K
  An equi-join can be implemented by means
                                                        300        C       D3          D3         30K
  of semijoins.

  Given a link L where owner (L) = S and                PAY1 = EMP1 α PAY
  member (L) = R, the derived horizontal
  fragments of R are defined as                         PAY2 = EMP2 α PAY

          Ri = R α Si, 1 <= I <= w                                         Emp1 = σSal <= 20K (Emp)
                                                                            Emp2 = σSal > 20K (Emp)
                                    Si = σ Fi (S)       PAY1Name
                                                         Id                  Dept             PAY2
                                                                                       Id     Name      Dept
        w is the max number of fragments that will be        100       A     D1
                                           defined on                                  300    C         D3
                                                             200       B     D2
 Fi is the formula using which the primary horizontal
                               fragment Si is defined
Primary Horizontal                               Vertical Fragmentation

                                                    Starts by assigning each attribute to
   Primary horizontal fragmentation is              one fragment
   defined by a selection operation on the
   owner relation of a database schema.
                                                    At each step, joins some of the
                                                    fragments until some criteria is
   Given relation Ri, its horizontal fragments      satisfied.
   are given by
      Ri = σFi(R),           1<= i <= w
                                                    Results in overlapping fragments
Fi selection formula used to obtain fragment
The example mentioned in slide 20, can be
   represented by using the above formula           Starts with a relation and decides on
   as                                               beneficial partitioning based on the
                     Emp1 = σSal <= 20K             access behavior of applications to the
   (Emp)                                            attributes
               Emp2 = σSal > 20K (Emp)
                                                    Fits more naturally within the top-down

                                                    Generates non-overlapping fragments.
                Hybrid Fragmentation
  Horizontal or vertical fragmentation of           In case of horizontal fragmentation,
  a database schema will not be                     one has to stop when each fragment
  sufficient to satisfy the requirements of         consists of only one tuple, whereas the
  user applications.                                termination point for vertical
  In certain cases, a vertical                      fragmentation is one attribute per
  fragmentation may be followed by a                fragment.
  horizontal one, or vice versa.
  Since two types of partitioning                   Example discussed in slides 20 and 26
  strategies are applied one after the              can be converted into hybrid
  other, this alternative is called hybrid          fragmentation

                                                          α                     α
        R1                    R2

                                              R11             R12   R21   R22       R23
R11           R12       R21 R22      R23