Jing_Luo_Distributed 20DBMSs Concepts 20and 20Design by ROcq7NRA


									Distributed DBMSs-Concept and
             Jing Luo
             CS 157B
             Dr. Lee
            Fall, 2003
Centralized DBMS             Distributed DBMS
• It allows users to         • It allows users to
  access only a single          access not only the
  logical database located      data at their own site
  at one site under its         but also data stored at
  control.                      remote sites.
• Distributed database: A logically interrelated collection of
  shared data (and a description of this data) physically
  distributed over a computer network.
• Distributed DBMS: The software system that permits the
  management of the distributed database and makes the
  distribution transparent to users.
Users access the distributed database
via applications
• Local applications
  Applications are those do not require data from other sites.

• Global applications
  Applications are those do require data from other sites.
              Characteristics of DDBMS
•   A collection of logically related shared data;
•   The data is split into a number of fragments;
•   Fragments may be replicated;
•   Fragments/replicas are allocated to sites;
•   The sites are linked by a communications network;
•   The data at each site is under the control of a DBMS;
•   The DBMS at each site can handle local applications, autonomously;
•   Each DBMS participates in at lease one global application.
A DDBMS is required to have at least one global application.
It is not necessary for every site in the system to have its own
local database.

                                           Site 1

                        Computer network            Site 2
          Site 4

                             Site 3
Distributed processing

A centralized database that can be
accessed over a computer network.
                   Distributed Processing (cont’d)
Distributed Processing

                                        Site 1

                                                 Site 2
                          Computer network
     Site 4

                              Site 3
  Distributed DBMS vs. Distributed Processing

Distributed DBMS                    Distributed processing
• System consists of data           • Data is centralized, even
   that is physically distributed      though other users may be
   across a number of sites            accessing the data over the
   in the network.                     network.
               Parallel DBMSs

A DBMS running across multiple processors
and disks that is designed to execute
operations in parallel, whenever possible, in
order to improve performance
  Three Main Architectures for Parallel
To provide multiple processors with
common access to a single database, a
parallel DBMS must provide for shared
resource management.
• Shared memory
• Shared disk
• Shared nothing
Shared memory is a tightly coupled architecture in which
multiple processors within a single system share system
• Symmetric multiprocessing (SMP)
  This approach has become popular on platforms ranging from personal
  workstations that support a few microprocessors in parallel, to RISC
  (Reduced Instruction Set Computer) based machines, all the way up
  to the largest mainframes.
• The architecture provides high-speed data access for a limited number
  of processors, but it is not scalable beyond about 64 processors when
  the interconnection network becomes a bottleneck.
            Shared Memory (cont’d)
•   Shared Memory

        CPU            CPU           CPU       CPU

                    Interconnection network

       DB             DB             DB
Shared disk is a loosely-coupled architecture optimized
for applications that are inherently centralized and require
high availability and performance.
• Each processor can access all disks directly, but each has
  its own private memory.
• Shared disk architecture eliminates the shared memory
  performance bottleneck without introducing the overhead
  associated with physically partitioned data.
               Shared Disk (cont’d)
•   Shared Disk

      Memory         Memory        Memory   Memory

      CPU            CPU           CPU       CPU

                  Interconnection network

                DB            DB            DB
Shared nothing known as massively parallel processing,
is a multiple processor architecture in which each
processor is part of a complete system, with its own
memory and disk storage.
• The database is partitioned among all the disks on each
  system associated with the database, and data is
  transparently available to users on all system.
• This architecture can easily support a large number of
       Shared nothing (cont’d)
• SN
        Memory                             Memory

DB      CPU                                CPU      DB

                 Interconnection network
        CPU                                 CPU      DB
        Memory                             Memory
Homogeneous & Heterogeneous DDBMSs

Homogeneous system              Heterogeneous system
• All sites use the same DBMS   • Sites may run different DBMS
   product.                        products, which need not be
                                   based on the same underlying
                                   data model, and so the system
                                   may be composed of relational,
                                   network, hierarchical, and
                                   object-oriented DBMSs.
Heterogeneous system problems
 In a heterogeneous system, translations are required to
 allow communication between different DBMSs.
 The system has the task of locating the data and
 performing any necessary translation.
 Data required from another site may have:
 •      Different hardware
 •      Different DBMS products
 •      Different hardware and different DBMS products
 If the hardware is different but the DBMS products are the same,
 involving the change of codes and word length.
 If the DBMS products are different, involving the mapping of data
 structures in one data model to the equivalent data structures in
 another data model.
     Heterogeneous system problems (cont’d)
An additional complexity is the provision of a common
Conceptual schemas. The integration of data models can be very difficult
     owing to the semantic heterogeneity.
For example, attributes with the same name in two
Schemas may represent different things. Equally well,
Attributes with different names may model the same thing.
    Gateways, which convert the language and model of each
    different DBMS into the language and model of the relational
•     It may not support transaction management. The gateway between two
      systems may be only a query translator. For example, a system may not
      coordinate concurrency control and recovery of transactions that involve
      updates to the pair of databases.
•     The gateway approach is concerned only with the problem of translating a
      query expressed in one language into an equivalent expression in another
      language. As such, generally it does not address the issues of
      homogenizing the structural and representational differences between different
A multidatabase system (MDBS) is a distributed DBMS
in which each site maintains complete autonomy. An
MDBS resides transparently on top of existing database
and file systems, and presents a single database to its
users. It maintains a global schema against which
users issue queries and updates; an MDBS maintains
only the global schema and the local DBMSs themselves
maintain all user data.
 Concepts of Networking
An interconnected collection of autonomous
computers that are capable of exchanging
For our purposes, the DDBMS is built on top
of a network in such of a way that the
Network is hidden from the user.
Classification of network
LAN: a local area network is intended for connecting
computers at the same site.
WAN: a wide area network is used when computers or
LANs need to be connected over long distances.

A special case of the WAN is a metropolitan area
network (MAN), which generally covers a city or
       Summary of WAN and LAN
WAN                                       LAN
• Distances up to thousands of            • Distances up to a few kilometers
   kilometers link autonomous computers   • Link computers that cooperate in
• Network managed by independent              distributed applications
   organization (using telephone or       • Network managed by users (using
   satellite links)                           privately owned cables)
• Data rate up to 33.6 kbits/(dial-up     • Data rate up to 2500 Mbit/s (ATM)
   via modem), 45 Mbit/s (T3 circuit)     • Simpler protocol
• Complex protocol                        • Use broadcast routing
• Use point-to-point routing              • Use bus or ring topology
• Use irregular topology                  • Error rate about 1:10^9
• Error rate about 1:10^5
Network protocols
a set of rules that determines how messages between
computers are sent, interpreted, and processed.
  • TCP/IP (Transmission Control Protocol/Internet Protocol)
  • SPX/IPX (Sequenced Packet Exchange/Internetwork Package
  • NetBIOS (Network Basic Input/Output System)
  • APPC (Advanced Program-to-Program Communications)
         Network protocol (cont’d)

•   DECnet
•   AppleTalk
•   WAP (Wireless Application Protocol)
•   SPX/IPX (Sequenced Packet Exchange/Internetwork Package

To top