Distributed Database Application & System
Description
Distributed Database System (DDBS) technology is the union of what appear to be two diametrically opposed approaches to data processing : Database System Computer Network
Shared by: ndodariya
-
Stats
- views:
- 103
- posted:
- 11/30/2012
- language:
- pages:
- 77
Document Sample


Distributed Database Application
& System
Subject Code: 171602
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 2
Text Books:
1.Principles of Distributed database systems,
By M. tamer Ozsu, Petrick Valduriez, Pearson
2.Distributed Database Systems By Chhanda Ray,
Pearson.
Reference Books:
1.Distributed Databases: Principles and Systems,
Stefano Ceri, McGraw-Hill
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 3
Outline
File Systems,
Database Management,
Motivation,
Distributed Computing,& What is distributed,
What is a Distributed Database System?
What is not a DDBS?
Centralized DBMS on a Network
Distributed DBMS Environment
Implicit Assumptions
Shared-Memory Architecture, Shared-Disk
Architecture, Shared-Nothing Architecture
Applications,
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 4
Outline
Distributed DBMS Promises of DDBS
Transparent Management of Distributed and
Replicated data, Transparencies
Distributed Database – User View
Distributed DBMS – Reality, Potentially
Improved Performance,
Complicating Factors, Problem Areas,
Parallelism Requirements, System Expansion
Distributed DBMS Issues, Distributed DBMS
Issues, Relationship Between Issues
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 5
Slide reference
Most of the slides are taken from the following
link
http://softbase.uwaterloo.ca/~tozsu/ddbook/notes/
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 6
Introduction
Distributed Database System (DDBS)
technology is the union of what appear to be
two diametrically opposed approaches to
data processing :
Database System
Computer Network
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 7
Traditional File Processing System
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 8
Database System
The main aim is
Data is defined and
administered
centrally
Data independence
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 9
Motivation
Major motivations behind the use of database
systems is the desire to integrate the operational
data of an enterprise and to provide centralized,
thus controlled access to that data.
The technology of computer networks, promotes a
mode of work that goes against all centralization
efforts.
It seems difficult to understand how these two
contrasting technology can possibly be
synthesized to produce a technology that is
more powerful and more promising than either
one alone.
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 10
Motivation
The most important objective of Database
technology is integration, not centralization
It is possible to achieve integration without
centralization, and that is what exactly the
distributed database technology attempts
to achieve
Fundamental concepts and set the framework
for distributed database concepts
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 11
Motivation
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 12
Distributed Data Processing
In the last couple of years,
Distributed processing or Distributed computing
Sometimes it was referred as
Multiprocessor systems
Distributed data processing
Computer networks
“Concept in search of a definition and a
name”
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 13
Distributed Data Processing
Synonymous terms
➠distributed function
➠distributed data processing
➠multiprocessors/multicomputers
➠satellite processing
➠backend processing
➠dedicated/special purpose computers
➠timeshared systems
➠functionally modular systems
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 14
Cont…
Obviously, some degree of distributed
processing goes on in any computer system,
even on single-processor computers.
CPU, I/O , ALU is separated, it can also be
considered as distributed processing.
But, we have nothing to do with a form of
distribution of functions in a single-processor
computer systems
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 15
Distributed Data Processing Definiton
A number of autonomous processing
elements (not necessarily homogeneous)
that are interconnected by a computer
network and that cooperate in performing
their assigned tasks
Processing element refers to a computing
device that can execute a program on its own
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 16
What is being distributed?
Processing logic
Definition says processing logic or processing elements are
distributed
Function
Distribution is according to function.
Various functions of computer system could be delegated
to various pieces of hardware or software.
Data
Distribution is according to data
Data used by a number of applications may be distributed
to a number of processing sites
Control
Control can be distributed
Control of the execution of various tasks might be
distributed instead of being performed by one computer
system
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 17
What is a Distributed Database System?
A distributed database (DDB) is a collection of
multiple, logically interrelated databases
distributed over a computer network.
A distributed database management system (D–
DBMS) is the software that manages the DDB
and provides an access mechanism that makes
this distribution transparent to the users.
Distributed database system (DDBS) = DDB + D–DBMS
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 18
What is not a DDBS?
A timesharing computer system
A loosely (shared-disk) or tightly (shared-
memory) coupled multiprocessor system
A database system which resides at one of
the nodes of a network of computers - this is
a centralized database on a network node
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 19
Shared Memory Architecture
Processor Processor Processor
Unit Unit Unit
Memory
I/O System
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 20
Cont….
A multiprocessor system is generally
considered to be a system where two or more
processors share some form of memory,
either primary memory, in which case the
multiprocessor is called shared memory (also
called tightly coupled), or secondary memory,
when it is called shared disk (also called
loosely coupled).
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 21
Shared Memory Architecture
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 22
Shared Disk Architecture
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 23
Cont….
Shared everything and Shared nothing
Architecture
Shared everything permits each processor to
access everything (Primary and secondary
memories, and peripherals)
Sharing memory enable processors to
communicate without exchanging messages.
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 24
Shared-Nothing Architecture
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 25
Cont….
Shared nothing architecture is one where
each processor has its own primary and
secondary memory and peripherals, and
communicates with other processors over a
very high speed interconnect (bus or switch)
In this manner it is similar to distributed
environment but still there is a difference.
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 26
Centralized DBMS on a Network
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 27
Distributed DBMS Environment
It is obvious that the existence of a computer
network or a collection of “files” is not
sufficient to form a distributed databases
system.
What we are interested in is an environment
where data is distributed among number of
sites.
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 28
Distributed DBMS Environment
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 29
Implicit Assumptions
Data stored at a number of sites each site logically
consists of a single processor.
Processors at different sites are interconnected by
a computer network (no multiprocessors)
➠parallel database systems
Distributed database is a database, not a collection
of files data logically related as exhibited in the
users’ access patterns
➠relational data model
D-DBMS is a full-fledged DBMS
➠not remote file system, not a TP system
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 30
Applications
Manufacturing - especially multi-plant
manufacturing
Military command and control
Corporate MIS
Airlines
Hotel chains
Any organization which has a decentralized
organization structure
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 31
Distributed DBMS Promises
❶Transparent management of distributed,
fragmented, and replicated data
❷Improved reliability/availability through
distributed transactions
❸Improved performance
❹Easier and more economical system
expansion
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 32
Distributed DBMS Promises
Higher reliability
Replication of components
No single points of failure
e.g., a broken communication link or
processing element does not bring down the
entire system
Distributed transaction processing
guarantees the consistency of the database
and concurrency
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 33
Distributed DBMS Promises
Improved performance
Proximity of data to its points of use
Reduces remote access delays
Requires some support for fragmentation and replication
Parallelism in execution
Inter-query parallelism
Intra-query parallelism
Update and read-only queries influence the design
of DDBSs substantially
If mostly read-only access is required, as much as possible
of the data should be replicated
Writing becomes more complicated with replicated data
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 34
Distributed DBMS Promises
Easier system expansion
Issue is database scaling
Emergence of microprocessor and workstation
technologies
Network of workstations much cheaper than a
single mainframe computer
Data communication cost versus
telecommunication cost
Increasing database size
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 35
Transparency
Transparency is the separation of the higher-level semantics
of a system from the lower-level implementation issues
Transparent system hides the implementation details from
users
Ex. A firm that has offices at different locations
Boston, Paris, Montreal, Newyork
They run projects at each of these sites and would maintain
a database of their employees, projects and other related
data
EMP(ENO, ENAME, TITLE)
Proj (PNO,PNAME,BUDGET)
PAY(TITLE,SAL)
ASG(ENO,PNO,DUR,RESP)
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 36
Cont….
If it would have been centralized database
and we wanted to find out the names of
employees who worked on a project for more
than 12 months, then SQL query
SELECT ENAME, SAL
FROM EMP,ASG,PAY
WHERE ASG.DUR > 12
AND EMP.ENO=ASG.ENO
AND PAY.TITLE=EMP.TITLE
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 37
Example
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 38
Cont….
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 39
Cont….
We need to partition each of the relations and
store each partition at a different site.
This is known as fragmentation
It is also preferable to duplicate some of this
data at other sites for performance and
reliability reasons
Fragmented and replicated distributed database
User can pose the query without paying
attention to the fragmentation, location, or
replication of data, let the system worry about
resolving these issues.
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 40
Transparency
Fundamental issue is to provide
data independence
in the distributed environment
Network (distribution) transparency
Replication transparency
Fragmentation transparency
horizontal fragmentation: selection
vertical fragmentation: projection
hybrid
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 41
Data Independence
Fundamental form of transparency that we
look within a DBMS, distributed or centralized
It refers to the immunity of user applications
to changes in the definition and organization
of data and vice versa
Later, we will see that
Logical structure of data is called schema
definition
Physical structure of data is called physical
data description
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 42
Cont…
Two types of data independence
Logical data independence
Physical data independence
Logical data independence
Refers to the immunity of user application to changes in
the logical structure of the database.
If a user application operates on a subset of the attributes
of a relation, it should not be affected later when new
attributes are added to the same relation
EMP relation (new attributes added then no change to
application program)
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 43
Cont…
Physical data independence
Hiding details of the storage structure from user
applications.
The data might e stored on different disk types,
parts of it might be organized differently or might
even be distributed so application program should
not be concerned with physical data
independence
Application program need be changed.
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 44
Network Transparency
The second resource need to be managed in
distributed database environment is : The network
User should be protected from the operational details of
the network
Desirable to hide even the existence of the network
Users can view DDBS as centralized DBMS
This type of transparency is referred to as network
transparency or distributed transparency
User can access services or data
Users do not have to specify where data is located
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 45
Cont….
Distribution transparency
Location transparency
The command used to perform a task is independent of
both the system on which the data is stored and the
system on which the command is executed
Naming transparency
a unique name is provided for each object in the
database. The name does not have object location
associated with it.
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 46
Replication Transparency
In detail it will be discussed later
Just for the performance, reliability and availability
reasons, it is usually desirable to be able to
distribute data in a replicated fashion across the
machines on a network
Ex
Data required by one user can be placed on that user’s
local machine as well as on the machine of another user
also
It increases locality of reference
If one of the machine fails, a copy of the data is still available
on another machine on the network.
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 47
Cont…
Whether to replicate or not?
How many copies?
How update operations will work?
User should be aware or not?
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 48
Fragmentation
It is commonly desirable to divide each database
relation into smaller fragments and treat each
fragment as a separate database object (i.e.
another relation)
This is done for performance, availability and
reliability
Fragmentation can reduce the negative effect of
replication
Each replica is not a full relation but only a
subset of it; thus less space is required and
fewer data items need be managed
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 49
Cont…
Horizontal fragmentation
Relation is partitioned into a subset of the tuples (rows)
Vertical fragmentation
Relation is partitioned into a subset of the attributes
(columns)
When the database objects are fragmented, the
problem of handling user queries that were specified
on entire relations now to be performed on sub-
relations
Finding query processing strategy based on
fragmentation
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 50
Assignment 1
Explain file processing and DBMS
What do you mean by Distributed data Processing?
Explain shared memory architecture
Explain shared Disk architecture
Explain central database on a network
Explain DDBS environment
Give an example of Distributed application and
explain
Define Transparency and explain different types of
transparencies in DDBS.
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 51
Distributed Database - User View
User wants to see one database
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 52
Distributed DBMS - Reality
Programmer sees many databases
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 53
Who should provide transparency?
Application
The transparency features can be built into the user
language, which then translates the requested
services into required operations.
Applications or application modules are implemented
in a distributed fashion
Communication and data exchange via standard
protocols (RPC, CORBA, HTTP, . . . )
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 54
Who should provide transparency?
Operating system
provides some level of transparency to system users.
Example, the device drivers within the operating system
handle the details of getting each piece of peripheral
equipment to do what is requested.
The typical computer user, or even an application
programmer, does not normally write device drivers to
interact with individual peripheral equipment; that
operation is transparent to the user.
In the distributed environment, where the management
of the network resource is taken over by the distributed
operating system
Realizes network transparency, e.g., on file system
level (NFS) or protocol level
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 55
Who should provide transparency?
Database system
The third layer at which transparency can be supported
is within the DBMS.
It is the responsibility of the DBMS to make all the
necessary translations from the operating system to the
higher-level user interface
Transparent access to data at remote database
instances
Requires splitting queries, transaction control,
replication
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 56
Layers of Transparency
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 57
Reliability Through Distributed Transactions
Distributed DBMSs are intended to improve reliability
since they have replicated components and, thereby
eliminate single points of failure.
Compensating node failures by data copies (replicates)
on remote sites
Distributed transactions guarantee that
1. A sequence of database operations is executed as an
atomic action
2. A consistent database state is transformed to another
consistent database state, even if multiple transactions
are executed concurrently (concurrency transparency
& failure atomicity)
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 58
Reliability Through Distributed Transactions
Example
Assume that there is an application that updates the
salaries of all the employees by 10%.
It is desirable to encapsulate the query (or the program
code) that accomplishes this task within transaction
boundaries.
For example, if a system failure occurs half-way through
the execution of this program, we would like the DBMS to
be able to determine, upon recovery, where it left off and
continue with its operation (or start all over again). This is
the topic of failure atomicity.
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 59
Reliability Through Distributed Transactions
Alternatively, if some other user runs a query
calculating the average salaries of the employees in
this firm while the original update action is going on,
the calculated result will be in error.
Therefore we would like the system to be able to
synchronize the concurrent execution of these
two programs.
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 60
Improved Performance
Fragmenting the conceptual database in a
way that enables data to be stored in close
proximity to its points of use
reduction of transfer costs and delays
Inherent parallelism of distributed systems
Inter-query parallelism: execution of multiple
queries at the same time
Intra-query parallelism: parallel execution of sub
queries at different sites accessing a different part
of the distributed database
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 61
Cont….
Read-only vs. update access
Query database (for ad-hoc querying) and production
database (for updates by application programs)
Copying the production database to the query
database at regular time intervals
Read-only access during regular operating hours,
updates are batched and executed during off hours
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 62
Easier system expansion
Necessity of increasing database size and/or
decreasing query execution time
Expansion by adding additional storage and
processing power to the network
A system of smaller computers is often
cheaper than a single big machine with the
equivalent power
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 63
Complicating Factors
First, data may be replicated in a distributed
environment
The possible duplication of data items is mainly due to
reliability and efficiency considerations.
Consequently, the distributed database system is
responsible for
(1) choosing one of the stored copies of the requested
data for access in case of retrievals, and
(2) making sure that the effect of an update is reflected on
each and every copy of that data item.
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 64
Complicating Factors
Second, if some sites fail (e.g., by either hardware or
software malfunction), or
if some communication links fail (making some of the
sites unreachable) while an update is being executed,
the system must make sure that the effects will be
reflected on the data residing at the failing or
unreachable sites as soon as the system can recover
from the failure.
The third point is the synchronization of transactions on
multiple sites is considerably harder than for a centralized
system.
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 65
Complicating Factors
These difficulties point to a number of potential problems
with distributed DBMSs.
Complexity
DDBS are complex compare to centralized DBMS
Cost
DDBS requires additional hardware, communication mechanism
etc which increases the cost.
Software is also complicated and required for DDBS
To maintain DDBS an increase in the personnel at different sites
Distribution of Control
Distribution creates the problem of synchronization and
coordination that must be handled and policies to be defined
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 66
Complicating Factors
Security
In centralized DBMS this was the major benefit
How to maintain security in DDBS is the major factor, also
network security is a major factor.
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 67
Design Issues/ Problem Areas
There are number of technical problems that
need to be resolved to realize the full
potential of DDBMS
Distributed database design
how to distribute the database?
How to fragment the data?
Partitioned data vs. replicated data?
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 68
Design Issues/ Problem Areas
Distributed query processing
Design algorithms that analyze queries and
convert them into a series of data manipulation
operations
Executing a query over the network in the most
cost-effective way
Distribution of data, communication costs, etc. has
to be considered
Find optimal query plans
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 69
Design Issues/ Problem Areas
Distributed directory management
A directory contains information (such as
descriptions and locations) about data items in the
database.
A directory may be
global to the entire DDBS or local to each site;
it can be centralized at one site or distributed over
several sites;
there can be a single copy or multiple copies.
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 70
Design Issues/ Problem Areas
Distributed concurrency control
Synchronization of concurrent accesses such that
the integrity of the DB is maintained
Integrity of multiple copies of (parts of) the DB
have to be considered (mutual consistency)
Synchronizing access such that integrity is
maintained
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 71
Design Issues/ Problem Areas
Distributed deadlock management
Deadlock management: prevention, avoidance,
detection/recovery
Reliability of distributed DBMS
Ensure consistency, detect failures, and recover
from failures
How to make the system resilient to failures
Atomicity and durability
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 72
Design Issues/ Problem Areas
Operating System Support
operating system with proper support for database
operations
dichotomy between general purpose processing
requirements and database processing requirements
Heterogeneous databases
If there is no homogeneity among the DBs at various
sites either in terms of the way data is logically
structured (data model) or in terms of the access
mechanisms (language), it becomes necessary to
provide translation mechanisms
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 73
Relationship Between Issues
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 74
Conclusion
A distributed database (DDB) is a collection of
multiple, logically interrelated databases distributed
over a computer network
Data stored at a number of sites, the sites are
connected by a network. DDB supports the
relational model. DDB is not a remote file system
Transparent system ‘hides’ the implementation
details from the users
Distribution transparency
Network transparency
Transaction transparency
Performance transparency
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 75
Conclusion
Programming a distributed database involves:
Distributed database design
Distributed query processing
Distributed directory management
Distributed concurrency control
Distributed deadlock management
Reliability
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 76
Assignment 2
Explain Layers of Transparency
Discuss in detail the problem areas in DDBS
environment
How do you explain the improved performance
in DDBS environment?
Discuss the complicating factors in DDBS
environment.
Give the advantages and disadvantages of
DDBS.
Prof. A.R. Vasant, V.V.P. Engineering College,
11/30/2012 Rajkot 77
Get documents about "