Distributed Databases

Document Sample
Distributed Databases Powered By Docstoc
					Distributed Databases

        Dr. Lee 
    By Alex Genadinik
  Distributed Databases? What is
• Distributed Database - a collection of
  multiple logically interrelated databases
  distributed over a computer network
• Because the database is distributed,
  different users can access it without
  interfering with one another.

• However, the DBMS must periodically
  synchronize the scattered databases to
  make sure that they all have consistent
Visual Representation
  More Detailed List of Benefits
• No centralized point of failure (data is not
• Local autonomy
• Ability to distribute data over multiple
  storage drives (no supercomputers)
• Replication of Data for Disaster Recovery
  and High Availability
  Closer look at the drawbacks
• Increased complexity of database design,
  hardware and other software
• Gives rise to absolute need of complicated
  security software and procedures
• Requires resolution for concurrent
  operation as well as having data integrity
       System Transparency
• Location Transparency – A command works the
  same no matter where in the system it is issued
• Naming Transparency – We can refer to data by
  the same name, from anywhere in the system,
  with no further specification.
• Replication Transparency – Hides multiple
  copies of data from user
• Fragmentation Transparency – Hide the fact that
  data is fragmented (ie, different sections of
  correlated data may be in different locations)
Architecture, Visually
More of Conceptual View
          2 Basic Patterns
• Horizontal – Store    • Vertical – Store
  Whole Tuples on         Different Fields of the
  Different machines.     same tuples on
                          Different machines.
          Horizontal pattern
• Entire tuples are on different machines
 This is nice because we can use standard
  relational algebra statements to define a
  restriction on a relation that creates these:

s”new york” (City)
s “chicago” (City)
            Vertical pattern
• Store Different Fields of the same tuples
  on Different machines
Use Projection Op to declare these:
 P (Acct #, Branch, Client Name Account)
 P (Acct #, Balance Account)
(requires redundant storage of at least one
  primary key per tuple)
Few Comments Before Moving On
• Data is completely dispersed
• Data is replicated (helps in case of
• There is no global directory
• Local-Master Directory
• Each node has its own catalog of data
• Each node has a directory to all of its data
  that is replicated elsewhere.
• Each database in a distributed database is
  distinct from all other databases in the
  system and has its own global database
          Name Resolution
• Every data object in every schema in
  every database has a unique identifying
• SELECT * FROM “Some Remote
  Database with a unique name” WHERE
    Remote and Distributed SQL
• Remote update – modification of data in
  one or more tables (all tables located on
  the same remote node).

• Remote query - retrieves information from
  two or more nodes.
          Case Study
One may think distributed databases are
required in large corporations that have
large databases. This is not true.

Sometimes even in a single office, with
only two cubicles that have two computers
you may need to have your database on a
network i.e., distributed.
       Case Study cont..
If the two users needed to use the
company’s database and make changes
to some data, they needed to have the
database centralized somewhere.

They could not make changes to the
database because the other person
wouldn’t be able to see them and would be
working with an outdated database.
If you are not running a simple database
that is local to only your workstation, you
need to be using a database that is on
some server i.e., a distributed database.
         Conclusion cont..
Thank you everyone for your
~ Alex