Chapter 5 Distributed Database Design_ p. 102

Document Sample
Chapter 5 Distributed Database Design_ p. 102 Powered By Docstoc
					ICS 611                                        Spring Semester, 2008                                L Gottschalk

                                Distributed Data Base Systems1
4.1 DBMS Standardization, page 76
All reference models have to address DBMS
- components
- funcitions
- data.

The assumption about data is that there are levels of schemas, and that the data dictionary provides the
translation between the levels of schema:
- external view
- internal view (mapping to files)

4.2 Architectural models for Distributed DBMSs, page 82
Distributed DBMSs can be categorized by how distributed they are:

Figure 4.3, page 82 shows a way of looking at the categories:
1) how distributed
2) how much local databases are autonomous
3) heterogeneity

4.2.1 Autonomy, p. 82
0) Tight integration: there is a single-image of the entire database, and it is available to all users.

1) Semi-autonomous systems: each db can and does operate independently; but there is a federated
view also. The federated view may not have all the data from all the independent db’s.

2) Total-isolation: each operates fully independently. In this mode, transaction coordination has to be
done by the application, or by a transaction monitor (such as CICS or Microsoft’s transaction monitor).

4.2.2 Distribution, p. 84
0) no distribution
1) client/server distribution
2) peer-to-peer distribution, also called fully distributed. This is the main focus of this book.

4.2.3 Heterogeneity, p. 84
0) homogeneous
1) Multiple types of DBMS’s; multiple query tools; different transaction management protocols

    Principles of Distributed Database Systems, second edition, Ozsu and Valduriez

Page 1
ICS 611                                   Spring Semester, 2008                                 L Gottschalk

4.2.4 Architectural alternatives, p. 84
Not all the boxes in the chart are meaningful.

There are two boxes that are highlighted for our study:

A0, D2, H0 tight integration, peer-to-peer distributed, homogeneous
  “distributed DBMS”

A2, D2, H1 total-isolation, peer-to-peer distributed, heterogeneous.
 “distributed MDBS”

We will consider distributed DBMS A0,D2,H0.

As for A2,D2,H1, the problems of MDBS can be investigated without reference to the added problems of
distribution. Therefore, we will consider MDBS, and not distributed MDBS.

4.3 Distributed DBMS architecture, p. 87

4.3.1 Client/Server Systems, p. 88
(Ax, D1, Hy)
There is no difference in the functionality offered to clients from what peer-to-peer offers.

The difference is the architectural paradigm.

The client DBMS passes SQL queries to the server without trying to understand or optimize them.

If there are multiple database servers, then the coordination between them can be done at the client, or
can be done by the “home server” for that client.

Page 2
ICS 611                                   Spring Semester, 2008                               L Gottschalk

4.3.2 Peer-to-peer systems, p. 90
(A0, D2, H0)
Because physical data organization might be different on each machine, there needs to be a local
internal schema (LIS) on each machine.

The enterprise view is the global conceptual schema (GCS).

To handle fragmentation and replication, there also needs to be a local conceptual schema (LCS) on each
machine. The GCS is the union of the LCS’s.

User applications see and reference an external schema, which is above the GCS.

Figure 4.5, page 90 shows how these relate.

Figure 4.6, page 92 shows the various DBAs that participate in the management of the databases.

Figure 4.7, page 93 shows the various components that we will study. The user data processor at the
top has the semantic data controller will be discussed in chapter 6. The global query optimizer and
decomposer will be discussed in chapters 7 thru 9.

The data processor (at the bottom of the figure) has the local query optimizer which selects the access
path (touched on in chapter 9). It also has the local recovery manager (chap 12).

4.3.3 MDBS architecture, p. 94
(A2, Dx, Hy)

In distributed DBMSs, described above, the GCS is the union of the local databases.

In MDBS, the it is the collection of SOME of the local databases that each DBMS want to share.

Multi-DBMSs: mapping is from local conceptual schemas to a global schema
This is because in multi-DBMSs, design is bottom up.

Logically integrated distributed DBMSs: mapping is from the global schema to the local conceptual
In distributed DBMSs, it is top-down.

For MDBSs, there are two alternatives:
1) unilingual: users might have to use different tools at the local database versus global database.

2) multilingual: any tools at the local database also work on the global schema.

Also, it is worth noting that there is some controversy about whether a Multi-DBMS should have a global
schema at all. There are many implementations that do not. There is a large variety of techniques used
to connect applications to the various DBMSs in the multi-DBMS system. Most techniques are a variety
of middle-ware. See figure 4.10

Page 3
ICS 611                                   Spring Semester, 2008                                 L Gottschalk

4.4 Global Directory Issues, p. 97
This section is only relevant for systems that have a global conceptual schema.

The global directory has to know about the location and makeup of fragments.

The global directory might be centralized, or itself distributed.
For multi-DBMSs that have a global directory, it is always the same as the MDBMS: centralized or
localized just as the individual DBMSs are.

Distributed global diretories can not only be fragmented (distributed), the fragments can be replicated.

4.5 Conclusion and Summary, p. 100
The rest of this book is mostly devoted to logically integrated distributed databases, not to
multidatabase systems. Chapter 15 deals with MDBMSs.

Page 4
ICS 611                                  Spring Semester, 2008                               L Gottschalk

Chapter 5: Distributed Database Design, p. 102
We ignore distribution of DBMS software: the assumption is that there is a DBMS system co-located
with every fragment.
We also ignore distribution of applications.
We ignore the network design, because we assume that is in place and we have no control over it.

Therefore, we focus on the distribution of the data:
- Fragmentation, and
- Allocation.

There are three dimension along which to analyze distributed data:
(see figure 5.1)
 - level of sharing
 - behavior of access patterns
 - level of knowledge on access pattern behavior.

In data sharing, a program can access data. In data+program sharing, a program can access another
program which can access the data.

On the access patterns, static systems no longer exist. All access patterns change over time.

On level of knowledge, designers usually have either partial or complete knowledge of (future) user

In section 5.1, we will introduce top-down and bottom-up design.
In section 5.2, we discuss issues.
In section 5.3 and 5.4 we discuss top-down
Much later, in chapter 15, we discuss bottom-up.

5.1 Alternative Design Strategies, pl 104
In practice, top-down and bottom-up are used together.

5.1.1 Top-Down Design Process, p. 104
See figure 5.2
Rather than distributing relations, it s common to divide relations into fragments and then allocate.
So the process is fragmentation, then allocation.

Top-down works best when a database is being designed from scratch. However, most often individual
databases already exist.

Page 5
ICS 611                                    Spring Semester, 2008                                 L Gottschalk

5.1.2 Bottom-up design, page 106
Bottom up works when there are existing databases. The starting point is the individual local conceptual

When there are existing databases, then are usually in different types of DBMSs. Therefore this is the
heterogeneous MDBS problem, and discussion is deferred until chapter 15.

5.2 Distribution Design Issues, p. 107
- Why fragment at all? Section 5.2.1
- How do fragment? Section 5.2.2
- How much fragmentation? Section 5.2.3
- How to test correctness of our fragmentation plan? Section 5.2.4
- How to allocate the fragments? Section 5.2.5
- What do we need to know to fragment and allocate? Section 5.2.6

5.2.1 Reasons for fragmentation, p107
Fragmented a database with relations (i.e., tables) as the unit makes no sense:
- distributed applications will want access to the same table, so either you either
      replicate it everywhere needed (difficulty executing updates), or
      keep it central (high traffic for remote access).
Both are bad choices.

If a relation (table) is fragmented, this allows intraquery concurrency (see chapters 8 and 9).
      But, applications that need to access more than one fragment of a table may suffer performance

Another problem with fragmenting a table is that dependency checking is more costlyh. This is covered
in Chapter 6 under Semantic Data Control.

5.2.2 Fragmentation Alternatives, page 108
Relation == table

You can divide horizontally, and vertically.

In real life, fragmentations can be nested, and may be of different types, resulting in hybrid

5.2.3 Degree of Fragmentation, page 110

What’s the right amount? Answered in section 5.3.

Page 6
ICS 611                                   Spring Semester, 2008                                  L Gottschalk

5.2.4 Correctness Rules of Fragmentation, page 110
These rules must not be broken by fragmentation:
1) Completeness: Each data item that can be found in Relation R can also be found in one or more of
Ri’s. This is also called “lossless decomposition”.

2) Reconstruction: You must be able to reconstruct the defragmented database.

3) For horizontal decomposition, any data item is not in more than one fragment.
For vertical decomposition, the same rule, but primary key values will be repeated in each fragment.

5.2.5 Allocation Alternatives, page 111
If there is no replication of fragments, then the database is called a “partitioned database”.

Reasons to replicate are:
- reliability, and
- efficiency of read-only queries.

The allocation algorithm must consider the ration of updates (update/insert/delete) to read-only

A database may be:
Not replicated (simply partitioned), or
Fully replicated (every fragment is at every site), or
Partially replicated (every fragment may be at multiple sites, but not necessarily at every site).

5.2.6 Information Requirements, page 111
For fragmentation:
- database information, and
- application information.

For allocation:
- communication network information, and
- computer system information.

Page 7

Shared By: