Document Sample
Database Powered By Docstoc
					IVE – Databaase                                                Chapter 8 – Physical Database Design

Chapter 8 – Physical Database Design

1. Topics

   Physical Design Considerations
   Performance Issues, Indexing
   File Organizations

2. Physical Design Considerations

    a) Physical Design
    It is a stage of database design that maps the data storage and access characteristics
    of the database. It produces a description of the implementation of the database on
    secondary storage. It also describes the storage structures and access methods used
    to achieve efficient access to the data.

    The logical phase of database design is DBMS-independent (although it is
    developed in according with the particular data model, such as relational,
    hierarchical or network). In developing physical database design, we must first
    identify the target database system. Therefore, physical design is tailored to a
    specific DBMS system. There is feedback between physical and logical design,
    because decisions are taken during physical design for improving performance that
    may affect the structure of the logical data model.

    b) Simplified Diagram on Physical and Logical Design

              External Schema

                                                            Logical /conceptual database

              Conceptual Schema

              Internal Schema                                   Physical database design

              Physical Storage

    For relational model, the main aim of physical database design involves:
        derive a set of relational tables and the constraints on these tables from the
         information presented in the global logical data model

File: DB:\CHAP08.DOC                                                                      Page: 1
IVE – Databaase                                                Chapter 8 – Physical Database Design

           identify the specific storage structures and access methods for the data to
            achieve an optimum performance for the database system
           designing security protection for the system.

    c) Considerations for the physical database design:
       Response time
         Elapsed time between submitting a database transaction for execution and
             receiving a response.
         A major influence on response time under the DBMS control is the
             database access time for data items referenced by the transaction.
         Also influenced by factors not under DBMS control: system load,
             operating system scheduling, or communication delays

           Space utilization
             Amount of storage space used by the database files and their access path
                structures on disk, including indexes and other access paths.

           Transaction throughput
             This is the average number of transactions that can be processed per
             It is a critical parameter of transaction systems such as those used for
                airline reservations or banking
             Transaction throughput must be measured under peak conditions on the

    d) Database Tuning
       After a database is deployed and is in operation, actual use of the applications,
       transactions, queries and views reveals factors and problem area that may not
       have been accounted for during the physical design. The inputs to physical
       design will need to be reviewed and revised as necessary. The goals of tuning
           To make application run faster
           To lower the response time of queries / transactions
           To improve the overall throughput of transactions

              Through the statistics of the actual database transactions, we can tune the
              database by dealing with the following:
             how to avoid excessive lock contention, thereby increasing concurrency
              among transactions
             how to minimize overhead of logging and unnecessary dumping of data
             how to optimize buffer size and scheduling of processes
             how to allocate resources such as disks, RAM, and processes for most
              efficient utilization

2. Performance Issues, Indexing

a) Performance Issues
      The goal is to come up an appropriate structuring of data in storage that
         guarantees good performance.
      For a given conceptual schema, there are many physical design alternatives

File: DB:\CHAP08.DOC                                                                      Page: 2
IVE – Databaase                                                Chapter 8 – Physical Database Design

            Not possible to make meaningful physical design decisions and
             performances analyses until we know the queries, transactions, and
             applications that are expected to run on the database

b) Factors considering database performance:

      I. Analyse the database queries and transactions
         a. For each query, we should specify :
               The files that will be accessed by the query
               The attributes on which any selection conditions for the query are
               The attributes on which any join conditions or conditions to link
                   multiple tables or objects for the query are specified
               The attributes whose values will be retrieved by the query
         b. for updated transaction or operation:
               the files that will be updated
               the type of operation on each file (insert / update / delete)
               the attributes on which selection conditions for a delete or update are
               the attributes whose values will be changed by an update operation

      II. Analyzing the expected frequency of invocation of queries and
         a. Consider the expected rates of invocations of the queries and transactions
         b. This frequency information together with the attribute information
             collected on each query and transaction is used to compile a cumulative list
             of expected frequency of use for all queries and transactions
         c. Use “80-20” rule which means that normally 80% of process is accounted
             for by only 20% of queries and transactions. (ie we need to determine
             those queries or transactions that have rates of 20% or above)

      III. Analyzing the Time Constraints of Queries and Transactions
          a. Some queries and transactions may have stringent performance
          b. For example, a transaction need to complete within 5 seconds on 95% of
              the occasions, and should never take more than 20 seconds, such as on line
              sales bookings etc.
          c. Such performance constraints become higher priority candidates for
              primary access structures

      IV. Analyse the expected frequencies of update operations
        a. A minimum number of access path should be specified for a file that is
            updated frequently, because updating the access path themselves slows
            down the update operation.

      V. Analyse the uniqueness constraints on attributes
        a. Access paths should be specified on all candidate key attributes – or sets of
            attributes – that are either primary key or constrained to be unique.
        b. The existence of an index makes it sufficient to search only the index when
            checking this constraint, since all values of the attributes will exist in the
            leaf nodes of the index.

File: DB:\CHAP08.DOC                                                                      Page: 3
IVE – Databaase                                                    Chapter 8 – Physical Database Design

c) Design Decisions on Indexing
   The performance of queries largely depends upon what indexes or hashing schemes
   exists to expedite the processing of selections and joins. On the other hand, during
   insert, delete or update operations, existence of indexes adds to the overhead. This
   overhead must be justified in terms of the gain in efficiency by expediting queries
   and transactions.

    I. Whether to index an attribute
       a. The attribute must be a key or there must be some query that uses attribute
          either in a selection condition or in a join. One factor in favor of setting up
          many indexes is that some queries can be processed by just scanning the
          indexes without retrieving any data

    II. What attribute or attributes to index on
        a. An index can be constructed on one or multiple attributes
        b. If multiple attributes from one relation are involved together in several
           queries, a multiattribute index is warranted.

    III. whether to set up a clustered index
         a. at most on indx per table can be a primary or clustering index, because this
            implies that the file is physically ordered on that attribute (if the attribute is a
            key, a primary index is created, whereas a clustering index is created if the
            attribute is not a key) If a table requires several indexes, the decision about
            which one should be clusted index depends upon whether keeping the table
            ordered on that attribute is needed. Range queries benefit a great deal fromj
            clustering. If several attribute requires range queries, relative benefits must
            be evaluated before deciding which attribute to cluster on.

    IV. Whether to use a hash index over a tree index
        a. In general RDBMSs use B+-trees for indexing.
        b. Some sytems also provide ISAM and hash indexes
        c. B+-trees support both equality and range queries on the attribute used as the
           search key
        d. Hash indexes work well with equality conditions, particularly during joins
           to find a matching record(s).

    V. whether to use dynamic hashing for the file
       a. for files that are very volatile (those that grow and shrink continuously).

    VI. Denormalisation to speed up queries
        a. The ultimate goal during normalization was to separate the logically related
           attributes into tables to minimize redundancy, and thereby avoid update
        b. The above ideals are sometimes sacrificed in favour of faster execution of
           frequently occurring queries and transactions. This process of storing the
           logical database design (BCNF form) in a weaker form, say 2NF or 1NF is
           called denormalization

File: DB:\CHAP08.DOC                                                                          Page: 4