; File Structures and Indexing
Learning Center
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

File Structures and Indexing


  • pg 1
									Memory Hierarchy and File Structures

 Physical Storage
 Buffer Management
 Organization of Records in Pages
 Heap File
 Sequential File
 Clustered Files
 Data Dictionary

                          Storage Hierarchy

Typical storage hierarchy:
    Main memory (RAM) for currently used data.
    Disk for the main database (secondary storage).
    Tapes for archiving older versions of the data (tertiary storage).
Why Not Store Everything in Main Memory?
Costs too much.
Main memory is volatile. We want data to be saved between runs.

Secondary storage device of choice.

Main advantage over tapes: random access vs. sequential.

Data is stored and retrieved in units called disk blocks or pages (typically a
   block is 1K-16Kbytes)

This has major implications for DBMS design!
    READ: transfer data from disk to main memory (RAM).
    WRITE: transfer data from RAM to disk.
    Both are high-cost operations, relative to in-memory operations, so must be
       planned carefully!

Unlike RAM, time to retrieve a disk page varies depending upon location
   on disk.
    Therefore, relative placement of pages on disk has major impact on DBMS

                    Components of a Disk
                                   Disk head

v   The platters spin.
v The arm assembly is                                            Sector

moved in or out to position
a head on a desired track.
Tracks under heads make
a cylinder (imaginary!).                                    Platters
                                      Arm movement
v Only one head
reads/writes at any
one time.
                          Arm assembly
vBlock size is a multiple
of sector size (which is fixed).

                Accessing a Disk Page

Time to access (read/write) a disk block:
    seek time (moving arms to position disk head on track)
    rotational delay (waiting for block to rotate under head)
    transfer time (actually moving data to/from disk surface)
Seek time and rotational delay dominate.
    Seek time varies from about 1 to 20msec
    Rotational delay varies from 0 to 10msec
    Transfer rate is about 1msec per 4KB page
Sequential I/O is much faster than random.
Key to lower I/O cost: reduce seek/rotation delays! Hardware vs.
  software solutions?

                      Storage Access

A database file is partitioned into fixed-length storage units called
   blocks (or pages). Blocks are units of both storage allocation
   and data transfer.
Database system seeks to minimize the number of block transfers
  between the disk and memory. We can reduce the number of
  disk accesses by keeping as many blocks as possible in main
Buffer – portion of main memory available to store copies of disk
Buffer manager – subsystem responsible for allocating buffer
  space in main memory.

                         Buffer Manager
Programs call on the buffer manager when they need a block from
   If the block is already in the buffer, the requesting program is given the
        address of the block in main memory
   If the block is not in the buffer,
        the buffer manager allocates space in the buffer for the block,
           replacing (throwing out) some other block, if required, to make
           space for the new block.
       The block that is thrown out is written back to disk only if it was
          modified since the most recent time that it was written to/fetched
          from the disk.
       Once space is allocated in the buffer, the buffer manager reads the
         block from the disk to the buffer, and passes the address of the
         block in main memory to requester.

           Buffer-Replacement Policies
                                                    relation r   relation s
Most operating systems replace the block least
                                                       1            1
recently used (LRU strategy)
LRU can be a bad strategy for certain access           2            2
patterns involving repeated scans of data, e.g.
when computing the join of 2 relations r and s by
a nested loops
Best for this example: most recently used (MRU)        n
strategy – replace the most recently used block.                   10

The DBMS usually has its own buffer manager
that uses statistical information regarding the
probability that a request will reference a
particular relation

                       File Organization

The database is stored as a collection of files. Each file is a
  sequence of records. A record is a sequence of fields.
Most common approach:
    assume record size is fixed
    each file has records of one particular type only
    different files are used for different relations
    This case is easiest to implement; will consider variable length records

                       Fixed-Length Records
Simple approach:
    In each page/block, store record i starting from byte n  (i – 1), where n
       is the size of each record.
    Record access is simple but records may cross blocks. Normally, do
      not allow records to cross block boundaries (there is some empty
      space at the end of the page)
Deletion of record I:
    Shift up subsequent records
    i + 1, . . ., n
Moving records inside a page
    not good when records are
    pointed by:
    1] other records (foreign keys)
    2] index entries

          Fixed-Length Records - Free Lists
Do not move records in page. Store the address of the first deleted record in the file
Use this first record to store the address of the second deleted record, and so on
Can think of these stored addresses as pointers since they ―point‖ to the location of
  a record.
More space efficient representation: reuse space for normal attributes of free
  records to store pointers. (No pointers stored in in-use records.)

                   Variable-Length Records
                  Byte String Representation
Variable-length records arise in database systems in several ways:
    Storage of multiple record types in a file.
    Record types that allow variable lengths for one or more fields.
    Record types that allow repeating fields (used in some older data
Simple (but bad) solution: Byte string representation
    Attach an end-of-record () control character to the end of each record
    Difficulty with deletion (fragmentation of free space)
    Difficulty with growth (movement of records is difficult).

           Variable-Length Records
               Reserved space

Reserved space – can use fixed-length records of a known
  maximum length; unused space in shorter records filled
  with a null or end-of-record symbol.

             Variable-Length Records
                  Pointer Method
Useful for certain types of records with repeating
  attributes. Two kinds of block in file:
    Anchor block – contains the first records of chain
    Overflow block – contains records other than those that are
      the first records of chairs.

             Variable-Length Records
              Slotted Page Structure

Slotted page header contains:
    number of record entries
    end of free space in the block
    location and size of each record
Records can be moved around within a page to keep them
  contiguous with no empty space between them; entry in the
  header must be updated.
Pointers do not point directly to record — instead they point to the
   entry for the record in header.

       Organization of Records in Files

Heap – a record can be placed anywhere in the file where there is
Sequential – store records in sequential order, based on the value
  of the search key of each record
Hashing – a hash function computed on some attribute of each
  record; the result specifies in which block of the file the record
  should be placed

              Unordered (Heap) Files

Simplest file structure contains records in no particular order.
As file grows and shrinks, disk pages are allocated and de-
To support record level operations, we must:
    keep track of the pages in a file
    keep track of free space in pages
    keep track of the records in a page
There are many alternatives for keeping track of this.

      Heap File Using a Page Directory
              Header                           Page 1
                                               Page 2

                       DIRECTORY               Page N

The entry for a page can include the number of free bytes on the
The directory is a collection of pages; linked list implementation is
  just one alternative.
    Much smaller than linked list of all HF pages!

     Sequential File Organization
Suitable for applications that require sequential
   processing of the entire file
The records in the file are ordered by a search-key (not
   the same concept a key in the relational model)

     Sequential File Organization (Cont.)
Deletion – use pointer chains
Insertion –locate the position where the record is to be inserted
    if there is free space insert there
    if no free space, insert the record in an overflow block
    In either case, pointer chain must be updated
Need to reorganize the file
   from time to time to restore
   sequential order

         Hashing as a file organization

Assume 100,000 employee records – we can put 100 records per
  page. Therefore, we need 1,000 pages.
Lets say that we want to organize the file so that we can efficiently
   answer equality selections on salary e.g., "find all employee
   records whose salary is 15,000".
We allocate 1,200 buckets (pages) so there is some space for
  future insertions.
The hash function will have the form h(salary)=(a*salary+b) modulo

         Hashing as a file organization (cont)

         bucket 1

         bucket 2
                                                      hash function

     overflow page
                      bucket for salary 15,000

        bucket 1200

When we insert a new record we compute the hash function of the salary and insert
  the record in the appropriate bucket. If the bucket is full we create an overflow
  page/bucket and insert the new record there.

    Hashing as a file organization (cont)

This organization can efficiently answer queries of the form: "find all
    employees whose salary is 15,000".
In order to answer this query we compute the hash value of 15,000
    and then search only in the corresponding bucket.
A bucket may contain records of employees with different salaries
    that produce the same hash value (e.g., employees with salary
    30,000). This is not a problem because since we read the page,
    we can check all its records and select only the ones that satisfy
    the query condition.
If there are no overflow buckets, answering the query requires just a
    single read.
If there are overflow buckets, we have to read all of them.
Hashing is not good for range search ("find all employees with
    salaries between 15,000 and 16,000") because the records of
    these employees maybe distributed in totally different buckets.

                   Simplistic Analysis
We ignore CPU costs, for simplicity:
    B: Is the number of data pages in the file
    Measuring number of page I/O’s ignores gains of pre-fetching blocks of
      pages; thus, even I/O cost is only approximated.
    Average-case analysis; based on several simplistic assumptions:
Single record insert and delete.
Heap Files:
    Equality selection on key; exactly one match.
    Insert always at end of file.
Sorted Files:
    Files compacted after deletions.
    Selections on sort field(s).
Hashed Files:
    No overflow buckets, 80% page occupancy.

                Cost of Operations

                    Heap          Sorted         Hashed
                    File          File           File
Scan all recs       B             B              1.25 B
Equality Search     0.5 B         log2B          1
Range Search        B             log2B + # of 1.25 B
                                  pages with
Insert              2             Search + B 2
Delete             Search + 1 Search + B 2
 * Several assumptions underlie these (rough) estimates!

              Data Dictionary Storage
Data dictionary (also called system catalog) stores metadata:
that is, data about data, such as
  Information about relations
      names of relations
      names and types of attributes of each relation
      names and definitions of views
      integrity constraints
  User and accounting information, including passwords
  Statistical and descriptive data
      number of tuples in each relation
  Physical file organization information
      How relation is stored (sequential/hash/…)
      Physical location of relation
          operating system file name or
          disk addresses of blocks containing records of the relation
  Information about indices

        Data Dictionary Storage (Cont.)

Catalog structure: can use either
    specialized data structures designed for efficient access
    a set of relations, with existing system features used to ensure efficient
    The latter alternative is usually preferred
A possible catalog representation:
  Relation-metadata = (relation-name, number-of-attributes,
                        storage-organization, location)
  Attribute-metadata = (attribute-name, relation-name, domain-type,
                               position, length)
  User-metadata = (user-name, encrypted-password, group)
  Index-metadata = (index-name, relation-name, index-type,
  View-metadata = (view-name, definition)


To top