th kbveera exhashing

Document Sample
th kbveera exhashing Powered By Docstoc
					                 External Hashing
   We have seen how hashing algorithms can be
    applied in main memory. To extend these algorithms
    to disk storage for database indexes, we should note
    that:
     – Internal hashing - The hash function returns a
       memory address to a particular array location.
     – External hashing - The hash function returns a
       bucket address (block on disk) that stores block
       addresses of all records that map to the particular
       bucket.
   Hashing algorithms that work especially well for
    secondary storage are bucket based algorithms
    because each bucket can map to a disk block.
     – However, we still must be able to handle overflow
       within a block. This can be done by using chaining
                  External Hashing
Hash file has relative bucket numbers 0 through N-1.
Map logical bucket numbers to physical disk block
addresses.
Disk blocks are buckets that hold several data records each.
External Hashing
   Example
           External Hash Table
            - 5 buckets
            - 2 records per
            bucket
            - use overflow
            blocks
            - f(x) = x % 5
  External Hashing
Example: Insertion
             Insert: 1,5,3,6,4,24
                 External Hashing
     Example: Insertion with Overflow

   Insert: 11
                External Hashing
          Example: Deletion
   Delete: 4
                External Hashing
        Example: Deletion with
             Overflow
   Delete: 6
                   External Hashing
                   Dynamic Hashing
   In static hashing, the hash function maps keys to a fixed
    set of bucket addresses.
     – If initial number of buckets is too small, performance
        will degrade due to too many overflows. If file size is
        made larger to accommodate future needs, a
        significant amount of space will be wasted initially.
   These problems can be avoided by using techniques that
    allow and the number of buckets to be modified
    dynamically.
   In dynamic or extendible hashing the hash file size can
    grow and shrink “on the fly” in response to the size of the
    data.
   We will look at two different techniques:
     – Extendible hashing
     – Linear hashing
                      External Hashing
                       Extendible Hashing
Basic Idea:
    Maintain a directory of bucket addresses instead of just hashing to
    buckets directly. (indirection)
   The directory can grow, but its size is always a power of 2.
   At any time, the directory consists of d levels, and a directory of depth
    d has 2d bucket pointers.
     – However, not every directory entry (bucket pointer) has to point to a
        unique bucket. More than one directory entry can point to the same
        one.
     – Each bucket has a local depth d’ that indicates how many of the d
        bits of the hash value are actually used to indicate membership in
        the bucket.
   The depth d of the directory is based on the # of bits we use from each
    hash value.
     – The hash function produces an output integer which can be treated
        as a sequence of k bits. We use the first d bits in the hash value
        produced to look up an entry in the directory. The directory entry
        points us to the block that contains records whose keys hash to that
        value.
             External Hashing



 Example:
 Assume each
  hashed key is a
  sequence of four
  binary digits.
 Store values 0001,
  1001, 1100.
               External Hashing
    Extendible Hashing: Insertion
   Insert 0100.
               External Hashing
    Extendible Hashing: Insertion
   Insert 1111.
             External Hashing
    Extendible Hashing: Insertion
   Directory grows
    one level.
               External Hashing
    Extendible Hashing: Insertion
   Insert 1101.
             External Hashing
    Extendible Hashing: Insertion
   Directory grows
    one level.
              External Hashing
    Extendible Hashing: Deletion
   Delete 1111.
              External Hashing
    Extendible Hashing: Deletion
   Merge blocks and
    shrink directory.
                    External Hashing
                   Linear Hashing
Linear hashing allows a hash file to expand and shrink
    dynamically without the need of a directory.
     – Thus, directory issues like extendible hashing are not
       present.
A linear hash table starts with 2d buckets where d is the # of bits
    used from the hash value to determine bucket membership.
     – The size of the table will grow gradually, but not double in
       size.
The growth of the hash table can either be triggered:
1) Every time there is a bucket overflow.
2) When the load factor of the hash table reaches a given point.
We will examine the second growth method.
     – Since overflows may not always trigger growth, note that
       each bucket may use overflow blocks.
                    External Hashing
               Linear Hashing: Load Factor
The load factor lf of the hash table is the number of records stored
  divided by the number of possible storage locations.
  The initial number of blocks n is a power of 2.
   – As the table grows, it may not always be a power of 2.
  The number of storage locations s = #blocks X #records/block.
  The initial number of records in the table r is 0 and is increased
  as records are added.
  Load factor = r / s = r / n * #records/block
We will use the load factor to determine when to grow or shrink the
  hash table.
  Common parameters are:
   – grow when lf >= 90% and shrink when lf <= 75%.
 We will use the value of 85% as the threshold for growth.
             External Hashing
     Linear Hashing: Example
Example:
 Assume each
  hashed key is a
  sequence of four
  binary digits.
 Store values 0000,
  1010, 1111.
               External Hashing
       Linear Hashing: Insertion
   Insert 0101.
              External Hashing
       Linear Hashing: Insertion
   Added new bucket
    10.
   Divide records of
    bucket 00 and 10.
            External Hashing
     Linear Hashing: Insertion
 Insert 0001.
 Use overflow
  block.
(May sort records
  later.)
               External Hashing
       Linear Hashing: Insertion
   Insert 0111.
              External Hashing
       Linear Hashing: Insertion
   Create bucket 11.
   Split records
    between 01 and 11.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:4/18/2011
language:English
pages:25