Document Sample

External Hashing We have seen how hashing algorithms can be applied in main memory. To extend these algorithms to disk storage for database indexes, we should note that: – Internal hashing - The hash function returns a memory address to a particular array location. – External hashing - The hash function returns a bucket address (block on disk) that stores block addresses of all records that map to the particular bucket. Hashing algorithms that work especially well for secondary storage are bucket based algorithms because each bucket can map to a disk block. – However, we still must be able to handle overflow within a block. This can be done by using chaining External Hashing Hash file has relative bucket numbers 0 through N-1. Map logical bucket numbers to physical disk block addresses. Disk blocks are buckets that hold several data records each. External Hashing Example External Hash Table - 5 buckets - 2 records per bucket - use overflow blocks - f(x) = x % 5 External Hashing Example: Insertion Insert: 1,5,3,6,4,24 External Hashing Example: Insertion with Overflow Insert: 11 External Hashing Example: Deletion Delete: 4 External Hashing Example: Deletion with Overflow Delete: 6 External Hashing Dynamic Hashing In static hashing, the hash function maps keys to a fixed set of bucket addresses. – If initial number of buckets is too small, performance will degrade due to too many overflows. If file size is made larger to accommodate future needs, a significant amount of space will be wasted initially. These problems can be avoided by using techniques that allow and the number of buckets to be modified dynamically. In dynamic or extendible hashing the hash file size can grow and shrink “on the fly” in response to the size of the data. We will look at two different techniques: – Extendible hashing – Linear hashing External Hashing Extendible Hashing Basic Idea: Maintain a directory of bucket addresses instead of just hashing to buckets directly. (indirection) The directory can grow, but its size is always a power of 2. At any time, the directory consists of d levels, and a directory of depth d has 2d bucket pointers. – However, not every directory entry (bucket pointer) has to point to a unique bucket. More than one directory entry can point to the same one. – Each bucket has a local depth d’ that indicates how many of the d bits of the hash value are actually used to indicate membership in the bucket. The depth d of the directory is based on the # of bits we use from each hash value. – The hash function produces an output integer which can be treated as a sequence of k bits. We use the first d bits in the hash value produced to look up an entry in the directory. The directory entry points us to the block that contains records whose keys hash to that value. External Hashing Example: Assume each hashed key is a sequence of four binary digits. Store values 0001, 1001, 1100. External Hashing Extendible Hashing: Insertion Insert 0100. External Hashing Extendible Hashing: Insertion Insert 1111. External Hashing Extendible Hashing: Insertion Directory grows one level. External Hashing Extendible Hashing: Insertion Insert 1101. External Hashing Extendible Hashing: Insertion Directory grows one level. External Hashing Extendible Hashing: Deletion Delete 1111. External Hashing Extendible Hashing: Deletion Merge blocks and shrink directory. External Hashing Linear Hashing Linear hashing allows a hash file to expand and shrink dynamically without the need of a directory. – Thus, directory issues like extendible hashing are not present. A linear hash table starts with 2d buckets where d is the # of bits used from the hash value to determine bucket membership. – The size of the table will grow gradually, but not double in size. The growth of the hash table can either be triggered: 1) Every time there is a bucket overflow. 2) When the load factor of the hash table reaches a given point. We will examine the second growth method. – Since overflows may not always trigger growth, note that each bucket may use overflow blocks. External Hashing Linear Hashing: Load Factor The load factor lf of the hash table is the number of records stored divided by the number of possible storage locations. The initial number of blocks n is a power of 2. – As the table grows, it may not always be a power of 2. The number of storage locations s = #blocks X #records/block. The initial number of records in the table r is 0 and is increased as records are added. Load factor = r / s = r / n * #records/block We will use the load factor to determine when to grow or shrink the hash table. Common parameters are: – grow when lf >= 90% and shrink when lf <= 75%. We will use the value of 85% as the threshold for growth. External Hashing Linear Hashing: Example Example: Assume each hashed key is a sequence of four binary digits. Store values 0000, 1010, 1111. External Hashing Linear Hashing: Insertion Insert 0101. External Hashing Linear Hashing: Insertion Added new bucket 10. Divide records of bucket 00 and 10. External Hashing Linear Hashing: Insertion Insert 0001. Use overflow block. (May sort records later.) External Hashing Linear Hashing: Insertion Insert 0111. External Hashing Linear Hashing: Insertion Create bucket 11. Split records between 01 and 11.

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 7 |

posted: | 4/18/2011 |

language: | English |

pages: | 25 |

OTHER DOCS BY mikesanye

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.