VIEWS: 3 PAGES: 19 POSTED ON: 8/28/2012
Hashing by Rafael Jaffarove CS157b Motivation Fast data access Search Insertion Deletion Ideal seek time is O(1) Types of Organization File organization search-key points to the disk block with desired record Index organization search-key is stored together with a pointer in a hash table. Pointer points to a particular bucket where the record is stored Types of Hashing Static hashing Fixed file size Dynamic hashing Extendable hashing Problems with Static Hashing Databases tend to grow over time The number of buckets must be predefined If number is too large then the space is wasted If number is too small then we have too many collisions Bucket overflow Handling Bucket Overflow Providing overflow buckets If an initial bucket is full a new bucket is given. If the second bucket is full then a 3rd bucket is given and so on. Additional buckets are linked together in a linked list Problems: searches and insertions might take liner time deletions are difficult to perform Dynamic Hashing Extendable hashing buckets created as needed Example of extendable hashing Insert the following countries into database: England, France, China, Germany, Egypt, Australia We will use hash function of sum of ASCII codes of all characters in a name Assumption: bucket can’t hold more than 2 records Extendable Hashing Example (contd.) Extendable Hashing Problem with dynamic hashing additional level of indirection Hash function Importance of choosing the right hash function Uniform function = even distribution of data Table size is a prime number There is no perfect hash function so collisions are possible Handling Collisions Linear probing Quadratic probing Double hashing Chaining Linear Probing If a slot is used, take next available If next is used, continue until an empty slot is found If end of table is reached, wrap around from beginning. Problems: Clustering of data How far to go if there are no empty slots? Deletion: deleting key in the middle of a cluster Quadratic probing To avoid clustering take not the next slot but 12, 22, 32, 42, etc. Problem: Secondary clustering, since the same seek pattern is used in case of a collision Double Hashing In case of collision, apply second hash function. Overall better performance than linear and quadratic probing Chaining Entries are linked lists In case of a collision the entries are added to those linked lists. Problem: In case of frequent collisions on the same key, search for that key in linked list becomes linear. Alternative data structures are used to solve this problem (i.e. B+-trees).
Pages to are hidden for
"Hashing"Please download to view full document