Document Sample

Hashing Dr. Ying Lu ylu@cse.unl.edu Giving credit where credit is due: • Most of the slides are based on the lecture note by Dr. David Matuszek, University of Pennsylvania • I have modified many of the slides and added new slides. 2 Example Problem • Assume that you are searching for a book in a library catalog – You know the book’s ISBN – You want to get the book’s record and find out where the book is located in the library • Which searching algorithm will you use? 3 Searching • Searching an array for a given key – If the array is not sorted, the search requires O(n) time • If the key isn’t there, we need to search all n elements • If the key is there, we search n/2 elements on average – If the array is sorted, we can do a binary search • A binary search requires O(log n) time • About equally fast whether the element is found or not – It doesn’t seem like we could do much better • How about an O(1), that is, constant time search? • We can do it if the array is organized in a particular way 4 Why searching by key? • We put key/value pairs ... key value into the table 141 – We use a key to find a 142 robin robin info place in the table 143 sparrow sparrow info – The value holds the information we are 144 hawk hawk info actually interested in 145 seagull seagull info – Book records database 146 (ISBN/book-info pairs) – Another example: 147 bluejay bluejay info student records database 148 owl owl info (ID/record pairs) 5 Hashing • Suppose we were to come up with a ―magic function‖ that, given a key to search for, would tell us exactly where in the array to look – If it’s in that location, it’s in the array – If it’s not in that location, it’s not in the array • This function is called a hash function 6 Example (ideal) hash function • Suppose our hash function 0 kiwi gave us the following outputs: 1 hashCode("apple") = 5 2 banana hashCode("watermelon") = 3 hashCode("grapes") = 8 3 watermelon hashCode("cantaloupe") = 7 4 hashCode("kiwi") = 0 hashCode("strawberry") = 9 5 apple hashCode("mango") = 6 6 mango hashCode("banana") = 2 7 cantaloupe 8 grapes 9 strawberry 7 Finding the hash function • How can we come up with this magic function? • In general, we cannot--there is no such magic function – In a few specific cases, where all the possible keys are known in advance, it has been possible to compute a perfect hash function • What is the next best thing? – A perfect hash function would tell us exactly where to look – In general, the best we can do is a function that tells us where to start looking! 8 Example imperfect hash function • Suppose our hash function 0 kiwi gave us the following 1 outputs: 2 banana – hash("apple") = 5 3 watermelon hash("watermelon") = 3 hash("grapes") = 8 4 hash("cantaloupe") = 7 5 apple hash("kiwi") = 0 hash("strawberry") = 9 6 mango hash("mango") = 6 7 cantaloupe hash("banana") = 2 hash("honeydew") = 6 8 grapes • Now what? 9 strawberry 9 Collisions • When two keys hash to the same array location, this is called a collision • Collisions are normally treated as ―first come, first served‖—the first key that hashes to the location gets it • We have to find something to do with the second and subsequent keys that hash to this same location 10 Handling collisions • What can we do when two different keys attempt to occupy the same place in an array? – Solution #1 (closed hashing): Search from there for an empty location • Can stop searching when we find the key or an empty location • Search must be end-around – Solution #2 (open hashing): Use the array location as the header of a linked list of keys that hash to this location • All these solutions work, provided: – We use the same technique to add things to the array as we use to search for things in the array 11 Insertion, I • Suppose you want to add ... seagull to this hash table 141 • Also suppose: 142 robin – hashCode(seagull) = 143 143 sparrow – table[143] is not empty 144 hawk – table[143] != seagull 145 seagull – table[144] is not empty 146 – table[144] != seagull 147 bluejay – table[145] is empty 148 owl • Therefore, put seagull at ... location 145 12 Searching, I • Suppose you want to look up ... seagull in this hash table 141 • Also suppose: 142 robin – hashCode(seagull) = 143 143 sparrow – table[143] is not empty – table[143] != seagull 144 hawk – table[144] is not empty 145 seagull – table[144] != seagull 146 – table[145] is not empty 147 bluejay – table[145] == seagull ! 148 owl • We found seagull at location ... 145 13 Searching, II • Suppose you want to look up ... cow in this hash table 141 • Also suppose: 142 robin – hashCode(cow) = 144 143 sparrow – table[144] is not empty – table[144] != cow 144 hawk – table[145] is not empty 145 seagull – table[145] != cow 146 – table[146] is empty 147 bluejay • If cow were in the table, we 148 owl should have found it by now ... • Therefore, it isn’t here 14 Insertion, II • Suppose you want to add ... hawk to this hash table 141 • Also suppose 142 robin – hashCode(hawk) = 143 143 sparrow – table[143] is not empty 144 hawk – table[143] != hawk 145 seagull – table[144] is not empty 146 – table[144] == hawk 147 bluejay • hawk is already in the table, 148 owl so do nothing ... 15 Insertion, III • Suppose: ... – You want to add cardinal to 141 this hash table 142 robin – hashCode(cardinal) = 147 143 sparrow – The last location is 148 144 hawk – 147 and 148 are occupied 145 seagull • Solution: 146 – Treat the table as circular; after 147 bluejay 148 comes 0 148 owl – Hence, cardinal goes in location 0 (or 1, or 2, or ...) 16 Clustering • One problem with the closed hashing technique is the tendency to form ―clusters‖ • A cluster is a group of items not containing any open slots • The bigger a cluster gets, the more likely it is that new keys will hash into the cluster, and make it even bigger • Clusters cause efficiency to degrade • double hashing: use second hash function to compute increment 17 Efficiency of Closed Hasing • Hash tables are actually surprisingly efficient • Until the table is about 70% full, the number of probes (places looked at in the table) is typically only 2 or 3 • Sophisticated mathematical analysis is required to prove that the expected cost of inserting into a hash table, or looking something up in the hash table, is O(1) • Even if the table is nearly full (leading to long searches), efficiency is usually still quite high 18 Solution #2: Open hashing (Bucket hashing ) • The previous ... solutions used closed 141 hashing: all entries 142 robin went into a ―flat‖ 143 sparrow seagull (unstructured) array 144 hawk • Another solution is to 145 make each array 146 location the header of 147 bluejay a linked list of keys 148 owl that hash to that ... location Efficiency for searching? 19 Other considerations • The hash table might fill up; we need to be prepared for that – Not a problem for open hashing, of course • You cannot delete items from a closed hashing table – This would create empty slots that might prevent you from finding items that hash before the slot but end up after it – Again, not a problem for open hashing • Generally speaking, hash tables work best when the table size is a prime number 20 In-class exercises • In an array of 2k+1 integers, there are k integers that appear twice and 1 integer that appears once in the array. Design an efficient algorithm to identify the unique integer. Design and Analysis of Algorithms – Chapter 6 21

DOCUMENT INFO

Shared By:

Categories:

Tags:
hash function, hash table, hash value, linear probing, load factor, Hash House Harriers, collision resolution, hash functions, Hash House, hashing algorithm

Stats:

views: | 67 |

posted: | 6/23/2011 |

language: | English |

pages: | 21 |

OTHER DOCS BY wanghonghx

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.