; Hashing Hashing Dr Ying
Documents
User Generated
Resources
Learning Center
Your Federal Quarterly Tax Payments are due April 15th

# Hashing Hashing Dr Ying

VIEWS: 67 PAGES: 21

• pg 1
```									Hashing

Dr. Ying Lu
ylu@cse.unl.edu
Giving credit where credit is due:
• Most of the slides are based on the lecture
note by Dr. David Matuszek, University of
Pennsylvania

• I have modified many of the slides and

2
Example Problem
• Assume that you are searching for a book in a
library catalog
– You know the book’s ISBN
– You want to get the book’s record and find out where
the book is located in the library

• Which searching algorithm will you use?

3
Searching
• Searching an array for a given key
– If the array is not sorted, the search requires O(n) time
• If the key isn’t there, we need to search all n elements
• If the key is there, we search n/2 elements on average
– If the array is sorted, we can do a binary search
• A binary search requires O(log n) time
• About equally fast whether the element is found or not
– It doesn’t seem like we could do much better
• How about an O(1), that is, constant time search?
• We can do it if the array is organized in a particular way

4
Why searching by key?
• We put key/value pairs        ...     key       value
into the table                141
– We use a key to find a     142    robin     robin info
place in the table
143 sparrow     sparrow info
– The value holds the
information we are         144   hawk       hawk info
actually interested in     145   seagull   seagull info
– Book records database
146
(ISBN/book-info pairs)
– Another example:           147   bluejay   bluejay info
student records database   148    owl        owl info
(ID/record pairs)

5
Hashing
• Suppose we were to come up with a ―magic
function‖ that, given a key to search for, would
tell us exactly where in the array to look
– If it’s in that location, it’s in the array
– If it’s not in that location, it’s not in the array

• This function is called a hash function

6
Example (ideal) hash function
• Suppose our hash function        0      kiwi
gave us the following outputs:   1
hashCode("apple") = 5         2     banana
hashCode("watermelon") = 3
hashCode("grapes") = 8
3   watermelon
hashCode("cantaloupe") = 7    4
hashCode("kiwi") = 0
hashCode("strawberry") = 9
5      apple
hashCode("mango") = 6         6     mango
hashCode("banana") = 2
7   cantaloupe
8     grapes
9   strawberry
7
Finding the hash function
• How can we come up with this magic function?
• In general, we cannot--there is no such magic
function 
– In a few specific cases, where all the possible keys are
known in advance, it has been possible to compute a
perfect hash function
• What is the next best thing?
– A perfect hash function would tell us exactly where to
look
– In general, the best we can do is a function that tells us
where to start looking!

8
Example imperfect hash function
• Suppose our hash function   0      kiwi
gave us the following       1
outputs:                    2     banana
– hash("apple") = 5         3   watermelon
hash("watermelon") = 3
hash("grapes") = 8        4
hash("cantaloupe") = 7    5      apple
hash("kiwi") = 0
hash("strawberry") = 9    6     mango
hash("mango") = 6         7   cantaloupe
hash("banana") = 2
hash("honeydew") = 6      8     grapes
• Now what?
9   strawberry
9
Collisions
• When two keys hash to the same array location,
this is called a collision
• Collisions are normally treated as ―first come, first
served‖—the first key that hashes to the location
gets it
• We have to find something to do with the second
and subsequent keys that hash to this same
location

10
Handling collisions
• What can we do when two different keys attempt to
occupy the same place in an array?
– Solution #1 (closed hashing): Search from there for an
empty location
• Can stop searching when we find the key or an empty location
• Search must be end-around
– Solution #2 (open hashing): Use the array location as the
header of a linked list of keys that hash to this location
• All these solutions work, provided:
– We use the same technique to add things to the array as
we use to search for things in the array

11
Insertion, I
• Suppose you want to add         ...
seagull to this hash table      141

• Also suppose:                   142  robin
– hashCode(seagull) = 143      143 sparrow
– table[143] is not empty      144  hawk
– table[143] != seagull        145   seagull
– table[144] is not empty
146
– table[144] != seagull
147   bluejay
– table[145] is empty
148    owl
• Therefore, put seagull at
...
location 145
12
Searching, I
• Suppose you want to look up    ...
seagull in this hash table     141
• Also suppose:                  142  robin
– hashCode(seagull) = 143
143 sparrow
– table[143] is not empty
– table[143] != seagull       144  hawk
– table[144] is not empty     145   seagull
– table[144] != seagull       146
– table[145] is not empty
147   bluejay
– table[145] == seagull !
148    owl
• We found seagull at location
...
145
13
Searching, II
• Suppose you want to look up    ...
cow in this hash table         141
• Also suppose:                  142  robin
– hashCode(cow) = 144
143 sparrow
– table[144] is not empty
– table[144] != cow           144  hawk
– table[145] is not empty     145   seagull
– table[145] != cow           146
– table[146] is empty
147   bluejay
• If cow were in the table, we   148    owl
should have found it by now
...
• Therefore, it isn’t here
14
Insertion, II
• Suppose you want to add         ...
hawk to this hash table         141

• Also suppose                    142  robin
– hashCode(hawk) = 143         143 sparrow
– table[143] is not empty      144  hawk
– table[143] != hawk           145   seagull
– table[144] is not empty
146
– table[144] == hawk
147   bluejay
• hawk is already in the table,
148    owl
so do nothing
...

15
Insertion, III
• Suppose:                                ...
– You want to add cardinal to          141
this hash table                      142  robin
– hashCode(cardinal) = 147
143 sparrow
– The last location is 148
144  hawk
– 147 and 148 are occupied
145   seagull
• Solution:                               146
– Treat the table as circular; after
147   bluejay
148 comes 0
148    owl
– Hence, cardinal goes in
location 0 (or 1, or 2, or ...)
16
Clustering
• One problem with the closed hashing
technique is the tendency to form ―clusters‖
• A cluster is a group of items not containing
any open slots
• The bigger a cluster gets, the more likely it is
that new keys will hash into the cluster, and
make it even bigger
• Clusters cause efficiency to degrade
• double hashing: use second hash function to
compute increment
17
Efficiency of Closed Hasing
• Hash tables are actually surprisingly efficient
• Until the table is about 70% full, the number of
probes (places looked at in the table) is typically
only 2 or 3
• Sophisticated mathematical analysis is required to
prove that the expected cost of inserting into a
hash table, or looking something up in the hash
table, is O(1)
• Even if the table is nearly full (leading to long
searches), efficiency is usually still quite high
18
Solution #2: Open hashing
(Bucket hashing )
• The previous             ...
solutions used closed    141
hashing: all entries     142  robin
went into a ―flat‖       143 sparrow   seagull
(unstructured) array     144  hawk
• Another solution is to   145
make each array          146
location the header of   147 bluejay
a linked list of keys    148   owl
that hash to that        ...
location
Efficiency for searching?
19
Other considerations
• The hash table might fill up; we need to be
prepared for that
– Not a problem for open hashing, of course
• You cannot delete items from a closed hashing
table
– This would create empty slots that might prevent you
from finding items that hash before the slot but end up
after it
– Again, not a problem for open hashing
• Generally speaking, hash tables work best when
the table size is a prime number

20
In-class exercises
• In an array of 2k+1 integers, there are k integers that
appear twice and 1 integer that appears once in the
array. Design an efficient algorithm to identify the
unique integer.

Design and Analysis of Algorithms –
Chapter 6                    21

```
To top