# Hashing slides

Document Sample

```					   Hashing

Chapter 7 Section 3
What is hashing?

• Hashing is using a 1-D array to implement a dictionary
o This implementation is called a "hash table"
• Items are stored by the key and have related values
• A real dictionary example
o The defined words are the keys
o The definitions are the values
• A more pertinent example
Hash Table info

• Items are placed in a specific location
o This location is determined by the hash function
• The hash function can be any function as long as it evenly
distributes the keys to unique slots or buckets
• The function should satisfy two principles:
o Even distribution (helpful to have a prime sized array)
o Easy computation
• We can use "mod" for example
o Takes advantage of the prime feature
Collisions are imminent...

• What happens if two keys get mapped to the same slot?
o A collision occurs
Collision resolution

• Why was there a collision in the first place?
o Poor hash function?
o Coincidence?
• Two ways of resolving collisions
o Open Hashing called separate chaining
o Closed Hashing called open addressing
Separate Chaining

• Each slot in the array turns into a linked list
• When a collision occurs, the key/value is added to the
end of the list at that slot
• How do we search for items with separate chaining?
o Linear search the list
• Efficiency depends on:
o Length of the linked lists
o Size of the array in comparison to the number of keys
o Quality of the hash function

• Collisions are resolved by moving to the next open slot
• How do we search in this case?
o Compute what the address "should" be
o Not there? Linear search until you find an empty slot
 This is called linear probing

• What happens
if we delete a key?

• If we delete a key, that leaves an open spot in the array
• If a key AFTER the one we deleted depended on linear
probing, then we will never find it
• We'll compute the location, and then find the open slot
before we reach it.
• We can use a "book mark"
o Just place a key in that slot that means nothing
o This will hold the location until a real item is inserted
• Problem #1 SOLVED!
• Clustering: groups of unwanted "linear probes"
• When groups of clusters get linked, we essentially have a
• How can we resolve this clustering problem?
o   Double Hashing
 This means using ANOTHER hash
function to determine a fixed increment for probing
besides 1
o   Rehashing
 This just means resizing the array and remapping all
the keys to new locations
• Problem #2 SOLVED!
Performance Analysis
Real Performance Analysis
• Worst case:
o All operations: O(n)
o Table is full and we are either linear probing or linear
searching to find the key
o Our table is acting like a linked list...
• Best case:
o All operations: O(1)
 Called "perfect hashing"
• Avg case:﻿  ﻿
o O(1+ k/n) for chaining and unsuccessful lookup
o O(1/(1- k/n)) for open addressing and unsuccessful
lookup
Problems
• Problems from 7.3:
o 1: Input 30, 20, 56, 75, 31, 19. Use h(K) = K mod 11
 Construct open hash table (separate chaining)
 Find largest # of comparisons for a valid search
 Find avg. # of comparisons for a valid search
o 2: Input 30, 20, 56, 75, 31, 19. Use h(K) = K mod 11
 Construct closed hash table (open addressing)
 Find largest # of comparisons for a valid search
 Find avg. # of comparisons for a valid search
o 5: How many people should be in a room so that a >
50% chance exists that two share a birthday? What
implications does this have for hashing?
Examples

• http://en.wikipedia.org/wiki/Birthday_problem
• http://turing.cs.olemiss.edu/~apernell/Count.java
• http://www.codinghorror.com/blog/2007/12/hashtables-
pigeonholes-and-birthdays.html
First person to raise your hand AND

A) What the two types of hashing are called?
B) What is it called when two keys map to the same slot?
C) What is it called when we resize the table?

gets 5 points on your next homework...

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 3 posted: 8/14/2011 language: English pages: 14