Docstoc

Hashing slides

Document Sample
Hashing slides Powered By Docstoc
					   Hashing

Chapter 7 Section 3
What is hashing?

• Hashing is using a 1-D array to implement a dictionary
   o This implementation is called a "hash table"
• Items are stored by the key and have related values
• A real dictionary example
   o The defined words are the keys
   o The definitions are the values
• A more pertinent example
   o Your Ole Miss ID number would be your key
   o Your name, major, address, etc. would be your values
Hash Table info

• Items are placed in a specific location
   o This location is determined by the hash function
• The hash function can be any function as long as it evenly
  distributes the keys to unique slots or buckets
• The function should satisfy two principles:
   o Even distribution (helpful to have a prime sized array)
   o Easy computation
• We can use "mod" for example
   o Takes advantage of the prime feature
Collisions are imminent...

• What happens if two keys get mapped to the same slot?
  o A collision occurs
Collision resolution

• Why was there a collision in the first place?
   o Poor hash function?
   o Load factor?
   o Coincidence?
• Two ways of resolving collisions
   o Open Hashing called separate chaining
   o Closed Hashing called open addressing
Separate Chaining

• Each slot in the array turns into a linked list
• When a collision occurs, the key/value is added to the
  end of the list at that slot
• How do we search for items with separate chaining?
   o Calculate the address
   o Linear search the list
• Efficiency depends on:
   o Length of the linked lists
   o Size of the array in comparison to the number of keys
   o Quality of the hash function
Open Addressing

• Collisions are resolved by moving to the next open slot
• How do we search in this case?
   o Compute what the address "should" be
   o Not there? Linear search until you find an empty slot
       This is called linear probing

• What happens
  if we delete a key?
Issues in Open Addressing

• If we delete a key, that leaves an open spot in the array
• If a key AFTER the one we deleted depended on linear
  probing, then we will never find it
• We'll compute the location, and then find the open slot
  before we reach it.
• We can use a "book mark"
   o Just place a key in that slot that means nothing
   o This will hold the location until a real item is inserted
• Problem #1 SOLVED!
More issues in Open Addressing!
• Clustering: groups of unwanted "linear probes"
• When groups of clusters get linked, we essentially have a
  linked list, which is bad news
• How can we resolve this clustering problem?
  o   Double Hashing
       This means using ANOTHER hash
        function to determine a fixed increment for probing
        besides 1
  o   Rehashing
      This just means resizing the array and remapping all
       the keys to new locations
• Problem #2 SOLVED!
Performance Analysis
Real Performance Analysis
• Worst case:
   o All operations: O(n)
   o Table is full and we are either linear probing or linear
     searching to find the key
   o Our table is acting like a linked list...
• Best case:
   o All operations: O(1)
       Called "perfect hashing"
• Avg case:  
   o O(1+ k/n) for chaining and unsuccessful lookup
   o O(1/(1- k/n)) for open addressing and unsuccessful
     lookup
Problems
• Problems from 7.3:
   o 1: Input 30, 20, 56, 75, 31, 19. Use h(K) = K mod 11
       Construct open hash table (separate chaining)
       Find largest # of comparisons for a valid search
       Find avg. # of comparisons for a valid search
   o 2: Input 30, 20, 56, 75, 31, 19. Use h(K) = K mod 11
       Construct closed hash table (open addressing)
       Find largest # of comparisons for a valid search
       Find avg. # of comparisons for a valid search
   o 5: How many people should be in a room so that a >
     50% chance exists that two share a birthday? What
     implications does this have for hashing?
Examples

• http://en.wikipedia.org/wiki/Birthday_problem
• http://turing.cs.olemiss.edu/~apernell/Count.java
• http://www.codinghorror.com/blog/2007/12/hashtables-
  pigeonholes-and-birthdays.html
          First person to raise your hand AND
              give me these three answers:

     A) What the two types of hashing are called?
B) What is it called when two keys map to the same slot?
    C) What is it called when we resize the table?

         gets 5 points on your next homework...

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:8/14/2011
language:English
pages:14