Document Sample

Chapter 9. The Map ADT & the hash table 1 map • A map models a searchable collection of key-value entries • The main operations of a map are for searching, inserting, and deleting items • Multiple entries with the same key are not allowed • Applications: – address book (key=name, value = address) – student-record database (key=student id, value = student record) 2 map The map ADT requires that each key is unique, so the association of keys to values defines a mapping 3 map 4 map public interface Map<K,V>{ public interface Entry<K,V>{ //see 2 slides up } /**return the number of entries in the map*/ public int size(); /**returns true if this map contains no key-value mappings*/ public boolean isEmpty(); /** Returns value to which specific key is mapped, or null */ public V get(K key); /** associate value with specified key in map, return old *value or null if already an entry with this key*/ public V put(K key, V value); /**Removes the mapping for a key from this map if present*/ public V remove(K key); 5 map /**return a set containing all the keys stored in map*/ public Set<K> keys(); /**return a set containing all the values associated *with the values stored*/ public Set<V> values(); /**return a set containing all the key-value entries*/ public Set<Map.Entry> entries(); } 6 map public interface Entry<K,V>{ /**Compares specified object with this entry for equality*/ public boolean equals(Object o); /**Returns the key corresponding to this entry*/ public K getKey(); /**Returns the value corresponding to this entry*/ public V getValue(); /**Returns the hash code value for this map entry*/ public int hashCode(); /**Replaces the value corresponding to this entry with the *specified value*/ public V setValue(V value); } 7 example map Operation Output Map isEmpty() true Ø put(5,A) null (5,A) put(7,B) null (5,A),(7,B) put(2,C) null (5,A),(7,B),(2,C) put(8,D) null (5,A),(7,B),(2,C),(8,D) put(2,E) C (5,A),(7,B),(2,E),(8,D) get(7) B (5,A),(7,B),(2,E),(8,D) get(4) null (5,A),(7,B),(2,E),(8,D) get(2) E (5,A),(7,B),(2,E),(8,D) size() 4 (5,A),(7,B),(2,E),(8,D) remove(5) A (7,B),(2,E),(8,D) remove(2) E (7,B),(8,D) get(2) null (7,B),(8,D) isEmpty() false (7,B),(8,D) Maps 8 map A Simple List-Based Map • We can implement a map using an unsorted list – We store the items of the map in a list S (based on a doubly-linked list), in arbitrary order head tail 9 c 6 g 5 a 8 r entries 9 map Performance of a List-Based Map • Performance: – put, get and remove take O(n) time since in the worst case (the item is not found) we traverse the entire sequence to look for an item with the given key • The unsorted list implementation is effective only for maps of small size. • All of the fundamental operations take O(n) time. • Would like something faster… 10 hash table hash table A hash table consists of two major components … hash table … a bucket array hash table … and a hash function hash table Performance is expected to be O(1) bucket array bucket array hash table • A bucket array is an array A of size N • A[i] is a bucket, i.e. a collection of <key,value> pairs • N is the capacity of A • <k,e> is inserted in A[k] • if keys are well distributed between 0 .. N-1 • if keys are unique integers in range 0 .. N-1 then each bucket holds at most one entry. • consequently O(1) for get, insert, delete • downside: space is proportional to N • if N is much larger than n (number of entries) we waste space • downside: keys must be in range 0 .. N • this may not be the case (think matric number) bucket array hash table 0 1 2 3 4 5 6 7 8 9 10 (3,C) (6,A) (1,D) (3,F) (7,Q) (3,Z) (6,C) Bucket array of size 11 for the entries (1,D), (3,C), (3,F), (3,Z), (6,A), (6,C) and (7,Q) If hashed keys unique entries in range [0..11] then each bucket holds at most one entry. Otherwise we have a collision and need to deal with it. 19 collision bucket array hash table When two different entries map to the same bucket we have a collision 20 collision bucket array hash table When two different entries map to the same bucket we have a collision It’s good to avoid collisions 21 hash function hash function hash table A hash function maps each key to an integer in the range [0,N-1] Given entry <k,e> … h(k) is the index into the bucket array store entry <k,e> in A[h(k)] h is a good hash function if • h maps keys so as to minimise collisions • h is easy to compute/program • h is fast to compute h(k) has two actions 1. map k to a hash code 2. map hash code into range [0,N-1] hash codes in java hash function hash table But care should be taken as this might not be “good” a bit of maths … that you know (af2) af2 Let A and B be sets • A function is • a mapping from elements of A • to elements of B • and is a subset of AxB • i.e. can be defined by a set of tuples! f : A B x[ x A y[ y B x, y f )]] af2 f : A B • A is the domain • B is codomain •f(x) = y • y is image of x • x is preimage of y • There may be more than one preimage of y • There is only one image of x • otherwise not a function • There may be an element in the codomain with no preimage • Range of f is the set of all images of A • the set of all results Injection (aka one-to-one, 1-1) af2 xy[( f ( x) f ( y)) x y] a u a x b v b c w c y x d y d z z not an injection injection If an injection then preimages are unique Injection (aka one-to-one, 1-1) af2 xy[( f ( x) f ( y)) x y] Ideally we want our hash function to be • injective (no collisions) • have a small codomain and range • may need to compress range a u a x b v b c w c y x d y d z z not an injection injection If an injection then preimages are unique back to ads2 hash code & hash function Just to clear this up (but lets not make too big a deal about it) … hash code & hash function Just to clear this up (but lets not make too big a deal about it) … We assume hash code is an integer in the codomain Hash function brings hash codes into the range [0,N-1] We will examine just a few hash functions, acting on strings Polynomial hash codes hash code & hash function Assume we have a key s that is a character String Here is a really dumb hash code public int dumbHash(String s){ int code = 0; for (int i=0;i<s.length();i++) code = code + s.charAt(i); return code; } What would we get for • dumbHash(“spot”) • dumbHash(“pots”) • dumbHash(“tops”) • dumbHash(“post”) Polynomial hash codes hash code & hash function Take into consideration the “position” of elements of the key n 1 h s0 a s1a s2 a sn 1a 0 1 2 So, this doesn’t look any different from an every-day number It’s to the base a and the coefficients are the components of the key Polynomial hash codes hash code & hash function Good values for a appear to be 33, 37, 39, 41 Polynomial hash codes hash code & hash function Small scale experiments on unix dictionary • a = 33 • 25104 words/strings • minimum hash value -9165468936209580338 • maximum hash value 8952279818009261254 • collision count 7 Yikes! Look at that range!!!! Cyclic shift hash codes hash code & hash function Start moving bits around Cyclic shift hash codes hash code & hash function Cyclic shift hash codes hash code & hash function Thanks to Arash Partow Cyclic shift hash codes hash code & hash function Cyclic shift hash codes hash code & hash function Cyclic shift hash codes hash code & hash function Cyclic shift hash codes hash code & hash function Cyclic shift hash codes hash code & hash function Cyclic shift hash codes hash code & hash function Cyclic shift hash codes hash code & hash function Compression Functions hash code & hash function So, you think you’ve found something that produces a good hash code … How do we compress its range to fit into our machine? Compression Functions hash code & hash function Assume we want to limit storage to buckets in range [0,N-1] The division method i hash(key) mod N int i = (int)(hash(s) % N); NOTE: keep N prime S[i] = s; … ideally, but there may be collisions Compression Functions hash code & hash function Assume we want to limit storage to buckets in range [0,N-1] The multiply add and divide (MAD) method i a hash(key) b mod N • N is prime • a > 1 is scaling factor • b ≥ 0 is a shift •a%N≠0 hash tables Collision handling schemes Collision handling schemes hash tables Separate Chaining Collision handling schemes Separate Chaining hash tables bucket[i] is a small map • implemented as a list bucket[i] should be a short list It may be sorted It might be something other than a list Collision handling schemes Separate Chaining hash tables Let N be number of buckets and n the amount of data stored load factor is n/M Upside: • simple Downside: • requires auxiliary data structures (to resolve collisions) • this may put additional burden on space Collision handling schemes hash tables Open Addressing Open Addressing hash tables Linear Probing Linear Probing Open Addressing hash tables i = hash(key); bucket[i] != null; collision! Try next bucket[(i+1) % N] Try next bucket[(i+2) % N] Try next bucket[(i+N-1) % N] Linear Probing Open Addressing hash tables What happens with get(key)? 1. i = hash(key); 2. bucket[i] == key … found, return 3. bucket[i] == null … not found, return 4. bucket[i] != null and bucket[i] != key i = (i+1) % N goto 2 “Linear Probing” gets its name because accessing a bucket is viewed as a probe Linear Probing Open Addressing hash tables What happens with remove(key)? We have a special marker “removed” 1. i = hash(key); 2. bucket[i] == key … found bucket[i] = “removed” return 3. bucket[i] == null … not found return 4. bucket[i] != null and bucket[i] != key i = (i+1) % N goto 2 Linear Probing Open Addressing hash tables What happens with put(key)? 1. Free location j = -1; 2. i = hash(key); 3. bucket[i] == key … found update bucket[i] return 4. bucket[i] == “removed” j = i; i = (i+1) % N goto 3 5. bucket[i] != null && bucket[i] != key i = (i+1) % N goto 3 6. bucket[i] == null // search stops if (j > -1) bucket[j] = <key,e> if (j = -1) bucket[i] = <key,e> Linear Probing Open Addressing hash tables So? Advantages • saves space as bucket[i] is only a bucket for a single entry • that is, no additional data structures Disadvantages • removals are complicated • put is complicated • if there are collisions entries might clump together • search can then degenerate from O(1) down to O(N) We might use linear probing when memory is tight and we want FAST access Open Addressing hash tables Quadratic Probing Quadratic Probing Open Addressing hash tables Quadratic probing iteratively try …. • bucket[(i + f(j)) % N] where • i = hash(key) • j = 0,1,2,… • f(j) = j*j Open Addressing hash tables Double Hashing Double Hashing Open Addressing hash tables We have a secondary hash function (call it g) i = hash(key) and collision at bucket[i] Try bucket[(i + g(key)) % N] Where g(key) = q – (key % q) Where q is a prime number < N Open Addressing hash tables So? So? Open Addressing hash tables Open addressing saves space, but is complicated, and may be slower In experiments chaining is competitive or faster, depending on load factor If memory is not an issue: • recommend use chaining with low load factor

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 17 |

posted: | 10/12/2011 |

language: | Maltese |

pages: | 70 |

OTHER DOCS BY hedongchenchen

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.