Document Sample

Introduction Hash functions Hash tables: collisions Other Hashing Victor Eijkhout Notes for CS 594 – Fall 2004 CS-594 Eijkhout, Fall 2004 Hashing 1 Introduction Hash functions Hash tables: collisions Other The basic problem Storing names and information about them: associative storage CS-594 Eijkhout, Fall 2004 Hashing 2 Introduction Hash functions Hash tables: collisions Other Issues CS-594 Eijkhout, Fall 2004 Hashing 3 Introduction Hash functions Hash tables: collisions Other Issues Insertion CS-594 Eijkhout, Fall 2004 Hashing 4 Introduction Hash functions Hash tables: collisions Other Issues Insertion Retrieval CS-594 Eijkhout, Fall 2004 Hashing 5 Introduction Hash functions Hash tables: collisions Other Issues Insertion Retrieval Deletion CS-594 Eijkhout, Fall 2004 Hashing 6 Introduction Hash functions Hash tables: collisions Other Simple strategies List in order of creation CS-594 Eijkhout, Fall 2004 Hashing 7 Introduction Hash functions Hash tables: collisions Other Simple strategies List in order of creation ⇒ Cheap to create, linear search time, linear deletion CS-594 Eijkhout, Fall 2004 Hashing 8 Introduction Hash functions Hash tables: collisions Other Simple strategies List in order of creation ⇒ Cheap to create, linear search time, linear deletion Sorted list CS-594 Eijkhout, Fall 2004 Hashing 9 Introduction Hash functions Hash tables: collisions Other Simple strategies List in order of creation ⇒ Cheap to create, linear search time, linear deletion Sorted list ⇒ Creation in linear time, search logarithmic, deletion linear CS-594 Eijkhout, Fall 2004 Hashing 10 Introduction Hash functions Hash tables: collisions Other Simple strategies List in order of creation ⇒ Cheap to create, linear search time, linear deletion Sorted list ⇒ Creation in linear time, search logarithmic, deletion linear Linear list CS-594 Eijkhout, Fall 2004 Hashing 11 Introduction Hash functions Hash tables: collisions Other Simple strategies List in order of creation ⇒ Cheap to create, linear search time, linear deletion Sorted list ⇒ Creation in linear time, search logarithmic, deletion linear Linear list ⇒ all linear time CS-594 Eijkhout, Fall 2004 Hashing 12 Introduction Hash functions Hash tables: collisions Other one more strategy • ¨rr ¨ ¨¨ rr B E ¨r ¨ r ¨ r ¨¨ rr ART E LSE ND ¨r ¨r GIN LL CS-594 Eijkhout, Fall 2004 Hashing 13 Introduction Hash functions Hash tables: collisions Other one more strategy • ¨rr ¨ ¨¨ rr B E ¨r ¨ r ¨ r ¨¨ rr ART E LSE ND ¨r ¨r GIN LL ⇒ all linear in length of string CS-594 Eijkhout, Fall 2004 Hashing 14 Introduction Hash functions Modulo operations Hash tables: collisions Character hashing Other Hash functions CS-594 Eijkhout, Fall 2004 Hashing 15 Introduction Hash functions Modulo operations Hash tables: collisions Character hashing Other Mapping from space of words to space of indices Source: unbounded; in practice not extremely large Target: array (static/dynamic) CS-594 Eijkhout, Fall 2004 Hashing 16 Introduction Hash functions Modulo operations Hash tables: collisions Character hashing Other Requirements CS-594 Eijkhout, Fall 2004 Hashing 17 Introduction Hash functions Modulo operations Hash tables: collisions Character hashing Other Requirements Function determined only by input data CS-594 Eijkhout, Fall 2004 Hashing 18 Introduction Hash functions Modulo operations Hash tables: collisions Character hashing Other Requirements Function determined only by input data Determined by as much of the data as possible key1, key2,. . . CS-594 Eijkhout, Fall 2004 Hashing 19 Introduction Hash functions Modulo operations Hash tables: collisions Character hashing Other Requirements Function determined only by input data Determined by as much of the data as possible key1, key2,. . . Uniform distribution (clustering bad, collisions really bad) CS-594 Eijkhout, Fall 2004 Hashing 20 Introduction Hash functions Modulo operations Hash tables: collisions Character hashing Other Requirements Function determined only by input data Determined by as much of the data as possible key1, key2,. . . Uniform distribution (clustering bad, collisions really bad) Similar data, mapped far apart CS-594 Eijkhout, Fall 2004 Hashing 21 Introduction Hash functions Modulo operations Hash tables: collisions Character hashing Other Good idea: prime numbers With M size of the hash table: h(K ) = K mod M, (1) or: h(K ) = aK mod M, (2) CS-594 Eijkhout, Fall 2004 Hashing 22 Introduction Hash functions Modulo operations Hash tables: collisions Character hashing Other Bad examples: M is even, say M = 2M , r = K mod M say K = nM + r then K = 2K ⇒ r = 2(nM − K ) K = 2K + 1 ⇒ r = 2(nM − K ) + 1 so key even iﬀ number ⇒ dependence on last digit M multiple of three: anagrams map to same key (sum of digits) ⇒ M prime, far away from powers of 2 CS-594 Eijkhout, Fall 2004 Hashing 23 Introduction Hash functions Modulo operations Hash tables: collisions Character hashing Other Multiplication instead of division r = K mod M = M (K /M) mod 1 A ≈ w /M, where w maxint Then 1/M = A/w , (A with decimal point to its left). from A h(K ) = M K mod 1 . w CS-594 Eijkhout, Fall 2004 Hashing 24 Introduction Hash functions Modulo operations Hash tables: collisions Character hashing Other Example: Bible 42,829 unique words, into a hash table with 30,241 elements (prime): 76.6% used table of size: 30,240 (divisible by 2–9): 60.7% used (collisions discussed later) CS-594 Eijkhout, Fall 2004 Hashing 25 Introduction Hash functions Modulo operations Hash tables: collisions Character hashing Other Two-step hashing Mix up characters of the key then modulo with table size CS-594 Eijkhout, Fall 2004 Hashing 26 Introduction Hash functions Modulo operations Hash tables: collisions Character hashing Other Character based hashing h = <some value> for (i=0; i<len(var); i++) h = h + <byte i of string>; prevent anagram problem: h = <some value> for (i=0; i<len(var); i++) h = Rand( h + <byte i of string> ); with table of random numbers; also function possible CS-594 Eijkhout, Fall 2004 Hashing 27 Introduction Hash functions Modulo operations Hash tables: collisions Character hashing Other ELF hash /* UNIX ELF hash * Published hash algorithm used in the UNIX ELF format * for object files */ unsigned long hash(char *name) { unsigned long h = 0, g; while ( *name ) { h = ( h << 4 ) + *name++; if ( g = h & 0xF0000000 ) h ^= g >> 24; h &= ~g; } } CS-594 Eijkhout, Fall 2004 Hashing 28 Introduction Hash functions Modulo operations Hash tables: collisions Character hashing Other Another hash function /* djb2 * This algorithm was first reported by Dan Bernstein * many years ago in comp.lang.c */ unsigned long hash(unsigned char *str) { unsigned long hash = 5381; int c; while (c = *str++) hash = ((hash << 5) + hash) + c; return hash; } CS-594 Eijkhout, Fall 2004 Hashing 29 Introduction Open hash table Hash functions Closed hash table Hash tables: collisions Chaining Other Hash tables: collisions CS-594 Eijkhout, Fall 2004 Hashing 30 Introduction Open hash table Hash functions Closed hash table Hash tables: collisions Chaining Other So far so good CS-594 Eijkhout, Fall 2004 Hashing 31 Introduction Open hash table Hash functions Closed hash table Hash tables: collisions Chaining Other Collisions k1 = k2 , h(k1 ) = h(k2 ) several strategies; all analysis statistical in nature open hash table: solve conﬂict outside the table closed hash table: solve by moving around in the table CS-594 Eijkhout, Fall 2004 Hashing 32 Introduction Open hash table Hash functions Closed hash table Hash tables: collisions Chaining Other Separate chaining CS-594 Eijkhout, Fall 2004 Hashing 33 Introduction Open hash table Hash functions Closed hash table Hash tables: collisions Chaining Other Pro: no need for searching through hash table Con: dynamic storage Also: M large to prevent collisions ⇒ wasted space CS-594 Eijkhout, Fall 2004 Hashing 34 Introduction Open hash table Hash functions Closed hash table Hash tables: collisions Chaining Other Linear probing Location occupied: search linearly from ﬁrst hash CS-594 Eijkhout, Fall 2004 Hashing 35 Introduction Open hash table Hash functions Closed hash table Hash tables: collisions Chaining Other addr = Hash(K); if (IsEmpty(addr)) Insert(K,addr); else { /* see if already stored */ test: if (Table[addr].key == K) return; else { addr = Table[addr].link; goto test;} /* find free cell */ Free = addr; do { Free--; if (Free<0) Free=M-1; } while (!IsEmpty(Free) && Free!=addr) if (!IsEmpty(Free)) abort; else { Insert(K,Free); Table[addr].link = Free;} } CS-594 Eijkhout, Fall 2004 Hashing 36 Introduction Open hash table Hash functions Closed hash table Hash tables: collisions Chaining Other Merging blocks in linear probing L I I I J3 J3 J3 J J J J2 J2 J2 J J J K K K K L I I I CS-594 Eijkhout, Fall 2004 Hashing 37 Introduction Open hash table Hash functions Closed hash table Hash tables: collisions Chaining Other Linear probing analysis Clusters forming Particularly bad: merging clusters Ratio occupied/total: α = N/M expected search time 1 2 1 2 1 + 1−α unsuccessful T ≈ 1+ 1 1 successful 2 1−α ⇒ increasing as table ﬁlls up CS-594 Eijkhout, Fall 2004 Hashing 38 Introduction Open hash table Hash functions Closed hash table Hash tables: collisions Chaining Other Chaining If location occupied, search from top of table CS-594 Eijkhout, Fall 2004 Hashing 39 Introduction Open hash table Hash functions Closed hash table Hash tables: collisions Chaining Other addr = Hash(K); Free = M-1; if (IsEmpty(addr)) Insert(K,addr); else { /* see if already stored */ test: if (Table[addr].key == K) return; else { addr = Table[addr].link; goto test;} /* find free cell */ do { Free--; } while (!IsEmpty(Free) if (Free<0) abort; else { Insert(K,Free); Table[addr].link = Free;} } CS-594 Eijkhout, Fall 2004 Hashing 40 Introduction Open hash table Hash functions Closed hash table Hash tables: collisions Chaining Other Chaining analysis No clusters merging Coalescing lists Search time (α occupied fraction) 1 + (e 2α − 1 − 2α)/4 unsuccessful T ≈ 1 + (e 2α − 1 − 2α)/8α + α/4 successful CS-594 Eijkhout, Fall 2004 Hashing 41 Introduction Open hash table Hash functions Closed hash table Hash tables: collisions Chaining Other Nonlinear rehashing ‘Random probing’: Try (h(m) + pi ) mod s, where pi is a sequence of random numbers (stored) prevent secondary collisions ‘Add the hash’: Try (i × h(m)) mod s. (s prime) Pro: scattered hash keys Con: more calculations, worse memory locality CS-594 Eijkhout, Fall 2004 Hashing 42 Introduction Deletion Hash functions Examples Hash tables: collisions Discussion Other Deleting keys Simple in direct chaining Very hard in closed hash table methods: can only mark ‘unused’ CS-594 Eijkhout, Fall 2004 Hashing 43 Introduction Deletion Hash functions Examples Hash tables: collisions Discussion Other Search in chess programs Problem: evaluation board positions if position arrived in two ways, no two calculations Solution: hash the board, use as key in table of evaluations Collisions? CS-594 Eijkhout, Fall 2004 Hashing 44 Introduction Deletion Hash functions Examples Hash tables: collisions Discussion Other String searching Problem: does string (length M) occur in document (length N) naive: N comparisons, giving O(MN) complexity solution: hash the strings, compare hash values (hash function does not distinguish between anagrams) h(k) = k[i] mod K i string comparison in O(1), ⇒ total cost O(M + N) CS-594 Eijkhout, Fall 2004 Hashing 45 Introduction Deletion Hash functions Examples Hash tables: collisions Discussion Other String searching Problem: does string (length M) occur in document (length N) naive: N comparisons, giving O(MN) complexity solution: hash the strings, compare hash values (hash function does not distinguish between anagrams) h(k) = k[i] mod K i cheap updating of the document hash key: h(t[2 . . . n + 1]) = h(t[1 . . . n]) + t[n + 1] − t[1] (with addition/subtraction modulo K ) string comparison in O(1), ⇒ total cost O(M + N) CS-594 Eijkhout, Fall 2004 Hashing 46 Introduction Deletion Hash functions Examples Hash tables: collisions Discussion Other Discussion CS-594 Eijkhout, Fall 2004 Hashing 47 Introduction Deletion Hash functions Examples Hash tables: collisions Discussion Other Hash table vs trees CS-594 Eijkhout, Fall 2004 Hashing 48 Introduction Deletion Hash functions Examples Hash tables: collisions Discussion Other Hash table vs trees Best case search time can be equal: harder to implement in trees CS-594 Eijkhout, Fall 2004 Hashing 49 Introduction Deletion Hash functions Examples Hash tables: collisions Discussion Other Hash table vs trees Best case search time can be equal: harder to implement in trees Trees can become unbalanced: considerable time and eﬀort to balance CS-594 Eijkhout, Fall 2004 Hashing 50 Introduction Deletion Hash functions Examples Hash tables: collisions Discussion Other Hash table vs trees Best case search time can be equal: harder to implement in trees Trees can become unbalanced: considerable time and eﬀort to balance Threes have dynamic storage: harder to code optimally; worse memory locality CS-594 Eijkhout, Fall 2004 Hashing 51 Introduction Deletion Hash functions Examples Hash tables: collisions Discussion Other Open vs closed hash tables CS-594 Eijkhout, Fall 2004 Hashing 52 Introduction Deletion Hash functions Examples Hash tables: collisions Discussion Other Open vs closed hash tables Approximately equal performance until the table ﬁlls up CS-594 Eijkhout, Fall 2004 Hashing 53 Introduction Deletion Hash functions Examples Hash tables: collisions Discussion Other Open vs closed hash tables Approximately equal performance until the table ﬁlls up Open: much simpler storage management, especially deletion CS-594 Eijkhout, Fall 2004 Hashing 54

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 54 |

posted: | 6/9/2010 |

language: | English |

pages: | 54 |

OTHER DOCS BY Lord_Blade

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.