# Hash (PowerPoint) by ewghwehws

VIEWS: 5 PAGES: 33

• pg 1
```									Hash

C and Data Structure
Baojian Hua
bjhua@ustc.edu.cn
Searching
   A dictionary-like data structure
   contains a collection of tuple data:
   <k1, v1>, <k2, v2>, …
   keys are comparable and distinct
   supports these operations:
   new ()
   insert (dict, k, v)
   lookup (dict, k)
   delete (dict, k)
Examples
Application    Purpose       Key         Value
Phone Book     phone         name        phone No.
Bank           transaction   visa        \$\$\$
Dictionary     lookup        word        meaning
compiler       symbol        variable    type
www.google.c   search        key words   contents
om
…              …             …           …
Summary So Far
rep’ array   sorted   linked   sorted   binary
op’                array    list     linked   search
list     tree
lookup() O(n)      O(lg n) O(n)      O(n)     O(n)

insert() O(n)      O(n)     O(n)     O(n)     O(n)

delete() O(n)      O(n)     O(n)     O(n)     O(n)
What’s the Problem?
   For every mapping (k, v)s
   After we insert it into the dictionary dict,
we don’t know it’s position!
   Ex: insert (d, “li”, 97), (d, “wang”, 99), (d,
“zhang”, 100), …
   and then lookup (d, “zhang”);

(“li”,   (“wang”,   (“zhang”,
97)        99)        100)          …
Basic Plan
   Start from the array-based approach
    Use an array A to hold elements (k, v)s
    For every key k:
   if we know its position (array index) i from k
   then lookup, insert and delete are simple:
   A[i]
   done in constant time O(1)
(k, v)
i
…
Example
   Ex: insert (d, “li”, 97), (d, “wang”, 99),
(d, “zhang”, 100), …;and then lookup
(d, “zhang”);
(“li”, 97)
Problem#1: How to
calculate index from
the key?
?
…
Example
   Ex: insert (d, “li”, 97), (d, “wang”, 99),
(d, “zhang”, 100), …;and then lookup
(d, “zhang”);
(“li”, 97)
Problem#2: How
long should array
be?
?
…
Basic Plan
   Save (k, v)s in an array, index
calculated from k
   Hash function: a method for computing
index from given keys
(“li”, 97)

hash (“li”)

…
Hash Function
   Given any key, compute an index
   Efficiently computable
   Ideal goals: for any key, the index is uniform
   different keys to different indexes
   However, thorough research problem, :-(
   Next, we assume that the array is of infinite
length, so the hash function has type:
   int hash (key k);
   Next is a “case analysis” on how different key
types affect “hash”
Hash Function On “int”
  If the key of hash is of “int” type, the
hash function is trivial:
int hash (int i)
{
return i;
}
Hash Function On “char”
  If the key of hash is of “char” type, the
hash function comes with type
conversion:
int hash (char c)
{
return c;
}
Hash Function On “float”
  Also type conversion:
int hash (float f)
{
return (int)f;
}
// how to deal with 0.aaa, say 0.5?
Hash Function On “string”
   “BillG”:
int hash (char *s)
{
int i=0, sum=0;
while (s[i])
{
sum += s[i];
i++;
}
return sum;
}
From “int” Hash to Index
   Problems with “int” Hash Type
   At any time, the array is finite
   no negative index (say -10)
   Our goal:
 int i ==> [0, N-1]
 Aha, that’s easy! It’s just:

abs(i) % N
Bug!
 Note that “int”s range: -231~~231-1
 So abs(-2
31) = 231 (Overflow!)

 The key step is to wipe the sign bit off

int t = i & 0x7fffffff;
int hc = t % N;
 In summary:

hc = (i & 0x7fffffff) % N;
Collision
   Given two keys k1 and k2, we compute
two hash values h1, h2[0, N-1]
   If k1<>k2, but h1==h2, then a collision
occurs

(k1, v1)        (k2, v2)

i
…
Collision Resolution
   Re-hash
   Chaining
Chaining
   For collision index i, we keep a separate
linear list (chain) at index i
(k1, v1)         (k2, v2)

i
…

k1

k2
    defaultLoadFactor: default value of the

k8       k1       k5

k43      k2
#ifndef HASH_H
#define HASH_H

typedef struct hash *hash;

hash   newHash ();
hash   newHash2 (double lf);
void   insert (hash h, poly key, poly value);
poly   lookup (hash h, poly key);
void   delete (hash h, poly key);

#endif
Hash Implementation
#include “hash.h”
#define extFactor 2
#define initBuckets 16

struct hash
{
int numBuckets;
int numItems;
};
In Figure

h
buckets

k8    k1   k5

k43   k2
“newHash ()”
hash new ()
{
hash h = checkedMalloc (sizeof (*h));
h->buckets = checkedMalloc (initBuckets *
h->numBuckets = initBuckets;
h->numItems = 0;

return h;
}
“newHash2 ()”
hash new (double lf)
{
hash h = checkedMalloc (sizeof (*h));
h->buckets = checkedMalloc (initBuckets *
h->numBuckets = initBuckets;
h->numItems = 0;

return h;
}
“lookup (hash, key)”
poly lookup (hash h, poly k)
{
int i = k->hashCode (); // how to take this?
int hc = (i & 0x7fffffff) % (h->numBuckets);

poly t =linkedListSearch ((h->buckets)[hc], k);

return t;
}
Ex: lookup (ha, k43)
hc = (hash (k43) & 0x7fffffff) % 8;
ha             // hc = 1
buckets

k8          k1           k5

k43         k2
Ex: lookup (ha, k43)
hc = (hash (k43) & 0x7fffffff) % 8;
ha                  // hc = 1
buckets

compare k43        k8          k1           k5
with k8,
k43         k2
Ex: lookup (ha, k43)
hc = (hash (k43) & 0x7fffffff) % 8;
ha                 // hc = 1
buckets

k8          k1           k5

compare k43        k43         k2
with k43,
found!
“insert (hash, key, value)”
void insert (hash h, poly k, poly v)
{
// buckets extension & items re-hash;

int i = k->hashCode (); // how to take this?
int hc = (i & 0x7fffffff) % (h->numBuckets);

tuple t = newTuple (k, v);

return;
}
Ex: insert (ha, k13)
hc = (hash (k13) & 0x7fffffff) % 8;
ha             // suppose hc==4
buckets

k8          k1           k5

k43         k2
Ex: insert (ha, k13)
hc = (hash (k13) & 0x7fffffff) % 8;
ha             // suppose hc==4
buckets

k8          k13          k5

k43         k1

k2
Complexity
rep’ array   sorted   linked   sorted   hash
op’                array    list     linked
list
lookup() O(n)      O(lg n) O(n)      O(n)     O(1)

insert() O(n)      O(n)     O(n)     O(n)     O(1)

delete() O(n)      O(n)     O(n)     O(n)     O(1)

```
To top