Docstoc

Hash (PowerPoint)

Document Sample
Hash (PowerPoint) Powered By Docstoc
					Hash

       C and Data Structure
            Baojian Hua
          bjhua@ustc.edu.cn
Searching
   A dictionary-like data structure
       contains a collection of tuple data:
            <k1, v1>, <k2, v2>, …
            keys are comparable and distinct
       supports these operations:
            new ()
            insert (dict, k, v)
            lookup (dict, k)
            delete (dict, k)
    Examples
Application    Purpose       Key         Value
Phone Book     phone         name        phone No.
Bank           transaction   visa        $$$
Dictionary     lookup        word        meaning
compiler       symbol        variable    type
www.google.c   search        key words   contents
om
…              …             …           …
       Summary So Far
      rep’ array   sorted   linked   sorted   binary
op’                array    list     linked   search
                                     list     tree
lookup() O(n)      O(lg n) O(n)      O(n)     O(n)

insert() O(n)      O(n)     O(n)     O(n)     O(n)

delete() O(n)      O(n)     O(n)     O(n)     O(n)
What’s the Problem?
   For every mapping (k, v)s
       After we insert it into the dictionary dict,
        we don’t know it’s position!
       Ex: insert (d, “li”, 97), (d, “wang”, 99), (d,
        “zhang”, 100), …
       and then lookup (d, “zhang”);

         (“li”,   (“wang”,   (“zhang”,
          97)        99)        100)          …
Basic Plan
   Start from the array-based approach
        Use an array A to hold elements (k, v)s
        For every key k:
             if we know its position (array index) i from k
             then lookup, insert and delete are simple:
                  A[i]
             done in constant time O(1)
(k, v)
                          i
                                       …
 Example
    Ex: insert (d, “li”, 97), (d, “wang”, 99),
     (d, “zhang”, 100), …;and then lookup
     (d, “zhang”);
                       (“li”, 97)
Problem#1: How to
calculate index from
the key?
                           ?
                                       …
 Example
    Ex: insert (d, “li”, 97), (d, “wang”, 99),
     (d, “zhang”, 100), …;and then lookup
     (d, “zhang”);
                      (“li”, 97)
Problem#2: How
long should array
be?
                          ?
                                       …
Basic Plan
   Save (k, v)s in an array, index
    calculated from k
   Hash function: a method for computing
    index from given keys
                         (“li”, 97)

           hash (“li”)



                                      …
Hash Function
   Given any key, compute an index
       Efficiently computable
       Ideal goals: for any key, the index is uniform
            different keys to different indexes
       However, thorough research problem, :-(
   Next, we assume that the array is of infinite
    length, so the hash function has type:
       int hash (key k);
       Next is a “case analysis” on how different key
        types affect “hash”
Hash Function On “int”
  If the key of hash is of “int” type, the
   hash function is trivial:
int hash (int i)
{
  return i;
}
Hash Function On “char”
  If the key of hash is of “char” type, the
   hash function comes with type
   conversion:
int hash (char c)
{
  return c;
}
Hash Function On “float”
  Also type conversion:
int hash (float f)
{
  return (int)f;
}
// how to deal with 0.aaa, say 0.5?
Hash Function On “string”
   “BillG”:
int hash (char *s)
{
  int i=0, sum=0;
  while (s[i])
  {
    sum += s[i];
    i++;
  }
  return sum;
}
From “int” Hash to Index
   Problems with “int” Hash Type
       At any time, the array is finite
       no negative index (say -10)
   Our goal:
     int i ==> [0, N-1]
     Aha, that’s easy! It’s just:

    abs(i) % N
Bug!
 Note that “int”s range: -231~~231-1
    So abs(-2
              31) = 231 (Overflow!)

 The key step is to wipe the sign bit off

int t = i & 0x7fffffff;
int hc = t % N;
 In summary:

hc = (i & 0x7fffffff) % N;
Collision
   Given two keys k1 and k2, we compute
    two hash values h1, h2[0, N-1]
   If k1<>k2, but h1==h2, then a collision
    occurs

           (k1, v1)        (k2, v2)

                       i
                                      …
Collision Resolution
   Open Addressing
   Re-hash
   Chaining
Chaining
   For collision index i, we keep a separate
    linear list (chain) at index i
            (k1, v1)         (k2, v2)

                        i
                                        …

                       k1


                       k2
Load Factor
   loadFactor=numItems/numBuckets
        defaultLoadFactor: default value of the
        load factor




                  k8       k1       k5


                  k43      k2
“hash” ADT: interface
#ifndef HASH_H
#define HASH_H

typedef struct hash *hash;

hash   newHash ();
hash   newHash2 (double lf);
void   insert (hash h, poly key, poly value);
poly   lookup (hash h, poly key);
void   delete (hash h, poly key);

#endif
Hash Implementation
#include “hash.h”
#define extFactor 2
#define initBuckets 16

struct hash
{
  linkedList (*buckets)[initBuckets];
  int numBuckets;
  int numItems;
  double defaultLoadFactor;
};
In Figure

h
    buckets




              k8    k1   k5


              k43   k2
“newHash ()”
hash new ()
{
  hash h = checkedMalloc (sizeof (*h));
  h->buckets = checkedMalloc (initBuckets *
                            sizeof (linkedList));
  h->numBuckets = initBuckets;
  h->numItems = 0;
  h->defaultLoadFactor = 0.25;

    return h;
}
“newHash2 ()”
hash new (double lf)
{
  hash h = checkedMalloc (sizeof (*h));
  h->buckets = checkedMalloc (initBuckets *
                             sizeof (linkedList));
  h->numBuckets = initBuckets;
  h->numItems = 0;
  h->defaultLoadFactor = lf;

    return h;
}
“lookup (hash, key)”
poly lookup (hash h, poly k)
{
  int i = k->hashCode (); // how to take this?
  int hc = (i & 0x7fffffff) % (h->numBuckets);

    poly t =linkedListSearch ((h->buckets)[hc], k);

    return t;
}
Ex: lookup (ha, k43)
               hc = (hash (k43) & 0x7fffffff) % 8;
ha             // hc = 1
     buckets




                    k8          k1           k5


                    k43         k2
Ex: lookup (ha, k43)
                    hc = (hash (k43) & 0x7fffffff) % 8;
ha                  // hc = 1
     buckets




      compare k43        k8          k1           k5
      with k8,
                         k43         k2
Ex: lookup (ha, k43)
                   hc = (hash (k43) & 0x7fffffff) % 8;
ha                 // hc = 1
     buckets




                        k8          k1           k5


     compare k43        k43         k2
     with k43,
     found!
“insert (hash, key, value)”
void insert (hash h, poly k, poly v)
{
  if (1.0*numItems/numBuckets >=defaultLoadFactor)
    // buckets extension & items re-hash;

    int i = k->hashCode (); // how to take this?
    int hc = (i & 0x7fffffff) % (h->numBuckets);

    tuple t = newTuple (k, v);

    linkedListInsertHead ((h->buckets)[hc], t);

    return;
}
Ex: insert (ha, k13)
               hc = (hash (k13) & 0x7fffffff) % 8;
ha             // suppose hc==4
     buckets




                    k8          k1           k5


                    k43         k2
Ex: insert (ha, k13)
               hc = (hash (k13) & 0x7fffffff) % 8;
ha             // suppose hc==4
     buckets




                    k8          k13          k5


                    k43         k1


                                k2
       Complexity
      rep’ array   sorted   linked   sorted   hash
op’                array    list     linked
                                     list
lookup() O(n)      O(lg n) O(n)      O(n)     O(1)

insert() O(n)      O(n)     O(n)     O(n)     O(1)

delete() O(n)      O(n)     O(n)     O(n)     O(1)

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:5
posted:1/18/2012
language:
pages:33