Docstoc

Lists

Document Sample
Lists Powered By Docstoc
					      Hash Tables


Briana B. Morrison
Adapted from William Collins
LET‟S START WITH A REVIEW OF
EARLIER SEARCH TECHNIQUES:




              Hashing          2
Sequential Search

   Given a vector of integers:
    v = {12, 15, 18, 3, 76, 9, 14, 33, 51, 44}

   What is the best case for sequential search?
       O(1) when value is the first element
   What is the worst case?
       O(n) when value is last element, or value is not in the list
   What is the average case?
       O(1/2 * n) which is O(n)


                                   Hashing                             3
       SEQUENTIAL SEARCH IN STL
// Postcondition: if there is an item in the range of iterators
//                 from first (inclusive) through last
//                 (exclusive) that is equal to value, the
//                 iterator returned is the first iterator i in that
//                 range such that *i = value. Otherwise,
//                 last is returned. The worstTime(n) is O(n).
template <typename InputIterator, typename T>
InputIterator find(InputIterator first, InputIterator last,
                    const T& value)
{
    while (first != last && *first != value)
       ++first;
    return first;
}
                              Hashing                          4
THE worstTimeU(n) IS LINEAR IN n.

DITTO FOR worstTimeS(n),
averageTimeU(n), AND averageTimeS(n).




                 Hashing                5
Binary Search

   Given a vector of integers:
    v = {3, 9, 12, 14, 15, 18, 33, 44, 51, 76}

   What is the best case for binary search?
       O(1) when element is the middle element
   What is the worst case?
       O(log n) when element is first, last, or not in list
   What is the average case?
       O(log n)

                               Hashing                         6
Do you remember how binary search works?


    Distance len = last - first;
    Distance half;
    RandomAccessIterator middle;
    while (len > 0) {
         half = len / 2;
         middle = first + half;
         if (*middle < value) {
              first = middle + 1;
              len = len - half - 1;
         } else
              len = half;
    }
    return first;
}


                                Hashing    7
THE worstTimeU(n) IS LOGARITHMIC IN
n.


DITTO FOR worstTimeS(n),
averageTimeU(n), AND averageTimeS(n).




                  Hashing               8
Applications

   First let’s consider two applications where
    searching is of utmost importance…




                         Hashing                  9
Dictionary ADT (§8.1.1)
    The dictionary ADT models a
     searchable collection of key-
     element items                                Dictionary ADT methods:
    The main operations of a                         find(k): if the dictionary has
     dictionary are searching,                         an item with key k, returns
     inserting, and deleting items                     the position of this element,
    Multiple items with the same key                  else, returns a null position.
     are allowed                                      insertItem(k, o): inserts item
    Applications:                                     (k, o) into the dictionary
         address book                                removeElement(k): if the
         credit card authorization                    dictionary has an item with
         mapping host names (e.g.,                    key k, removes it from the
          cs16.net to internet addresses               dictionary and returns its
          (e.g., 128.148.34.101)                       element. An error occurs if
                                                       there is no such element.
                                                      size(), isEmpty()
                                                      keys(), Elements()

                                     Hashing                                        10
Log File (§8.1.2)

    A log file is a dictionary implemented by means of an unsorted
     sequence
         We store the items of the dictionary in a sequence (based on a
          doubly-linked lists or a circular array), in arbitrary order
    Performance:
         insertItem takes O(1) time since we can insert the new item at the
          beginning or at the end of the sequence
         find and removeElement take O(n) time since in the worst case (the
          item is not found) we traverse the entire sequence to look for an
          item with the given key
    The log file is effective only for dictionaries of small size or for
     dictionaries on which insertions are the most common
     operations, while searches and removals are rarely performed
     (e.g., historical record of logins to a workstation)


                                   Hashing                                  11
NOW LET‟S FOCUS ON AN UNUSUAL
BUT VERY EFFICIENT SEARCH
TECHNIQUE:



         HASHING



             Hashing            12
THE CLASS IN WHICH HASHING IS
IMPLEMENTED IS THE hash_map
CLASS. THIS IS NOT YET IN THE
STANDARD TEMPLATE LIBRARY.




              Hashing           13
HERE ARE THE METHOD
INTERFACES FOR THE hash_map
CLASS:




              Hashing         14
1. // Postcondition: this hash_map is empty.
   hash_map( );


2. // Postcondition: the number of items in this hash_map
   //               has been returned.
   int size( );




                          Hashing                     15
3. // Postcondition: If an item with x's key had already been
   //                 inserted into this hash_map, the pair
   //                 returned consists of an iterator positioned
   //                 at the previously inserted item, and false.
   //                 Otherwise, the pair returned consists of
   //                an iterator positioned at the newly inserted
   //                 item, and true. Timing estimates are
   //                 discussed later.
   pair<iterator, bool> insert (
                    const value_type<const key_type, T>& x);




                             Hashing                         16
4. // Postcondition: if this hash_map already contains a value
   //                whose key part is key, a reference to that
   //                value's second component has been
   //                returned. Otherwise, a new value, <key,
   //                T( )>, is inserted into this hash_map. Timing
   //                 estimates are discussed later.
   T& operator[ ] (const key_type& key);




                             Hashing                         17
5. // Postcondition: If this hash_map contains a value whose
   //                 first component equals key, an iterator
   //                 positioned at that value has been returned.
   //                 Otherwise, an iterator at the same
   //                  position as end() has been returned.
   //                Timing estimates are discussed later.
   iterator find (const key_type& key);



6. // Precondition: itr is positioned at value in this hash_map.
   // Postcondition: the value that itr is positioned at has been
   //                deleted from this hash_map. Timing
   //                estimates are discussed later in this chapter.
   void erase (iterator itr);




                              Hashing                         18
7. // Postcondition: an iterator positioned at the beginning
   //                 of this hash_map has been returned.
   //                 Timing estimates are discussed later.
   iterator begin( );


8. // Postcondition: an iterator has been returned that can be
   //                used in comparisons to terminate iterating
   //                through this hash_map.
   iterator end( );


9. // Postcondition: the space for this hash_map object has
   //                 been deallocated.
   ~hash_map( );



                             Hashing                           19
WE‟LL STUDY THE TIME ESTIMATES
AFTER WE DEFINE THE METHODS.
BUT BASICALLY, FOR find, insert, AND
erase,



averageTime(n) IS CONSTANT!
                  Hashing          20
How should we implement:
CONTIGUOUS
    array? vector? deque? heap?

LINKED
    Linked? list? map?

BUT NONE OF THESE WILL GIVE
CONSTANT AVERAGE TIME FOR
SEARCHES, INSERTIONS AND
REMOVALS.

                    Hashing       21
HERE IS THE BASIC IDEA:



buckets   // an array of values

count     // the number of values in the hash_map




                       Hashing                  22
LET‟S SEE WHERE THAT LEADS.
SUPPOSE persons IS A HASH MAP
THAT WILL HOLD UP TO 1000
VALUES. EACH VALUE CONSISTS
OF A UNIQUE 3-DIGIT INTEGER (THE
KEY), AND A NAME.




              Hashing         23
      buckets       count

0

1

2
       .
       .
       .


999



                Hashing     24
Persons [351] = “Prashant”;

persons [108] = “Barrett”;

persons[435] = “Lin”;


WHERE SHOULD WE STORE THE
VALUE WHOSE KEY IS 351?




                             Hashing   25
          buckets             count
                                3
  0 ?     ?
    …
108 108 Barrett
    …

351 351 Prashant
    …
435 435 Lin


      …

999

                    Hashing           26
NOW FOR SOMETHING SLIGHTLY
DIFFERENT: SUPPOSE persons IS A
HASH MAP THAT HOLDS UP TO 1000
VALUES. EACH VALUE CONSISTS OF
A 10-DIGIT TELEPHONE NUMBER
(THE KEY), AND A NAME.



             Hashing        27
persons [9876543210] = “Prashant”;
persons [6103301256] = “Barrett”;
persons [6103309816] = “Lin”;
persons [4153576256] = “Sutey”;



WHERE SHOULD THESE VALUES
BE STORED?



                         Hashing     28
To make these values fit into the table, we need to mod by
the table size; i.e., key % 1000.


9876543210                          210

6103301256                          256

6103309816                          816

4153576256                          OOPS!




                          Hashing                        29
WHEN TWO DIFFERENT KEYS MAP TO
THE SAME INDEX, THAT IS CALLED A
COLLISION.



KEYS THAT MAP TO THE SAME INDEX
ARE CALLED SYNONYMS.



              Hashing        30
HASHING:

AN ALGORITHM THAT TRANSFORMS
A KEY INTO AN ARRAY INDEX.




            Hashing       31
THE ALGORITHM HAS TWO PARTS:

1. A HASH FUNCTION: AN EASILY
   COMPUTABLE OPERATION ON THE
   KEY THAT RETURNS AN unsigned
   long, WHICH IS THEN CONVERTED
 INTO AN INDEX IN THE ARRAY
 buckets;

2. A COLLISION HANDLER.
              Hashing         32
Hash Functions and Hash
Tables (§8.2)
   A hash function h maps keys of a given type to integers in a
    fixed interval [0, N - 1]
   Example:
         h(x) = x mod N
    is a hash function for integer keys
   The integer h(x) is called the hash value of key x

   A hash table for a given key type consists of
      Hash function h

      Array (called table) of size N

   When implementing a dictionary with a hash table, the goal
    is to store item (k, o) at index i = h(k)
                              Hashing                            33
Example
   We design a hash table for a
    dictionary storing items (SSN,
    Name), where SSN (social
                                            0   
    security number) is a nine-digit                025-612-0001
                                            1
    positive integer                        2       981-101-0002
   Our hash table uses an array of         3   
    size N = 10,000 and the hash            4       451-229-0004
    function




                                                …
    h(x) = last four digits of x
                                         9997   
                                         9998       200-751-9998
                                         9999   




                               Hashing                             34
Hash Functions

   A hash function should be quick and easy to
    compute.
   A hash function should achieve an even
    distribution of the keys that actually occur
    across the range of indices for both random
    and non-random data.
   Calculation should involve the entire search
    key.


                        Hashing                    35
Examples of Hash Functions
   Usually involves taking the key, chopping it
    up, mix the pieces together in various ways
   Examples:
       Truncation – ignore part of key, use the remaining
        part as the index
       Folding – partition the key into several parts and
        combine the parts in a convenient way (adding,
        etc.)
   After calculating the index, use modular
    arithmetic. Divide by the size of the index
    range, and take the remainder as the result

                             Hashing                     36
Example Hash Function
                                0
hf(22) = 22   22 % 7 = 1        1   tableEntry[1]

                                2
                                3
 hf(4) = 4    4%7=4             4   tableEntry[4]
                                5
                                6

                      Hashing                       37
Hash Functions (§8.2.2)


    A hash function is usually        The hash code map is
     specified as the                   applied first, and the
                                        compression map is
     composition of two                 applied next on the
     functions:                         result, i.e.,
     Hash code map:                          h(x) = h2(h1(x))
      h1: keys  integers              The goal of the hash
                                        function is to “disperse”
     Compression map:                   the keys in an
      h2: integers  [0, N - 1]         apparently random way



                          Hashing                              38
HERE IS THE START OF THE
hash_map CLASS:
template<typename Key, typename T,
          typename HashFunc>
class hash_map
{
THE THIRD TEMPLATE PARAMETER
IS A FUNCTION CLASS: A CLASS IN
WHICH THE FUNCTION-CALL
OPERATOR, operator( ), IS
OVERLOADED. THIS IS THE HASH
FUNCTION CLASS.
                    Hashing          39
THE HEADING FOR operator( ) IS
unsigned long operator( ) (const key_type& key)

FOR EXAMPLE, WE CAN DEFINE A
SIMPLE HASH FUNCTION CLASS IF
EACH KEY IS AN int:
class hash_func
{
   public:

        unsigned long operator( ) (const int& key)
        {
             return (unsigned long)key;
        } // overloaded operator( )
} // class hash_func

                          Hashing                    40
HERE IS A PROGRAM WITH A
hash_map CLASS IN WHICH EACH
VALUE CONSISTS OF A TELEPHONE
EXTENSION AND THE PERSON AT
THAT EXTENSION. THE ABOVE
hash_func IS USED.




             Hashing        41
int main()
{
  typedef hash_map<int, string, hash_func> hash_class;

 hash_class extensions;
 hash_class::iterator itr;

 extensions [5520] = "Yvonne";
 extensions [5415] = "Jim";
 extensions [5416] = "Penny";
 extensions [5537] = "Chun Wai";
 extensions [5273] = "Jim";

 for (itr = extensions.begin(); itr != extensions.end(); itr++)
   cout << (*itr).first << " " << (*itr).second << endl;

  cout << "The number of items is " << extensions.size() <<
endl;

                              Hashing                             42
      if (extensions.find (5537) != extensions.end())
  {
      cout << endl << "At extension " << “5537”
            << " is " << extensions [5537] << endl;
      extensions.erase (extensions.find (5537));
  } // if
  for (itr = extensions.begin( ); itr != extensions.end( ); itr++)
      cout << (*itr).first << " " << (*itr).second << endl;

   return 0;
} // main




                                Hashing                              43
HERE IS THE OUTPUT:
5520 Yvonne
5537 Chun Wai
5415 Jim
5416 Penny
5273 Jim
The number of items is 5

At extension 5537 is Chun Wai
5520 Yvonne
5415 Jim
5416 Penny
5273 Jim



                           Hashing   44
THERE IS NO OBVIOUS ORDER OF
THE KEYS. IF THE CONTAINER MUST
ALWAYS BE IN ORDER, USE A map
INSTEAD OF A hash_map.




              Hashing        45
AS YOU MIGHT HAVE GUESSED,
HASHING IS INEFFICIENT WHEN
THERE ARE A LOT OF COLLISIONS.




              Hashing            46
USERS OF THE hash_map CLASS
“HOPE” THAT THE KEYS ARE
SCATTERED RANDOMLY
THROUGHOUT THE TABLE. THIS
HOPE IS FORMALLY STATED AS
FOLLOWS:




             Hashing          47
THE UNIFORM HASHING ASSUMPTION

EACH KEY IS EQUALLY LIKELY TO
HASH TO ANY ONE OF THE TABLE
ADDRESSES, INDEPENDENTLY OF
WHERE THE OTHER KEYS HAVE
HASHED.



              Hashing           48
EVEN IF THE UNIFORM HASHING
ASSUMPTION HOLDS, THERE MAY
STILL BE COLLISIONS.




             Hashing          49
Collision Handlers

NOW WE’LL LOOK AT SPECIFIC COLLISION
  HANDLERS:
 Chaining

 Linear Probing (Open Addressing)

 Double Hashing

 Quadratic Hashing




                 Hashing           50
Collision Handling
(§8.2.5)
   Collisions occur when    0 
                             1     025-612-0001
    different elements are
                             2 
    mapped to the same cell 3 
                                   451-229-0004 981-101-0004
   Chaining: let each cell 4
    in the table point to a
    linked list of elements  Chaining is simple, but
    that map there            requires additional
                              memory outside the
                              table


                           Hashing                      51
CHAINING (ALSO CALLED CHAINED
HASHING): AT INDEX i IN buckets,
STORE THE LIST OF ALL VALUES
WHOSE KEYS HASH TO i.

HERE ARE THE FIELDS FOR CHAINED
HASHING:


                Hashing            52
list <value_type< const key_type, T> >* buckets;
               // at each index in the array buckets,
               // we will store the list of all
               // items whose keys hashed to that index

int count,      // number of items in this hash_map
     length;    // number of buckets in this hash_map
// these two fields are used to calculate the load to
// know when to increase the size of the table

hash_func hash;      // hash is a function object




                         Hashing                    53
INSERT VALUES WITH THESE KEYS:

2155551612
7178626358
6103309358
6103309000
7178621359
7178627451
2155554358
6103300451

ASSUME length = 1000. IGNORE 2ND COMPONENT
IN VALUE, IGNORE prev FIELD, USE „X‟ AT END.



                    Hashing               54
       buckets                                      count

 0                6103309000       X                      8
 1      X

 ...

358              7178626358                  6103309358


359              7178621359    X             2155554358       X
...

451              7178627451                   6103300451          X

...

612              2155551612    X

                                   Hashing                            55
FOR THE find METHOD,

averageTimeS(n, m) IS CONSTANT.




               Hashing            56
EVEN IF THE UNIFORM HASHING
ASSUMPTION HOLDS, IT IS POSSIBLE
FOR EACH KEY TO HASH TO THE
SAME INDEX. TO SEARCH THE LIST
AT THAT INDEX TAKES LINEAR-IN-n
TIME.

SO worstTimeS(n, m) IS LINEAR IN n.

                 Hashing              57
THE SAME RESULTS,

CONSTANT AVERAGE TIME AND
LINEAR WORST TIME,

Also HOLD FOR insert AND erase.




                Hashing           58
The next collision handler is Linear Probing
(OPEN-ADDRESS HASHING).



AT MOST ONE VALUE IS STORED AT
EACH INDEX IN buckets.



                    Hashing              59
HERE IS HOW THE unsigned long
RETURNED BY hash_func IS
CONVERTED INTO AN INDEX:

int index = hash_func (key) % length;

THIS IS DONE IN THE HASH_MAP
CLASS, BECAUSE ONLY THE
HASH_MAP CLASS KNOWS THE
LENGTH OF THE ARRAY.

                        Hashing         60
WHEN COLLISION OCCURS:

 SEARCH THE TABLE UNTIL AN
 “OPEN” SLOT IN buckets IS FOUND.
THIS IS ALSO KNOWN AS “OFFSET-
OF-1” COLLISION HANDLER.




                Hashing             61
OFFSET-OF-1 COLLISION HANDLER:

IF buckets [index] ALREADY HAS
ANOTHER ELEMENT, TRY

buckets [index + 1], buckets [index + 2], …,

buckets [length – 1], buckets [0],

buckets [1], …, buckets [index – 1].
                    Hashing             62
       Hash Table Using Open Probe
  0           Insert Example
       Addressing 45
        77     1        0    77      1         0    77      1   0     77      1
  1                    1 89             1 89                          89
        89     1
                     (mod by table size … % 11) 1
                                1                               1             1
  2                    2     45      2         2    45      2   2     45      2
  3     14     1       3     14      1         3    14      1   3     14      1
  4                    4                       4    35      3   4     35      3
  5                     5                      5                5     76      7
  6     94     1        6    94      1         6    94      1   6     94      1
   7                    7                      7                7
   8                    8                      8                8
   9                    9                      9                9
 10     54    1        10    54      1        10    54      1   10    54      1

      Insert                Insert                 Insert            Insert
54, 77, 94, 89, 14            45                     35                76
        (a)                   (b)                    (c)               (d)
                                         Hashing                              63
       Hash Table Using Open Probe
  0    Addressing Example 35
        77     1     0  Insert
                           77      1         0    77      1   0     77      1
  1     89     1     1     89      1         1    89      1   1     89      1
  2                  2     45      2         2    45      2   2     45      2
  3     14     1     3     14      1         3    14      1   3     14      1
  4                  4                       4    35      3   4     35      3
  5                  5                       5                5     76      7
  6     94     1     6     94      1         6    94      1   6     94      1
   7                 7                       7                7
   8                 8                       8                8
   9                 9                       9                9
 10     54    1      10    54      1        10    54      1   10    54      1

      Insert              Insert                 Insert            Insert
54, 77, 94, 89, 14          45                     35                76
        (a)                 (b)                    (c)               (d)
                                       Hashing                              64
       Hash Table Using Open Probe
  0    Addressing Example
        77     1     0     77    Insert 76
                                   1         0    77      1   0     77      1
  1     89     1     1     89      1         1    89      1   1     89      1
  2                  2     45      2         2    45      2   2     45      2
  3     14     1     3     14      1         3    14      1   3     14      1
  4                  4                       4    35      3   4     35      3
  5                  5                       5                5     76      7
  6     94     1     6     94      1         6    94      1   6     94      1
   7                 7                       7                7
   8                 8                       8                8
   9                 9                       9                9
 10     54    1      10    54      1        10    54      1   10    54      1

      Insert              Insert                 Insert            Insert
54, 77, 94, 89, 14          45                     35                76
        (a)                 (b)                    (c)               (d)
                                       Hashing                              65
       Hash Table Using Open Probe
  0    Addressing Example
        77     1     0     77      1         0    77      1   0     77      1
  1     89     1     1     89      1         1    89      1   1     89      1
  2                  2     45      2         2    45      2   2     45      2
  3     14     1     3     14      1         3    14      1   3     14      1
  4                  4                       4    35      3   4     35      3
  5                  5                       5                5     76      7
  6     94     1     6     94      1         6    94      1   6     94      1
   7                 7                       7                7
   8                 8                       8                8
   9                 9                       9                9
 10     54    1      10    54      1        10    54      1   10    54      1

      Insert              Insert                 Insert            Insert
54, 77, 94, 89, 14          45                     35                76
        (a)                 (b)                    (c)               (d)
                                       Hashing                              66
Linear Probing
   Open addressing: the
    colliding item is placed
    in a different cell of                  Example:
    the table                           
                                               h(x) = x mod 13
   Linear probing handles                     Insert keys 18, 41, 22,
    collisions by placing the                   44, 59, 32, 31, 73, in this
    colliding item in the next                  order
    (circularly) available table
    cell
   Each table cell inspected
    is referred to as a “probe”
   Colliding items lump
                                        0 1 2 3 4 5 6 7 8 9 10 11 12
    together, causing future
    collisions to cause a
    longer sequence of
    probes                                  41    18 44 59 32 22 31 73
                                        0 1 2 3 4 5 6 7 8 9 10 11 12

                              Hashing                                         67
WE NEED TO KNOW WHEN A SLOT IS FULL
OR OCCUPIED.

HOW?

INSTEAD OF JUST T() STORED IN THE
  BUCKETS (BECAUSE T() COULD BE A
  VALID VALUE), THE BUCKET WILL STORE
  AN INSTANCE OF THE VALUE_TYPE
  CLASS.


                 Hashing           68
TO INDICATE WHETHER A LOCATION
IS OCCUPIED, THE value_type CLASS
WILL HAVE

bool occupied;

IN ADDITION TO

T key;


                 Hashing       69
      key     occupied
  0      ?    false

       …      false

 54    1069   true         1069 % 203 = 54
 55     460   true          460 % 203 = 54
 56   1070    true         1070 % 203 = 55

109    312    true          312 % 203 = 109


201    607     true         607 % 203 = 201
202           false

                 Hashing                      70
Retrieve

   What about when we want to retrieve?

   Consider the previous example….




                       Hashing             71
       Hash Table Using Open Probe
  0
  1
       Addressing Example
        77     1     0     77
                           89
                                   1         0    77
                                                  89
                                                          1   0     77
                                                                    89
                                                                            1
        89     1     1             1         1            1   1             1
  2    Find the value 35. (% 11)45
                 2  45 2     2                            2   2     45      2
  3     14     1     3     14      1         3    14      1   3     14      1
  4    Now find the value 76.4
                 4                                35      3   4     35      3
  5                  5                       5                5     76      7
  6     94     1     6     94      1         6    94      1   6     94      1
   7                 7                       7                7
   8   Now find the value 33. 8
                 8                                            8
   9                 9                       9                9
 10     54    1      10    54      1        10    54      1   10    54      1

      Insert              Insert                 Insert            Insert
54, 77, 94, 89, 14          45                     35                76
        (a)                 (b)                    (c)               (d)

                                       Hashing                              72
       Hash Table Using Open Probe
  0
  1
       Addressing Example
        77     1     0     77
                           89
                                   1         0    77
                                                  89
                                                          1   0     77
                                                                    89
                                                                            1
        89     1     1             1         1            1   1             1
  2    Now delete 35. (% 11) 2
                2  45  2                          45      2   2     45      2
  3     14     1     3     14      1         3    14      1   3     14      1
  4    Now find the value 76.4
                 4                                35      3   4     35      3
  5                  5                       5                5     76      7
  6     94     1     6     94      1         6    94      1   6     94      1
   7                 7                       7                7
   8   Now find the value 33. 8
                 8                                            8
   9                 9                       9                9
 10     54    1      10    54      1        10    54      1   10    54      1

      Insert              Insert                 Insert            Insert
54, 77, 94, 89, 14          45                     35                76
        (a)                 (b)                    (c)               (d)

                                       Hashing                              73
IN THE SECOND EXAMPLE, SUPPOSE
itr IS POSITIONED AT INDEX 54 AND
THE MESSAGE IS

my_map.erase (itr);




                  Hashing       74
      key     occupied       Erase value 1069.
  0      ?    false

       …      false

 54    1069   true
              false         1069 % 203 = 54
 55     460   true           460 % 203 = 54
 56   1070    true          1070 % 203 = 55

109    312    true           312 % 203 = 109


201    607     true          607 % 203 = 201
202           false


                  Hashing                        75
                            Now search for 460.
      key     occupied
  0      ?    false

       …      false

 54    1069   false        1069 % 203 = 54
 55     460   true          460 % 203 = 54
 56   1070    true         1070 % 203 = 55

109    312    true          312 % 203 = 109


201    607     true         607 % 203 = 201
202           false

                 Hashing                          76
NOW A SEARCH 460 FOR WOULD BE
UNSUCCESSFUL BECAUSE 460
INITIALLY HASHES TO 54, AN
UNOCCUPIED LOCATION.




             Hashing        77
SOLUTION:
bool marked_for_removal;

THE CONSTRUCTOR FOR
VALUE_TYPE SETS EACH bucket‟s
marked_for_removal FIELD TO false.
insert SETS marked_for_removal TO
false;
erase SETS marked_for_removal TO
true.
SO AFTER THE INSERTIONS:
                     Hashing         78
                       marked_for_
      key     occupied removal
 0       ?     false false

       …       false        false

54     1069    true         false   1069 % 203 = 54
55      460    true         false   460 % 203 = 54
56    1070     true         false   1070 % 203 = 55

109    312     true         false   312 % 203 = 109


201    607      true        false   607 % 203 = 201
202            false        false

                  Hashing                         79
AFTER DELETING THE VALUE WITH
KEY 1069:




             Hashing        80
                       marked_for_
      key     occupied removal
 0       ?     false false

       …       false        false

54     1069    true         true    1069 % 203 = 54
55      460    true         false   460 % 203 = 54
56    1070     true         false   1070 % 203 = 55

109    312     true         false   312 % 203 = 109


201    607      true        false   607 % 203 = 201
202            false        false

                  Hashing                         81
FOR find, AN UNSUCCESSFUL
SEARCH CANNOT STOP UNTIL buckets
[index].marked_for_removal = false.




               Hashing          82
CLUSTER: A SEQUENCE OF NON-EMPTY
LOCATIONS

KEYS THAT HASH TO 54 FOLLOW THE
SAME COLLISION-PATH AS KEYS THAT
HASH TO 55, …




               Hashing             83
                       marked_for_
      key     occupied removal
 0       ?     false false

       …       false        false

54     1069    true         false   1069 % 203 = 54
55      460    true         false   460 % 203 = 54
56    1070     true         false   1070 % 203 = 55

109    312     true         false   312 % 203 = 109


201    607      true        false   607 % 203 = 201
202            false        false

                  Hashing                         84
PRIMARY CLUSTERING: THE
PHENOMENON THAT OCCURS WHEN
THE COLLISION HANDLER ALLOWS
THE GROWTH OF CLUSTERS TO
ACCUMULATE.


THIS WILL OCCUR WITH OFFSET-OF-
1 OR ANY CONSTANT OFFSET.


              Hashing        85
SOLUTION 1: DOUBLE HASHING,
THAT IS, OBTAIN BOTH INDICES
AND OFFSETS BY HASHING:
unsigned long hash_int = hash (key);
int index = hash_int % length,
offset = hash_int / length;
NOW THE OFFSET DEPENDS ON THE
KEY, SO DIFFERENT KEYS WILL
USUALLY HAVE DIFFERENT
OFFSETS, SO NO MORE PRIMARY
CLUSTERING!
                            Hashing    86
 TO GET A NEW INDEX:
       index = (index + offset) % length;


Notice that if a collision occurs, you
rehash from the NEW index value.



                      Hashing               87
EXAMPLE: length = 11         0
                             1
key      index     offset
                             2
15        4        1
                             3
19        8        1
16        5        1         4
58        3        5         5
27        5        2         6
35        2        3         7
30        8        2         8
47        3        4
                             9
                             10
WHERE WOULD THESE KEYS GO IN buckets?
                   Hashing          88
index   key
 0      47
 1
 2      35
 3      58
 4      15
 5      16
 6
 7      27
 8      19
 9
10      30

              Hashing   89
PROBLEM: WHAT IF OFFSET IS A MULTIPLE OF length?
EXAMPLE: length = 11                    0  47
key        index     offset             1
15          4        1
                                        2  35
19          8        1
16          5        1                  3  58
58          3        5                  4  15
27          5        2                  5  16
35          2        3                  6
47          3        4                  7  27
246         4        22                 8  19
// BUT 15 IS AT INDEX 4
                                        9
// FOR KEY 246,
   NEW INDEX = (4 + 22) % 11 = 4. OOPS! 10 30
                     Hashing                 90
SOLUTION:

if (offset % length == 0)
             offset = 1;

ON AVERAGE, offset % length WILL
EQUAL 0 ONLY ONCE IN EVERY
length TIMES.

                        Hashing    91
FINAL PROBLEM: WHAT IF length
HAS SEVERAL FACTORS?
EXAMPLE: length = 20
key    index       offset
20      0            1
25      5            1
30    10             1
35    15             1
110   10             5 // BUT 30 IS AT INDEX 10

FOR KEY 110, NEW INDEX = (10 + 5) % 20 = 15,
WHICH IS OCCUPIED, SO NEW INDEX = (15 + 5)
% 20, WHICH IS OCCUPIED, SO NEW INDEX = ...
                     Hashing                92
SOLUTION: MAKE length A PRIME.




                Hashing          93
THIS VERSION OF OPEN-ADDRESS
HASHING IS FAST. IF THE UNIFORM
HASHING ASSUMPTION HOLDS,
averageTime(n, m) FOR SEARCHING,
INSERTING AND REMOVING IS
CONSTANT O(1).




               Hashing         94
ANOTHER SOLUTION:
QUADRATIC HASHING, THAT IS,
ONCE COLLISION OCCURS AT h, GO
TO LOCATION h + 1, THEN IF
COLLISION OCCURS THERE GO TO
LOCATION h + 4, then h + 9, then h + 16,
etc.
unsigned long hash_int = hash (key);
int index = hash_int % length,
offset = i2;
Notice that h stays at the same location.
No clustering.
                            Hashing         95
QUADRATIC REHASHING                     0
EXAMPLE: length = 11                    1
                                        2
key    index   offset
                                        3
15      4
19      8                               4
16      5                               5
58      3                               6
27      5    1, final place index = 6   7
35      2                               8
30      8    1, final place index = 9
                                        9
47      3    4, final place index = 7
                                        10

                    Hashing                  96
  We know that hashing becomes inefficient as
the table fills up. What to do?

             EXPAND!


                     Hashing             97
WHAT ABOUT THE SIZE OF buckets,
AND SHOULD THAT ARRAY EVER BE
RE-SIZED?

RE-SIZE WHENEVER THE LOAD
FACTOR, THE RATIO OF count TO
length, EXCEEDS 0.75.




               Hashing          98
TO RE-SIZE, WE WILL DOUBLE THE
OLD CAPACITY, PLUS 1. WHY +1?
ANOTHER OPTION…FIND NEXT
PRIME NUMBER AFTER DOUBLING.

NOTE THAT WE RE-SIZE WHENEVER
THE LOAD FACTOR, THAT IS, THE
AVERAGE LIST SIZE, EXCEEDS 0.75.


               Hashing        99
IN check_for_expansion, IF count >=
length * 0.75, CREATE A NEW ARRAY
OF DOUBLE THE OLD LENGTH (PLUS
1). FOR EACH VALUE IN THE OLD
ARRAY, ITERATE THROUGH
AND HASH EACH VALUE TO
THE NEW ARRAY. FINALLY, ERASE
THE OLD ARRAY.

                Hashing          100
Performance

   HOW DOES DOUBLE-HASHING
    COMPARE WITH CHAINED HASHING?




                  Hashing           101
Performance of Hashing

   In the worst case,                 The expected running
    searches, insertions and            time of all the dictionary
    removals on a hash                  ADT operations in a hash
                                        table is O(1)
    table take O(n) time
                                       In practice, hashing is
   The worst case occurs               very fast provided the load
    when all the keys                   factor is not close to 100%
    inserted into the                  Applications of hash
    dictionary collide                  tables:
                                            small databases
    The load factor a = n/N
                                        

                                           compilers
    affects the performance                browser caches
    of a hash table

                          Hashing                                102
GROUP EXERCISE: ASSUME THAT length = 13.
INSERT THE FOLLOWING KEYS INTO A HASH
TABLE USING 1) OPEN ADDRESS, 2) DOUBLE
HASHING, and 3) CHAINING

20, 33, 49, 22, 26, 140, 38, 9, 7, 3, 0, 1




                            Hashing          103
 Summary Slide 1
§- Hash Table
 - simulates the fastest searching technique, knowing
   the index of the required value in a vector and array
   and apply the index to access the value, by applying
   a hash function that converts the data to an integer
 - After obtaining an index by dividing the value from
   the hash function by the table size and taking the
   remainder, access the table. Normally, the number
   of elements in the table is much smaller than the
   number of distinct data values, so collisions occur.
 - To handle collisions, we must place a value that
   collides with an existing table element into the table
   in such a way that we can efficiently access it later.
                          Hashing                   104
                                                     104
 Summary Slide 2

§- Hash Table (Cont…)
 - average running time for a search of a hash table is
   O(1)
 - the worst case is O(n)




                         Hashing                   105
                                                    105
 Summary Slide 3
§- Collision Resolution
 - Types:
   1) linear open probe addressing
         - the table is a vector or array of static size
         - After using the hash function to compute a
         table index, look up the entry in the table.
            - If the values match, perform an update if
               necessary.
            - If the table entry is empty, insert the value in
               the table.

                            Hashing                      106
                                                         106
 Summary Slide 4
§- Collision Resolution (Cont…)
 - Types:
   1) linear open probe addressing
            - Otherwise, probe forward circularly, looking
               for a match or an empty table slot.
            - If the probe returns to the original starting
               point, the table is full.
         - you can search table items that hashed to
            different table locations.
         - Deleting an item difficult.

                           Hashing                     107
                                                        107
 Summary Slide 5
§- Collision Resolution (Cont…)
  2) chaining with separate lists.
       - the hash table is a vector of list objects
          - Each list is a sequence of colliding items.
       - After applying the hash function to compute
          the table index, search the list for the data
          value.
       - If it is found, update its value; otherwise, insert
          the value at the back of the list.
       - you search only items that collided at the
          same table location
                          Hashing                      108
                                                       108
 Summary Slide 6

§- Collision Resolution (Cont…)
     - there is no limitation on the number of values
        in the table, and deleting an item from the
        table involves only erasing it from its
        corresponding list




                       Hashing                    109
                                                  109

				
DOCUMENT INFO