# Lists

Document Sample

```					      Hash Tables

Briana B. Morrison
Adapted from William Collins
EARLIER SEARCH TECHNIQUES:

Hashing          2
Sequential Search

   Given a vector of integers:
v = {12, 15, 18, 3, 76, 9, 14, 33, 51, 44}

   What is the best case for sequential search?
   O(1) when value is the first element
   What is the worst case?
   O(n) when value is last element, or value is not in the list
   What is the average case?
   O(1/2 * n) which is O(n)

Hashing                             3
SEQUENTIAL SEARCH IN STL
// Postcondition: if there is an item in the range of iterators
//                 from first (inclusive) through last
//                 (exclusive) that is equal to value, the
//                 iterator returned is the first iterator i in that
//                 range such that *i = value. Otherwise,
//                 last is returned. The worstTime(n) is O(n).
template <typename InputIterator, typename T>
InputIterator find(InputIterator first, InputIterator last,
const T& value)
{
while (first != last && *first != value)
++first;
return first;
}
Hashing                          4
THE worstTimeU(n) IS LINEAR IN n.

DITTO FOR worstTimeS(n),
averageTimeU(n), AND averageTimeS(n).

Hashing                5
Binary Search

   Given a vector of integers:
v = {3, 9, 12, 14, 15, 18, 33, 44, 51, 76}

   What is the best case for binary search?
   O(1) when element is the middle element
   What is the worst case?
   O(log n) when element is first, last, or not in list
   What is the average case?
   O(log n)

Hashing                         6
Do you remember how binary search works?

Distance len = last - first;
Distance half;
RandomAccessIterator middle;
while (len > 0) {
half = len / 2;
middle = first + half;
if (*middle < value) {
first = middle + 1;
len = len - half - 1;
} else
len = half;
}
return first;
}

Hashing    7
THE worstTimeU(n) IS LOGARITHMIC IN
n.

DITTO FOR worstTimeS(n),
averageTimeU(n), AND averageTimeS(n).

Hashing               8
Applications

   First let’s consider two applications where
searching is of utmost importance…

Hashing                  9
   The dictionary ADT models a
searchable collection of key-
element items                                Dictionary ADT methods:
   The main operations of a                         find(k): if the dictionary has
dictionary are searching,                         an item with key k, returns
inserting, and deleting items                     the position of this element,
   Multiple items with the same key                  else, returns a null position.
are allowed                                      insertItem(k, o): inserts item
   Applications:                                     (k, o) into the dictionary
   address book                                removeElement(k): if the
   credit card authorization                    dictionary has an item with
   mapping host names (e.g.,                    key k, removes it from the
cs16.net to internet addresses               dictionary and returns its
(e.g., 128.148.34.101)                       element. An error occurs if
there is no such element.
   size(), isEmpty()
   keys(), Elements()

Hashing                                        10
Log File (§8.1.2)

   A log file is a dictionary implemented by means of an unsorted
sequence
   We store the items of the dictionary in a sequence (based on a
doubly-linked lists or a circular array), in arbitrary order
   Performance:
   insertItem takes O(1) time since we can insert the new item at the
beginning or at the end of the sequence
   find and removeElement take O(n) time since in the worst case (the
item is not found) we traverse the entire sequence to look for an
item with the given key
   The log file is effective only for dictionaries of small size or for
dictionaries on which insertions are the most common
operations, while searches and removals are rarely performed
(e.g., historical record of logins to a workstation)

Hashing                                  11
NOW LET‟S FOCUS ON AN UNUSUAL
BUT VERY EFFICIENT SEARCH
TECHNIQUE:

HASHING

Hashing            12
THE CLASS IN WHICH HASHING IS
IMPLEMENTED IS THE hash_map
CLASS. THIS IS NOT YET IN THE
STANDARD TEMPLATE LIBRARY.

Hashing           13
HERE ARE THE METHOD
INTERFACES FOR THE hash_map
CLASS:

Hashing         14
1. // Postcondition: this hash_map is empty.
hash_map( );

2. // Postcondition: the number of items in this hash_map
//               has been returned.
int size( );

Hashing                     15
3. // Postcondition: If an item with x's key had already been
//                 inserted into this hash_map, the pair
//                 returned consists of an iterator positioned
//                 at the previously inserted item, and false.
//                 Otherwise, the pair returned consists of
//                an iterator positioned at the newly inserted
//                 item, and true. Timing estimates are
//                 discussed later.
pair<iterator, bool> insert (
const value_type<const key_type, T>& x);

Hashing                         16
4. // Postcondition: if this hash_map already contains a value
//                whose key part is key, a reference to that
//                value's second component has been
//                returned. Otherwise, a new value, <key,
//                T( )>, is inserted into this hash_map. Timing
//                 estimates are discussed later.
T& operator[ ] (const key_type& key);

Hashing                         17
5. // Postcondition: If this hash_map contains a value whose
//                 first component equals key, an iterator
//                 positioned at that value has been returned.
//                 Otherwise, an iterator at the same
//                  position as end() has been returned.
//                Timing estimates are discussed later.
iterator find (const key_type& key);

6. // Precondition: itr is positioned at value in this hash_map.
// Postcondition: the value that itr is positioned at has been
//                deleted from this hash_map. Timing
//                estimates are discussed later in this chapter.
void erase (iterator itr);

Hashing                         18
7. // Postcondition: an iterator positioned at the beginning
//                 of this hash_map has been returned.
//                 Timing estimates are discussed later.
iterator begin( );

8. // Postcondition: an iterator has been returned that can be
//                used in comparisons to terminate iterating
//                through this hash_map.
iterator end( );

9. // Postcondition: the space for this hash_map object has
//                 been deallocated.
~hash_map( );

Hashing                           19
WE‟LL STUDY THE TIME ESTIMATES
AFTER WE DEFINE THE METHODS.
BUT BASICALLY, FOR find, insert, AND
erase,

averageTime(n) IS CONSTANT!
Hashing          20
How should we implement:
CONTIGUOUS
array? vector? deque? heap?

BUT NONE OF THESE WILL GIVE
CONSTANT AVERAGE TIME FOR
SEARCHES, INSERTIONS AND
REMOVALS.

Hashing       21
HERE IS THE BASIC IDEA:

buckets   // an array of values

count     // the number of values in the hash_map

Hashing                  22
LET‟S SEE WHERE THAT LEADS.
SUPPOSE persons IS A HASH MAP
THAT WILL HOLD UP TO 1000
VALUES. EACH VALUE CONSISTS
OF A UNIQUE 3-DIGIT INTEGER (THE
KEY), AND A NAME.

Hashing         23
buckets       count

0

1

2
.
.
.

999

Hashing     24
Persons [351] = “Prashant”;

persons [108] = “Barrett”;

persons[435] = “Lin”;

WHERE SHOULD WE STORE THE
VALUE WHOSE KEY IS 351?

Hashing   25
buckets             count
3
0 ?     ?
…
108 108 Barrett
…

351 351 Prashant
…
435 435 Lin

…

999

Hashing           26
NOW FOR SOMETHING SLIGHTLY
DIFFERENT: SUPPOSE persons IS A
HASH MAP THAT HOLDS UP TO 1000
VALUES. EACH VALUE CONSISTS OF
A 10-DIGIT TELEPHONE NUMBER
(THE KEY), AND A NAME.

Hashing        27
persons [9876543210] = “Prashant”;
persons [6103301256] = “Barrett”;
persons [6103309816] = “Lin”;
persons [4153576256] = “Sutey”;

WHERE SHOULD THESE VALUES
BE STORED?

Hashing     28
To make these values fit into the table, we need to mod by
the table size; i.e., key % 1000.

9876543210                          210

6103301256                          256

6103309816                          816

4153576256                          OOPS!

Hashing                        29
WHEN TWO DIFFERENT KEYS MAP TO
THE SAME INDEX, THAT IS CALLED A
COLLISION.

KEYS THAT MAP TO THE SAME INDEX
ARE CALLED SYNONYMS.

Hashing        30
HASHING:

AN ALGORITHM THAT TRANSFORMS
A KEY INTO AN ARRAY INDEX.

Hashing       31
THE ALGORITHM HAS TWO PARTS:

1. A HASH FUNCTION: AN EASILY
COMPUTABLE OPERATION ON THE
KEY THAT RETURNS AN unsigned
long, WHICH IS THEN CONVERTED
INTO AN INDEX IN THE ARRAY
buckets;

2. A COLLISION HANDLER.
Hashing         32
Hash Functions and Hash
Tables (§8.2)
   A hash function h maps keys of a given type to integers in a
fixed interval [0, N - 1]
   Example:
h(x) = x mod N
is a hash function for integer keys
   The integer h(x) is called the hash value of key x

   A hash table for a given key type consists of
 Hash function h

 Array (called table) of size N

   When implementing a dictionary with a hash table, the goal
is to store item (k, o) at index i = h(k)
Hashing                            33
Example
   We design a hash table for a
dictionary storing items (SSN,
Name), where SSN (social
0   
security number) is a nine-digit                025-612-0001
1
positive integer                        2       981-101-0002
   Our hash table uses an array of         3   
size N = 10,000 and the hash            4       451-229-0004
function

…
h(x) = last four digits of x
9997   
9998       200-751-9998
9999   

Hashing                             34
Hash Functions

   A hash function should be quick and easy to
compute.
   A hash function should achieve an even
distribution of the keys that actually occur
across the range of indices for both random
and non-random data.
   Calculation should involve the entire search
key.

Hashing                    35
Examples of Hash Functions
   Usually involves taking the key, chopping it
up, mix the pieces together in various ways
   Examples:
   Truncation – ignore part of key, use the remaining
part as the index
   Folding – partition the key into several parts and
combine the parts in a convenient way (adding,
etc.)
   After calculating the index, use modular
arithmetic. Divide by the size of the index
range, and take the remainder as the result

Hashing                     36
Example Hash Function
0
hf(22) = 22   22 % 7 = 1        1   tableEntry[1]

2
3
hf(4) = 4    4%7=4             4   tableEntry[4]
5
6

Hashing                       37
Hash Functions (§8.2.2)

   A hash function is usually        The hash code map is
specified as the                   applied first, and the
compression map is
composition of two                 applied next on the
functions:                         result, i.e.,
Hash code map:                          h(x) = h2(h1(x))
h1: keys  integers              The goal of the hash
function is to “disperse”
Compression map:                   the keys in an
h2: integers  [0, N - 1]         apparently random way

Hashing                              38
HERE IS THE START OF THE
hash_map CLASS:
template<typename Key, typename T,
typename HashFunc>
class hash_map
{
THE THIRD TEMPLATE PARAMETER
IS A FUNCTION CLASS: A CLASS IN
WHICH THE FUNCTION-CALL
OPERATOR, operator( ), IS
OVERLOADED. THIS IS THE HASH
FUNCTION CLASS.
Hashing          39
THE HEADING FOR operator( ) IS
unsigned long operator( ) (const key_type& key)

FOR EXAMPLE, WE CAN DEFINE A
SIMPLE HASH FUNCTION CLASS IF
EACH KEY IS AN int:
class hash_func
{
public:

unsigned long operator( ) (const int& key)
{
return (unsigned long)key;
} // overloaded operator( )
} // class hash_func

Hashing                    40
HERE IS A PROGRAM WITH A
hash_map CLASS IN WHICH EACH
VALUE CONSISTS OF A TELEPHONE
EXTENSION AND THE PERSON AT
THAT EXTENSION. THE ABOVE
hash_func IS USED.

Hashing        41
int main()
{
typedef hash_map<int, string, hash_func> hash_class;

hash_class extensions;
hash_class::iterator itr;

extensions [5520] = "Yvonne";
extensions [5415] = "Jim";
extensions [5416] = "Penny";
extensions [5537] = "Chun Wai";
extensions [5273] = "Jim";

for (itr = extensions.begin(); itr != extensions.end(); itr++)
cout << (*itr).first << " " << (*itr).second << endl;

cout << "The number of items is " << extensions.size() <<
endl;

Hashing                             42
if (extensions.find (5537) != extensions.end())
{
cout << endl << "At extension " << “5537”
<< " is " << extensions [5537] << endl;
extensions.erase (extensions.find (5537));
} // if
for (itr = extensions.begin( ); itr != extensions.end( ); itr++)
cout << (*itr).first << " " << (*itr).second << endl;

return 0;
} // main

Hashing                              43
HERE IS THE OUTPUT:
5520 Yvonne
5537 Chun Wai
5415 Jim
5416 Penny
5273 Jim
The number of items is 5

At extension 5537 is Chun Wai
5520 Yvonne
5415 Jim
5416 Penny
5273 Jim

Hashing   44
THERE IS NO OBVIOUS ORDER OF
THE KEYS. IF THE CONTAINER MUST
ALWAYS BE IN ORDER, USE A map
INSTEAD OF A hash_map.

Hashing        45
AS YOU MIGHT HAVE GUESSED,
HASHING IS INEFFICIENT WHEN
THERE ARE A LOT OF COLLISIONS.

Hashing            46
USERS OF THE hash_map CLASS
“HOPE” THAT THE KEYS ARE
SCATTERED RANDOMLY
THROUGHOUT THE TABLE. THIS
HOPE IS FORMALLY STATED AS
FOLLOWS:

Hashing          47
THE UNIFORM HASHING ASSUMPTION

EACH KEY IS EQUALLY LIKELY TO
HASH TO ANY ONE OF THE TABLE
WHERE THE OTHER KEYS HAVE
HASHED.

Hashing           48
EVEN IF THE UNIFORM HASHING
ASSUMPTION HOLDS, THERE MAY
STILL BE COLLISIONS.

Hashing          49
Collision Handlers

NOW WE’LL LOOK AT SPECIFIC COLLISION
HANDLERS:
 Chaining

 Linear Probing (Open Addressing)

 Double Hashing

Hashing           50
Collision Handling
(§8.2.5)
   Collisions occur when    0 
1     025-612-0001
different elements are
2 
mapped to the same cell 3 
451-229-0004 981-101-0004
   Chaining: let each cell 4
in the table point to a
linked list of elements  Chaining is simple, but
that map there            requires additional
memory outside the
table

Hashing                      51
CHAINING (ALSO CALLED CHAINED
HASHING): AT INDEX i IN buckets,
STORE THE LIST OF ALL VALUES
WHOSE KEYS HASH TO i.

HERE ARE THE FIELDS FOR CHAINED
HASHING:

Hashing            52
list <value_type< const key_type, T> >* buckets;
// at each index in the array buckets,
// we will store the list of all
// items whose keys hashed to that index

int count,      // number of items in this hash_map
length;    // number of buckets in this hash_map
// these two fields are used to calculate the load to
// know when to increase the size of the table

hash_func hash;      // hash is a function object

Hashing                    53
INSERT VALUES WITH THESE KEYS:

2155551612
7178626358
6103309358
6103309000
7178621359
7178627451
2155554358
6103300451

ASSUME length = 1000. IGNORE 2ND COMPONENT
IN VALUE, IGNORE prev FIELD, USE „X‟ AT END.

Hashing               54
buckets                                      count

0                6103309000       X                      8
1      X

...

358              7178626358                  6103309358

359              7178621359    X             2155554358       X
...

451              7178627451                   6103300451          X

...

612              2155551612    X

Hashing                            55
FOR THE find METHOD,

averageTimeS(n, m) IS CONSTANT.

Hashing            56
EVEN IF THE UNIFORM HASHING
ASSUMPTION HOLDS, IT IS POSSIBLE
FOR EACH KEY TO HASH TO THE
SAME INDEX. TO SEARCH THE LIST
AT THAT INDEX TAKES LINEAR-IN-n
TIME.

SO worstTimeS(n, m) IS LINEAR IN n.

Hashing              57
THE SAME RESULTS,

CONSTANT AVERAGE TIME AND
LINEAR WORST TIME,

Also HOLD FOR insert AND erase.

Hashing           58
The next collision handler is Linear Probing

AT MOST ONE VALUE IS STORED AT
EACH INDEX IN buckets.

Hashing              59
HERE IS HOW THE unsigned long
RETURNED BY hash_func IS
CONVERTED INTO AN INDEX:

int index = hash_func (key) % length;

THIS IS DONE IN THE HASH_MAP
CLASS, BECAUSE ONLY THE
HASH_MAP CLASS KNOWS THE
LENGTH OF THE ARRAY.

Hashing         60
WHEN COLLISION OCCURS:

SEARCH THE TABLE UNTIL AN
“OPEN” SLOT IN buckets IS FOUND.
THIS IS ALSO KNOWN AS “OFFSET-
OF-1” COLLISION HANDLER.

Hashing             61
OFFSET-OF-1 COLLISION HANDLER:

IF buckets [index] ALREADY HAS
ANOTHER ELEMENT, TRY

buckets [index + 1], buckets [index + 2], …,

buckets [length – 1], buckets [0],

buckets [1], …, buckets [index – 1].
Hashing             62
Hash Table Using Open Probe
0           Insert Example
77     1        0    77      1         0    77      1   0     77      1
1                    1 89             1 89                          89
89     1
(mod by table size … % 11) 1
1                               1             1
2                    2     45      2         2    45      2   2     45      2
3     14     1       3     14      1         3    14      1   3     14      1
4                    4                       4    35      3   4     35      3
5                     5                      5                5     76      7
6     94     1        6    94      1         6    94      1   6     94      1
7                    7                      7                7
8                    8                      8                8
9                    9                      9                9
10     54    1        10    54      1        10    54      1   10    54      1

Insert                Insert                 Insert            Insert
54, 77, 94, 89, 14            45                     35                76
(a)                   (b)                    (c)               (d)
Hashing                              63
Hash Table Using Open Probe
0    Addressing Example 35
77     1     0  Insert
77      1         0    77      1   0     77      1
1     89     1     1     89      1         1    89      1   1     89      1
2                  2     45      2         2    45      2   2     45      2
3     14     1     3     14      1         3    14      1   3     14      1
4                  4                       4    35      3   4     35      3
5                  5                       5                5     76      7
6     94     1     6     94      1         6    94      1   6     94      1
7                 7                       7                7
8                 8                       8                8
9                 9                       9                9
10     54    1      10    54      1        10    54      1   10    54      1

Insert              Insert                 Insert            Insert
54, 77, 94, 89, 14          45                     35                76
(a)                 (b)                    (c)               (d)
Hashing                              64
Hash Table Using Open Probe
77     1     0     77    Insert 76
1         0    77      1   0     77      1
1     89     1     1     89      1         1    89      1   1     89      1
2                  2     45      2         2    45      2   2     45      2
3     14     1     3     14      1         3    14      1   3     14      1
4                  4                       4    35      3   4     35      3
5                  5                       5                5     76      7
6     94     1     6     94      1         6    94      1   6     94      1
7                 7                       7                7
8                 8                       8                8
9                 9                       9                9
10     54    1      10    54      1        10    54      1   10    54      1

Insert              Insert                 Insert            Insert
54, 77, 94, 89, 14          45                     35                76
(a)                 (b)                    (c)               (d)
Hashing                              65
Hash Table Using Open Probe
77     1     0     77      1         0    77      1   0     77      1
1     89     1     1     89      1         1    89      1   1     89      1
2                  2     45      2         2    45      2   2     45      2
3     14     1     3     14      1         3    14      1   3     14      1
4                  4                       4    35      3   4     35      3
5                  5                       5                5     76      7
6     94     1     6     94      1         6    94      1   6     94      1
7                 7                       7                7
8                 8                       8                8
9                 9                       9                9
10     54    1      10    54      1        10    54      1   10    54      1

Insert              Insert                 Insert            Insert
54, 77, 94, 89, 14          45                     35                76
(a)                 (b)                    (c)               (d)
Hashing                              66
Linear Probing
   Open addressing: the
colliding item is placed
in a different cell of                  Example:
the table                           
   h(x) = x mod 13
   Linear probing handles                     Insert keys 18, 41, 22,
collisions by placing the                   44, 59, 32, 31, 73, in this
colliding item in the next                  order
(circularly) available table
cell
   Each table cell inspected
is referred to as a “probe”
   Colliding items lump
0 1 2 3 4 5 6 7 8 9 10 11 12
together, causing future
collisions to cause a
longer sequence of
probes                                  41    18 44 59 32 22 31 73
0 1 2 3 4 5 6 7 8 9 10 11 12

Hashing                                         67
WE NEED TO KNOW WHEN A SLOT IS FULL
OR OCCUPIED.

HOW?

INSTEAD OF JUST T() STORED IN THE
BUCKETS (BECAUSE T() COULD BE A
VALID VALUE), THE BUCKET WILL STORE
AN INSTANCE OF THE VALUE_TYPE
CLASS.

Hashing           68
TO INDICATE WHETHER A LOCATION
IS OCCUPIED, THE value_type CLASS
WILL HAVE

bool occupied;

T key;

Hashing       69
key     occupied
0      ?    false

…      false

54    1069   true         1069 % 203 = 54
55     460   true          460 % 203 = 54
56   1070    true         1070 % 203 = 55

109    312    true          312 % 203 = 109

201    607     true         607 % 203 = 201
202           false

Hashing                      70
Retrieve

   What about when we want to retrieve?

   Consider the previous example….

Hashing             71
Hash Table Using Open Probe
0
1
77     1     0     77
89
1         0    77
89
1   0     77
89
1
89     1     1             1         1            1   1             1
2    Find the value 35. (% 11)45
2  45 2     2                            2   2     45      2
3     14     1     3     14      1         3    14      1   3     14      1
4    Now find the value 76.4
4                                35      3   4     35      3
5                  5                       5                5     76      7
6     94     1     6     94      1         6    94      1   6     94      1
7                 7                       7                7
8   Now find the value 33. 8
8                                            8
9                 9                       9                9
10     54    1      10    54      1        10    54      1   10    54      1

Insert              Insert                 Insert            Insert
54, 77, 94, 89, 14          45                     35                76
(a)                 (b)                    (c)               (d)

Hashing                              72
Hash Table Using Open Probe
0
1
77     1     0     77
89
1         0    77
89
1   0     77
89
1
89     1     1             1         1            1   1             1
2    Now delete 35. (% 11) 2
2  45  2                          45      2   2     45      2
3     14     1     3     14      1         3    14      1   3     14      1
4    Now find the value 76.4
4                                35      3   4     35      3
5                  5                       5                5     76      7
6     94     1     6     94      1         6    94      1   6     94      1
7                 7                       7                7
8   Now find the value 33. 8
8                                            8
9                 9                       9                9
10     54    1      10    54      1        10    54      1   10    54      1

Insert              Insert                 Insert            Insert
54, 77, 94, 89, 14          45                     35                76
(a)                 (b)                    (c)               (d)

Hashing                              73
IN THE SECOND EXAMPLE, SUPPOSE
itr IS POSITIONED AT INDEX 54 AND
THE MESSAGE IS

my_map.erase (itr);

Hashing       74
key     occupied       Erase value 1069.
0      ?    false

…      false

54    1069   true
false         1069 % 203 = 54
55     460   true           460 % 203 = 54
56   1070    true          1070 % 203 = 55

109    312    true           312 % 203 = 109

201    607     true          607 % 203 = 201
202           false

Hashing                        75
Now search for 460.
key     occupied
0      ?    false

…      false

54    1069   false        1069 % 203 = 54
55     460   true          460 % 203 = 54
56   1070    true         1070 % 203 = 55

109    312    true          312 % 203 = 109

201    607     true         607 % 203 = 201
202           false

Hashing                          76
NOW A SEARCH 460 FOR WOULD BE
UNSUCCESSFUL BECAUSE 460
INITIALLY HASHES TO 54, AN
UNOCCUPIED LOCATION.

Hashing        77
SOLUTION:
bool marked_for_removal;

THE CONSTRUCTOR FOR
VALUE_TYPE SETS EACH bucket‟s
marked_for_removal FIELD TO false.
insert SETS marked_for_removal TO
false;
erase SETS marked_for_removal TO
true.
SO AFTER THE INSERTIONS:
Hashing         78
marked_for_
key     occupied removal
0       ?     false false

…       false        false

54     1069    true         false   1069 % 203 = 54
55      460    true         false   460 % 203 = 54
56    1070     true         false   1070 % 203 = 55

109    312     true         false   312 % 203 = 109

201    607      true        false   607 % 203 = 201
202            false        false

Hashing                         79
AFTER DELETING THE VALUE WITH
KEY 1069:

Hashing        80
marked_for_
key     occupied removal
0       ?     false false

…       false        false

54     1069    true         true    1069 % 203 = 54
55      460    true         false   460 % 203 = 54
56    1070     true         false   1070 % 203 = 55

109    312     true         false   312 % 203 = 109

201    607      true        false   607 % 203 = 201
202            false        false

Hashing                         81
FOR find, AN UNSUCCESSFUL
SEARCH CANNOT STOP UNTIL buckets
[index].marked_for_removal = false.

Hashing          82
CLUSTER: A SEQUENCE OF NON-EMPTY
LOCATIONS

KEYS THAT HASH TO 54 FOLLOW THE
SAME COLLISION-PATH AS KEYS THAT
HASH TO 55, …

Hashing             83
marked_for_
key     occupied removal
0       ?     false false

…       false        false

54     1069    true         false   1069 % 203 = 54
55      460    true         false   460 % 203 = 54
56    1070     true         false   1070 % 203 = 55

109    312     true         false   312 % 203 = 109

201    607      true        false   607 % 203 = 201
202            false        false

Hashing                         84
PRIMARY CLUSTERING: THE
PHENOMENON THAT OCCURS WHEN
THE COLLISION HANDLER ALLOWS
THE GROWTH OF CLUSTERS TO
ACCUMULATE.

THIS WILL OCCUR WITH OFFSET-OF-
1 OR ANY CONSTANT OFFSET.

Hashing        85
SOLUTION 1: DOUBLE HASHING,
THAT IS, OBTAIN BOTH INDICES
AND OFFSETS BY HASHING:
unsigned long hash_int = hash (key);
int index = hash_int % length,
offset = hash_int / length;
NOW THE OFFSET DEPENDS ON THE
KEY, SO DIFFERENT KEYS WILL
USUALLY HAVE DIFFERENT
OFFSETS, SO NO MORE PRIMARY
CLUSTERING!
Hashing    86
TO GET A NEW INDEX:
index = (index + offset) % length;

Notice that if a collision occurs, you
rehash from the NEW index value.

Hashing               87
EXAMPLE: length = 11         0
1
key      index     offset
2
15        4        1
3
19        8        1
16        5        1         4
58        3        5         5
27        5        2         6
35        2        3         7
30        8        2         8
47        3        4
9
10
WHERE WOULD THESE KEYS GO IN buckets?
Hashing          88
index   key
0      47
1
2      35
3      58
4      15
5      16
6
7      27
8      19
9
10      30

Hashing   89
PROBLEM: WHAT IF OFFSET IS A MULTIPLE OF length?
EXAMPLE: length = 11                    0  47
key        index     offset             1
15          4        1
2  35
19          8        1
16          5        1                  3  58
58          3        5                  4  15
27          5        2                  5  16
35          2        3                  6
47          3        4                  7  27
246         4        22                 8  19
// BUT 15 IS AT INDEX 4
9
// FOR KEY 246,
NEW INDEX = (4 + 22) % 11 = 4. OOPS! 10 30
Hashing                 90
SOLUTION:

if (offset % length == 0)
offset = 1;

ON AVERAGE, offset % length WILL
EQUAL 0 ONLY ONCE IN EVERY
length TIMES.

Hashing    91
FINAL PROBLEM: WHAT IF length
HAS SEVERAL FACTORS?
EXAMPLE: length = 20
key    index       offset
20      0            1
25      5            1
30    10             1
35    15             1
110   10             5 // BUT 30 IS AT INDEX 10

FOR KEY 110, NEW INDEX = (10 + 5) % 20 = 15,
WHICH IS OCCUPIED, SO NEW INDEX = (15 + 5)
% 20, WHICH IS OCCUPIED, SO NEW INDEX = ...
Hashing                92
SOLUTION: MAKE length A PRIME.

Hashing          93
THIS VERSION OF OPEN-ADDRESS
HASHING IS FAST. IF THE UNIFORM
HASHING ASSUMPTION HOLDS,
averageTime(n, m) FOR SEARCHING,
INSERTING AND REMOVING IS
CONSTANT O(1).

Hashing         94
ANOTHER SOLUTION:
QUADRATIC HASHING, THAT IS,
ONCE COLLISION OCCURS AT h, GO
TO LOCATION h + 1, THEN IF
COLLISION OCCURS THERE GO TO
LOCATION h + 4, then h + 9, then h + 16,
etc.
unsigned long hash_int = hash (key);
int index = hash_int % length,
offset = i2;
Notice that h stays at the same location.
No clustering.
Hashing         95
EXAMPLE: length = 11                    1
2
key    index   offset
3
15      4
19      8                               4
16      5                               5
58      3                               6
27      5    1, final place index = 6   7
35      2                               8
30      8    1, final place index = 9
9
47      3    4, final place index = 7
10

Hashing                  96
We know that hashing becomes inefficient as
the table fills up. What to do?

EXPAND!

Hashing             97
WHAT ABOUT THE SIZE OF buckets,
AND SHOULD THAT ARRAY EVER BE
RE-SIZED?

RE-SIZE WHENEVER THE LOAD
FACTOR, THE RATIO OF count TO
length, EXCEEDS 0.75.

Hashing          98
TO RE-SIZE, WE WILL DOUBLE THE
OLD CAPACITY, PLUS 1. WHY +1?
ANOTHER OPTION…FIND NEXT
PRIME NUMBER AFTER DOUBLING.

NOTE THAT WE RE-SIZE WHENEVER
THE LOAD FACTOR, THAT IS, THE
AVERAGE LIST SIZE, EXCEEDS 0.75.

Hashing        99
IN check_for_expansion, IF count >=
length * 0.75, CREATE A NEW ARRAY
OF DOUBLE THE OLD LENGTH (PLUS
1). FOR EACH VALUE IN THE OLD
ARRAY, ITERATE THROUGH
AND HASH EACH VALUE TO
THE NEW ARRAY. FINALLY, ERASE
THE OLD ARRAY.

Hashing          100
Performance

   HOW DOES DOUBLE-HASHING
COMPARE WITH CHAINED HASHING?

Hashing           101
Performance of Hashing

   In the worst case,                 The expected running
searches, insertions and            time of all the dictionary
removals on a hash                  ADT operations in a hash
table is O(1)
table take O(n) time
   In practice, hashing is
   The worst case occurs               very fast provided the load
when all the keys                   factor is not close to 100%
inserted into the                  Applications of hash
dictionary collide                  tables:
small databases
The load factor a = n/N


   compilers
affects the performance                browser caches
of a hash table

Hashing                                102
GROUP EXERCISE: ASSUME THAT length = 13.
INSERT THE FOLLOWING KEYS INTO A HASH
TABLE USING 1) OPEN ADDRESS, 2) DOUBLE
HASHING, and 3) CHAINING

20, 33, 49, 22, 26, 140, 38, 9, 7, 3, 0, 1

Hashing          103
Summary Slide 1
§- Hash Table
- simulates the fastest searching technique, knowing
the index of the required value in a vector and array
and apply the index to access the value, by applying
a hash function that converts the data to an integer
- After obtaining an index by dividing the value from
the hash function by the table size and taking the
remainder, access the table. Normally, the number
of elements in the table is much smaller than the
number of distinct data values, so collisions occur.
- To handle collisions, we must place a value that
collides with an existing table element into the table
in such a way that we can efficiently access it later.
Hashing                   104
104
Summary Slide 2

§- Hash Table (Cont…)
- average running time for a search of a hash table is
O(1)
- the worst case is O(n)

Hashing                   105
105
Summary Slide 3
§- Collision Resolution
- Types:
1) linear open probe addressing
- the table is a vector or array of static size
- After using the hash function to compute a
table index, look up the entry in the table.
- If the values match, perform an update if
necessary.
- If the table entry is empty, insert the value in
the table.

Hashing                      106
106
Summary Slide 4
§- Collision Resolution (Cont…)
- Types:
1) linear open probe addressing
- Otherwise, probe forward circularly, looking
for a match or an empty table slot.
- If the probe returns to the original starting
point, the table is full.
- you can search table items that hashed to
different table locations.
- Deleting an item difficult.

Hashing                     107
107
Summary Slide 5
§- Collision Resolution (Cont…)
2) chaining with separate lists.
- the hash table is a vector of list objects
- Each list is a sequence of colliding items.
- After applying the hash function to compute
the table index, search the list for the data
value.
- If it is found, update its value; otherwise, insert
the value at the back of the list.
- you search only items that collided at the
same table location
Hashing                      108
108
Summary Slide 6

§- Collision Resolution (Cont…)
- there is no limitation on the number of values
in the table, and deleting an item from the
table involves only erasing it from its
corresponding list

Hashing                    109
109

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 14 posted: 6/28/2011 language: English pages: 109