Your Federal Quarterly Tax Payments are due April 15th

# Hash Table - PowerPoint by rt3463df

VIEWS: 208 PAGES: 23

• pg 1
```									          TOPIC 5 ASSIGNMENT
SORTING, HASH TABLES & LINKED LISTS

Yerusha Nuh & Ivan Yu
Arrays
 Efficient access of data.
 Access by index.
 Mapping between search keys and
indices allows each data to be stored in
the array element with the
corresponding index.
Example
There are 500 students in a school. Each
student has their own TDSB nine-digit
student number.
If we want to assign an ID to each student
name, we could use their student
number. However, if the greatest student
number is “351000005”, there would be
351,000,005 elements in the array. This
is a lot more than what is required to
store the names of 500 students.
Solution:
 Mapping between the student numbers
and the numbers from 0 to 499.

By using arithmetic operations on keys, we
can map them onto table addresses.

 Direct referencing.
Mapping
Methods for mapping:
 Hash table

Hash table – a data structure that uses a hash function
to efficiently map certain identifiers or keys (i.e.
persons’ names) to associated values (i.e. their
telephone numbers).

A hash table is made up of two parts:
 An array (the actual table where the data to be
searched is stored)
 A mapping function, a.k.a. hash function.
Hash Function
Hash function – a function that transforms
the search key into a table address.

Different hash functions use different
arithmetic operations to do this. We will
focus on the modulo arithmetic.
Hash Function
Modulo Arithmetic
Numbers as keys
 Address = search key % size of array
Pseudocode - Number
get number
address = key % size of array
Strings as keys
 Take the binary representation of a key
as a number and then apply the first
case.
In general the arithmetic operations in
such expressions will use 32-bit modular
arithmetic ignoring overflow.

For example:
Integer.MAX_VALUE + 1 = Integer.MIN_VALUE
where
Integer.MAX_VALUE = 2147483647
Integer.MIN_VALUE = -2147483648
Example
Char        h        e        l        l        o
Unicode    104      101      108      108       111

104*314 + 101*313 + 108*312 + 108*311 + 111*310
= 99162322

To prevent overflow, we can apply Horner’s method:
anxn + an-1·xn-1 + an-2·xn-2 + … + a1x1 + a0x0
= x(x(…x(x (an·x +an-1) + an-2) + ….) + a1) + a0
99162322 = (((104*31 + 101)31 + 108)31 +
108)31 + 111

We compute the hash function by applying
the mod (%) operation at each step, thus
avoiding overflowing.

   Compute h0 = (22*32 +5) % N
   Compute h1 = (32*h0 + 18) % N
   Compute h2 = (32*h1 +25) % N
   Etc.
Pseudocode - String
get string
loop (for as many as the number of
characters in the string, each time with a
different character of the string)
{
character) % size of array
}
Hash Table
How do we choose the size of the array (hash table)?

Let N be the number of records to be stored.
Let M be the size of the hash table.

Ideally N records are stored in a hash table of size N.
However...
 We may not have prior knowledge of exact number of
records.
 It is possible to have two keys mapped to the same index
(although this can be prevented).

Hence, we assume that the size of the table (N) can be
different from the number of records (M).
Load factor – the ratio between N and M.
 Load factor L = N/M
 The default L value for Java is 0.75.

Note: M should be a prime number to
obtain more even distribution of keys
over the table.
Collision Resolution
Collision – when two or more keys hash to
the same index.

Methods to resolve collisions:
 Separate chaining
 Linear probing
 Double hashing
Linear Probing
Collision when inserting:
 Probe the next slot in the table.
 If unoccupied, store the key.
 If occupied, continue probing the next
slot.
Linear Probing - Collision
Searching:
 If the key hashes to an occupied slot but
does not match the key occupying the
slot, probe the next slot.
 If slot is empty, search is unsuccessful.
 If slot is occupied:
○ If it does not match, search is unsuccessful.
○ If it matches, search is successful.

   When reaching the end of table, resume
from the beginning.
 Primary clustering – building up of large
clusters
 Runs slowly for tables that are almost full
   Speed
 Especially with large number of entries (thousands
or more).

   Efficient when maximum number of entries is

   If the set of key-value pairs is fixed and known
ahead of time (no insertions and deletions),
average lookup cost can be reduced by a
careful choice of the hash function, bucket
table size, and internal data structures.
   More difficult to implement than self-balancing binary
trees.

   Difficult to create a perfect hash function.

   Insertion or deletion may take time proportional to
number of entries.
 May not be suitable for real-time or interactive
applications.

   Cost is significantly higher than sequential list or
search tree even though operations take constant
time on average.
 Not suitable for small number of entries.

```
To top