Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Hash Table - PowerPoint by rt3463df

VIEWS: 208 PAGES: 23

									          TOPIC 5 ASSIGNMENT

                     Yerusha Nuh & Ivan Yu
 Efficient access of data.
 Access by index.
 Mapping between search keys and
  indices allows each data to be stored in
  the array element with the
  corresponding index.
There are 500 students in a school. Each
   student has their own TDSB nine-digit
   student number.
If we want to assign an ID to each student
   name, we could use their student
   number. However, if the greatest student
   number is “351000005”, there would be
   351,000,005 elements in the array. This
   is a lot more than what is required to
   store the names of 500 students.
 Mapping between the student numbers
  and the numbers from 0 to 499.

By using arithmetic operations on keys, we
  can map them onto table addresses.

 Direct referencing.
Methods for mapping:
 Direct address table
 Hash table

Hash table – a data structure that uses a hash function
  to efficiently map certain identifiers or keys (i.e.
  persons’ names) to associated values (i.e. their
  telephone numbers).

A hash table is made up of two parts:
 An array (the actual table where the data to be
   searched is stored)
 A mapping function, a.k.a. hash function.
Hash Function
Hash function – a function that transforms
  the search key into a table address.

Different hash functions use different
  arithmetic operations to do this. We will
  focus on the modulo arithmetic.
Hash Function
Modulo Arithmetic
Numbers as keys
 Address = search key % size of array
Pseudocode - Number
 get number
 address = key % size of array
Strings as keys
 Take the binary representation of a key
  as a number and then apply the first
In general the arithmetic operations in
   such expressions will use 32-bit modular
   arithmetic ignoring overflow.

For example:
Integer.MAX_VALUE + 1 = Integer.MIN_VALUE
   Integer.MAX_VALUE = 2147483647
   Integer.MIN_VALUE = -2147483648
Char        h        e        l        l        o
Unicode    104      101      108      108       111

104*314 + 101*313 + 108*312 + 108*311 + 111*310
= 99162322

To prevent overflow, we can apply Horner’s method:
   anxn + an-1·xn-1 + an-2·xn-2 + … + a1x1 + a0x0
   = x(x(…x(x (an·x +an-1) + an-2) + ….) + a1) + a0
99162322 = (((104*31 + 101)31 + 108)31 +
  108)31 + 111

We compute the hash function by applying
 the mod (%) operation at each step, thus
 avoiding overflowing.

   Compute h0 = (22*32 +5) % N
   Compute h1 = (32*h0 + 18) % N
   Compute h2 = (32*h1 +25) % N
   Etc.
Pseudocode - String
 get string
 loop (for as many as the number of
 characters in the string, each time with a
 different character of the string)
     address = (31*address + Unicode of
     character) % size of array
Hash Table
How do we choose the size of the array (hash table)?

Let N be the number of records to be stored.
Let M be the size of the hash table.

Ideally N records are stored in a hash table of size N.
 We may not have prior knowledge of exact number of
 It is possible to have two keys mapped to the same index
   (although this can be prevented).

Hence, we assume that the size of the table (N) can be
  different from the number of records (M).
Load factor – the ratio between N and M.
 Load factor L = N/M
 The default L value for Java is 0.75.

Note: M should be a prime number to
  obtain more even distribution of keys
  over the table.
Collision Resolution
Collision – when two or more keys hash to
  the same index.

Methods to resolve collisions:
 Separate chaining
 Open addressing
   Linear probing
   Quadric probing
   Double hashing
Linear Probing
Collision when inserting:
 Probe the next slot in the table.
   If unoccupied, store the key.
   If occupied, continue probing the next
Linear Probing - Collision
 If the key hashes to an occupied slot but
  does not match the key occupying the
  slot, probe the next slot.
     If slot is empty, search is unsuccessful.
     If slot is occupied:
      ○ If it does not match, search is unsuccessful.
      ○ If it matches, search is successful.

   When reaching the end of table, resume
    from the beginning.
 Primary clustering – building up of large
 Runs slowly for tables that are almost full
Hash Table - Advantages
   Speed
     Especially with large number of entries (thousands
      or more).

   Efficient when maximum number of entries is
    predicted in advance.

   If the set of key-value pairs is fixed and known
    ahead of time (no insertions and deletions),
    average lookup cost can be reduced by a
    careful choice of the hash function, bucket
    table size, and internal data structures.
Hash Tables - Disadvantages
   More difficult to implement than self-balancing binary

   Difficult to create a perfect hash function.

   Insertion or deletion may take time proportional to
    number of entries.
     May not be suitable for real-time or interactive

   Cost is significantly higher than sequential list or
    search tree even though operations take constant
    time on average.
     Not suitable for small number of entries.

To top