Hashing by dfhdhdhdhjr

VIEWS: 3 PAGES: 19

									  Hashing
by Rafael Jaffarove
     CS157b
                Motivation
 Fast data access
   Search
   Insertion
   Deletion


 Ideal seek time is O(1)
         Types of Organization

 File organization
   search-key points to the disk block with
    desired record
 Index organization
   search-key is stored together with a pointer in
    a hash table. Pointer points to a particular
    bucket where the record is stored
           Types of Hashing
 Static hashing
   Fixed file size
 Dynamic hashing
   Extendable hashing
  Problems with Static Hashing
 Databases tend to grow over time
 The number of buckets must be
  predefined
 If number is too large then the space is
  wasted
 If number is too small then we have too
  many collisions
 Bucket overflow
    Handling Bucket Overflow
 Providing overflow buckets
   If an initial bucket is full a new bucket is given.
    If the second bucket is full then a 3rd bucket is
    given and so on.
   Additional buckets are linked together in a
    linked list
 Problems:
   searches and insertions might take liner time
   deletions are difficult to perform
           Dynamic Hashing
 Extendable hashing
   buckets created as needed
 Example of extendable hashing
   Insert the following countries into database:
    England, France, China, Germany, Egypt,
    Australia
   We will use hash function of sum of ASCII
    codes of all characters in a name
   Assumption: bucket can’t hold more than 2
    records
       Extendable Hashing
Example (contd.)
         Extendable Hashing
 Problem with dynamic hashing
   additional level of indirection
              Hash function
 Importance of choosing the right hash
 function
   Uniform function = even distribution of data
   Table size is a prime number


 There is no perfect hash function so
 collisions are possible
        Handling Collisions
 Linear probing
 Quadratic probing
 Double hashing
 Chaining
                Linear Probing
 If a slot is used, take next available
 If next is used, continue until an empty slot is
  found
 If end of table is reached, wrap around from
  beginning.

 Problems:
    Clustering of data
    How far to go if there are no empty slots?
    Deletion: deleting key in the middle of a cluster
          Quadratic probing
 To avoid clustering take not the next slot
 but 12, 22, 32, 42, etc.

 Problem:
   Secondary clustering, since the same seek
    pattern is used in case of a collision
           Double Hashing
 In case of collision, apply second hash
  function.
 Overall better performance than linear and
  quadratic probing
                  Chaining
 Entries are linked lists
 In case of a collision the entries are added
 to those linked lists.

 Problem:
   In case of frequent collisions on the same
    key, search for that key in linked list becomes
    linear. Alternative data structures are used to
    solve this problem (i.e. B+-trees).

								
To top