Chapter 9

Document Sample

```					  Chapter 9

Search Algorithms

Data Structures Using C++   1
Chapter Objectives
• Learn the various search algorithms
• Explore how to implement the sequential
and binary search algorithms
• Discover how the sequential and binary
search algorithms perform
• Become aware of the lower bound on
comparison-based search algorithms
Data Structures Using C++   2
class arrayListType: Basic
Operations
•   isEmpty                       •    removeAt
•   isFull                        •    retrieveAt
•   listSize                      •    replaceAt
•   maxListSize                   •    clearList
•   Print                         •    seqSearch
•   isItemAtEqual                 •    insert
•   insertAt                      •    remove
•   insertEnd                     •    arrayListType
Data Structures Using C++          3
Sequential Search
template<class elemType>
int arrayListType<elemType>::seqSearch(const elemType& item)
{
int loc;
bool found = false ;
for(loc = 0; loc < length; loc++)
if(list[loc] == item)
{
found = true ;
break;
}
if(found)
return loc ;
else
return -1 ;
}//end seqSearch

Data Structures Using C++           4
Search Algorithms
• Search item: target
• To determine the average number of
comparisons in the successful case of the
sequential search algorithm:
– Consider all possible cases
– Find the number of comparisons for each case
– Add the number of comparisons and divide by the
number of cases

Data Structures Using C++           5
Search Algorithms
Suppose that there are n elements in the list. The following
expression gives the average number of comparisons:

It is known that

Therefore, the following expression gives the average number of
comparisons made by the sequential search in the successful case:

O(n)

Data Structures Using C++               6
Ordered Lists as Arrays
template<class elemType>
class orderedArrayListType: public arrayListType<elemType>
{
public:
orderedArrayListType(int size = 100);
//constructor
...
//We will add the necessary members as needed.
private:
//We will add the necessary members as needed.
}

Data Structures Using C++            7
template<class elemType>
{
public:
...
}

Data Structures Using C++              8
Binary Search

Data Structures Using C++   9
Binary Search: middle element

first + last
mid =
2

Data Structures Using C++   10
Binary Search
template<class elemType>
int orderedArrayListType<elemType>::binarySearch(const elemType& item)
{ int first = 0;
int last = length - 1;
int mid;
bool found = false ;               //initialize
while(first <= last && !found)
{ mid = (first + last) / 2;
if(list[mid] == item)
found = true;
else
if(list[mid] > item)
last = mid - 1;    //left half
else
first = mid + 1;  //right half
}
if(found)
return mid;
else
return –1;
} //end binarySearch
Data Structures Using C++                     11
Binary Search: Example

Data Structures Using C++   12
Binary Search: Example

• Unsuccessful search
• Total number of comparisons is 6

Data Structures Using C++   13
Performance of Binary Search

Data Structures Using C++   14
Performance of Binary Search

Data Structures Using C++   15
Binary search
• Each iteration of the while loop cuts the size
of search list in half
• 1000~1024=210 , at most 11 iterations to
determine whether x in list L
• Every iteration makes 2 key comparisons

Data Structures Using C++    16
Performance of Binary Search
• To determine whether or not an element is in list
– at most 2*log2n + 2 key comparisons

• Unsuccessful search
– for a list of length n, a binary search makes
approximately 2*log2(n + 1) key comparisons

• Successful search
– for a list of length n, on average, a binary search
makes 2*log2n – 4 key comparisons
Data Structures Using C++             17
Binary Search and Insertion
template<class elemType>
class orderedArrayListType: public arrayListType<elemType>
{
public:
void insertOrd(const elemType&);
int binarySearch(const elemType& item);
orderedArrayListType(int size = 100); //constructor
};

Data Structures Using C++            18
template<class elemType>
void orderedArrayListType<elemType>::insertOrd(const elemType& item)
{
int first = 0;
int last = length - 1;
int mid;
bool found = false;

if(length == 0) //list is empty
{       list[0] = item;
length++;
}
else
if (length == maxSize)
cerr<<"Cannot insert into a full list."<<endl;

Data Structures Using C++                19
else //list is not empty
{
while (first <= last && !found)
{ mid = (first + last) / 2;
if (list[mid] == item)
found = true;
else
if (list[mid] > item)
last = mid - 1;
else
first = mid + 1;
}//end while

Data Structures Using C++   20
if (found)
cerr<<"The insert item is already in the list. "
<<"Duplicates are not allowed.";
else //first >last
{
if (list[mid] < item)
mid++ ;
insertAt(mid, item);
}
}
}//end insertOrd

Data Structures Using C++                21
Search Algorithm Analysis
Summary

Data Structures Using C++   22
Lower Bound on Comparison-
Based Search
• Theorum: Let L be a list of size n > 1. Suppose that the
elements of L are sorted. If SRH(n) denotes the minimum
number of comparisons needed, in the worst case, by using
a comparison-based algorithm to recognize whether an
element x is in L, then SRH(n) = log2(n + 1).

• Corollary: The binary search algorithm is the optimal
worst-case algorithm for solving search problems by the
comparison method.

Data Structures Using C++             23
Hashing
• To organize the data with a hash table by a
hash function h(x), x is the key of each element
– Hash table HT is stored in an array
– If size of HT is m, then 0≦h(x)<m
h(x)

• Main objectives to choosing hash functions:
– Choose a hash function that is easy to compute
– Minimize the number of collisions

Data Structures Using C++          24
Commonly Used Hash Functions
• Mid-Square
– Hash function, h, computed by squaring the identifier
– Using appropriate number of bits from the middle of
the square to obtain the bucket address
– Middle bits of a square usually depend on all the
characters, it is expected that different keys will yield
different hash addresses with high probability, even
if some of the characters are the same

Data Structures Using C++            25
Midsquare
•   ID : 0000~9999
•   array size:1000
•   Square : 9452*9452=89340304
•   Midsquare : key=9452
h(key)=403

Data Structures Using C++   26
Commonly Used Hash Functions
• Folding
– Key X is partitioned into parts such that all the parts,
except possibly the last parts, are of equal length
– Parts then added, in convenient way, to obtain
• Division (Modular arithmetic)
– Key X is converted into an integer iX
– This integer divided by size of hash table to get
remainder, giving address of X in HT

Data Structures Using C++            27
Folding
• Key = 123456789
•        123
•        456
•        789
•      1368

Data Structures Using C++   28
Commonly Used Hash Functions
Suppose that each key is a string. The following C++ function uses
the division method to compute the address of the key:

int hashFunction(char *key, int keyLength)
{
int sum = 0;
for(int j = 0; j <= keyLength; j++)
sum = sum + static_cast<int>(key[j]);
return (sum % HTSize);
} //end hashFunction

Data Structures Using C++                  29
Pseudorandom method
•x is the key
•H(x)=(a x+ c ) % HTSize
•x=121267
•H(x)=(17*121267+7) % 307
= 41
※ a, c, HTSize all are prime numbers

Data Structures Using C++   30
Collision Resolution
• Algorithms to handle collisions
• Two categories of collision resolution
techniques
– Chaining (open hashing)

Data Structures Using C++   31
Collision Resolution:
Pseudocode implementing linear probing:

hIndex = hashFunction(insertKey);
found = false;
While (HT[hIndex] != emptyKey && !found)
if (HT[hIndex].key == key)
found = true;
else
hIndex = (hIndex + 1) % HTSize; //next slot
if(found)
cerr<<”Duplicate items are not allowed.”<<endl;
else
HT[hIndex] = newItem;

Data Structures Using C++   32
Collision Resolution:
linear probing
Let h be the hash function,
h(X) =t , X is the key of some item
then 0≤t ≤ HTSize-1
The sequence :
t, (t+1)% HTSize, (t+2)% HTSize, (t+3)% HTSize, …
Is called the probe sequence.

Data Structures Using C++      33
Linear Probing

Data Structures Using C++   34
Linear Probing

If the next key has hash address 6,7,8, 9, then slot 9 will be
occupied and causes (primary) clustering.

Data Structures Using C++                35
Linear Probing

Similarly, slot 14 will be occupied with probability 9/20 while 15, or
16 will be 1/20 .

One way to improve linear probing:
(h(X) +i* c)%HTSize //c is a constant
//skip some array position

Data Structures Using C++                  36
Random Probing
• Uses a random number generator to find the next
available slot
• ith slot in the probe sequence is:
(h(X) + ri ) % HTSize
where ri is the ith value in a random permutation of
the numbers 1 to HTSize – 1
• All insertions and searches use the same
sequence of random numbers

Example: if h(X1)=26, h(X2)=35 and r1=2, r2=5, r3=8 then probe
sequence of x1 is 26, 28, 31, 34
probe sequence of x2 is 35, 37, 40, 43

Data Structures Using C++                37
Rehashing
• If a collision occurs with the hash function h,
we use a series of hash functions
h1, h2, h3, …, hs
• If the collision occurs at h(x), the array slots
hi(x) 1≦i ≦s are examined .

Data Structures Using C++         38
• Let h be the hash function,
h(X) =t , X is the key of some item
then 0≤t ≤ HTSize-1
The probe sequence :
t, (t+1)% HTSize, (t+22)% HTSize, (t+32)% HTSize, …
• Reduces primary clustering
Example: if h(X1)=25, h(X2)=96 and h(X3)=34 then
probe sequence of x1 is 25, 26, 29, 34, 41
probe sequence of x2 is 96, 97, 100, 4, 11
probe sequence of x3 is 34, 35,38, 43, 50
even though element 34 of the probe sequence of x3 is the same
as the 4th element of probe sequence of x1,the next are different.

Data Structures Using C++                39
• We do not know if it probes all the positions in
the table
• When HTSize is prime, quadratic probing
probes about half the table before repeating
the probe sequence:
For 0≦ i < j ≦ HTSize
(t+i2)% HTSize=(t+j2)% HTSize               j≧ HTSize/2

• (t+i2)% HTSize=(t+1+3+…+(2i -1))% HTSize

Data Structures Using C++             40
Double hashing
• Both random and quadratic probings eliminate primary
clustering. However they may cause secondary
clustering, because they have the same home position.

• Solution: (h(X) + i* h1(X)) % HTSize
• h(X1)=35, h(X2)=83 and h1(X1)=3, h1(X2)=6
HTSize=101
probe sequence of x1 is 35, 38, 41, 44, 47
probe sequence of x2 is 83, 89, 95, 0, 6

Data Structures Using C++       41
• Suppose after inserting R another item R’,
was inserted in HT, and the home position of
R and R’ is the same . The probe sequence
of R is contained in R’.
When we delete R simply to make the slot
empty, then while searching for R’ will cause

Data Structures Using C++        42

Data Structures Using C++   43

Data Structures Using C++   44
Probing
template <class elemType>
void hashT<elemType>::insert(int hashIndex, const elemType& rec)
{
int pCount ;
int inc;
pCount = 0 ;
inc = 1;

while(indexStatusList[hashIndex] == 1 && HTable[hashIndex] != rec
&& pCount < HTSize / 2)
{
pCount++;
hashIndex = (hashIndex + inc ) % HTSize;
inc = inc + 2;
}
Data Structures Using C++                 45
If (indexStatusList[hashIndex] != 1)
{
HTable[hashIndex] = rec;
indexStatusList[hashIndex] = 1;
length++;
}
else
if ( HTable[hashIndex] == rec)
cerr<<"Error: No duplicates are allowed. " <<endl;
else
cerr<<"Error: The table is full. "
<<"Unable to resolve the collision."<<endl;
}

Data Structures Using C++             46
Collision Resolution:
Chaining (Open Hashing)
HT: array of
pointers
size ≦number
of items

Data Structures Using C++   47
Collision Resolution:
Chaining (Open Hashing)
insertion and deletion are straightforward, if the
hash function is efficient, few keys are hashed to
the same home position, thus, a linked list is short.
If the item size is small, a considerable amount of
space is wasted.

Data Structures Using C++              48
Hashing Analysis
Let

Then a is called the load factor

Data Structures Using C++   49
Linear Probing:
Average Number of Comparisons
1. Successful search

2. Unsuccessful search

Data Structures Using C++   50
Average Number of Comparisons
1. Successful search

2. Unsuccessful search

Data Structures Using C++   51
Chaining:
Average Number of Comparisons
1. Successful search

2. Unsuccessful search

Data Structures Using C++   52
Chapter Summary
• Search Algorithms
– Sequential
– Binary
• Algorithm Analysis
• Hashing
– Hash Table
– Hash function
– Collision Resolution
Data Structures Using C++   53

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 15 posted: 12/11/2011 language: English pages: 53
How are you planning on using Docstoc?