Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

In ordinary English queue is defined as waiting line like

VIEWS: 2 PAGES: 9

									                         DEPARTMENT OF SOFTWARE ENGINEERING
                         COLLEGE OF INFORMATION TECHNOLOGY
                              UNIVERSITI TENAGA NASIONAL

                          CSEB324 DATA STRUCTURES & ALGORITHM


                                       SEARCHING


One of the most common and most time consuming operation in computer science is
searching, the process used to find the location of a target among a list of objects. In this
chapter we will study two basic search algorithms, the sequential search and the binary
search. The sequential search is most commonly used to locate data in a linked list while
the binary search is a structure built to provide the efficiency of the binary search of a tree
structure.


We will consider the method of searching large amount of data to find one particular
piece of information.

Some terms to remember:
      Element in a file is called record.
      Table or file is a group of elements (record)
      Key is used to differentiate among records. This can be simply or complex and
      must be unique.


Searching algorithm is a concept to accept an argument a and tried to find a record whose
key is a.
As the result, one can return the whole record or just a pointer to the record (commonly
used).
This searching process is also called data retrieval




                                          Page 1 of 9
Sequential Searching

This is the simplest searching method. It is used whenever the list is not ordered.
Generally, you will use this technique only for small lists or lists that are not searched
often. In other cases you should first sort the list and then search it using the binary
search discussed later.

Assume
     K is an array containing n keys;
     K(0) …. K(n-1) elements.
     r array of record
     k(i) is the key of r(i)

The algorithm

       for(j = 0; j <n ; j++)
            if (key == k(j))
                 return (j);
       return (-1);

The algorithm examines each key in turn, upon finding one that matches the search
argument, its index is returned. If no match is found, -1 is returned.



Searching an Ordered Table

If the table is stored in ascending or descending order of the record keys, there are several
techniques that can be used to improve the efficiency of searching. This is true if the table
is of fixed size. One obvious advantage in searching a sorted order file over searching an
unsorted file is in the case that the argument key is absent from the file. In the case of an
unsorted file, n comparisons are needed to detect this. Unlike the unsorted file, sorted file
need only n/2 comparisons provided the data is uniformly distributed. This is because we
know the data is given as we encounter a key that is greater than the argument.

Because the simplicity and efficiency of sequential processing on sorted files, it may be
worthwhile to sort a file before searching for keys in it.




                                        Page 2 of 9
Index Sequential Search.

There is another technique to improve search efficiency for a sorted data, but occupy
more space. This method is called the index sequential searching search method. An
auxiliary table, called index table, is set aside in addition to the sorted file itself. Each
element in the index table consists of a key kindex and a pointer pindex to the record in
the file that corresponds to kindex
The elements in the index, as well as the elements in the file, must be sorted on the key.


                                                      key            record
                                                        8                         0
                                                       14                         1
                                                       38                         2
   kindex          pindex                              72                         3
      8               0                               115                         4
    115               4                               321                         5
    500               8                               400                         6
 Index table                                          412                         7
                                                      500                         8
indexsize=4                                           512                         9
                                                      555                         10
                                                      600                         11
                                                    Data file

The algorithm used for searching an indexed sequential file is straightforward. Let r, k,
and key be defined as before, let kindex, be an array of the key in index table, and let the
pindex, be an array of pointer within the index table to the actual record in file. We
assume that the file is stored as an array, that n is the size of the file, and that indexsize is
the size of the index.

The algorithm:

        for (j = 0; j < indexsize && kindex(j) <= key; j++);
        if (j==0)
             lowlim = 0;
        else
             lowlim = pindex(j-1);
        if (j==indexsize)
             hilim = n-1;
        else
             hilim = pindex(j) – 1;
        for ( j = lowlim; j<= hilim && k(j) != key ; j++);
        if (j>hilim )
             return -1;
        else
             return j;



                                          Page 3 of 9
The real advantage of the indexed sequential method is that the items in the table can be
examine sequentially if all the record in the file must be accessed, yet the search time for
a particular item is sharply reduced. A sequential search is performed on a smaller index
rather on the larger table. Once the correct index is found, a second sequential search is
performed on a small portion of the record table itself.

If the table is large that even the use of an index does not achieve sufficient efficiency, a
second index can be used.




                                        Page 4 of 9
Binary Search

The most efficient method of searching a sequential table without the use of auxiliary
indices or tables is binary search. Basically, the argument is compared with the key of
the middle element in the table, if they are equal, the search end successfully; otherwise,
either the upper or lower half of the table must be searched in the similar manner.

The algorithm of binary search:
                low = 0;
                hi = n – 1;
                while (low <= hi )
                     mid = (low + hi) /2;
                     if (key = = k(mid))
                         return(mid);
                     if (key < k(mid))
                           hi = mid – 1;
                     else
                           low = mid + 1;
                end while
                return (-1);

Each comparison in the binary search reduces the number of possible candidates by a
factor of 2. Thus, the maximum number of key comparisons is approximately log2 n.

Note the binary search may be used in conjunction with the indexed sequential table.
Instead of searching the index sequentially, a binary search can be used. The binary
search can also be used in searching the main table once two boundary records are
identified.

Unfortunately, binary search can only be used if the data is stored using an array. This
is because the fact that the indices of array elements are consecutive integer.

To search for an element, perform a binary search on the element array. If the argument
key is not found, the element does not exist in the table.




                                       Page 5 of 9
Tree Searching

As the continuation from previous chapter, we derive the algorithm of tree search as
below:

       p = tree;
       while ( p != NULL && key != key(p))
            if (key < key (p) )
                  p = left(p);
            else
                  p = right(p);
       return (p);

The advantage of using a binary search tree over an array is that a tree enables search,
insertion and deletion operations to be performed efficiently. If an array is used, an
insertion or deletion requires approximately half of the array to be moved. On the other
hand, with binary tree only a few pointer adjustments are needed for deletion and
insertion process.




                                      Page 6 of 9
Hashing

In the data retrieval process, we assumed that the record is sought and stored in a table
and it is necessary to pass through some number of keys before finding the desired one.
The organization of file and the order in which the keys are inserted affect the number of
keys that must be inspected before obtaining the desired one. Obviously, efficient search
techniques are those that can minimize the number of comparison. Optimally, we would
like to have a table organization and search technique in which there are no unnecessary
comparisons.
If each key is to be retrieved in a single access, the location of the record within the table
can be depend only on the key. It may not depend on the location of other key as in tree.
The most efficient way to organize such a table is an array. If the record keys are integers,
the keys themselves can serve as indices to the array

Let assume you have below declaration of an array that represent a collection of data

       partype part[100];

where part[i] represents the record whose part number is i. In this situation, the part
numbers are keys that are used as indices to the array. Even if the total number of parts
are fewer than 100, the same structure can be used to maintain the data, Although many
locations in part may correspond to nonexistent keys, this waste is offset by advantage of
direct access to each of the existent parts.

Unfortunately, however, such a system is not always practical. For example, suppose that
a company has an inventory file of more than 100 items and the key to each record is
seven-digit part number. To use direct indexing using the entire seven-digit key, an array
of 10 million elements is needed. This clearly wasted an unacceptably large amount of
space.

What is necessary is some method of converting a key into an integer within a limited
range. Ideally, no two keys should be converted into the same integer. Unfortunately,
such an ideal method usually does not exist. Let us develop a method to solve this
problem.

Let us reconsider the example is key by seven-digit part number. Suppose that the
company has fewer than 1000 parts and that there is only a single record for each part.
Then an array of 1000 elements is sufficient to store the entire file. The array is indexed
by an integer between 0 to 999 inclusive. The last three digits of the part number are used
as the index for the part’s record in the array.

A function that transforms a key into a table index is called a hash function. If h is a
hash function and key is a key, h(key) is called the hash of key and is the index at which
a record with the key key should be placed. If r is a record whose key hashes into hr,
hr is called the hash key of r, The hash function in the preceding example is h(k) = key
% 1000. The values that h produces should cover the entire set indices in the table. For



                                         Page 7 of 9
example, the function x % 100 can produce any integer 0 and 999, depending on the
value of x. As we shall see shortly, it is a good idea for the table size to be somewhat
larger than the number of records that are to be inserted.

The foregoing method has a flaw. Suppose the two key k1 and k2 are such that h(k1)
equals h(k2). Then when a record with key k1 is entered into the table, it is inserted at
position h(k1). But when k2 is hashed, because its hash key is the same as that of k2, an
attempt may be made to insert the record into the same position where the record with
key k1 is inserted. Clearly, two records cannot occupy the same position. Such a
situation is called a hash collision or a hash clash.

There are two basic methods of dealing with a hash clash. The first technique, called
rehashing, involved using a secondary hash function on the hash key of the item. The
rehash function is applied successively until an empty position is found where the item
can be inserted.

The second technique, called chaining, builds a linked list of all the items whose keys
hash to the same location. During search, this short linked list is traversed sequentially for
desired key.


Choosing a Hash Function

Let us know turn the question to how to choose a good hash function. Clearly, the
function should produce a few clashes as possible, that is, it should spread the key
uniformly over the possible array indices. Of course, unless the keys are known in
advance, it cannot be determined whether a particular hash function disperses them
properly.

1. Direct Hashing
   The key is the address without any algorithmic manipulation. The data structure must
   therefore contain an element for every possible key. While the situations where you
   can use direct hashing are limited, when it can be used it is very powerful because it
   guarantees that there are no synonyms.

2. Subtraction method
   Sometimes we have keys that are consecutive but do not start from one. For example,
   a company may have only 100 employees, but the employee numbers start from 1000
   and go to 1100. In this case we use a very simple hashing function that subtracts
   1000 from the key to determine the address. The beauty of this example is that it is
   simple and guarantees no collisions. Its limitations is similar to direct hashing, where
   it can only be used for small lists.

3. Division method




                                         Page 8 of 9
   This method divides the key by the array size and uses the remainder plus one for the
   address. This gives us the simple hashing algorithm shown below when list size is
   the number of elements in the array.
       address = key % listSize + 1

4. Digit Extraction method
   Selected digits are extracted from the key and used as the address. For example,
   using a six-digit employee number to hash to a three-digit address (000-999), we
   could select the first, third, and fourth digits (from the left) and use them as the
   address.
   Example:
       379452  394
       121267  112
       378845  388




                                      Page 9 of 9

								
To top