Algorithms and Complexity 2003 Assignment 2

Document Sample
Algorithms and Complexity 2003 Assignment 2 Powered By Docstoc
					       Algorithms and Complexity 2003
                Assignment 2

                              August 13, 2003

    Due date: Wednesday August 20, 2003.
    Total marks: 108.

1     Preliminaries [no marks for this, but you need
      it to test and demonstrate your implementa-
The algorithms you need to implement for this assignment operate on a ran-
domly ordered list of numbers. In order to test your implementations, you need
to be able to generate input for your programs. Write a program that will
output a list of N random integers to a file, in a format that your programs
will be able to use as input. The programs that follow should give the user a
choice between giving input via the keyboard and providing the input in a file.
You should first test your programs using short lists (keyboard input) and then
investigate their performance on long lists (e.g. N > 1000) generated using this

2     Selection [63 marks]
The K Nearest Neighbours algorithm can be used in many applications of
Pattern Recognition, for instance image recognition, speech recognition, classi-
fication of EEG signals, etc. The central problem in this algorithm is to find
the K largest (or smallest) numbers in a list of N numbers. This is related to
the selection problem.

2.1    [5 marks]
Figure 1 gives an algorithm for solving the selection problem, i.e. to find the
Kth largest element of a list. Show that the average-case complexity of this
algorithm is O(KN ).

FindKthLargest( list, N, K )
  list: the values to look through
  N   : the size of the list
  K   : the element to select

  for i = 1:K
    largest = list[1]
    largestLocation = 1
    for j = 2:N-(i-1)
      if list[j] > largest
        largest = list[j]
        largestLocation = j
      end if
    end for
    Swap( list[N-(i-1)], list[largestLocation] )
  end for
  return largest

              Figure 1: Pseudocode for direct selection algorithm

2.2    [20 marks]
Implement the algorithm in Figure 1. Next, adapt it so that, rather than output
the Kth largest element, it outputs the K largest elements, thus solving the
problem mentioned above. To obtain a mark you need to demonstrate both
versions of your program.

2.3    [5 marks]
Figure 2 gives a recursive algorithm for solving the selection problem, using the
divide and conquer technique.
   Partition is a function that chooses a value from the list (e.g. the first
element) and then rearranges the list so that all values larger than the chosen
value are after it in the list and all values smaller than the chosen value are
before it in the list. After the rearrangement, the position of the chosen value
as counted from the end of the list is returned in the variable middle (i.e. if
middle = 3 the chosen value is the 3rd largest element in the list). You’ll need
to design your own pseudocode for this function.
   Show that the average-case complexity of this algorithm is only about 2N
comparisons, giving a substantial improvement over the previous algorithm if
K is large.

KthLargestRecursive( list, start, end, K )
  list : the list of values
  start: the index of the first value to consider
  end : the index of the last value to consider
  K    : the element to select

    if start < end
      Partition( list, start, end, middle )
      middleLocation = end-middle+1
      if middle = K
        return list[middleLocation]
        if K < middle
          return KthLargestRecursive( list, middleLocation+1, end, K)
          return KthLargestRecursive( list, start, middleLocation-1, K-middle)
        end if
      end if
    end if

       Figure 2: Pseudocode for Divide and Conquer selection algorithm

2.4    [3 marks]
What would the complexity be if you were to adapt this algorithm to solve the
K nearest neighbours problem? Would this be a sensible thing to do?

2.5    [30 marks]
Implement the algorithm of Figure 2. Investigate how long the list needs to
be before you can notice a difference in running time for the two selection
algorithms using K = 100.

3     Sorting [45 marks]
Implement the merge sort and quicksort algorithms. Compare the running time
of your implementation with that of selection sort, bubble sort and insertion sort
(you can reuse your implementation of these algorithms from last term; if you
do not have an implementation available you should implement them again).
Investigate how long the list needs to be before you can notice a difference in
running time.

4    Bonus question
Answers to this optional question should be demonstrated separately
to me, and will be worth up to 40 marks, depending on the amount
of initiative displayed.
(Exercise 3.1.3:4, p. 62 of McConnell)
    When you look closely at the insertion sort algorithm, you will notice that
the insertion of a new element basically does a sequential search for the new
location. We saw in Chapter 2 that binary searches are much faster. Consider a
variation on insertion sort that does a binary search to find the correct position
to insert this new element. You should notice that for unique keys the standard
binary search would always return a failure. So for this problem, assume a
revised binary search that returns the location at which this key belongs.
    Implement and analyse the complexity (worst-case and average-case) of this
new insertion sort.


Shared By: