Sorting

Document Sample
Sorting Powered By Docstoc
					     Sorting – Part II

CS 367 – Introduction to Data Structures
             Better Sorting
• The problem with all previous examples is
  the O(n2) performance
  – this may be acceptable for small data sets,
    but not large ones
• Theoretically, O(n log n) is possible
  – see proof in Section 9.2 of the book
                 Heap Sort
• Major problem with selection sort
  – it has to search entire back end of array on
    every search for next smallest item
  – what if we could make this search faster?
• A heap always keeps the largest element
  at the top
  – it only takes O(log n) to remove the top
  – O(log n) is much better than O(n) search time
    of selection sort
                   Heap Sort
• Basic procedure
  – build a heap
  – swap the root with the last element
  – rebuild the heap excluding the last element
     • the last element is where it is supposed to be
  – repeat until only one item left in the heap
            Heap Sort - Conceptually
            Z
                                    0   1   2   3   4   5   6
    X               M       queue                           Z

T       N       J       L

            X
                                    0   1   2   3   4   5   6
    T               M       queue                       X   Z

L       N       J

            T
                                    0   1   2   3   4   5   6
    N               M       queue                   T   X   Z

L       J
    Heap Sort - Implementation
0   1      2    3   4   5       6
Z   X     M T       N   J   L

        swap 0 and 6, rebuild

0   1      2    3   4   5       6
X   T     M L       N   J   Z                      0    1   2   3   4   5    6
                                                   J    L   M N     T   X    Z
        swap 0 and 5, rebuild
                                                        swap 0 and 1, done
0   1      2    3   4   5       6
                                                   0    1   2   3   4   5    6
T   N     M L       J   X   Z
                                                   L    J   M N     T   X    Z
        swap 0 and 4, rebuild
                                                        swap 0 and 2, rebuild
0   1      2    3   4   5       6                  0    1 2 3 4 5 6
N   L     M J       T   X   Z                      M L      J   N   T   X    Z
                                swap 0 and 3, rebuild
            Building the Heap
• The heap will be build within the array
  – no extra data structures will be needed
• Basic idea
  – start at the last non-terminal node
  – restore heap for tree rooted at this node
     • simply swap this node with it’s largest child if the
       child is larger
  – repeat this process for all non-terminal nodes
Building the Heap
  0   1    2   3    4    5   6
 M    J    Z T      X    L N
      compare Z with its children
      (no move made)
  0   1    2   3    4    5   6
 M J       Z T      X    L N
      compare J with its children
      (swap it with X)
  0   1    2   3    4    5   6
 M X       Z T      J    L N

      compare M with its children
      (swap it with Z and then N)
  0    1 2 3 4 5 6
 Z    X    N T      J    L M        Valid Heap
                  Building the Heap
• Code to re-build the heap
      void moveDown(Object[ ] data, int first, int last) {
       int child = 2 * first + 1;
       while(child <= last) {
           if((child < last) && ((child + 1) <= last)) {
              if(data[child] < data[child + 1]) { child++; }
          if(data[first] < data[child]) {
              swap(first, child);
              first = child;
              child = 2 * child + 1;
           }
          else { break; }
        }
  }
                           Heap Sort
• Code to build the heap and sort it
    void heapSort(Object[ ] data) {
        // build the heap out of the data
        for(int i=data.length / 2; i >= 0; i--)
           moveDown(data, i, data.length – 1);

        // now sort it
        for(int i = data.length – 1; i < 0; i--) {
           swap(0, i);
           moveDown(data, 0, i – 1);
        }
    }
                 Heap Sort
• Time to build the heap in worst case
  – O(n)
  – proof can be found in Section 6.9.2 of book
• Number of swaps to perform
  – always (n – 1)
• Performance to rebuild the heap
  – O(n log n)
• Overall performance
  – O(n) + (n-1) + O(n log n) = O(n log n)
                      Quicksort
• Basic procedure
  – divide the initial array into two parts
     • all of the elements in the left side must be smaller than all of
       the elements in the right side
  – sort the two arrays separately and put them back
    together
     • we now have a completely sorted array
  – however, before sorting the two arrays, divided them
    each into two more arrays
     • we now have a total of 4 arrays
     • smallest elements in far left and largest in far right
  – repeat this process until only 1 element arrays remain
     • put them all together and the overall array is sorted
                            Quicksort
                    0       1       2       3   4       5       6
                    M       J       Z X         T       L N
                                        break into two parts

                0       1       2       3           0       1       2
            M           J       L N                 Z       X       T

                                        break into four parts

    0       1           0       1               0       1               0
    J       L           M       N               X       T           Z

                                        break into 7 parts

0       0               0           0           0           0               0
J       L               M           N           T       X                   Z
         Quicksort - Implementing
•    Steps
    1. move the largest value to the highest spot
         –   this prevents some array overflow problems
    2. pick an upper bound for the left sub-array
         –   pick the value in the center of the array
         –   move this to first element so it doesn’t get moved
    3.   move all elements less than this to left side
    4.   move all elements greater to the right side
    5.   bound will now be in its final position
    6.   repeat with the two new arrays
         –   from 0 to index(bound) – 1
         –   from index(bound) + 1 to array.length - 1
            Quicksort - Implementing
void quickSort(Object[ ] data) {
   if(data.length < 2) { return; }
   int max = 0;

    // find the highest value and put it in top spot
    for(int i=1; i<data.length; i++)
           if(data[i] > data[max]) { max = i; }
    swap(max, data.length – 1);

    // start the real algorithm
    quickSort(data, 0, data.length – 2);
}
           Quicksort - Implementing
void quickSort(Object[ ] data, int first, int last) {
   int lower = first + 1, upper = last;
   swap(first, (first + last) / 2); // find the bound
   Comparable bound = data[first];
   while(lower <= upper) { // divides the array in half
          while(data[lower] < bound) { lower++; } // lowers that are right
          while(data[upper] > bound) { upper--; } // uppers that are right
          if(lower < upper) { swap(lower++, upper--); }
          else { lower++; } // arrays are already split
   }
   swap(upper, first); // puts bound in its final location
   if(first < upper – 1) { quickSort(data, first, upper – 1); }
   if(upper + 1 < last) { quickSort(data, upper + 1, last); }
}
      Quicksort Performance
• Worst case
  – consider selecting the smallest (or largest)
    number as the bound
  – then all of the numbers end up on one “side”
  – consider the sorting the following array
     • [5 3 2 1 4 6 8]
     • 1 will be the first bound and end up in its proper
       location
     • however, there will still be n – 1 elements to sort
     • this will happen on each iteration
  – the result is an O(n2) algorithm
      Quicksort Performance
• So what’s the average case?
  – the answer is O(n log n)
• In practice, quicksort is usually the best
  sorting algorithm
  – the closer the bound is to the median, the
    better it is
  – beware, for arrays under 30 elements,
    insertion sort is more efficient
     • can you think how quicksort and insertion sort
       could be combined?
                 Mergesort
• One of the first ever sorting algorithms
  used on a computer
• It works on a principle similar to quicksort
  – each array is broken into two parts and then
    sorted separately
  – this partition and sort method continues until
    only single element arrays exist
  – then all of the arrays are put back together to
    form a sorted array
                 Mergesort
• Big difference from quicksort is that the
  arrays are always broken into equal
  partitions
  – or in the case of an odd sized array, as close
    as possible to even
• There is no bound selected
• To put the arrays back together, simply
  select the smallest element from either
  array and make it next
                            Mergesort
                    0       1           2       3       4       5           6
                    Z       M J R                       X       T V
                                            break into 7 parts

0       0               0               0               0           0               0
Z       M               J           R               X           T                   V

0       1                   0       1                   0       1
M       Z               J           R                   T       X

    0       1       2           3                               0           1   2
    J       M R Z                                               T           V   X


                0       1           2       3       4       5           6
                J       M R T                       V       X Z
                  Merging
• The most sophisticated part of mergesort
  is recombining (or merging) two separate
  arrays
• Just go through each array selecting the
  smallest remaining element from each
  array
  – add it to the new array
                        Merging
• Pseudo-code
    merge(array, first, last) {
      mid = (first + last) / 2;
      i1= 0;
      i2 = first;
      i3 = mid + 1;
      while( // both left and right sub-arrays contain elements ) {
              if(array[i2] < array[i3]) { tmp[i1++] = array[i2++]; }
              else { tmp[i1++ = array[i3++]; }
      }
      // load into temp array remaining elements of array
      // copy elements in temp back into array
    }
                     Mergesort
• Once the merge code is done, the code for
  mergesort is easy
• Psuedo-code
    mergeSort(data, first, last) {
      if(first < last) {
               mid = (first + last) / 2;
               mergeSort(data, first, mid);
               mergeSort(data, mid + 1, last);
               merge(data, first, last);
      }
    }
      Mergesort Performance
• Mergesort produces a lot of copying in
  memory
• It also requires extra storage space for the
  temporary array
  – this can be prohibitive for very large data sets

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:13
posted:9/10/2011
language:English
pages:25