VIEWS: 12 PAGES: 69 POSTED ON: 8/29/2012
Sorting • Three main types so far: – Bubble sort – Selection sort – Insertion sort • Computing costs in terms of: – Number of comparisons required – Mildly concerned about number of swaps • Best so far: insertion sort – On each iteration has the possibility of stopping early – Leads to improved best and average case over worst case Improving Sorts • Better sorting algorithms rely on divide and conquer (recursion) – In essence, can think of it as trying to reduce the number of comparisons we must make for a number against other numbers • New algorithms will consist of three major steps: – Find an efficient technique for splitting data – Sort the splits separately – Find an efficient technique for merging the data • Process will be done recursively: Continually work with smaller and smaller groups • We’ll see two examples – One does most of its work splitting – One does most of its work merging Quicksort General Quicksort Algorithm: – Select an element from the array to be the pivot – Rearrange the elements of the array into a left and right subarray • All values in the left subarray are < pivot • All values in the right subarray are > pivot – Independently sort the two subarrays – No merging required, as left and right are independent problems Quicksort void quicksort(int* arrayOfInts, int first, int last) { int pivot; if (first < last) // while I have still have data { pivot = partition(arrayOfInts, // find a pivot first, last); // quicksort the left quicksort(arrayOfInts,first,pivot -1); quicksort(arrayOfInts,pivot+1,las // quciksort the right t); } // leave pivot point alone! } Deceptively simple? Does work splitting – let’s look at partition function! Quicksort int partition(int* arrayOfInts, int first, int last) { int temp; int p = first; // set pivot = first index for (int k = first+1; k <= last; k++) { // for every other index if (arrayOfInts[k] <= arrayOfInts[first]) { // if data is smaller than pivot p = p + 1; // update final pivot location temp = arrayOfInts[k]; // rearrange data arrayOfInts[k] = arrayOfInts[p]; arrayOfInts[p] = temp; } } temp = arrayOfInts[p]; // put pivot in right spot arrayOfInts[p] = arrayOfInts[first]; arrayOfInts[first] = temp; return p; } Partition Step Through partition(cards, 0, 4) P=0K=1 P=1K=3 cards[1] < cards[0] ? No cards[3] < cards[0]? Yes P=2 P=0K=2 temp = cards[3] cards[2] < cards[0] ? Yes cards[3] = cards[2] P=1 cards[2] = cards[3] temp = cards[2] P=2K=4 cards[2] = cards[1] cards[4] < cards[0]? No cards[1] = temp After this partition call, repeat quicksort temp = cards[2], cards[2] = cards[first] on subarrays (0-1), (3-4) cards[first] = temp, return p = 2; Partition Algorithm • Book (pg 495) has another version of partition which is probably a little faster in practice (not asymptotically)! Complexity of Quicksort • Note that for any recursive call, I essentially have to look at every entry in each sub-array. • Let’s assume for now that we always choose the perfect middle pivot n 1 – Our 1 problems turns into 2 2 subproblems – Each of those turns into 2 smaller subproblems – Each of those turns into 2 smaller subproblems… – Once we get down to one element, we: • Can’t decompose any further • We are sorted! – Let’s draw out a tree of the problems being solved Complexity of Quicksort Recursion Tree of Calls Complexity of Binary Search Actually, this is almost exactly the same process as binary search, except we are doing more in each recursive calls What is the tree for binary search? Un-idealistic Quicksearch • Now, what if we weren’t able to split in half each time? – What would the arrays look like after a bad pivot calls – What type of data leads us to this setup? – What would the recursion tree look like? Complexity of Quicksort Recursion Tree of Calls Complexity of Quicksort • Worst case is O(n2) • On average is O(n log2n), and its average a lot of the time. Improving Quicksort • Worst case is already sorted and reverse sorted! • For our other sort, that had different cases, how did already sorted and reverse sorted fare? • How might we improve quicksort? • Could sweep once through • Could improve partitioning mechanism – How might you come up with a simple way of ensuring likely that you will half the arrays? – If data sorted? – If reverse sorted – If data not sorted? Choose the median of first, middle, last elements. Recursion Time Efficiency: Recurrence Relations • Want to build a function f(n) which describes the cost of performing the recursion • Needs to support three components: – Amount of work needed for current iteration • Cost of basic function requirements – Number of subproblems that have to be solved – Size (in terms of input) of subproblems to be solved • How much smaller is each of the subproblems Relating recurrence relations to trees – Number of subproblems that have to be solved • Number of branches on each level – Amount of work needed for current iteration – Size (in terms of input) of subproblems to be solved • Cost of sub-problems • How many levels appear in the tree Recursive Binary Search • Recursive Binary Search Requirements: – Amount of work needed for current iteration • 1 Comparison (inputValue versus middle index) • Essentially constant amount of work, not dependent on size of array being searched – Number of subproblems that have to be solved • 1 Subproblem (the left OR right side of the array) – Size (in terms of input) of subproblems to be solved • Subproblem is half the size of the original problem Quicksort • Quicksort Requirements: – Amount of work needed for current iteration • Essentially, a comparison for every element in the current array (actually n-1 but close enough) – Number of sub-problems that have to be solved • 2 sub-problems (the left AND right sides of the array) – Size (in terms of input) of sub-problems to be solved • Sub-problem is half the size of the original problem Recurrence Relation • General Recurrence Relation: T(n) = aT(n/b) + cnk a = number of sub problems b = 1/size of subproblems f(n) = current iteration work = constant * nk Binary Search Recurrence Relation • Binary Search: T(n) = 1T(n/2) + n0 Now, is this anything like we’ve seen before? Shouldn’t we be able to get log n out of this? We need a closed form version of this function. The way we’ll see to do this is through ‘expansion’ and ‘backwards substitution’. Actually, this is only going to work for a subset (but large) set of problems we are interested in! Binary Search Recurrence Relation • First we need a base case: T(1) = 1 (one operation to handle searching one element array) • And our relation: T(n) = 1T(n/2) + cn0 = T(n/2) + c Now, what is T(N/2)? T(n/4) + c Let’s substitute that back into our original equation T(n) = (T(n/4) + c) + c Binary Search Recurrence Relation T(n) = (T(n/4) + c) + c T(n/4) = T(n/8) + c T(n) = (T(n/8) + c) + c) + c T(n/8) = T(n/16) + c T(n) = (T(n/16)+c) +c) +c) +c … Find a pattern T(n) = T(n/2i) + ic Assume our data sizes are size 2k T(2k) = T(2k/2i)+ic Binary Search Recurrence Relation T(2k) = T(2k/2i)+ic What i do we need to get to our base case: T(1) = 1 Let i = k T(2k) = T(2k/2k)+kc T(2k) = T(1) + kc T(2k) = 1+kc Binary Search Recurrence Relation Translate 2k back to n T(2k) = 1+kc T(n) = 1 + c*log2n Quicksort Recurrence Relation Base case: T(1) = 0 Recurrence: T(n) = 2T(n/2) + n Are these the right entries for the recurrence relation? Quicksort Recurrence Relation T(n) = 2T(n/2) + n T(n/2) = 2T(n/4) + n/2 T(n) = 2(2T(n/4) + n/2) + n T(n/4) = 2T(n/8) + n/4 T(n) = 2(2(2T(n/8) + n/4) + n/2) + n = 8T(n/8) + 3n = 2iT(n/2i) + in Quicksort Recurrence Relation T(n) = 2iT(n/2i) + in T(2k) = 2iT(2k/2i) + i2k Let i = k to get to base case of T(1) = 0 T(2k) = 2kT(2k/2k) + k2k T(2k) = 2kT(1) + k2k T(2k) = 0 + k2k T(2k) = k2k Translate 2K back to n T(n) = n*log2n Complexity of Quicksort • Requires stack space to implement recursion • Worst case: – If pivot breaks into 1 element and n-1 element subarrays – O(n) stack space • Average case – Pivot splits evenly – O(log n) MergeSort • General Mergesort Algorithm: – Recursively split subarrays in half – Merge sorted subarrays – Splitting is first in recursive call, so continues until have one item subarrays • One item subarrays are by definition sorted – Merge recombines subarrays so result is sorted • 1+1 item subarrays => 2 item subarrays • 2+2 item subarrays => 4 item subarrays • Use fact that subarrays are sorted to simplify merge algorithm Another T(n) Example T(n) = 4 T(n/2) + cn T(1) = 1 Another T(n) Example T(n) = 4 T(n/2) + cn T(n/2) = 4T(n/4) + cn/2 T(n) = 4(4T(n/4) + cn/2) + cn T(n/4) = 4T(n/8) + cn/4 T(n) = 4(4(4T(n/8) + cn/4) + cn/2) + cn The pattern is: T(n) = 4iT(n/2i) + (4cn+2cn+cn) T(n) = 4iT(n/2i) + (2i-1)cn Another T(n) Example T(n) = 4iT(n/2i) + (2i-1)cn Let n = 2k T(2k) = 4iT(2k/2i)+(2i-1)c2k Let i = k to get to T(1) T(2k) = 4kT(2k/2k)+(2k-1)c2k T(2k) = 4kT(1)+(22k-2k)c T(2k) = 4k+(22k-2k)c 4k = (22)k =22k = (2k)2 T(2k)=(2k)2+((2k)2-2k)c Change 2k back to n T(n) = n2 + (n2-n)*c T(n) = (c+1)n2 – cn O(n2) T(n) Verify Let c = 1 T(1) = 1 T(2) = 4T(1) + 2 = 4+2 = 6 T(4) = 4T(2) + 4 = 4*6+4 = 28 T(8) = 4T(4) + 8 = 4*28 + 8 = 120 T(n) = 2n2-n T(1) = 2*1-1 = 2-1 = 1 T(2) = 2*4-2 = 8-2 = 6 T(4) = 2*16 – 4 = 32-4 = 28 T(8) = 2*64 – 8 = 128-8 = 120 MergeSort void mergesort(int* array, int* tempArray, int low, int high, int size) { if (low < high) { int middle = (low + high) / 2; mergesort(array,tempArray,low,middle, size); mergesort(array,tempArray,middle+1, high, size); merge(array,tempArray,low,middle,high, size); } } MergeSort void merge(int* array, int* tempArray, int low, int middle, int high, int size) { int i, j, k; for (i = low; i <= high; i++) { tempArray[i] = array[i]; } // copy into temp array i = low; j = middle+1; k = low; while ((i <= middle) && (j <= high)) { // merging starts here if (tempArray[i] <= tempArray[j]) // if lhs item is smaller array[k++] = tempArray[i++]; // put in final array, increment else // final array position, lhs index array[k++] = tempArray[j++]; // else put rhs item in final array } // increment final array position // rhs index while (i <= middle) // one of the two will run out array[k++] = tempArray[i++]; // copy the rest of the data } // only need to copy if in lhs array // rhs array already in right place MergeSort Example Recursively Split MergeSort Example Recursively Split MergeSort Example Merge Merge Sort Example 2 cards Not very interesting Think of as swap Temp Array Array Temp[i] < Temp[j] Yes j k i MergeSort Example Temp Array Array Temp[i] < Temp[j] No i j k Update J, K by 1 => Hit Limit of Internal While Loop, as J > High Now Copy until I > Middle Note: If RHS we Array were copying at this stage, really already in place k MergeSort Example 2 Card Swap Final after merging above sets Time Complexity • Merge Sort Tree – Cost per level? What is largest number of comparisons I might have to do in merging? Time Complexity • Merge Sort Recurrence Relation: T(n) = 2T(n/2) + cn1 Have we seen this before? Quicksort Recurrence Relation Base case: T(1) = 0 Recurrence: T(n) = 2T(n/2) + cn1 Are these the right entries for the recurrence relation? Space Complexity of Mergesort • Need an additional O(n) temporary array • Number of recursive calls: – Always O(log2n) Tradeoffs • When it is more useful to: – Just search – Quicksort or Mergesort and search – Assume Z searches Search on random data: Zn Fast Sort and binary search: nlog2n + Z *log2n Tradeoffs Z * n <= nlog2n + Zlog2n Z(n - log2n) <= n log2n Z <= (n log2n) / (n-log2n) Z <= (n log2n) / n [Approximation] Z <= log2n [Approximation] Where as before, had to do N searches to make up for cost of sorting, now only do log2N 1,000,000 items = 19 searches, instead of 1,000,000 How Fast? • Without specific details of what sorting, O(n log2n) is the maximum speed sort possible. – Only available operations: Compare, Swap • Proof: Decision Tree – describes how sort operates – Every vertex represents a comparison, every branch a result – Moving down tree – Tracing a possible run through the algorithm How Fast? K1 <= K2 [1,2,3] Yes No [1,2,3] K2 <= K3 K1 <= K3 [2,1,3] Yes Yes No No [1,2,3] [2,3,1] [2,1,3] stop K2 <= K3 stop K1 <= K3 [1,3,2] Yes No Yes No stop stop stop stop [1,3,2] [3,1,2] [2,3,1] [3,2,1] How fast for sorting? • Number of leaves – number of possible outcomes • In sorting, how many permutations of an array, any of which could be the ‘sorted’ version? – n! • Tree height? – log2(n!) log2(n!) == O(n*log2n) • Derivation: Recurrence: Master Theorem T(n) = aT(n/b) + f (n) where f (n)=cnk 1. a < bk T(n) ~ nk 2. a = bk T(n) ~ nk logbn 3. a > bk T(n) ~ nlog b a Recursive Binary Search T(n) = aT(n/b) + cnk a = number of sub problems = 1 b = 1/size of subproblems = 1/(1/2) => 2 f(n) = current iteration work = 2n0 so k = 0 Compare a to bk: 1 vs 2^0 = 1 vs 1 If they are equal, computational cost is: nk logbn = 1 * log2n => log2n Complexity of Quicksort Recurrence Relation: [Best Case] 2 sub problems ½ size (if good pivot) Partition is O(n) = cn1 a=2b=2k=1 2 == 21 Master Theorem: O(n*log2n) Complexity of Quicksort Recurrence Relation: [Worst Case] – Partition separates into (n-1) and (1) – Can’t use master theorem: b (subproblem size) changes n-1/n n-2/n-1 n-3/n-2 Our Start of Class Example T(n) = 4T(n/2) + cn1 a=4 b=2 k=1 4 vs 21 4 > 21 M.T. says: T(n) ~ nlog b a T(n) ~ O(nlog24) = O(n2) Sorts Using Additional Information • We can break the O(n log2n) barrier if we can do more than element comparisons and swaps – Data specific sorting instead of generalized Radix Sort Also called bin sort: Repeatedly shuffle data into small bins Collect data from bins into new deck Repeat until sorted Appropriate method of shuffling and collecting? For integers, key is to shuffle data into bins on a per digit basis, starting with the rightmost (ones digit) Collect in order, from bin 0 to bin 9, and left to right within a bin Radix Sort: Ones Digit Data: 459 254 472 534 649 239 432 654 477 Bin 0 Bin 1 Bin 2 472 432 Bin 3 Bin 4 254 534 654 Bin 5 Bin 6 Bin 7 477 Bin 8 Bin 9 459 649 239 Radix Sort: Tens Digit Data: 472 432 254 534 654 477 459 649 239 Bin 0 Bin 1 Bin 2 Bin 3 432 534 239 Bin 4 649 Bin 5 254 654 459 Bin 6 Bin 7 472 477 Bin 8 Bin 9 Radix Sort: Hundreds Digit Data: 432 534 239 649 254 654 459 472 477 Bin 0 Bin 1 Bin 2 239 254 Bin 3 Bin 4 432 459 472 477 Bin 5 534 Bin 6 649 654 Bin 7 Bin 8 Bin 9 Final Sorted Data: 239 254 432 459 472 477 534 649 654 Radix Sort Algorithm Begin with current digit as one’s digit While there is still a digit on which to classify { For each number in the master list, Add that number to the appropriate sublist keyed on the current digit For each sublist from 0 to 9 For each number in the sublist Remove the number from the sublist and append to a new master list Advance the current digit one place to the left. } Radix Sort Costs • For each digit in the numbers, you have to distribute all n values in array into k bins, then collect from the k bins back into an array, usually n >> k • #operations ~ max digits * (n + k) ~ (max digits * n) – For distribute, have to be able to access an individual component of value • Integers: % • Strings: charAt[i] • space ~ (one approach) initial array, buckets for each possible component value (10 for integers), each bucket sized to possibly hold all values in array – So all arrays sized right OR – Linked lists with fast adds at back Stable Sorts Stable sort: A sort that preserves the input ordering of elements that are equal Why important? Maintenance of pre-specified ordering / properties An example: Preservation of a previous sort on another key Stable Sorts • Full deck card sorting – Sort cards by type – Clubs < Diamonds < Hearts < Spades – Sort next on value –2<3<4<5<… Gives all 2s together, all 3s together, … with 2C followed by 2D followed by 2H followed by 2S – Stable value sort preserve type Stable Sorts • Which sorts have we seen that are stable? – Insertion Sort • Picking up in order from “extras” pile • Only move past if you are greater than item in front of you – Mergesort • No swapping • Fill in result array from temp array left to right – Merging – Two equal things – one on the left stays on the left • Unstable: – Selection Sort - Why? – Quick Sort - Why? Selection Sort – Unstable void selectionSort(int* a, int size) { for (int k = 0; k < size-1; k++) { int index = mininumIndex(a, k, size); swap(a[k],a[index]); } Unstable because swapping may } rearrange order of original items (Sort doesn’t care what it picks up int minimumIndex(int* a, int first, int last) and moves) { int minIndex = first; for (int j = first + 1; j < last; j++) { if (a[j] < a[minIndex]) minIndex = j; } return minIndex; } Selection Sort - Unstable Do Swap to Put 2 of Spades in Place for Value Sort – Ruins 3’s Type Sort Quicksort int partition(int* arrayOfInts, int first, int last) { int temp; int p = first; // set pivot = first index for (int k = first+1; k <= last; k++) { // for every other index if (arrayOfInts[k] <= arrayOfInts[first]) { // if data on is smaller p = p + 1; // update final pivot location temp = arrayOfInts[k]; // rearrange data arrayOfInts[k] = arrayOfInts[p]; arrayOfInts[p] = temp; } } temp = arrayOfInts[p]; // put pivot in right spot arrayOfInts[p] = arrayOfInts[first]; arrayOfInts[first] = temp; return p; Possibility to move items out of order } Stability is not guaranteed with 5C 2C 3C 6C 9C 6D 3D (swapping 6C, 3D in partition can effect 6D, 6C ordering)