VIEWS: 21 PAGES: 17 POSTED ON: 9/4/2011 Public Domain
Introduction to Divide and Conquer Algorithms There exist many problems that can be solved using a divide-and-conquer algorithm. A divide-and-conquer algorithm A follows these general guidelines. • Divide. Algorithm A divides original problem into one or more subproblems of a smaller size. • Conquer. Each subproblem is solved by making a recursive call to A. • Combine. Finally, A combines the subproblem solutions into a ﬁnal solution for the original problem. Some problems that can be solved using a divide-and-conquer algorithm: • Binary Search: Locating an element in a sorted array. • Quicksort and Mergesort: Sorting an array. • Order Statistics: ﬁnding the k th least or greatest element of an array. • Geometric Algorithms: ﬁnding the convex hull of a set of points; ﬁnding two points that are closest. • Matrix Operations: matrix inversion, Fast-Fourier Transform, matrix multiplica- tion, ﬁnding the largest submatrix of 1’s in a Boolean matrix. • Maximum Subsequence Sum: ﬁnd the maximum sum of any subsequence in a sequence of integers. 1 A Review of Quicksort Quicksort is considered in practice to be the most eﬃcient sorting algorithm for arrays of data stored in local memory. Quicksort is a divide-and-conquer algorithm which works in the following manner. Let a[] be the array to be sorted. 1. if a[] is an array with 3 or fewer elements, then sort array using “brute force”; 2. ﬁnd an array element M (called the pivot) which is a good candidate for splitting a[] into two subarrays, alef t [] and aright [], such that x ≤ M for every x ∈ alef t [] and x ≥ M for every x ∈ aright []; 3. swap the elements of a[] so that the elements x ≤ M move to the left side of a[] and the elements x ≥ M move to the right side of a[]; 4. a[] is now of the form a0 , a1 , . . . , ai−1 , ai = M, ai+1 , ai+2 , . . . , an−1 where aj ≤ M for every j ≤ i − 1, and aj ≥ M for every j ≥ i; 5. let alef t = a[0 : (i − 1)] and aright = a[i : (n − 1)]. After both alef t and aright have been sorted using quicksort, the entire array a[] will be sorted. Finding an element to split the array. A median for an array a[] is an element M ∈ a[] which splits the array into two equal pieces, where piece 1 (respectively 2) consists of elements all of which are less than or equal to (respectively, greater than or equal to) M . Although ﬁnding the median M of a[] would satisfy step 2 of quicksort, in practice ﬁnding the median of the entire array seems too costly. On the other hand, a compromise between speed and accuracy that seems to work in practice is to use the median-of-three rule which is take the median of the three elements a[0], a[n − 1], a[ n−1 ]. 2 Swapping elements of a[]. Once a pivot M has been selected, it can be swapped with the last element of the array (of course, in the median-of-three strategy, it may be the last element!). The remaining elements a[0] through a[n − 2] can be swapped using two markers, left and right, which respectively begin on the left and right sides of the array. Both markers move toward the center of the array. A marker stops when it encounters an element which should be on the other side of the array. For example, marker left will stop when it encounters an element x for which x ≥ M . When both markers stop, they swap elements at those points, unless they have crossed one another, in which case the process terminates. 2 Quicksort: • Name of Algorithm: Quicksort • Input: an integer array a[0 : n − 1] of size n • Output: void • Eﬀect: a[] is sorted in increasing order • Begin Algorithm • base case. if n ≤ 3, then sort a[] using a sequence of pairwise swaps • recursive case. • ﬁnd a median a[m] = M of a[] • swap a[m] with a[n − 1] • initialize lef t = 0 and right = n − 2 • while(lef t < right) – let l ≥ lef t be the least integer for which a[l] ≥ M – let r ≤ right be the greatest positve integer for which a[r] ≤ M – set lef t = l – set right = r – if lef t < right then swap a[lef t] with a[right] • swap a[lef t] with a[n − 1] • if lef t > 0 perform quicksort on a[0 : lef t − 1] • if lef t < n − 1 then perform quicksort on a[lef t + 1, n − 1] • End Algorithm 3 Example 1. Demonstrate the quicksort algorithm using the array 5, 8, 6, 2, 7, 1, 0, 9, 3, 4, 6. 4 Complexity of Quicksort. The time complexity (i.e. number of steps T (n) for an array size of n comparables) of quicksort depends on how the median is chosen. Later in this lecture we demonstrate how to ﬁnd an exact median in O(n) steps. Using this approach quicksort has a worst-case complexity of O(n log n). On the other hand, if the median is chosen randomly, then it can be shown that quicksort has O(n log n) average-case complexity, but O(n2 ) worst-case complexity. In practice, the median-of-three approach gives empirically faster running times than both the exact and random approaches. Example 2. Verify that, if an exact median for an array of n comparables can be found in O(n) steps, then Quicksort has a worst-case complexity of O(n log n). 5 Example 3. Consider the following recursive algorithm for sorting, called Mergersort. • Name of Algorithm: Mergesort • Input: – integer array a[0 : n − 1] – integer start – integer end. • Output: void. • Eﬀect: a[start : end] is sorted in increasing order. • Begin Algorithm • Base Case. If end − start ≤ 4, then sort a[start : end] using Insertion Sort. • Recursive Case. • Initialize mid = (start + end)/2. • Perform Mergesort on a[start : mid]. • Perform Mergesort on a[mid + 1 : end]. • Comment: now merge the two sorted arrays • Let tmp[0 : end − start] be the result of performing Merge on arrays a[start : mid] and a[mid + 1 : end] • Copy tmp[0 : end − start] to a[start : end]. • End Algorithm 6 Exercise. Write a pseudocode algorithm for the function Merge which takes as input two sorted integer arrays a and b of sizes m and n respectively, and returns a sorted integer array of size m + n whose elements are the union of elements from a and b. Example 3. Use the Master Theorem to analyze the complexity of Mergesort when start = 0 and end = n − 1. 7 Using Divide-and-Conquer Algorithms for Finding Order Statistics The i th order statistic of a set of n elements is the i th smallest element in the array. A median of a set of elements is some element M for which half the elements are less than or equal to M while the other half are greater than or equal to M . Finding the i th order statistic can be done easily by sorting the array (or set) and returning the i th element in the sorted array. Using quicksort or mergesort, this will take O(n log n) steps. But we can do better by returning to the quicksort algorithm and slightly modifying it in the following ways. 8 • Name of Algorithm: Randomized Find Statistic • Input: – integer array a[0 : n − 1] – integer start – integer end – integer k where where start ≤ k ≤ end. • Output: the k − start least element in a[start : end]. • Eﬀect: a[] may have elements that have been rearranged due to sorting. • Begin Algorithm • Base Case. If end − start ≤ 4, then sort a[start : end] using Insertion Sort and return a[k]. • Recursive Case. • Randomly select an index start ≤ m ≤ end. • Initialize M = a[m]. • Swap a[m] with a[end]. • Initialize lef t = start. • Initialize right = end − 1. • While lef t < right – Let lef t ≤ l ≤ end be the least integer for which a[l] ≥ M . – Let start ≤ r ≤ right be the greatest integer for which a[r] ≤ M . – Set lef t = l. – Set right = r. – If lef t < right then swap a[lef t] with a[right]. • Swap a[lef t] with a[end]. • If k = lef t then return a[k]. • Else if k > lef t return the output of Randomized Find Statistic on inputs a[0 : n − 1], lef t + 1, end, k. • Else return the output of Randomized Find Statistic on inputs a[0 : n − 1], start, lef t − 1, k. • End Algorithm 9 • Name of Algorithm Median-of-Five Find Statistic • Inputs: – integer array a[0 : n − 1] – integer start – integer end – integer k, where 0 ≤ k ≤ n − 1. • Output: the k − start least element in a[start : end]. • Eﬀect: a[] may have elements that have been rearranged due to sorting • Begin Algorithm • Base Case. If end − start ≤ 4, then sort a[start : end] using Insertion Sort. • Recursive Case. • Initialize N = end − start + 1. • Divide the N elements into N groups of ﬁve elements. each, and one group consisting 5 of the remaining N mod 5 elements. N • Find the median x of each of the 5 groups using Insertion Sort. N N • Initialize array medians[0 : 5 − 1] to hold each of the 5 medians. • Intialize M to be the output of Median-of-Five Find Statistic on inputs N – medians[0 : 5 − 1] – 0 N – 5 − 1 1 N – 2 5 . • Quicksort partition the input array a[start : end] around the median M . • Let m be the index of M in the newly partitioned array a[start : end]. • If k = m, then return M . • Else if k < m then return the output of Median-of-Five Find Statistic on inputs a[0 : n − 1], start, m − 1, k. • Else return the output of Median-of-Five Find Statistic on inputs a[0 : n − 1], m + 1, end, k. • End Algorithm 10 Some Probability Facts Let X and Y be ﬁnite random variables, where X has real-valued range space {x1 , . . . , xm } and Y has real-valued range space {y1 , . . . , yn }. Deﬁnition of Expectation. m E[X] = xi P r(X = xi ). i=1 Deﬁnition of Conditional Probability. P r(X = xi , Y = yj ) P r(X = xi |Y = yj ) = . P r(Y = yj ) Law of Total Probability. n P r(X = xi ) = P r(X = xi |Y = yj )P r(Y = yj ). i=1 Proof of the Law of Total Probability. P r(X = xi ) = P r(X = xi ∩ (Y = y1 ∪ · · · ∪ Y = yn )) = P r((X = xi ∩ Y = y1 ) ∪ · · · ∪ (X = xi ∩ Y = yn )) = n n P r(X = xi ∩ Y = yj ) = P r(X = xi , Y = yj ) = j=1 j=1 n P r(X = xi |Y = yj )P r(Y = yj ). j=1 QED Law of Expectation for Sums of Random Variables. Let X1 , . . . , Xk be random variables. Then E[X1 + · · · + Xk ] = E[X1 ] + · · · + E[Xk ]. Proof of the Law of Expectation for k = 2 and using ﬁnite random variables. Let X1 = X, and X2 = Y , where X and Y are as stated above. Then m n E[X + Y ] = (xi + yj )P r(X = xi , Y = yj ) = i=1 j=1 m n m n xi P r(X = xi , Y = yj ) + yj P r(X = xi , Y = yj ) = i=1 j=1 i=1 j=1 11 m n n m xi ( P r(X = xi , Y = yj )) + yj ( P r(X = xi , Y = yj )) = i=1 j=1 j=1 i=1 m n xi P r(X = xi ) + yj P r(Y = yj ) = i=1 j=1 E[X] + E[Y ]. QED Note: the proof for k ≥ 2 using discrete variables can be proved using induction. Conditional Expectation. Given random variables X and Y , the conditional expectation of X given Y is a real-valued random variable, denoted E[X|Y ]. Moreover, the value of the random variable depends upon the value of Y . For example, if Y takes on the value Y = yj , then m E[X|Y ](yj ) = xi P r(X = xi |Y = yj ). i=1 Law of Expectation of Conditional Expectation. E[E[X|Y ]] = E[X] Proof of Law of Expectation of Conditional Expectation. n n E[E[X|Y ]] = ( xi P r(X = xi |Y = yj ))P r(Y = yj ) = j=1 i=1 n n xi ( P r(X = xi |Y = yj )P r(Y = yj )) = i=1 j=1 n xi P r(X = xi ) = E[X]. i=1 QED 12 Theorem 1. Random Find Statistic has O(n) average-case complexity. Proof of Theorem 1. Random Find Statistic has at most n rounds (i.e. recursive calls to Random Find Statistic). Let Li , 1 ≤ i ≤ n, be random variables that denote the length of the subarray of a[] that is considered during round i. Note that L1 = n with probability one, since this is the length of the original array a[]. Note also that the average-case complexity of Random Find Statistic will be O(E[L1 + · · · + Ln ]) = O(E[L1 ] + · · · + E[Ln ]). But notice that 1. E[L1 ] = n 2. For 2 ≤ i ≤ n, E[Li ] = E[E[Li |Li−1 ]] = 1 E[Li−1 ] 2 Hence, by induction, we see that n n n E[Li ] = ≤ 2n. i=1 i=1 2i−1 Thus, the expected number of computational steps is bounded by a linear function of n. QED 13 Theorem 2. Median-of-Five Find Statistic has O(n) worst-case complexity. Proof of Theorem 2. Let T (n) denote the worst-case number of steps needed to implement Median of 5 Find Statistic on an array of size n. Then computational steps needed for Median-of-Five Find Statistic can be broken up into the following pieces. 1. dividing a[] into groups of 5 and ﬁnding the medians of those groups; steps needed: O(n) 2. ﬁnding the median M of the medians-of-5 using a recursive call to Median-of-Five Find Statistic; steps needed: T ( n ) 5 3. swapping elements about the pivot M ; number of steps: O(n) 4. making a recursive call to Median-of-Five Find Statistic on the reduced array; number of steps: T ( 7n + 6). This can be seen by noting that the M is at least as large (small) 10 as 1 n 3n 3( − 2) ≥ −6 2 5 10 7n elements. Thus, both sides of pivot M are guaranteed to have at least 10 + 6 elements. Claim: T ( n ) + T ( 7n + 6) + O(n) = O(n). To prove this we use induction. Notice that, 5 10 since the above claim is with respect to the asymptotic growth of some function, we only need to prove that there exists C > 0 such that n 7n T( ) + T ( + 6) + an ≤ Cn, 5 10 where a > 0 is chosen large enough so that Pieces 1 and 3 take fewer than an steps combined, for suﬃciently large n. The proof of the claim is by induction. Basis Step. For any J, the claim is certainly true for the ﬁrst J values of n, since we simply have to choose C large enough so that C · J exceeds the maximum number of steps needed by the algorithm for any input of size J or less. What should J equal? Stay tuned! Inductive Step. Assume that T (i) ≤ C · i for all 1 ≤ i ≤ n − 1. Then n 7n T (n) ≤ T ( ) + T ( + 6) + an ≤ 5 10 n 7n C + C( + 6) + an ≤ 5 10 Cn 7Cn +C + + 6C + an = 5 10 −Cn Cn + ( + 7C + an), 10 14 which will be at most Cn provided −Cn + 7C + an ≤ 0. 10 But this inequality holds so long as n C ≥ 10a . n − 70 Thus, the basis step should choose J to something like J = 140 so that the inductive step guarantees values of n > 140, which allows for C to be chosen as C ≥ 10a(2) = 20a. QED 15 Example 4. Demonstrate the Random Find Statistic algorithm using the array 5, 8, 16, 2, 7, 11, 0, 9, 3, 4, 6, 7, 3, 15, 5, 12, 4, 7, and k = 3. 16 Example 5. Demonstrate the Median-of-Five Find Statistic algorithm using the array 5, 8, 16, 2, 7, 11, 0, 9, 3, 4, 6, 7, 3, 15, 5, 12, 4, 7, and k = 3. 17