VIEWS: 7 PAGES: 13 POSTED ON: 11/23/2011 Public Domain
Median/Order Statistics Algorithms • Minimum and Maximum • Selection in expected linear time • Selection in worst-case linear time Minimum and Maximum • How many comparisons are sufficient to find minimum/maximum? • How many comparisons are sufficient to find both minimum AND maximum? • Show n + log n - 2 comparisons are sufficient to find second minimum (and minimum) Selection (Median) Problem • How quickly can we find the median (or in general the kth largest element) of an unsorted list of numbers? • Two approaches – Quicksort partition algorithm expected Θ(n) time but Ω(n2) time in the worst-case – Deterministic Θ(n) time in the worst-case Quicksort Approach • int Select(int A[], k, low, high) – Choose a pivot item – Determine rank of pivot element in current partition • Compare all items to this pivot element – If pivot is kth item, return pivot – Else update low and high and recurse on partition that contains kth item Example k=5 low high rank 17 12 6 23 19 8 5 10 1 8 6 8 5 10 17 12 23 19 5 8 4 17 12 19 23 5 6 7 12 17 found: 5 Probabilistic Analysis • Assume each of n! permutations is equally likely • Modify earlier indicator variable analysis of quicksort to handle this k-selection problem • What is probability ith smallest item is compared to jth smallest item? – If k is contained in (i..j)? – If k ≤ i? – If k ≥ j? Cases where (i..j) do not contain k • Case k ≥ j: – Σ(i=1 to k-1) Σj = i+1 to k 2/(k-i+1) = Σi=1 to k-1 (k-i) 2/(k-i+1) = Σi=1 to k-1 2i/(i+1) [replace k-i with i] = 2 Σi=1 to k-1 i/(i+1) ≤ 2(k-1) • Case k ≤ i: – Σ(j=k+1 to n) Σi = k to j-1 2/(j-k+1) = Σj=k+1 to n (j-k) 2/(j-k+1) = Σj = 1 to n-k 2j/(j+1) [replace j-k with j and change bounds] = 2 Σj=1 to n-k j/(j+1) ≥ 2(n-k) • Total for both cases is ≤ 2n-2 Case where (i..j) contains k • At most 1 interval of size 3 contains k – i=k-1, j=k+1 • At most 2 intervals of size 4 contain k – i=k-1, j=k+2 and i=k-2, j= k+1 • In general, at most q-2 intervals of size q contain k • Thus we get Σ(q=3 to n) (q-2)2/q ≤ Σ(q=3 to n) 2 = 2(n-2) • Summing together all cases we see the expected number of comparisons is less than 4n Best case, Worst-case • Best case running time? • What happens in the worst-case? – Pivot element chosen is always what? – This leads to comparing all possible pairs – This leads to Θ(n2) comparisons Deterministic O(n) approach • Need to guarantee a good pivot element while doing O(n) work to find the pivot element • int Select(int A[], k, low, high) – Choosing pivot element • Divide into groups of 5 • For each group of 5, find that group’s median • Use median of the medians as pivot element – Determine rank of pivot element • Compare some remaining items directly to median – Update low and high and recurse on partition that contains kth item (or return kth item if it is pivot) Guarantees on the pivot element • Median of medians is guaranteed to be smaller than all the red colored items – Why? – How many red items are there? • Likewise, median of medians is guaranteed to be larger than the blue colored items • Thus median of medians is in the range: • What elements do we need to compare to pivot to determine its rank? – How many of these are there? Analysis of number of comparisons • int Select(int A[], k, low, high) • Analysis – Choosing pivot element – Choosing pivot element • For each group of 5, find • c1 n/5 that group’s median – c1 for median of 5 • Find the median of the • Recurse on problem of medians size n/5 – Compare remaining items – c2 n comparisons directly to median – Recurse on correct partition – Recurse on problem of size at most 7n/10 • T(n) = Solving recurrence relation • T(n) = T(7n/10) + T(n/5) + O(n) – Key observation: 7/10 + 1/5 = 9/10 < 1 • Prove T(n) ≤ cn for some constant c by induction on n • T(n) = 7cn/10 + cn/5 + dn • = 9cn/10 + dn • Need 9cn/10 + dn ≤ cn • Thus c/10 ≥ d c ≥ 10d