# Median/Order Statistics Algorithms by 2Xcm63

VIEWS: 7 PAGES: 13

• pg 1
```									Median/Order Statistics Algorithms

• Minimum and Maximum
• Selection in expected linear time
• Selection in worst-case linear time
Minimum and Maximum
• How many comparisons are sufficient to
find minimum/maximum?
• How many comparisons are sufficient to
find both minimum AND maximum?
• Show n + log n - 2 comparisons are
sufficient to find second minimum (and
minimum)
Selection (Median) Problem
• How quickly can we find the median (or in
general the kth largest element) of an
unsorted list of numbers?
• Two approaches
– Quicksort partition algorithm expected Θ(n)
time but Ω(n2) time in the worst-case
– Deterministic Θ(n) time in the worst-case
Quicksort Approach
• int Select(int A[], k, low, high)
– Choose a pivot item
– Determine rank of pivot element in current
partition
• Compare all items to this pivot element
– If pivot is kth item, return pivot
– Else update low and high and recurse on
partition that contains kth item
Example
k=5                      low high rank
17 12 6 23 19 8 5 10      1    8
6 8 5 10 17 12 23 19     5    8    4
17 12 19 23    5    6    7
12 17          found:    5
Probabilistic Analysis
• Assume each of n! permutations is equally likely
• Modify earlier indicator variable analysis of
quicksort to handle this k-selection problem
• What is probability ith smallest item is compared
to jth smallest item?
– If k is contained in (i..j)?
– If k ≤ i?
– If k ≥ j?
Cases where (i..j) do not contain k
• Case k ≥ j:
– Σ(i=1 to k-1) Σj = i+1 to k 2/(k-i+1) = Σi=1 to k-1 (k-i) 2/(k-i+1)
= Σi=1 to k-1 2i/(i+1) [replace k-i with i]
= 2 Σi=1 to k-1 i/(i+1)
≤ 2(k-1)
• Case k ≤ i:
– Σ(j=k+1 to n) Σi = k to j-1 2/(j-k+1) = Σj=k+1 to n (j-k) 2/(j-k+1)
= Σj = 1 to n-k 2j/(j+1)
[replace j-k with j and change bounds]
= 2 Σj=1 to n-k j/(j+1)
≥ 2(n-k)
• Total for both cases is ≤ 2n-2
Case where (i..j) contains k
• At most 1 interval of size 3 contains k
– i=k-1, j=k+1
• At most 2 intervals of size 4 contain k
– i=k-1, j=k+2 and i=k-2, j= k+1
• In general, at most q-2 intervals of size q contain k
• Thus we get Σ(q=3 to n) (q-2)2/q ≤ Σ(q=3 to n) 2 = 2(n-2)
• Summing together all cases we see the expected number of
comparisons is less than 4n
Best case, Worst-case
• Best case running time?
• What happens in the worst-case?
– Pivot element chosen is always what?
– This leads to comparing all possible pairs
– This leads to Θ(n2) comparisons
Deterministic O(n) approach
• Need to guarantee a good pivot element while
doing O(n) work to find the pivot element
• int Select(int A[], k, low, high)
– Choosing pivot element
• Divide into groups of 5
• For each group of 5, find that group’s median
• Use median of the medians as pivot element
– Determine rank of pivot element
• Compare some remaining items directly to median
– Update low and high and recurse on partition that
contains kth item (or return kth item if it is pivot)
Guarantees on the pivot element

• Median of medians is guaranteed to be smaller than all the
red colored items
– Why?
– How many red items are there?
• Likewise, median of medians is guaranteed to be larger
than the blue colored items
• Thus median of medians is in the range:
• What elements do we need to compare to pivot to
determine its rank?
– How many of these are there?
Analysis of number of comparisons
• int Select(int A[], k, low, high)    • Analysis
– Choosing pivot element               – Choosing pivot element
• For each group of 5, find            • c1 n/5
that group’s median                      – c1 for median of 5
• Find the median of the               • Recurse on problem of
medians                                size n/5
– Compare remaining items               – c2 n comparisons
directly to median
– Recurse on correct partition          – Recurse on problem of size
at most 7n/10
•   T(n) =
Solving recurrence relation
• T(n) = T(7n/10) + T(n/5) + O(n)
– Key observation: 7/10 + 1/5 = 9/10 < 1
• Prove T(n) ≤ cn for some constant c by
induction on n
• T(n) = 7cn/10 + cn/5 + dn
•      = 9cn/10 + dn
• Need 9cn/10 + dn ≤ cn
• Thus c/10 ≥ d  c ≥ 10d

```
To top