Median/Order Statistics Algorithms by 2Xcm63

VIEWS: 7 PAGES: 13

									Median/Order Statistics Algorithms

• Minimum and Maximum
• Selection in expected linear time
• Selection in worst-case linear time
     Minimum and Maximum
• How many comparisons are sufficient to
  find minimum/maximum?
• How many comparisons are sufficient to
  find both minimum AND maximum?
• Show n + log n - 2 comparisons are
  sufficient to find second minimum (and
  minimum)
   Selection (Median) Problem
• How quickly can we find the median (or in
  general the kth largest element) of an
  unsorted list of numbers?
• Two approaches
  – Quicksort partition algorithm expected Θ(n)
    time but Ω(n2) time in the worst-case
  – Deterministic Θ(n) time in the worst-case
         Quicksort Approach
• int Select(int A[], k, low, high)
  – Choose a pivot item
  – Determine rank of pivot element in current
    partition
     • Compare all items to this pivot element
  – If pivot is kth item, return pivot
  – Else update low and high and recurse on
    partition that contains kth item
              Example
k=5                      low high rank
17 12 6 23 19 8 5 10      1    8
 6 8 5 10 17 12 23 19     5    8    4
           17 12 19 23    5    6    7
           12 17          found:    5
          Probabilistic Analysis
• Assume each of n! permutations is equally likely
• Modify earlier indicator variable analysis of
  quicksort to handle this k-selection problem
• What is probability ith smallest item is compared
  to jth smallest item?
   – If k is contained in (i..j)?
   – If k ≤ i?
   – If k ≥ j?
  Cases where (i..j) do not contain k
• Case k ≥ j:
   – Σ(i=1 to k-1) Σj = i+1 to k 2/(k-i+1) = Σi=1 to k-1 (k-i) 2/(k-i+1)
                                          = Σi=1 to k-1 2i/(i+1) [replace k-i with i]
                                          = 2 Σi=1 to k-1 i/(i+1)
                                          ≤ 2(k-1)
• Case k ≤ i:
   – Σ(j=k+1 to n) Σi = k to j-1 2/(j-k+1) = Σj=k+1 to n (j-k) 2/(j-k+1)
                                          = Σj = 1 to n-k 2j/(j+1)
                                          [replace j-k with j and change bounds]
                                          = 2 Σj=1 to n-k j/(j+1)
                                         ≥ 2(n-k)
• Total for both cases is ≤ 2n-2
      Case where (i..j) contains k
• At most 1 interval of size 3 contains k
   – i=k-1, j=k+1
• At most 2 intervals of size 4 contain k
   – i=k-1, j=k+2 and i=k-2, j= k+1
• In general, at most q-2 intervals of size q contain k
• Thus we get Σ(q=3 to n) (q-2)2/q ≤ Σ(q=3 to n) 2 = 2(n-2)
• Summing together all cases we see the expected number of
  comparisons is less than 4n
        Best case, Worst-case
• Best case running time?
• What happens in the worst-case?
  – Pivot element chosen is always what?
  – This leads to comparing all possible pairs
  – This leads to Θ(n2) comparisons
    Deterministic O(n) approach
• Need to guarantee a good pivot element while
  doing O(n) work to find the pivot element
• int Select(int A[], k, low, high)
   – Choosing pivot element
      • Divide into groups of 5
      • For each group of 5, find that group’s median
      • Use median of the medians as pivot element
   – Determine rank of pivot element
      • Compare some remaining items directly to median
   – Update low and high and recurse on partition that
     contains kth item (or return kth item if it is pivot)
 Guarantees on the pivot element


• Median of medians is guaranteed to be smaller than all the
  red colored items
   – Why?
   – How many red items are there?
• Likewise, median of medians is guaranteed to be larger
  than the blue colored items
• Thus median of medians is in the range:
• What elements do we need to compare to pivot to
  determine its rank?
   – How many of these are there?
   Analysis of number of comparisons
• int Select(int A[], k, low, high)    • Analysis
   – Choosing pivot element               – Choosing pivot element
         • For each group of 5, find            • c1 n/5
           that group’s median                      – c1 for median of 5
         • Find the median of the               • Recurse on problem of
           medians                                size n/5
    – Compare remaining items               – c2 n comparisons
      directly to median
    – Recurse on correct partition          – Recurse on problem of size
                                              at most 7n/10
                                       •   T(n) =
    Solving recurrence relation
• T(n) = T(7n/10) + T(n/5) + O(n)
  – Key observation: 7/10 + 1/5 = 9/10 < 1
• Prove T(n) ≤ cn for some constant c by
  induction on n
• T(n) = 7cn/10 + cn/5 + dn
•      = 9cn/10 + dn
• Need 9cn/10 + dn ≤ cn
• Thus c/10 ≥ d  c ≥ 10d

								
To top