VIEWS: 5 PAGES: 17 POSTED ON: 10/18/2010
Worst-Case Lower Bound for Comparison- Based Sorting Algorithms This lecture is a little bit abstract; we try to prove the following theorem. “Any comparison based sorting algorithm takes Ω(n logn) to sort a list of n distinct elements in the worst-case.” worst-case lower bound of 1 compariosn sorts Understanding the theorem “Any comparison based sorting algorithm takes Ω(n logn) to sort a list of n distinct elements in the worst-case.” A comparison based sorting algorithm (comparison sort for short) produces a sorted order which is determined ONLY on comparisons („>‟, „<‟, „≥’ , ‘≤’, etc.) between the input keys. It cannot inspect the values of the keys to gain information about the sorted order. E.g., insertion sort, merge sort, quick sort, and heap sort are all comparison based sorting algorithms. worst-case lower bound of 2 compariosn sorts Understanding the theorem “Any comparison based sorting algorithm takes Ω(n logn) to sort a list of n distinct elements in the worst-case.” ↔ The worst-case running time of any comparison sort to sort n distinct elements is AT LEAST Ω(n logn). ↔ The worst-case lower bound of comparison sort (AT LEAST) is Ω(n logn). ↔ There does not exist any comparison sort that runs faster than Ω(n logn) in the worst-case. worst-case lower bound of 3 compariosn sorts Understanding the theorem “Any comparison based sorting algorithm takes Ω(n logn) to sort a list of n distinct elements in the worst-case.” Quick checking on our knowledge 1. The insertion sort takes (n2) in the worst case (so it takes at least Ω(n logn)). ok 2. The merge sort takes (n log n) in the worst case. ok 3. The quick sort takes (n2) in the worst case. ok 4. The heap sort takes (n log n) in the worst case. ok worst-case lower bound of 4 compariosn sorts Understanding the theorem “Any comparison based sorting algorithm takes Ω(n logn) to sort a list of n distinct elements in the worst-case.” So all the (comparison based) sorting algorithm we learnt so far follow the theorem. But is it a format proof ? Definitely not, as the theorem states that ANY (EVERY) (not just a particular) comparison based sorting algorithm takes at least Ω(n logn) in the worst case. worst-case lower bound of 5 compariosn sorts New idea is needed to prove the theorem “Any comparison based sorting algorithm takes Ω(n logn) to sort a list of n distinct elements in the worst-case.” Our old strategy to analyze the running time of a particular algorithm cannot be used to prove the theorem. We need to have a new model to argue the worst-case running time for any comparison based sorting algorithm. We use a decision tree model to do the analysis. worst-case lower bound of 6 compariosn sorts Decision Tree Model Suppose the input sequence is a permutation taken from a set of n distinct numbers : {a1, a2, a3, a4, …, an} So there are n! possible input sequence. The comparison sort should be able to sort all n! sequences. A comparison sort uses comparison between the elements to gain order information about the input sequence, i.e., ai < aj, ai > aj, etc. Denote this kind of key comparison performed by comparison sort be ai:aj. worst-case lower bound of 7 compariosn sorts Decision Tree Model ai:aj A comparison made between element ai and aj Two possible outcomes of a comparison (ai < aj) or (ai > aj) For any comparison sort, it will perform a sequence of key comparisons to sort the keys. The exact sequence of key comparisons depends on the input keys, and the comparison sort. We abstract the sequence of key comparisons performed by the comparison sort in a decision tree. worst-case lower bound of 8 compariosn sorts Decision Tree Model : insertion sort For example, we trace the sequence of key comparisons of insertion sort on <a1, a2, a3> by a decision tree. Input: <a1,a2,a3> a1 position in the results a1:a2 a1 position in the results will always before a2 will always be after a2 < > <a1,a2,a3> <a2,a1,a3> a2:a3 a1:a3 According to the According to the operation of insertion operation of insertion sort, a1 will NOT swap sort, a1 will swap with a2. with a2. So in the next So in the next iteration, iteration, a2 compares a1 compares with a3. with a3. worst-case lower bound of 9 compariosn sorts Decision Tree Model : insertion sort Input: <a1,a2,a3> The output sorted order is <a1, a2, a3> <a2,a1,a3> <a2,a3,a1> The output sorted worst-case lower bound of order is <a3, a2, a1> 10 compariosn sorts Decision Tree Model : insertion sort The path from the root to a leaf node shows the sequence of key comparisons made to each Depending on the values of the one of the n! outcomes. input keys, the insertion sort performs different sequence of key comparisons. worst-case lower bound of 11 compariosn sorts Decision Tree Model In the insertion sort example, the decision tree reveals all possible key comparison sequences for 3 distinct numbers. There are exactly 3!=6 possible output sequences. Different comparison sorts should generate different decision trees. It should be clear that, in theory, we should be able to draw a decision tree for ANY comparison sort algorithm. worst-case lower bound of 12 compariosn sorts Decision Tree Model Given a particular input sequence, the path from root to the leaf path traces a particular key comparison sequence performed by that comparison sort. The length of that path represented the number of key comparisons performed by the sorting algorithm. When we come to a leaf, the sorting algorithm has determined the sorted order. Notice that a correct sorting algorithm should be able to sort EVERY possible output sorted order. Since, there are n! possible sorted order, there are n! leaves in the decision tree. worst-case lower bound of 13 compariosn sorts Decision Tree Model Given a decision tree, the height of the tree represent the longest length of a root to leaf path. It follows the height of the decision tree represents the largest number of key comparisons, which is the worst- case running time of the sorting algorithm. “Any comparison based sorting algorithm takes Ω(n logn) to sort a list of n distinct elements in the worst-case.” any comparison sort model by a decision tree worst-case running time the height of decision tree We are very close to what we want, but how to show the lower bound Ω(n logn) ? worst-case lower bound of 14 compariosn sorts Decision Tree Model “Any comparison based sorting algorithm takes Ω(n logn) to sort a list of n distinct elements in the worst-case.” We want to find a lower bound (Ω) of the height of a binary tree that has n! leaves. What is the minimum height of a binary tree that has n! leaves? The binary tree must be a complete tree (recall the definition of complete tree). Hence the minimum (lower bound) height is (log2(n!)). worst-case lower bound of 15 compariosn sorts Decision Tree Model log2(n!) = log2(n) + log2(n-1) + …+ log2(n/2)+…. ≥ n/2 log2(n/2) = n/2 log2(n) – n/2 So, log2(n!) = Ω(n logn). It follows the height of a binary tree which has n! leaves is at least Ω(n logn) worst-case running time is at least Ω(n logn) Putting everything together, we have “Any comparison based sorting algorithm takes Ω(n logn) to sort a list of n distinct elements in the worst-case.” worst-case lower bound of 16 compariosn sorts Sorting in linear time -- Radix sort Recall our long lost friend “Radix sort.” What is the running time of radix sort? It is (kn), where k represents the number of digitals in the keys. If we assume k is a constant, the running time of radix sort is (n). Wow, it is a linear time sorting algorithm! And it seems the running time analysis contradicts to our theorem. What is wrong? Radix sort is NOT a comparison sort. Hence, the previous theorem cannot apply to radix sort. worst-case lower bound of 17 compariosn sorts