Chapter 12
Sorting
CS 260 Data Structures
Indiana University – Purdue University Fort Wayne
Mark Temte
1
Note
We temporarily skip ahead to Section 12.3
Pages 624 – 634
2
Heapsort
Heapsort is a superior O( n log(n) ) method
Assume the array to sort is int[ ] a = new int[ n ]
Overview
First convert an unsorted array to a heap
Then, iteratively, remove the root element, rebuilding
the heap each time
The root element is always the largest remaining element
The elements, as removed, are in descending order
Notation: Define a[ i .. j ] to consist of the elements
a[ i ], a[ i+1 ], a[ i+2 ], ..., a[ j ]
3
Heapsort method details
public static void heapsort( int[ ] a, int n )
1. Note that a[ 0..0 ] is already a heap (only one element)
2. Turn a[ 0..(n-1) ] into a heap by successively adding . . .
a[ 1 ] to a[ 0..0 ]
a[ 2 ] to a[ 0..1 ]
a[ 3 ] to a[ 0..2 ]
•••
a[ n-1 ] to a[ 0..(n-2) ]
These steps could be written as a private helper method
called makeheap:
public static void makeHeap( int[ ] a, int n )
4
Heapsort method details
3. Iteratively remove the largest remaining element and
rebuild the heap
for( int i = n-1; i > 0; i-- ) {
Exchange a[ 0 ] with a[ i ]
Perform “reheapification downward” on a[ 0..(i-1) ]
}
Note: The largest remaining element is only removed
from the heap logically but remains physically in the array
(in the new position)
Note: Reheapification downward could be written as a
private helper method called reheapifyDown:
public static void reheapifyDown( int[ ] a, int n )
5
0 1 2 3 4
Heapsort example a 5 2 8 7 1
5 5 8 8 8
swap
2 2 5 7 5 7 5
swap
makeHeap 2 2 1
1 7 1 5
swap
swap
7 5 2 5 2 5 2 1
swap 7
2 8 1
0 1 2 3 4
1 2 1 1
swap a 1 2 5 7 8
2
2 5 1 final array
6
Analysis of heapsort
Since the heap is complete binary tree, it is
automatically balanced
The depth of tree is O( log(n) )
Worst case analysis of heapsort is O( n log(n) )
Steps 1 and 2 have O( n log(n) ) performance
Step 3 has O( n log(n) ) performance
The steps form a sequence
The worse case performance is also the best case
performance and the average case performance
7
Quadratic sorting algorithms
Quadratic sorting algorithms
Have inefficient worst case O( n2 ) performance
Are easy to implement
O( n2 ) performance doesn’t matter for small
arrays
8
Selection sort
public static void selectionsort( int[ ] data, int first, int n ) {
Find the largest element. Swap it with the last.
Find the next largest. Swap it with the next to last.
Etc.
}
The sort range is data[ first .. (first + n – 1) ]
A typical call is selectionsort( a, 0, n )
Here a is an array of n cells
Analysis
Best case = worst case = average case = O( n2 )
9
Insertion sort
public static void insertionsort( int[ ] data, int first, int n ) {
Consider data[ 0..0 ] already sorted
Insert data[1] into the proper position of data[ 0..1 ]
Insert data[2] into the proper position of data[ 0..2 ]
Etc.
}
Each insert operation places an additional
element into a portion of the array that has already
been sorted as follows:
sorted
0 1 2 3 4 5 6 7 8 9 10 11
a 3 7 11 19 31 10
10
10
Insertion sort
Analysis
Worst case = average case = O( n2 )
Best case = O( n )
The algorithms takes advantage of the situation
when the array is already sorted
This is a good method when . . .
a few updates need to be added from time to time so
that the array remains sorted
11
Recursive O( n log(n) ) methods
We will consider
Mergesort
Quicksort
12
Mergesort
public static void mergesort( int[ ] data, int first, int n ) {
Divide the array in half.
Recursively apply mergesort to each half.
Merge the sorted halves into a sorted temporary array
Copy the temporary array back to the original array
}
0 1 2 3 4 5 6 7 8 9
5 7 2 10 4 8 1 6 9 3
mergesort mergesort
2 4 5 7 10 1 3 6 8 9
1 2 3 4 5 6 7 8 9 10
13
Mergesort
Recursive stopping case
When a subarray to be sorted consists of only one
element
During the merge process, when one of the halves
becomes empty, simply copy the remainder of the
remaining half to the end of the temporary array
14
private static void merge( int[ ] data, int first, int n1, int n2) {
int[ ] temp = new int[ n1+n2 ]; // Allocate the temporary array
int copied = 0; // Number of elements copied from data to temp
int copied1 = 0; // Number copied from the first half of data
int copied2 = 0; // Number copied from the second half of data
int i; // Array index to copy from temp back into data
while ( ( copied1 1 ) {
// Compute sizes of the two halves
n1 = n / 2;
n2 = n - n1;
mergesort( data, first, n1 ); // Sort data[ first ] through data[ first+n1-1 ]
mergesort( data, first + n1, n2 ); // Sort data[ first+n1 ] to the end
// Merge the two sorted halves.
merge(data, first, n1, n2);
}
}
16
Mergesort analysis
The usual technique of determining the big-O of a
recursive methods does not work here
There are only half as many elements in the merge
phase within each successive recursive call
Instead look at the merge activity across an entire
level at a time
The big-O of merge across each level is O(n)
There are O( log(n) ) levels
Ignore the actual number of recursive calls
17
Mergesort analysis
log(n)
levels
n elements
Worst case = average case = best case = O( n log(n) )
18
Mergesort analysis
A disadvantage of mergesort when used with arrays is
that a second temporary array is needed
This effectively cuts the size of the largest array that can be
sorted in half
Advantages
Works with linked lists without need for a temporary array !
Can be used to sort data in a huge disk file
A file much too large to fit in memory
Subdivide the file into pieces small enough to fit in memory
Sort the pieces
Merge the pieces together
19
Quicksort
public static void quicksort( int[ ] data, int first, int n ) {
“Partition” the array in two parts such that
(all elements in left part) = each element of the left part
pivot (7)
24
Quicksort
Once elements in both parts are sorted, the entire
array is sorted
public static void quicksort( int[ ] data, int first, int n) {
int pivotIndex; // Array index for the pivot element
int n1; // Number of elements before the pivot element
int n2; // Number of elements after the pivot element
if ( n > 1 ) {
// Partition the array, and set the pivot index.
pivotIndex = partition( data, first, n );
// Compute the sizes of the two pieces.
n1 = pivotIndex - first;
n2 = n - n1 - 1;
// Recursive calls will now sort the two pieces.
quicksort( data, first, n1 );
quicksort( data, pivotIndex + 1, n2 );
}
}
25
Quicksort analysis
Best case = average case = O( n log(n) )
When pivot occurs near the center of each time
Number of levels = O( log(n) )
Number of probes within each level = O(n)
Worst case = O(n2)
When pivot occurs near an end most of the time
For example, when the array is already sorted
Number of levels is only limited by n
26
Quicksort
There is a better way to choose the pivot value
1. Choose the median of the three values . . .
data[ first ]
data[ first + n – 1 ]
data[ first + n/2 ]
2. Swap the chosen value with data[ first ]
3. Continue as before
This method is called the median of three
Statistically, this gives a much better pivot value
Performance is much more likely to be O(n log(n) )
Even when the data is already sorted
27
Other improvements
Both mergesort and quicksort encounter more and
more overhead due to recursion when the
subarrays get small
Both can be improved as follows
When a subarray represents less than some number M
of elements, use the insertion sort method on the
subarray instead of making a recursive call
A typical value for M might be around 100
28