Heap Sort - Welcome to Shaw Webspace_
Document Sample


Heap Sort
Motivation: Many of the sorting algorithms that you know
(bubble sort, insertion sort, even quicksort) can take O(N2)
time in the worst case. In this section we examine an
algorithm that guarantees O( N log N ) sort time.
We are already familiar with a binary heap. In this section
we will use a minimum binary heap in order to sort an array
of data in descending order. Similarly, a maximum binary
heap can be used to sort data in ascending order.
Recall that a minimum binary heap is a complete binary tree
such that the data in each node is less than or equal to the
data in each of its child nodes (if these child nodes exist).
Also recall that because a binary heap is a complete binary
tree, it can conveniently be represented using an array.
CPSC 252 Heap Sort Page 1
Example:
3
7 5 3 7 5 9 8 12
9 8 12
Recall that given the index, j, of a node in the array, it is a
simple matter to determine the index of the left and right
child nodes and of the parent node:
left child = 2 * j + 1; parent = ( j – 1 ) / 2
right child = 2 * j + 2;
CPSC 252 Heap Sort Page 2
The heapsort algorithm consists of two phases:
- build a heap from an arbitrary array
- use the heap to sort the data
Building a heap from an arbitrary array
It is easier to picture this process if we represent the heap
using a binary tree rather than an array.
Algorithm:
let index be the index of the last parent node in the tree
while index is greater than or equal to zero
perform a reheap-down operation starting with the node
at index
decrement index
end while
CPSC 252 Heap Sort Page 3
Example: Convert the following array to a heap
8 9 7 3 2 5 0 1
Picture the array as a complete binary tree:
8
9 7
3 2 5 0
1
For an array with N elements, the index of the last parent
node in the tree is:
CPSC 252 Heap Sort Page 4
8 8
9 7 9
2 5 0
2
8
CPSC 252 Heap Sort Page 5
Having built the heap, we now sort the array
0
0 1 5 3 2 8 7 9
1 5
Note: in this section we
8 7 will represent the data in
3 2
both binary tree and array
formats – it is important to
9 understand that in practice
the data is stored only as
Algorithm: an array.
let swapIndex = N - 1
while swapIndex is greater than 0
swap data at position swapIndex with data at position 0
reheap down between positions 0 and swapIndex – 1
end while
CPSC 252 Heap Sort Page 6
1 5 3 2 8 7
1 5
3 2 8 7
5 8 7 0
5
8 7
0
CPSC 252 Heap Sort Page 7
2 5 3 9 8 0
2 5
3 9 8
0
5 8 1 0
5
8 1
0
And so the process continues until the entire array is sorted.
CPSC 252 Heap Sort Page 8
Implementation:
template< typename type >
void sort( type* data, int size )
//Pre: the capacity of the array pointed to by data
//is at least size
//Post: the first size elements of data have been
//sorted in descending order
{
int swpIndx;
buildHeap( data, size );
for( swpIndx = size – 1; swpIndx > 0; swpIndx-- )
{
swap( data[ 0 ], data[ swpIndx ] );
reheapDown( data, 0, swpIndx );
}
}
CPSC 252 Heap Sort Page 9
template< typename type >
void buildHeap( type* data, int size )
//Pre: data points to an array of data of capacity at
//least size elements
//Post: the first size elements of data are a heap
{
int index;
for( index = (size – 2) / 2; index >= 0; index-- )
reheapDown( data, index, size );
}
CPSC 252 Heap Sort Page 10
template< typename type >
void reheapDown( type* data, int top, int size )
// Pre: data between index top + 1 and index size – 1
// is a heap
// Post: data between index top and index size – 1 is
// a heap
{
int leftChild = 2 * top + 1;
int rightChild = 2 * top + 2;
int minChild;
if( leftChild < size )
{
// find index of smallest child
if( rightChild >= size ||
data[ leftChild ] < data[ rightChild ] )
minChild = leftChild;
else
minChild = rightChild;
CPSC 252 Heap Sort Page 11
// if data at top is greater than smallest
// child then swap and continue
if( data[ top ] > data[ minChild ] )
{
swap( data[ top ], data[ minChild ] );
reheapDown( data, minChild, size );
}
}
}
Note: this function is tail-recursive and so can easily be
replaced with an iterative version having O(1) space
requirements.
CPSC 252 Heap Sort Page 12
Time complexity of Heapsort
We need to determine the time complexity of the build heap
step and the time complexity of the subsequent sorting step.
The time complexity of the sorting operation once the heap
has been built is fairly easy to determine. For each element
in the heap, we perform a single swap and a reheapDown. If
there are N elements in the heap, the reheapDown operation is
O( log N ) and hence the sorting operation is O( N log N ).
We must now consider the cost of building the heap.
Surprisingly it is an O(N) operation! If we examine the
buildHeap function we see that the reheapDown function is
called O(N) times but we have to realize that the reheapDown
operation does not always start at the top of the heap and
that it is not called at all on any of the leaf nodes (which
account for roughly half the nodes in the tree!)
CPSC 252 Heap Sort Page 13
We can determine the time complexity of the buildHeap
function by looking at the total number of times the
comparison and swap operations occur while building the
heap. We consider the worst case when the last level in the
heap is full.
We will colour all the paths from each node, starting with the
lowest parent and working up to the root, to a leaf node.
The number of edges on the path from each node to a leaf
node represents the maximum number of comparison and
swap operations that will occur while applying the reheapDown
operation to that node. By summing the total length of these
paths, we will determine the time complexity of the
buildHeap function.
CPSC 252 Heap Sort Page 14
No edge is coloured more than once. Hence the work done
by buildHeap can be measured in terms of the number of
coloured edges. If H is the height of the tree, then:
total number of coloured edges =
CPSC 252 Heap Sort Page 15
Hence in the worst case the overall time complexity of the
heapsort algorithm is:
max[ O(N), O( N log N ) ] = O( N log N )
CPSC 252 Heap Sort Page 16
Get documents about "