# Sorting

Document Sample

```					                       Sorting
• Three main types so far:
– Bubble sort
– Selection sort
– Insertion sort

• Computing costs in terms of:
– Number of comparisons required
– Mildly concerned about number of swaps

• Best so far: insertion sort
– On each iteration has the possibility of stopping early
– Leads to improved best and average case over worst
case
Improving Sorts
• Better sorting algorithms rely on divide and conquer
(recursion)
– In essence, can think of it as trying to reduce the number of
comparisons we must make for a number against other numbers

• New algorithms will consist of three major steps:
– Find an efficient technique for splitting data
– Sort the splits separately
– Find an efficient technique for merging the data

• Process will be done recursively: Continually work with
smaller and smaller groups

• We’ll see two examples
– One does most of its work splitting
– One does most of its work merging
Quicksort
General Quicksort Algorithm:
– Select an element from the array to be the pivot
– Rearrange the elements of the array into a left and
right subarray
• All values in the left subarray are < pivot
• All values in the right subarray are > pivot
– Independently sort the two subarrays
– No merging required, as left and right are
independent problems
Quicksort
void quicksort(int* arrayOfInts, int
first, int last)
{
int pivot;
if (first < last)                        // while I have still have data
{
pivot = partition(arrayOfInts,       // find a pivot
first, last);
// quicksort the left
quicksort(arrayOfInts,first,pivot
-1);

quicksort(arrayOfInts,pivot+1,las     // quciksort the right
t);
}                                      // leave pivot point alone!
}

Deceptively simple?
Does work splitting – let’s look at
partition function!
Quicksort
int partition(int* arrayOfInts, int first, int last)
{
int temp;
int p = first;                                 // set pivot = first index
for (int k = first+1; k <= last; k++) {        // for every other index
if (arrayOfInts[k] <= arrayOfInts[first]) { // if data is smaller than pivot
p = p + 1;                                 // update final pivot location
temp = arrayOfInts[k];                     // rearrange data
arrayOfInts[k] = arrayOfInts[p];
arrayOfInts[p] = temp; } }
temp = arrayOfInts[p];                         // put pivot in right spot
arrayOfInts[p] = arrayOfInts[first];
arrayOfInts[first] = temp;
return p;
}
Partition Step Through

partition(cards, 0, 4)
P=0K=1                                        P=1K=3
cards[1] < cards[0] ? No                 cards[3] < cards[0]? Yes
P=2
P=0K=2                                             temp = cards[3]
cards[2] < cards[0] ? Yes                     cards[3] = cards[2]
P=1                                           cards[2] = cards[3]
temp = cards[2]                          P=2K=4
cards[2] = cards[1]                           cards[4] < cards[0]? No
cards[1] = temp
After this partition call, repeat quicksort   temp = cards[2], cards[2] = cards[first]
on subarrays (0-1), (3-4)                     cards[first] = temp, return p = 2;
Partition Algorithm
• Book (pg 495) has another version of
partition which is probably a little faster in
practice (not asymptotically)!
Complexity of Quicksort
• Note that for any recursive call, I essentially have
to look at every entry in each sub-array.
• Let’s assume for now that we always choose the
perfect middle pivot         n 1
–   Our 1 problems turns into 2 2 subproblems
–   Each of those turns into 2 smaller subproblems
–   Each of those turns into 2 smaller subproblems…
–   Once we get down to one element, we:
• Can’t decompose any further
• We are sorted!
– Let’s draw out a tree of the problems being solved
Complexity of Quicksort
Recursion Tree of Calls
Complexity of Binary Search
Actually, this is almost exactly the same
process as binary search, except we are
doing more in each recursive calls

What is the tree for binary search?
Un-idealistic Quicksearch
• Now, what if we weren’t able to split in half
each time?

– What would the arrays look like after a bad
pivot calls
– What type of data leads us to this setup?
– What would the recursion tree look like?
Complexity of Quicksort
Recursion Tree of Calls
Complexity of Quicksort
• Worst case is O(n2)
• On average is O(n log2n), and its average
a lot of the time.
Improving Quicksort
• Worst case is already sorted and reverse sorted!
• For our other sort, that had different cases, how did already sorted
and reverse sorted fare?

• How might we improve quicksort?

• Could sweep once through
• Could improve partitioning mechanism
– How might you come up with a simple way of ensuring likely that you
will half the arrays?
– If data sorted?
– If reverse sorted
– If data not sorted?

Choose the median of first, middle, last elements.
Recursion Time Efficiency:
Recurrence Relations
• Want to build a function f(n) which
describes the cost of performing the
recursion
• Needs to support three components:
– Amount of work needed for current iteration
• Cost of basic function requirements
– Number of subproblems that have to be
solved
– Size (in terms of input) of subproblems to be
solved
• How much smaller is each of the subproblems
Relating recurrence relations
to trees
– Number of subproblems that have to be
solved
• Number of branches on each level

– Amount of work needed for current iteration
– Size (in terms of input) of subproblems to be
solved
• Cost of sub-problems
• How many levels appear in the tree
Recursive Binary Search
• Recursive Binary Search Requirements:
– Amount of work needed for current iteration
• 1 Comparison (inputValue versus middle index)
• Essentially constant amount of work, not
dependent on size of array being searched
– Number of subproblems that have to be
solved
• 1 Subproblem (the left OR right side of the array)
– Size (in terms of input) of subproblems to be
solved
• Subproblem is half the size of the original problem
Quicksort
• Quicksort Requirements:
– Amount of work needed for current iteration
• Essentially, a comparison for every element in the
current array (actually n-1 but close enough)
– Number of sub-problems that have to be
solved
• 2 sub-problems (the left AND right sides of the
array)
– Size (in terms of input) of sub-problems to be
solved
• Sub-problem is half the size of the original problem
Recurrence Relation
• General Recurrence Relation:

T(n) = aT(n/b) + cnk

a = number of sub problems
b = 1/size of subproblems
f(n) = current iteration work =
constant * nk
Binary Search
Recurrence Relation
• Binary Search:
T(n) = 1T(n/2) + n0

Now, is this anything like we’ve seen before?
Shouldn’t we be able to get log n out of this?

We need a closed form version of this function.
The way we’ll see to do this is through
‘expansion’ and ‘backwards substitution’.
Actually, this is only going to work for a subset (but
large) set of problems we are interested in!
Binary Search
Recurrence Relation
• First we need a base case:
T(1) = 1 (one operation to handle searching one
element array)
• And our relation:
T(n) = 1T(n/2) + cn0 = T(n/2) + c

Now, what is T(N/2)?
T(n/4) + c
Let’s substitute that back into our original equation
T(n) = (T(n/4) + c) + c
Binary Search
Recurrence Relation
T(n) = (T(n/4) + c) + c
T(n/4) = T(n/8) + c
T(n) = (T(n/8) + c) + c) + c
T(n/8) = T(n/16) + c
T(n) = (T(n/16)+c) +c) +c) +c
…
Find a pattern
T(n) = T(n/2i) + ic
Assume our data sizes are size 2k
T(2k) = T(2k/2i)+ic
Binary Search
Recurrence Relation
T(2k) = T(2k/2i)+ic
What i do we need to get to our base case:
T(1) = 1
Let i = k
T(2k) = T(2k/2k)+kc
T(2k) = T(1) + kc
T(2k) = 1+kc
Binary Search
Recurrence Relation
Translate 2k back to n
T(2k) = 1+kc
T(n) = 1 + c*log2n
Quicksort Recurrence Relation
Base case: T(1) = 0
Recurrence: T(n) = 2T(n/2) + n

Are these the right entries for the recurrence
relation?
Quicksort Recurrence Relation
T(n) = 2T(n/2) + n
T(n/2) = 2T(n/4) + n/2
T(n) = 2(2T(n/4) + n/2) + n
T(n/4) = 2T(n/8) + n/4
T(n) = 2(2(2T(n/8) + n/4) + n/2) + n

= 8T(n/8) + 3n
= 2iT(n/2i) + in
Quicksort Recurrence Relation
T(n) = 2iT(n/2i) + in
T(2k) = 2iT(2k/2i) + i2k
Let i = k to get to base case of T(1) = 0
T(2k) = 2kT(2k/2k) + k2k
T(2k) = 2kT(1) + k2k
T(2k) = 0 + k2k
T(2k) = k2k
Translate 2K back to n
T(n) = n*log2n
Complexity of Quicksort
• Requires stack space to implement
recursion
• Worst case:
– If pivot breaks into 1 element and n-1 element
subarrays
– O(n) stack space
• Average case
– Pivot splits evenly
– O(log n)
MergeSort
• General Mergesort Algorithm:
– Recursively split subarrays in half
– Merge sorted subarrays

– Splitting is first in recursive call, so continues until
have one item subarrays
• One item subarrays are by definition sorted

– Merge recombines subarrays so result is sorted
• 1+1 item subarrays => 2 item subarrays
• 2+2 item subarrays => 4 item subarrays
• Use fact that subarrays are sorted to simplify merge
algorithm
Another T(n) Example
T(n) = 4 T(n/2) + cn
T(1) = 1
Another T(n) Example
T(n) = 4 T(n/2) + cn
T(n/2) = 4T(n/4) + cn/2
T(n) = 4(4T(n/4) + cn/2) + cn
T(n/4) = 4T(n/8) + cn/4
T(n) = 4(4(4T(n/8) + cn/4) + cn/2) + cn

The pattern is:
T(n) = 4iT(n/2i) + (4cn+2cn+cn)
T(n) = 4iT(n/2i) + (2i-1)cn
Another T(n) Example
T(n) = 4iT(n/2i) + (2i-1)cn

Let n = 2k
T(2k) = 4iT(2k/2i)+(2i-1)c2k
Let i = k to get to T(1)
T(2k) = 4kT(2k/2k)+(2k-1)c2k
T(2k) = 4kT(1)+(22k-2k)c
T(2k) = 4k+(22k-2k)c
4k = (22)k =22k = (2k)2
T(2k)=(2k)2+((2k)2-2k)c
Change 2k back to n
T(n) = n2 + (n2-n)*c
T(n) = (c+1)n2 – cn                   O(n2)
T(n) Verify
Let c = 1
T(1) = 1
T(2) = 4T(1) + 2 = 4+2 = 6
T(4) = 4T(2) + 4 = 4*6+4 = 28
T(8) = 4T(4) + 8 = 4*28 + 8 = 120

T(n) = 2n2-n
T(1) = 2*1-1 = 2-1 = 1
T(2) = 2*4-2 = 8-2 = 6
T(4) = 2*16 – 4 = 32-4 = 28
T(8) = 2*64 – 8 = 128-8 = 120
MergeSort
void mergesort(int* array, int* tempArray, int low, int high,
int size)
{
if (low < high)
{
int middle = (low + high) / 2;
mergesort(array,tempArray,low,middle, size);
mergesort(array,tempArray,middle+1, high, size);
merge(array,tempArray,low,middle,high, size);
}
}
MergeSort
void merge(int* array, int* tempArray, int low, int middle, int high, int size)
{
int i, j, k;

for (i = low; i <= high; i++) { tempArray[i] = array[i]; } // copy into temp array
i = low; j = middle+1; k = low;

while ((i <= middle) && (j <= high)) {          // merging starts here
if (tempArray[i] <= tempArray[j])            // if lhs item is smaller
array[k++] = tempArray[i++];                 // put in final array, increment
else                                         // final array position, lhs index
array[k++] = tempArray[j++];                 // else put rhs item in final array
}                                              // increment final array position
// rhs index
while (i <= middle)                             // one of the two will run out
array[k++] = tempArray[i++];                   // copy the rest of the data
}                                                   // only need to copy if in lhs array
// rhs array already in right place
MergeSort Example

Recursively Split
MergeSort Example

Recursively Split
MergeSort Example

Merge
Merge Sort Example
2 cards
Not very interesting
Think of as swap

Temp Array                               Array

Temp[i]
< Temp[j]

Yes

j                     k
i
MergeSort Example
Temp Array                                               Array

Temp[i]
< Temp[j]

No

i               j                                      k

Update J, K by 1 => Hit Limit of Internal While Loop, as J > High Now
Copy until I > Middle
Note:
If RHS we
Array                                                          were copying
at this stage,
in place
k
MergeSort Example
2 Card
Swap

Final after
merging
above sets
Time Complexity
• Merge Sort Tree
– Cost per level? What is largest number of
comparisons I might have to do in merging?
Time Complexity
• Merge Sort Recurrence Relation:
T(n) = 2T(n/2) + cn1

Have we seen this before?
Quicksort Recurrence Relation
Base case: T(1) = 0
Recurrence: T(n) = 2T(n/2) + cn1

Are these the right entries for the recurrence
relation?
Space Complexity of Mergesort
• Need an additional O(n) temporary array
• Number of recursive calls:
– Always O(log2n)
• When it is more useful to:
– Just search
– Quicksort or Mergesort and search

– Assume Z searches
Search on random data: Zn
Fast Sort and binary search:
nlog2n + Z *log2n
Z * n <= nlog2n + Zlog2n
Z(n - log2n) <= n log2n
Z <= (n log2n) / (n-log2n)

Z <= (n log2n) / n [Approximation]
Z <= log2n [Approximation]

Where as before, had to do N searches to make up
for cost of sorting, now only do log2N
1,000,000 items = 19 searches, instead of 1,000,000
How Fast?
• Without specific details of what sorting, O(n
log2n) is the maximum speed sort possible.
– Only available operations: Compare, Swap

• Proof: Decision Tree – describes how sort
operates
– Every vertex represents a comparison, every
branch a result
– Moving down tree – Tracing a possible run
through the algorithm
How Fast?
K1 <= K2               [1,2,3]

Yes                     No

[1,2,3]   K2 <= K3                                                 K1 <= K3            [2,1,3]

Yes                                                      Yes                      No
No
[1,2,3]                                                                                                     [2,3,1]
[2,1,3]
stop                    K2 <= K3
stop                 K1 <= K3       [1,3,2]
Yes                No                                           Yes
No

stop              stop
stop                stop

[1,3,2]          [3,1,2]                                [2,3,1]                [3,2,1]
How fast for sorting?
• Number of leaves – number of possible
outcomes
• In sorting, how many permutations of an
array, any of which could be the ‘sorted’
version?
– n!
• Tree height?
– log2(n!)
log2(n!) == O(n*log2n)
• Derivation:
Recurrence: Master Theorem
T(n) = aT(n/b) + f (n)   where f (n)=cnk

1. a < bk         T(n) ~ nk
2. a = bk         T(n) ~ nk logbn
3. a > bk         T(n) ~ nlog b a
Recursive Binary Search
T(n) = aT(n/b) + cnk
a = number of sub problems = 1
b = 1/size of subproblems = 1/(1/2) => 2
f(n) = current iteration work = 2n0 so k = 0

Compare a to bk:        1 vs 2^0 = 1 vs 1
If they are equal, computational cost is:
nk logbn = 1 * log2n       =>          log2n
Complexity of Quicksort
Recurrence Relation: [Best Case]
2 sub problems
½ size (if good pivot)
Partition is O(n) = cn1

a=2b=2k=1
2 == 21 Master Theorem: O(n*log2n)
Complexity of Quicksort
Recurrence Relation: [Worst Case]
– Partition separates into (n-1) and (1)
– Can’t use master theorem:
b (subproblem size) changes
n-1/n n-2/n-1 n-3/n-2
Our Start of Class Example
T(n) = 4T(n/2) + cn1

a=4             b=2          k=1

4 vs 21         4 > 21

M.T. says: T(n) ~ nlog b a
T(n) ~ O(nlog24) = O(n2)
• We can break the O(n log2n) barrier if we
can do more than element comparisons
and swaps
– Data specific sorting instead of generalized
Also called bin sort:
Repeatedly shuffle data into small bins
Collect data from bins into new deck
Repeat until sorted

Appropriate method of shuffling and collecting?
For integers, key is to shuffle data into bins on a
per digit basis, starting with the rightmost (ones
digit)
Collect in order, from bin 0 to bin 9, and left to
right within a bin
Data: 459 254 472 534 649 239 432 654 477
Bin 0
Bin 1
Bin 2   472 432
Bin 3
Bin 4   254 534 654
Bin 5
Bin 6
Bin 7   477
Bin 8
Bin 9   459 649 239
Data: 472 432 254 534 654 477 459 649 239
Bin 0
Bin 1
Bin 2
Bin 3 432 534 239
Bin 4 649
Bin 5 254 654 459
Bin 6
Bin 7 472 477
Bin 8
Bin 9
Data: 432 534 239 649 254 654 459 472 477
Bin 0
Bin 1
Bin 2 239 254
Bin 3
Bin 4 432 459 472 477
Bin 5 534
Bin 6 649 654
Bin 7
Bin 8
Bin 9
Final Sorted Data: 239 254 432 459 472 477 534
649 654
Begin with current digit as one’s digit
While there is still a digit on which to classify
{
For each number in the master list,
Add that number to the appropriate sublist keyed on
the current digit

For each sublist from 0 to 9
For each number in the sublist
Remove the number from the sublist and
append to a new master list

Advance the current digit one place to the left.
}
• For each digit in the numbers, you have to distribute all n
values in array into k bins, then collect from the k bins
back into an array, usually n >> k
• #operations ~ max digits * (n + k) ~ (max digits * n)
– For distribute, have to be able to access an individual
component of value
• Integers: %
• Strings: charAt[i]
• space ~ (one approach) initial array, buckets for each
possible component value (10 for integers), each bucket
sized to possibly hold all values in array
– So all arrays sized right OR
Stable Sorts
Stable sort:
A sort that preserves the input ordering of
elements that are equal

Why important?
Maintenance of pre-specified ordering /
properties
An example: Preservation of a previous
sort on another key
Stable Sorts
• Full deck card sorting
– Sort cards by type
– Clubs < Diamonds < Hearts < Spades
– Sort next on value
–2<3<4<5<…
Gives all 2s together, all 3s together, …
with 2C followed by 2D followed by 2H
followed by 2S
– Stable value sort preserve type
Stable Sorts
• Which sorts have we seen that are stable?
– Insertion Sort
• Picking up in order from “extras” pile
• Only move past if you are greater than item in front of you
– Mergesort
• No swapping
• Fill in result array from temp array left to right
– Merging – Two equal things – one on the left stays on the left
• Unstable:
– Selection Sort                       - Why?
– Quick Sort                           - Why?
Selection Sort – Unstable
void selectionSort(int* a, int size)
{
for (int k = 0; k < size-1; k++)
{
int index = mininumIndex(a, k, size);
swap(a[k],a[index]);
}                                            Unstable because swapping may
}                                                rearrange order of original items
(Sort doesn’t care what it picks up
int minimumIndex(int* a, int first, int last) and moves)
{
int minIndex = first;
for (int j = first + 1; j < last; j++)
{ if (a[j] < a[minIndex]) minIndex = j; }
return minIndex;
}
Selection Sort - Unstable

Do Swap to Put 2 of Spades in Place for Value Sort – Ruins 3’s Type Sort
Quicksort
int partition(int* arrayOfInts, int first, int last)
{
int temp;
int p = first;                                 // set pivot = first index
for (int k = first+1; k <= last; k++) {        // for every other index
if (arrayOfInts[k] <= arrayOfInts[first]) { // if data on is smaller
p = p + 1;                                 // update final pivot location
temp = arrayOfInts[k];                     // rearrange data
arrayOfInts[k] = arrayOfInts[p];
arrayOfInts[p] = temp; } }
temp = arrayOfInts[p];                         // put pivot in right spot
arrayOfInts[p] = arrayOfInts[first];
arrayOfInts[first] = temp;
return p;                           Possibility to move items out of order
}
Stability is not guaranteed with
5C 2C 3C 6C 9C 6D 3D
(swapping 6C, 3D in partition can effect 6D, 6C ordering)

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 12 posted: 8/29/2012 language: English pages: 69