Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Sorting

VIEWS: 12 PAGES: 69

									                       Sorting
• Three main types so far:
   – Bubble sort
   – Selection sort
   – Insertion sort

• Computing costs in terms of:
   – Number of comparisons required
   – Mildly concerned about number of swaps

• Best so far: insertion sort
   – On each iteration has the possibility of stopping early
   – Leads to improved best and average case over worst
     case
                   Improving Sorts
• Better sorting algorithms rely on divide and conquer
  (recursion)
   – In essence, can think of it as trying to reduce the number of
     comparisons we must make for a number against other numbers

• New algorithms will consist of three major steps:
   – Find an efficient technique for splitting data
   – Sort the splits separately
   – Find an efficient technique for merging the data

• Process will be done recursively: Continually work with
  smaller and smaller groups

• We’ll see two examples
   – One does most of its work splitting
   – One does most of its work merging
                     Quicksort
General Quicksort Algorithm:
  – Select an element from the array to be the pivot
  – Rearrange the elements of the array into a left and
    right subarray
    • All values in the left subarray are < pivot
    • All values in the right subarray are > pivot
  – Independently sort the two subarrays
  – No merging required, as left and right are
    independent problems
                              Quicksort
void quicksort(int* arrayOfInts, int
     first, int last)
{
  int pivot;
  if (first < last)                        // while I have still have data
    {
      pivot = partition(arrayOfInts,       // find a pivot
     first, last);
                                           // quicksort the left
    quicksort(arrayOfInts,first,pivot
    -1);

     quicksort(arrayOfInts,pivot+1,las     // quciksort the right
     t);
    }                                      // leave pivot point alone!
}

          Deceptively simple?
     Does work splitting – let’s look at
     partition function!
                            Quicksort
int partition(int* arrayOfInts, int first, int last)
{
  int temp;
  int p = first;                                 // set pivot = first index
  for (int k = first+1; k <= last; k++) {        // for every other index
     if (arrayOfInts[k] <= arrayOfInts[first]) { // if data is smaller than pivot
      p = p + 1;                                 // update final pivot location
      temp = arrayOfInts[k];                     // rearrange data
      arrayOfInts[k] = arrayOfInts[p];
      arrayOfInts[p] = temp; } }
  temp = arrayOfInts[p];                         // put pivot in right spot
  arrayOfInts[p] = arrayOfInts[first];
  arrayOfInts[first] = temp;
  return p;
}
                  Partition Step Through




partition(cards, 0, 4)
P=0K=1                                        P=1K=3
          cards[1] < cards[0] ? No                 cards[3] < cards[0]? Yes
                                                   P=2
P=0K=2                                             temp = cards[3]
     cards[2] < cards[0] ? Yes                     cards[3] = cards[2]
     P=1                                           cards[2] = cards[3]
     temp = cards[2]                          P=2K=4
     cards[2] = cards[1]                           cards[4] < cards[0]? No
     cards[1] = temp
After this partition call, repeat quicksort   temp = cards[2], cards[2] = cards[first]
on subarrays (0-1), (3-4)                     cards[first] = temp, return p = 2;
          Partition Algorithm
• Book (pg 495) has another version of
  partition which is probably a little faster in
  practice (not asymptotically)!
         Complexity of Quicksort
• Note that for any recursive call, I essentially have
  to look at every entry in each sub-array.
• Let’s assume for now that we always choose the
  perfect middle pivot         n 1
   –   Our 1 problems turns into 2 2 subproblems
   –   Each of those turns into 2 smaller subproblems
   –   Each of those turns into 2 smaller subproblems…
   –   Once we get down to one element, we:
        • Can’t decompose any further
        • We are sorted!
   – Let’s draw out a tree of the problems being solved
Complexity of Quicksort
Recursion Tree of Calls
Complexity of Binary Search
Actually, this is almost exactly the same
process as binary search, except we are
doing more in each recursive calls

 What is the tree for binary search?
     Un-idealistic Quicksearch
• Now, what if we weren’t able to split in half
  each time?

  – What would the arrays look like after a bad
    pivot calls
  – What type of data leads us to this setup?
  – What would the recursion tree look like?
Complexity of Quicksort
Recursion Tree of Calls
     Complexity of Quicksort
• Worst case is O(n2)
• On average is O(n log2n), and its average
  a lot of the time.
              Improving Quicksort
• Worst case is already sorted and reverse sorted!
• For our other sort, that had different cases, how did already sorted
  and reverse sorted fare?

• How might we improve quicksort?

• Could sweep once through
• Could improve partitioning mechanism
    – How might you come up with a simple way of ensuring likely that you
      will half the arrays?
    – If data sorted?
    – If reverse sorted
    – If data not sorted?

    Choose the median of first, middle, last elements.
     Recursion Time Efficiency:
       Recurrence Relations
• Want to build a function f(n) which
  describes the cost of performing the
  recursion
• Needs to support three components:
  – Amount of work needed for current iteration
     • Cost of basic function requirements
  – Number of subproblems that have to be
    solved
  – Size (in terms of input) of subproblems to be
    solved
     • How much smaller is each of the subproblems
  Relating recurrence relations
             to trees
– Number of subproblems that have to be
  solved
  • Number of branches on each level


– Amount of work needed for current iteration
– Size (in terms of input) of subproblems to be
  solved
  • Cost of sub-problems
  • How many levels appear in the tree
      Recursive Binary Search
• Recursive Binary Search Requirements:
  – Amount of work needed for current iteration
    • 1 Comparison (inputValue versus middle index)
    • Essentially constant amount of work, not
      dependent on size of array being searched
  – Number of subproblems that have to be
    solved
    • 1 Subproblem (the left OR right side of the array)
  – Size (in terms of input) of subproblems to be
    solved
    • Subproblem is half the size of the original problem
                   Quicksort
• Quicksort Requirements:
  – Amount of work needed for current iteration
    • Essentially, a comparison for every element in the
      current array (actually n-1 but close enough)
  – Number of sub-problems that have to be
    solved
    • 2 sub-problems (the left AND right sides of the
      array)
  – Size (in terms of input) of sub-problems to be
    solved
    • Sub-problem is half the size of the original problem
       Recurrence Relation
• General Recurrence Relation:

         T(n) = aT(n/b) + cnk

         a = number of sub problems
         b = 1/size of subproblems
         f(n) = current iteration work =
               constant * nk
             Binary Search
           Recurrence Relation
• Binary Search:
  T(n) = 1T(n/2) + n0

 Now, is this anything like we’ve seen before?
 Shouldn’t we be able to get log n out of this?

  We need a closed form version of this function.
  The way we’ll see to do this is through
  ‘expansion’ and ‘backwards substitution’.
Actually, this is only going to work for a subset (but
  large) set of problems we are interested in!
             Binary Search
           Recurrence Relation
• First we need a base case:
  T(1) = 1 (one operation to handle searching one
  element array)
• And our relation:
  T(n) = 1T(n/2) + cn0 = T(n/2) + c

Now, what is T(N/2)?
T(n/4) + c
Let’s substitute that back into our original equation
T(n) = (T(n/4) + c) + c
            Binary Search
          Recurrence Relation
T(n) = (T(n/4) + c) + c
  T(n/4) = T(n/8) + c
T(n) = (T(n/8) + c) + c) + c
  T(n/8) = T(n/16) + c
T(n) = (T(n/16)+c) +c) +c) +c
…
Find a pattern
T(n) = T(n/2i) + ic
Assume our data sizes are size 2k
T(2k) = T(2k/2i)+ic
           Binary Search
         Recurrence Relation
T(2k) = T(2k/2i)+ic
What i do we need to get to our base case:
  T(1) = 1
Let i = k
T(2k) = T(2k/2k)+kc
T(2k) = T(1) + kc
T(2k) = 1+kc
           Binary Search
         Recurrence Relation
Translate 2k back to n
T(2k) = 1+kc
T(n) = 1 + c*log2n
 Quicksort Recurrence Relation
Base case: T(1) = 0
Recurrence: T(n) = 2T(n/2) + n

Are these the right entries for the recurrence
  relation?
 Quicksort Recurrence Relation
T(n) = 2T(n/2) + n
  T(n/2) = 2T(n/4) + n/2
T(n) = 2(2T(n/4) + n/2) + n
  T(n/4) = 2T(n/8) + n/4
T(n) = 2(2(2T(n/8) + n/4) + n/2) + n

= 8T(n/8) + 3n
= 2iT(n/2i) + in
 Quicksort Recurrence Relation
T(n) = 2iT(n/2i) + in
T(2k) = 2iT(2k/2i) + i2k
Let i = k to get to base case of T(1) = 0
T(2k) = 2kT(2k/2k) + k2k
T(2k) = 2kT(1) + k2k
T(2k) = 0 + k2k
T(2k) = k2k
Translate 2K back to n
T(n) = n*log2n
      Complexity of Quicksort
• Requires stack space to implement
  recursion
• Worst case:
  – If pivot breaks into 1 element and n-1 element
    subarrays
  – O(n) stack space
• Average case
  – Pivot splits evenly
  – O(log n)
                     MergeSort
• General Mergesort Algorithm:
  – Recursively split subarrays in half
  – Merge sorted subarrays

  – Splitting is first in recursive call, so continues until
    have one item subarrays
     • One item subarrays are by definition sorted

  – Merge recombines subarrays so result is sorted
     • 1+1 item subarrays => 2 item subarrays
     • 2+2 item subarrays => 4 item subarrays
     • Use fact that subarrays are sorted to simplify merge
       algorithm
       Another T(n) Example
T(n) = 4 T(n/2) + cn
T(1) = 1
        Another T(n) Example
T(n) = 4 T(n/2) + cn
  T(n/2) = 4T(n/4) + cn/2
T(n) = 4(4T(n/4) + cn/2) + cn
  T(n/4) = 4T(n/8) + cn/4
T(n) = 4(4(4T(n/8) + cn/4) + cn/2) + cn

The pattern is:
T(n) = 4iT(n/2i) + (4cn+2cn+cn)
T(n) = 4iT(n/2i) + (2i-1)cn
              Another T(n) Example
T(n) = 4iT(n/2i) + (2i-1)cn

Let n = 2k
T(2k) = 4iT(2k/2i)+(2i-1)c2k
Let i = k to get to T(1)
T(2k) = 4kT(2k/2k)+(2k-1)c2k
T(2k) = 4kT(1)+(22k-2k)c
T(2k) = 4k+(22k-2k)c
              4k = (22)k =22k = (2k)2
T(2k)=(2k)2+((2k)2-2k)c
Change 2k back to n
T(n) = n2 + (n2-n)*c
T(n) = (c+1)n2 – cn                   O(n2)
                   T(n) Verify
Let c = 1
T(1) = 1
T(2) = 4T(1) + 2 = 4+2 = 6
T(4) = 4T(2) + 4 = 4*6+4 = 28
T(8) = 4T(4) + 8 = 4*28 + 8 = 120

T(n) = 2n2-n
T(1) = 2*1-1 = 2-1 = 1
T(2) = 2*4-2 = 8-2 = 6
T(4) = 2*16 – 4 = 32-4 = 28
T(8) = 2*64 – 8 = 128-8 = 120
                     MergeSort
void mergesort(int* array, int* tempArray, int low, int high,
  int size)
{
  if (low < high)
  {
    int middle = (low + high) / 2;
    mergesort(array,tempArray,low,middle, size);
    mergesort(array,tempArray,middle+1, high, size);
    merge(array,tempArray,low,middle,high, size);
  }
}
                                MergeSort
void merge(int* array, int* tempArray, int low, int middle, int high, int size)
{
  int i, j, k;

    for (i = low; i <= high; i++) { tempArray[i] = array[i]; } // copy into temp array
    i = low; j = middle+1; k = low;

    while ((i <= middle) && (j <= high)) {          // merging starts here
       if (tempArray[i] <= tempArray[j])            // if lhs item is smaller
       array[k++] = tempArray[i++];                 // put in final array, increment
       else                                         // final array position, lhs index
       array[k++] = tempArray[j++];                 // else put rhs item in final array
     }                                              // increment final array position
                                                    // rhs index
    while (i <= middle)                             // one of the two will run out
     array[k++] = tempArray[i++];                   // copy the rest of the data
}                                                   // only need to copy if in lhs array
                                                    // rhs array already in right place
MergeSort Example



    Recursively Split
MergeSort Example



    Recursively Split
MergeSort Example



      Merge
       Merge Sort Example
                 2 cards
                 Not very interesting
                 Think of as swap




    Temp Array                               Array


                             Temp[i]
                             < Temp[j]

                             Yes

                   j                     k
i
        MergeSort Example
    Temp Array                                               Array


                                    Temp[i]
                                    < Temp[j]

                                    No


          i               j                                      k

Update J, K by 1 => Hit Limit of Internal While Loop, as J > High Now
Copy until I > Middle
                                                               Note:
                                                               If RHS we
Array                                                          were copying
                                                               at this stage,
                                                               really already
                                                               in place
                                            k
              MergeSort Example
                      2 Card
                      Swap




Final after
merging
above sets
           Time Complexity
• Merge Sort Tree
  – Cost per level? What is largest number of
    comparisons I might have to do in merging?
          Time Complexity
• Merge Sort Recurrence Relation:
  T(n) = 2T(n/2) + cn1

Have we seen this before?
 Quicksort Recurrence Relation
Base case: T(1) = 0
Recurrence: T(n) = 2T(n/2) + cn1

Are these the right entries for the recurrence
  relation?
Space Complexity of Mergesort
• Need an additional O(n) temporary array
• Number of recursive calls:
  – Always O(log2n)
                Tradeoffs
• When it is more useful to:
  – Just search
  – Quicksort or Mergesort and search


  – Assume Z searches
  Search on random data: Zn
  Fast Sort and binary search:
    nlog2n + Z *log2n
                  Tradeoffs
             Z * n <= nlog2n + Zlog2n
              Z(n - log2n) <= n log2n
             Z <= (n log2n) / (n-log2n)

         Z <= (n log2n) / n [Approximation]
            Z <= log2n [Approximation]

 Where as before, had to do N searches to make up
        for cost of sorting, now only do log2N
1,000,000 items = 19 searches, instead of 1,000,000
               How Fast?
• Without specific details of what sorting, O(n
  log2n) is the maximum speed sort possible.
  – Only available operations: Compare, Swap

• Proof: Decision Tree – describes how sort
  operates
  – Every vertex represents a comparison, every
    branch a result
  – Moving down tree – Tracing a possible run
    through the algorithm
                                   How Fast?
                                           K1 <= K2               [1,2,3]


                                   Yes                     No

          [1,2,3]   K2 <= K3                                                 K1 <= K3            [2,1,3]


             Yes                                                      Yes                      No
                                 No
[1,2,3]                                                                                                     [2,3,1]
                                                        [2,1,3]
                                                                      stop                    K2 <= K3
          stop                 K1 <= K3       [1,3,2]
                       Yes                No                                           Yes
                                                                                                           No

                     stop              stop
                                                                                       stop                stop

                     [1,3,2]          [3,1,2]                                [2,3,1]                [3,2,1]
         How fast for sorting?
• Number of leaves – number of possible
  outcomes
• In sorting, how many permutations of an
  array, any of which could be the ‘sorted’
  version?
  – n!
• Tree height?
  – log2(n!)
       log2(n!) == O(n*log2n)
• Derivation:
 Recurrence: Master Theorem
T(n) = aT(n/b) + f (n)   where f (n)=cnk

1. a < bk         T(n) ~ nk
2. a = bk         T(n) ~ nk logbn
3. a > bk         T(n) ~ nlog b a
      Recursive Binary Search
T(n) = aT(n/b) + cnk
a = number of sub problems = 1
b = 1/size of subproblems = 1/(1/2) => 2
f(n) = current iteration work = 2n0 so k = 0

Compare a to bk:        1 vs 2^0 = 1 vs 1
If they are equal, computational cost is:
   nk logbn = 1 * log2n       =>          log2n
     Complexity of Quicksort
Recurrence Relation: [Best Case]
 2 sub problems
 ½ size (if good pivot)
 Partition is O(n) = cn1

 a=2b=2k=1
 2 == 21 Master Theorem: O(n*log2n)
      Complexity of Quicksort
Recurrence Relation: [Worst Case]
  – Partition separates into (n-1) and (1)
  – Can’t use master theorem:
     b (subproblem size) changes
          n-1/n n-2/n-1 n-3/n-2
   Our Start of Class Example
T(n) = 4T(n/2) + cn1

a=4             b=2          k=1

4 vs 21         4 > 21

M.T. says: T(n) ~ nlog b a
T(n) ~ O(nlog24) = O(n2)
 Sorts Using Additional Information
• We can break the O(n log2n) barrier if we
  can do more than element comparisons
  and swaps
  – Data specific sorting instead of generalized
                  Radix Sort
Also called bin sort:
  Repeatedly shuffle data into small bins
  Collect data from bins into new deck
  Repeat until sorted

  Appropriate method of shuffling and collecting?
  For integers, key is to shuffle data into bins on a
  per digit basis, starting with the rightmost (ones
  digit)
  Collect in order, from bin 0 to bin 9, and left to
  right within a bin
        Radix Sort: Ones Digit
   Data: 459 254 472 534 649 239 432 654 477
Bin 0
Bin 1
Bin 2   472 432
Bin 3
Bin 4   254 534 654
Bin 5
Bin 6
Bin 7   477
Bin 8
Bin 9   459 649 239
     Radix Sort: Tens Digit
   Data: 472 432 254 534 654 477 459 649 239
Bin 0
Bin 1
Bin 2
Bin 3 432 534 239
Bin 4 649
Bin 5 254 654 459
Bin 6
Bin 7 472 477
Bin 8
Bin 9
   Radix Sort: Hundreds Digit
   Data: 432 534 239 649 254 654 459 472 477
Bin 0
Bin 1
Bin 2 239 254
Bin 3
Bin 4 432 459 472 477
Bin 5 534
Bin 6 649 654
Bin 7
Bin 8
Bin 9
Final Sorted Data: 239 254 432 459 472 477 534
  649 654
            Radix Sort Algorithm
Begin with current digit as one’s digit
While there is still a digit on which to classify
{
  For each number in the master list,
       Add that number to the appropriate sublist keyed on
  the current digit

    For each sublist from 0 to 9
        For each number in the sublist
              Remove the number from the sublist and
    append to a new master list

    Advance the current digit one place to the left.
}
                 Radix Sort Costs
• For each digit in the numbers, you have to distribute all n
  values in array into k bins, then collect from the k bins
  back into an array, usually n >> k
• #operations ~ max digits * (n + k) ~ (max digits * n)
   – For distribute, have to be able to access an individual
     component of value
       • Integers: %
       • Strings: charAt[i]
• space ~ (one approach) initial array, buckets for each
  possible component value (10 for integers), each bucket
  sized to possibly hold all values in array
   – So all arrays sized right OR
   – Linked lists with fast adds at back
              Stable Sorts
Stable sort:
  A sort that preserves the input ordering of
  elements that are equal

Why important?
 Maintenance of pre-specified ordering /
 properties
 An example: Preservation of a previous
 sort on another key
               Stable Sorts
• Full deck card sorting
  – Sort cards by type
  – Clubs < Diamonds < Hearts < Spades
  – Sort next on value
  –2<3<4<5<…
    Gives all 2s together, all 3s together, …
    with 2C followed by 2D followed by 2H
    followed by 2S
  – Stable value sort preserve type
                    Stable Sorts
• Which sorts have we seen that are stable?
  – Insertion Sort
     • Picking up in order from “extras” pile
     • Only move past if you are greater than item in front of you
  – Mergesort
     • No swapping
     • Fill in result array from temp array left to right
         – Merging – Two equal things – one on the left stays on the left
• Unstable:
  – Selection Sort                       - Why?
  – Quick Sort                           - Why?
        Selection Sort – Unstable
void selectionSort(int* a, int size)
{
    for (int k = 0; k < size-1; k++)
    {
           int index = mininumIndex(a, k, size);
           swap(a[k],a[index]);
    }                                            Unstable because swapping may
}                                                rearrange order of original items
                                                 (Sort doesn’t care what it picks up
int minimumIndex(int* a, int first, int last) and moves)
{
    int minIndex = first;
    for (int j = first + 1; j < last; j++)
    { if (a[j] < a[minIndex]) minIndex = j; }
    return minIndex;
}
         Selection Sort - Unstable




Do Swap to Put 2 of Spades in Place for Value Sort – Ruins 3’s Type Sort
                            Quicksort
int partition(int* arrayOfInts, int first, int last)
{
  int temp;
  int p = first;                                 // set pivot = first index
  for (int k = first+1; k <= last; k++) {        // for every other index
     if (arrayOfInts[k] <= arrayOfInts[first]) { // if data on is smaller
      p = p + 1;                                 // update final pivot location
      temp = arrayOfInts[k];                     // rearrange data
      arrayOfInts[k] = arrayOfInts[p];
      arrayOfInts[p] = temp; } }
  temp = arrayOfInts[p];                         // put pivot in right spot
  arrayOfInts[p] = arrayOfInts[first];
  arrayOfInts[first] = temp;
  return p;                           Possibility to move items out of order
}
           Stability is not guaranteed with
           5C 2C 3C 6C 9C 6D 3D
           (swapping 6C, 3D in partition can effect 6D, 6C ordering)

								
To top