Introduction to Algorithms and Data Structures - jgypk

Document Sample
Introduction to Algorithms and Data Structures - jgypk Powered By Docstoc
					INTRODUCTION TO ALGORITH
   AND DATA STRUCTURES
           A.E. Csallner
   Department of Applied Informatics
         University of Szeged
              Hungary
                   Algorithms
  Algorithm:
    Finite sequence of finite steps
    Provides the solution to a given problem

  Properties:                         Communication:
    Finiteness                          Input
    Definiteness                        Output
    Executability


About algorithms    Algorithms and Data Structures I     2
         Structured programming
  Design strategies:

        Bottom-up: synthesize smaller algorithmic
         parts into bigger ones

        Top-down: formulate the problem and
         repeatedly break it up into smaller and
         smaller parts


About algorithms     Algorithms and Data Structures I   3
 Example : Shoe a horse

                                     shoe a horse
                   a horse has four hooves

                                      shoe a hoof
                  need a horseshoe                           need to fasten the horseshoe
                                                                               to the hoof
                            hammer a                drive a cog
                            horseshoe               into a hoof
                                                                   need cogs

                                                      hammer
                                                       a cog



Structured programming          Algorithms and Data Structures I                             4
 Basic elements of structured programming
  Sequence: series of actions

  Selection: branching on a decision

  Iteration: conditional repetition


 All structured algorithms can be defined using
   only these three elements (E.W. Dijkstra 1960s)



Structured programming   Algorithms and Data Structures I   5
   Algorithm description methods

 An algorithm description method defines an
   algorithm so that the description code should
         be unambiguous;

         programming language independent;

         still easy to implement;

         state-of-the-art


Algorithm description        Algorithms and Data Structures I   6
 Some possible types of classification:

  Age (when the description method was
   invented)

  Purpose (e.g. structural or object-oriented)

  Formulation (graphical or text code, etc.)

  ...


Algorithm description   Algorithms and Data Structures I   7
 Most popular and useful description methods


  Flow diagram
         old

         not definitely structured(!)

         graphical

         very intuitive and easy to use




Algorithm description    Algorithms and Data Structures I   8
              A possible notation of flow diagrams


  Circle:              START




                                                             STOP




                                          STOP




Algorithm description     Algorithms and Data Structures I          9
              A possible notation of flow diagrams


  Rectangle:


                         Any action execution
                          can be given here




Algorithm description   Algorithms and Data Structures I   10
              A possible notation of flow diagrams


  Diamond:


                                        Any                no
                                       yes/no
                                      question

                                      yes       yes




Algorithm description   Algorithms and Data Structures I        11
              A possible notation of flow diagrams


 An example:                         START



                   Iteration

                        Selection                                       Sequence
                                                          yes             Hammer a
                                    Need more
                                    horseshoes?                           horseshoe

                                     no

                                                                         Shoe a hoof
                                      STOP




Algorithm description                Algorithms and Data Structures I                  12
 Most popular and useful description methods


  Pseudocode
         old

         definitely structured

         text based

         very easy to implement




Algorithm description    Algorithms and Data Structures I   13
                Properties of a possible pseudocode


  Assignment instruction: 

  Looping constructs as in Pascal:

         for-do instruction (counting loop)
          for variable  initial value to/downto final value
            do body of the loop




Algorithm description     Algorithms and Data Structures I     14
                Properties of a possible pseudocode


         while-do instruction (pre-test loop)
          while stay-in test
           do body of the loop

         repeat-until instruction (post-test loop)
          repeat body of the loop
            until exit test




Algorithm description     Algorithms and Data Structures I   15
                Properties of a possible pseudocode


  Conditional constructs as in Pascal:

         if-then-else instruction (else clause is optional)
          if test
             then test passed clause
             else test failed clause


  Blocks are denoted by indentation


Algorithm description     Algorithms and Data Structures I     16
                Properties of a possible pseudocode


  Object identifiers are references

  Field of an object separator is a dot:
        object.field
        object.method
        object.method(formal parameter list)


  Empty reference is NIL


Algorithm description     Algorithms and Data Structures I   17
                Properties of a possible pseudocode


  Arrays are objects

  Parameters are passed by value




Algorithm description     Algorithms and Data Structures I   18
                Properties of a possible pseudocode


 An example:

 ShoeAHorse(Hooves)
 hoof  1
 while hoof ≤ Hooves.Count      Iteration

    do horseshoe  HammerAHorseshoe                          Sequence

       Hooves[hoof]  horseshoe
       hoof  hoof + 1


Algorithm description     Algorithms and Data Structures I              19
                  Type algorithms
 Algorithm classification on the I/O structure

  Sequence → Value

  Sequence → Sequence

  More sequences → Sequence

  Sequence → More sequences


Type algorithms      Algorithms and Data Structures I   20
 Sequence → Value

  sequence calculations (e.g. summation, product of a

       series, linking elements together, etc.),

  decision (e.g. checking whether a sequence contains

       any element with a given property),

  selection (e.g. determining the first element in a

       sequence with a given property provided we know

       that there exists at least one),

Type algorithms           Algorithms and Data Structures I   21
 Sequence → Value (continued)

  search (e.g. finding a given element),

  counting (e.g. counting the elements having a

       given property),

  minimum or maximum search (e.g. finding the

       least or the largest element).


Type algorithms       Algorithms and Data Structures I   22
 Sequence → Sequence

  selection (e.g. collect the elements with a given

       property of a sequence),

  copying (e.g. copy the elements of a sequence to

       create a second sequence),

  sorting (e.g. arrange elements into an increasing

       order).

Type algorithms        Algorithms and Data Structures I   23
 More sequences → Sequence

  union (e.g. set union of sequences),

  intersection (e.g. set intersection of sequences),

  difference (e.g. set difference of sequences),

  uniting sorted sequences (merging / combing

       two ordered sequences).

Type algorithms      Algorithms and Data Structures I   24
 Sequence → More sequences

  filtering (e.g. filtering out elements of a

       sequence having given properties).




Type algorithms      Algorithms and Data Structures I   25
                     Special algorithms

  Iterative algorithm
        Consists of two parts:

         Initialization (usually initializing data)

         Iteration (repeated part)




Special algorithms        Algorithms and Data Structures I   26
  Recursive algorithms
        Basic types:

         direct (self-reference)

         indirect (mutual references)



        Two alternative parts depending on the base criterion:

         Base case (if the problem is small enough)

         Recurrences (direct or indirect self-reference)



Special algorithms        Algorithms and Data Structures I       27
 An example of recursive algorithms:

 Towers of Hanoi


 Aim:
 Move n disks from a rod to another, using a third one


 Rules:
  One disk moved at a time
  No disk on top of a smaller one

Special algorithms   Algorithms and Data Structures I   28
                     Recursive solution of the problem


                        1st step: move n–1 disks


                              2nd step: move
                              1 disk
                                                         3rd step:
                                                         move n–1 disks




Special algorithms              Algorithms and Data Structures I          29
 Pseudocode of the recursive solution

 TowersOfHanoi(n,FirstRod,SecondRod,ThirdRod)
   1 if n > 0
   2    then TowersOfHanoi(n – 1,FirstRod,ThirdRod,SecondRod)
   3          write “Move a disk from ” FirstRod “ to ” SecondRod
   4          TowersOfHanoi(n – 1, ThirdRod,SecondRod,FirstRod)

                               line 2




                           line 3               line 4




Special algorithms      Algorithms and Data Structures I            30
  Backtracking algorithms


        Backtracking algorithm:

         Sequence of systematic trials

         Builds a tree of decision branches

         Steps back (backtracking) in the tree if no branch at a
            point is effective




Special algorithms          Algorithms and Data Structures I        31
 An example of the backtracking algorithms



 Eight Queens Puzzle:


eight chess queens to be
placed on a chessboard so
that no two queens attack
each other

Special algorithms   Algorithms and Data Structures I   32
 Pseudocode of the iterative solution


 EightQueens
    1 column  1
    2 RowInColumn[column]  0
    3 repeat
    4          repeat inc(RowInColumn[column])
    5          until IsSafe(column, RowInColumn)
    6          if RowInColumn[column] > 8
    7             then column  column – 1
    8             else if column < 8
    9                     then column  column + 1
    10                         RowInColumn[column]  0
    11                    else draw chessboard
    12 until column = 0


Special algorithms    Algorithms and Data Structures I   33
        Complexity of algorithms
 Questions regarding an algorithm:
  Does it solve the problem?
  How fast does it solve the problem?
  How much storage place does it occupy to
   solve the problem?


                         Complexity issues
                          of the algorithm

Analysis of algorithms       Algorithms and Data Structures I   34
 Elementary storage or time: independent from the
    size of the input.

 Example 1
 If an algorithm needs 500 kilobytes to store some
    internal data, this can be considered as
    elementary.

 Example 2
 If an algorithm contains a loop whose body is
    executed 1000 times, it counts as an elementary
    algorithmic step.
Analysis of algorithms   Algorithms and Data Structures I   35
 Hence a block of instructions count as a single
   elementary step if none of the particular
   instructions depends on the size of the input.

 A looping construct counts as a single elementary
   step if the number of iterations it executes does
   not depend on the size of the input and its
   body is an elementary step.

 ⇒ to shoe a horse can be considered as an
   elementary step ⇔ it takes constant time (one
   step) to shoe a horse

Analysis of algorithms   Algorithms and Data Structures I   36
 The time complexity of an algorithm is a function
   depending on the size of the input.

 Notation: T(n) where n is the size of the input

 Function T can depend on more than one variable,
   e.g. T(n,m) if the input of the algorithm is an
   n⨯m matrix.




Analysis of algorithms   Algorithms and Data Structures I   37
 Example: Find the minimum of an array.

 Minimum(A)
  1 min  A[1]
  2 i  1            1
  3 repeat
  4       i  i + 1
  5       if A[i] < min                                     1   n−1
  6           then min  A[i]
  7 until i  A.Length
  8 return min

Analysis of algorithms   Algorithms and Data Structures I             38
 Hence T(n) = n (where n = A.Length)

 Does this change if line 8 (return min) is
   considered as an extra step?
                   ?
 In other words: n ≈ n + 1                      this counts as
                                                a single
                                                elementary step
 It does not change!
 Proof:
 n + 1 = (n − 1) + 2 ≈ (n − 1) + 1 = n


Analysis of algorithms   Algorithms and Data Structures I         39
 This so-called asymptotic behavior can be
   formulated rigorously in the following way:

 We say that f (x) = O(g(x)) (big O notation) if

                (∃C, x0 > 0) (∀x ≥ x0) 0 ≤ f (x) ≤ C∙g(x)


 means that g is an asymptotic upper bound of f




Analysis of algorithms      Algorithms and Data Structures I   40
                                                                 C∙g(x)


                                                                          f (x)




                                                                      g(x)



                                                            x0

Analysis of algorithms   Algorithms and Data Structures I                     41
 The O notation denotes an upper bound.

 If g is also a lower bound of f then we say that

                         f (x) = θ (g(x)) if

      (∃c, C, x0 > 0) (∀x ≥ x0) 0 ≤ c∙g(x) ≤ f (x) ≤ C∙g(x)


 means that f asymptotically equals g


Analysis of algorithms   Algorithms and Data Structures I     42
                                                                       f (x)
                                                            C∙g(x)

                                                                     g(x)

                                                                     c∙g(x)




                          x0C                  x0c =x0

Analysis of algorithms   Algorithms and Data Structures I               43
 What does the asymptotic notation show us?

 We have seen:
   T(n) = θ (n) for the procedure Minimum(A)
 where n = A.Length

 However, due to the definition of the θ function
   T(n) = θ (n),
   T(2n) = θ (n),
   T(3n) = θ (n) ...
                                   ?
 Minimum does not run slower on more data?

Analysis of algorithms   Algorithms and Data Structures I   44
 What does the asymptotic notation show us?

 Asymtotic notation shows us the tendency:



  T(n) = θ (n))linear tendency
           (n2 quadratic tendency
        n data → a certain amount of time t
                         22
        2n data → time ≈ 2tt = 4t
                         32
        3n data → time ≈ 3tt = 9t




Analysis of algorithms   Algorithms and Data Structures I   45
   Analyzing recursive algorithms

    Recursive algorithm – recursive function T
 Example: Towers of Hanoi

 TowersOfHanoi(n,FirstRod,SecondRod,ThirdRod)
    1 if n > 0
    2    then TowersOfHanoi(n – 1,FirstRod,ThirdRod,SecondRod)
    3          write “Move a disk from ” FirstRod “ to ” SecondRod
    4          TowersOfHanoi(n – 1, ThirdRod,SecondRod,FirstRod)


    T(n)=                T(n−1)      +T(n−1)                  +1     =2T(n−1)+1


Analysis of algorithms            Algorithms and Data Structures I                46
           T(n) = 2T(n−1) + 1 is a recursive function

 In general it is very difficult (sometimes
    insoluble) to determine the explicit form of an
    implicit (recursive) formula

 If the algorithm is recursive, the solution can be
     achieved using recursion trees.

    T(n)=                                                   =2T(n−1)+1


Analysis of algorithms   Algorithms and Data Structures I                47
                     Recursion tree of TowersOfHanoi:
                                n


                 n−1            1                    n−1
                                                                         1
      n−2            1   n−2              n−2           1         n−2
                                                                         2

       1                  1                1                       1     4



       1         1                         1        1                   2n−1
                                                                        2n−1
Analysis of algorithms         Algorithms and Data Structures I                48
 Time complexity:
    T(n) = 2n − 1 = θ (2n) − exponential time
                                                            (very slow)

 Example: n = 64 (from the original legend)
   T(n) = 2n − 1 = 264 − 1 =    (assuming one disk
   ≈ 1.8∙1019 seconds =           move per second)
   ≈ 3∙1017 minutes =
   ≈ 5.1∙1015 hours =
   ≈ 2.1∙1014 days =
       ≈ 5.8∙1011 years > half a trillion years
Analysis of algorithms   Algorithms and Data Structures I             49
                         Different cases
 Problem (example): search a given element in a
   sequence (array).

             LinearSearch(A,w)
              1 i  0
              2 repeat i  i + 1
              3 until A[i] = w  or  i = A.Length
              4 if A[i] = w then return i
              5              else return NIL
Analysis of algorithms      Algorithms and Data Structures I   50
 Array:                  8   1   3       9       5        6         2

 Best case
   Element wanted: 8
   Time complexity: T(n) = 1 = θ (1)


 Worst case
  Element wanted: 2
  Time complexity: T(n) = n = θ (n)


  Average case?

Analysis of algorithms           Algorithms and Data Structures I       51
 Array:                  8   1   3       9       5        6         2




 The mean value of the time complexities on all
   possible inputs:

 T(n) = ( 1 + 2+ 3+ 4+ ... + n ) / n =
      = n∙(n + 1) / 2n = (n + 1) / 2 = θ (n)


          (The
  Average case? same as in the worst case)

Analysis of algorithms           Algorithms and Data Structures I       52
                  Basic data structures
  To store a set of data of the same type in a linear
    structure, two basic solutions exist:
   Arrays: physical sequence in the memory
                                  18 29 22

   Linked lists: the particular elements are linked
    together using links (pointers or indices)

                          18                   29                   22
          head




                          key   link

Arrays and linked lists          Algorithms and Data Structures I        53
   Arrays vs. linked lists

   Time complexity of some operations on arrays
   and linked lists in the worst case

               Search     Insert   Delete    Minimum         Maximum     Successor   Predecessor

   Array       O(n)       O(n)     O(n)        O(n)            O(n)       O(n)         O(n)

  Linked       O(n)       O(1)     O(1)        O(n)            O(n)       O(n)         O(n)
    list




Arrays and linked lists               Algorithms and Data Structures I                             54
    Doubly linked lists:

                head
                               18                     29               22


    Dummy head lists:
dummy




                           X             18                     29          22
 head




                               pointer


          Indirection (indirect reference): pointer.key
          Double indirection: pointer.link.key
                                                                        to be continued...

 Arrays and linked lists            Algorithms and Data Structures I                         55
   Array representation of linked lists
dummy




                           X               18                      29              22
 head




                               1   2       3        4       5       6     7    8

                    key            22 X                   18              29

                    link           0       5               7              2
                dummy
                 head          3                    Problem: a lot of garbage


 Arrays and linked lists               Algorithms and Data Structures I                 56
  Garbage collection for array-represented lists

  The empty cells are linked to a separate garbage
    list using the link array:

                          1   2       3        4       5       6     7    8

                   key        22 X                   18              29

                   link   8   0       5       0       7        1     2    4
               dummy
                head      3                 garbage        6



Arrays and linked lists           Algorithms and Data Structures I            57
  To allocate place for a new key and use it:
   the first element of the garbage list is linked out
    from the garbage
   and linked into the proper list with a new key
    (33 here) if necessary.
                          1   2       3        4       5       6     7         8

                   key        22 X                   18 33 29

                   link   8   0       5
                                      6       0       7        1
                                                               5     2     4
               dummy
                head      3                 garbage        6
                                                           1             new       6



Arrays and linked lists           Algorithms and Data Structures I                     58
  Pseudocode for garbage management
  Allocate(link)
    1 if link.garbage = 0
    2     then return  0
    3     else new  link.garbage
    4            link.garbage  link[link.garbage]
    5            return  new

  Free(index,link)
    1 link[index]  link.garbage
    2 link.garbage  index

Arrays and linked lists   Algorithms and Data Structures I   59
  Dummy head linked lists (...continued)
   FindAndDelete for simple linked lists
  FindAndDelete(toFind,key,link)
     1 if key[link.head] = toFind                     extra case:
     2    then toDelete  link.head            the first element
     3           link.head  link[link.head]    is to be deleted
     4           Free(toDelete,link)
     5    else toDelete  link[link.head] an additional pointer is needed
     6           pointer  link.head                         to step forward
     7           while toDelete  0 and key[toDelete]  toFind
     8                  do pointer  toDelete
     9                      toDelete  link[toDelete]
     10          if toDelete  0
     11             then link[pointer]  link[toDelete]
     12                    Free(toDelete,link)


Arrays and linked lists   Algorithms and Data Structures I              60
  Dummy head linked lists (...continued)
   FindAndDelete for dummy head linked lists

  FindAndDeleteDummy(toFind,key,link)
     1 pointer  link.dummyhead
     2 while link[pointer]  0  and  key[link[pointer]]  toFind
     3        do pointer  link[pointer]
     4 if link[pointer]  0
     5    then toDelete  link[pointer]
     6           link[pointer]  link[toDelete]
     7           Free(toDelete,link)



Arrays and linked lists   Algorithms and Data Structures I         61
                    Stacks and queues
 Common properties:
  only two operations are defined:
         store a new key (called push and enqueue, resp.)
         extract a key (called pop and dequeue, resp.)
  all (both) operations work in constant time

 Different properties:
  stacks are LIFO structures
  queues are FIFO (or pipeline) structures

Stacks and queues       Algorithms and Data Structures I     62
 Two erroneous cases:

  an empty data structure is intended to be
   extracted from: underflow
  no more space but insertion attempted:
   overflow




Stacks and queues   Algorithms and Data Structures I   63
 Stack management using arrays

                                                  push(8)
           Stack:   3                             push(1)
                                                  push(3)
                    1
                                                  push(9) Stack overflow
                    8                             pop
                              top                pop
                                                  pop
                                                  pop Stack underflow


Stacks and queues   Algorithms and Data Structures I                    64
 Stack management using arrays

 Push(key,Stack)
   1 if Stack.top = Stack.Length  stack overflow
   2     then return  Overflow error
   3     else Stack.top  Stack.top + 1
   4           Stack[Stack.top]  key




Stacks and queues   Algorithms and Data Structures I   65
 Stack management using arrays



 Pop(Stack)
   1 if Stack.top = 0  stack underflow
   2    then return  Underflow error
   3    else Stack.top  Stack.top − 1
   4          return  Stack[Stack.top + 1]




Stacks and queues   Algorithms and Data Structures I   66
 Queue management using arrays

    Queue:                      end ↓                           Empty ?queue:
                                                                  •beginning = n
                    8   3   1      6       4       7           2 9 5
                                                                  •end = 0



                                                                ← beginning




Stacks and queues           Algorithms and Data Structures I                       67
 Queue management using arrays

 Enqueue(key,Queue)
   1 if Queue.beginning = Queue.end  queue
   2    then return  Overflow error    overflow
   3    else if Queue.end = Queue.Length
   4            then Queue.end  1
   5            else Queue.end  Queue.end + 1
   6         Queue[Queue.end]  key




Stacks and queues   Algorithms and Data Structures I   68
 Queue management using arrays

 Dequeue(Queue)
   1 if Queue.end = 0  queue underflow
   2    then return  Underflow error
   3    else if Queue.beginning = Queue.Length
   4            then Queue.beginning  1
   5            else inc(Queue.beginning)
   6         key  Queue[Queue.beginning]
   7         if Queue.beginning = Queue.end
   8            then Queue.beginning  Queue.Length
   9                 Queue.end  0
   10        return  key

Stacks and queues   Algorithms and Data Structures I   69
                      Binary search trees
  Linear data structures cannot provide better time
    complexity than n in some cases

  Idea: let us use another kind of structure

  Solution:
   rooted trees (especially binary trees)
   special order of keys (‘search trees’)


Binary search trees        Algorithms and Data Structures I   70
  A binary tree:                                 depth (height)
  Notions:
                      root                                      levels

vertex (node)


        edge                                                    parent - child




   twins
   (siblings)                                                               leaf


Binary search trees          Algorithms and Data Structures I                    71
           search
  A binary tree:
                                                  for all vertices
                                        28

    all keys in the                                                all keys in the
    left subtree                                                   right subtree
    are smaller            12                       30             are greater



                      7             21                            49


                          14                    26                     50

Binary search trees            Algorithms and Data Structures I                      72
  Implementation of binary search trees:

                                        28

                                                                  link to the parent
                           12                       30
                                                              key and other data

                      7             21                         49
                                                              link to the    link to the
                                                               left child    right child


                          14                    26                      50

Binary search trees            Algorithms and Data Structures I                            73
  Binary search tree operations:
   tree walk                                                                                   7
      inorder:         28                                                                     12




                                                                            increasing order
             1. left                                                                           14
             2. root                                                                           21
             3. right      12                       30                                         26
                                                                                               28
                                                                                               30
                      7             21                            49                           49
                                                                                               50


                          14                    26                     50

Binary search trees            Algorithms and Data Structures I                                     74
  InorderWalk(Tree)
    1 if Tree  NIL
    2    then InorderWalk(Tree.Left)
    3          visit Tree, e.g. check it or list it
    4          InorderWalk(Tree.Right)

  The so-called preorder and postorder tree walks
    only differ by the order of lines 2-4:
         preorder: root → left → right
         postorder: left → right → root


Binary search trees      Algorithms and Data Structures I   75
  Binary search tree operations:
   tree search
                        28                                             TreeSearch(14)
                   <         <                                         TreeSearch(45)

                           12                       30
                                       <                          <

                      7             21                            49
                           <                             <

                          14                    26                       50

Binary search trees            Algorithms and Data Structures I                         76
  TreeSearch(toFind,Tree)
    1 while Tree  NIL  and  Tree.key  toFind
    2        do if toFind < Tree.key
    3                then Tree  Tree.Left
    4                else Tree  Tree.Right
    5 return  Tree




Binary search trees   Algorithms and Data Structures I   77
  Binary search tree operations:
   insert
                        28                                            TreeInsert(14)
                   <

                          12                       30
                                      <

                      7            21                            49
                          <
new vertices are
always inserted
as leaves        14                            26                       50

Binary search trees           Algorithms and Data Structures I                         78
  Binary search tree operations:
   tree minimum
   tree maximum        28


                           12                       30


                      7             21                            49


                          14                    26                     50

Binary search trees            Algorithms and Data Structures I             79
  TreeMinimum(Tree)
    1 while Tree.Left  NIL
    2        do Tree  Tree.Left
    3 return  Tree

  TreeMaximum(Tree)
    1 while Tree.Right   NIL
    2        do Tree  Tree.Right
    3 return  Tree


Binary search trees   Algorithms and Data Structures I   80
  Binary search tree operations:
   successor of an
                        28                                       TreeSuccessor(12)
    element
   parent-left child                                           if the element has no
   relation                                                    right child:

                           12                    30              TreeSuccessor(26)


                      7          21                            49

                              tree
                          14minimum 26                               50

Binary search trees         Algorithms and Data Structures I                         81
  TreeSuccessor(Element)
    1 if Element.Right  NIL
    2 then return  TreeMinimum(Element.Right)
    3 else Above  Element.Parent
    4         while Above  NIL  and 
                              Element = Above.Right
    5               do Element  Above
    6                   Above  Above.Parent
    7 return  Above

  Finding the predecessor is similar.

Binary search trees   Algorithms and Data Structures I   82
  Binary search tree operations:
   delete
                        28       1. if the element has
                                                                       no children:
                                                                   TreeDelete(26)
                           12                       30


                      7             21                            49


                          14                    26                        50

Binary search trees            Algorithms and Data Structures I                       83
  Binary search tree operations:
   delete
                        28       2. if the element has
                                                                       only one child:
                                                                   TreeDelete(30)
                           12                       30


                      7             21                            49


                          14                    26                        50

Binary search trees            Algorithms and Data Structures I                          84
   Binary search tree operations:
    delete
                         28       3. if the element has
        12 is substituted                                          two children:
        for a close key,
        e.g. the successor,
                                                               TreeDelete(12)
        14               12                     30


                       7        21                            49
the successor,
found in the right     tree
subtree has at     14minimum 26                                       50
most one child

 Binary search trees       Algorithms and Data Structures I                        85
The case if Element has no children:

TreeDelete(Element,Tree)
 1 if Element.Left = NIL  and  Element.Right = NIL
 2    then if Element.Parent = NIL
 3            then Tree  NIL
 4            else if Element = (Element.Parent).Left
 5                    then (Element.Parent).Left  NIL
 6                    else (Element.Parent).Right  NIL
 7         Free(Element)
 8         return  Tree
 9- next page


Binary search trees   Algorithms and Data Structures I    86
The case if Element has only a right child:

   -8 previous page
   9 if Element.Left = NIL  and  Element.Right  NIL
   10    then if Element.Parent = NIL
   11            then Tree  Element.Right
   12                 (Element.Right).Parent  NIL
   13            else (Element.Right).Parent  Element.Parent
   14                 if Element = (Element.Parent).Left
   15                    then (Element.Parent).Left  Element.Right
   16                    else (Element.Parent).Right  Element.Right
   17         Free(Element)
   18         return  Tree
   19- next page


Binary search trees       Algorithms and Data Structures I             87
The case if Element has only a left child:
    Ve
         ry
   -18 previous page
              sim
   19 if Element.Left  NIL  and  Element.Right = NIL
   20                ila
          then if Element.Parent = NIL
                          rt
   21
                              ot
                  then Tree  Element.Left
   22
                                   he
                       (Element.Left).Parent  NIL
   23
                                         pr
                  else (Element.Left).Parent  Element.Parent
   24
                                             ev
                       if Element = (Element.Parent).Left
   25
                                                  iou
                          then (Element.Parent).Left  Element.Left
   26
                                                         sc
                          else (Element.Parent).Right  Element.Left
   27          Free(Element)
                                                            as
   28
   29-next page
               return  Tree
                                                                e

Binary search trees       Algorithms and Data Structures I             88
The case if Element has two children:
   -28 previous page
   29 if Element.Left  NIL  and  Element.Right  NIL
   30 then Substitute  TreeSuccessor(Element)
   31        if Substitute.Right  NIL                     Substitute is linked out
   32           then (Substitute.Right).Parent  Substitute.Parent   from its place
   33        if Substitute = (Substitute.Parent).Left
   34           then (Substitute.Parent).Left  Substitute.Right
   35           else (Substitute.Parent).Right  Substitute.Right
   36        Substitute.Parent  Element.Parent
   37        if Element.Parent = NIL
                                               Substitute is linked into
   38           then Tree  Substitute
   39           else if Element = (Element.Parent).Left Elements place
   40                    then (Element.Parent).Left  Substitute
   41                    else (Element.Parent).Right  Substitute
   42        Substitute.Left  Element.Left
   43        (Substitute.Left).Parent  Substitute
   44        Substitute.Right  Element.Right
   45        (Substitute. Right).Parent  Substitute
   27        Free(Element)
   28        return  Tree



Binary search trees             Algorithms and Data Structures I                      89
 Time complexity of
   binary search tree operations

  T(n) = O(d) for all operations (except for the
   walk), where d denotes the depth of the tree
  The depth of any randomly built binary search
   tree is d = O(log n)
  Hence the time complexity of the search tree
   operations in the average case is
                   T(n) = O(log n)



Stacks and queues   Algorithms and Data Structures I   90
                    Binary search
 If insert and delete is used rarely then it is more
     convenient and faster to use an oredered array
     instead of a binary search tree.
 Faster: the following operations have T(n) = O(1)
     constant time complexity:
         minimum, maximum,
         successor, predecessor.

  Search has the same T(n) = O(log n) time
  complexity as on binary search trees:

Binary search            Algorithms and Data Structures I   91
  Search has the same T(n) = O(log n) time
  complexity as on binary search trees:

 Let us search key 29 in the ordered array below:


                        central element
                                                       <
                2   3      7 12 29 31 45


                          search here


Binary search            Algorithms and Data Structures I   92
  Search has the same T(n) = O(log n) time
  complexity as on binary search trees:

 Let us search key 29 in the ordered array below:


                              <      central element


                2   3     7 12 29 31 45


                                         search here


Binary search           Algorithms and Data Structures I   93
  Search has the same T(n) = O(log n) time
  complexity as on binary search trees:

 Let us search key 29 in the ordered array below:


                            central element
                                                           found!
                                               =
                2   3     7 12 29 31 45


                                 search here


Binary search           Algorithms and Data Structures I            94
  Search has the same T(n) = O(log n) time
  complexity as on binary search trees:




                                                                O(log n)

                    2   3      7 12 29 31 45

 This result can also be derived from:
 if we halve n elements k times, we get 1 ⇔
                n / 2k = 1     ⇔         k = log2 n = O(log n)
Binary search                Algorithms and Data Structures I          95
                     Sorting
 Problem
 There is a set of data from a base set with a given
 order over it (e.g. numbers, texts). Arrange them
 according to the order of the base set.

  Example

                  12 2             7        3         sorting



Sorting            Algorithms and Data Structures I             96
 Sorting sequences
 We sort sequences in a lexicographical order:
 from two sequences the sequence is ‘smaller’
 which has a smaller value at the first position
 where they differ.

  Example (texts)


          g   o   n      e          ?
                                    <          g         o   o   d

                       n < o in the alphabet

Sorting               Algorithms and Data Structures I               97
                            Insertion sort
  Principle
             14
                                   8


                       69




                  22
                            75




Insertion sort                   Algorithms and Data Structures I   98
  Implementation of insertion sort with arrays

   insertion step:




                   22 69 75 38 14


                 sorted part               unsorted part




Insertion sort        Algorithms and Data Structures I     99
  InsertionSort(A)
    1 for i  2 to A.Length
    2      do ins  A[i]
    3         j  i – 1
    4         while j > 0  and  ins < A[j]
    5                 do A[j + 1]  A[j]
    6                     j  j – 1
    7         A[j + 1]  ins




Insertion sort     Algorithms and Data Structures I   100
  Time complexity of insertion sort

  Best case
    In each step the new element is inserted to the
    end of the sorted part:
    T(n) = 1 + 1 + 1 +...+ 1 = n − 1 = θ (n)


  Worst case
   In each step the new element is inserted to the
   beginning of the sorted part:
   T(n) = 2 + 3 + 4 +...+ n = n(n + 1)/2 − 1 = θ (n2)

Insertion sort      Algorithms and Data Structures I    101
  Time complexity of insertion sort

  Average case
    In each step the new element is inserted
    somewhere in the middle of the sorted part:

        T(n) = 2/2 + 3/2 + 4/2 +...+ n/2 =
             = (n(n + 1)/2 − 1) / 2 = θ (n2)


                 The same as in the worst case


Insertion sort          Algorithms and Data Structures I   102
  Another implementation of insertion sort



   The input is providing elements continually (e.g.
    file, net)
   The sorted part is a linked list where the
    elements are inserted one by one

  The time complexity is the same in every case.




Insertion sort     Algorithms and Data Structures I   103
  Another implementation of insertion sort

  The linked list implementation delivers an on-line
  algorithm:
   after each step the subproblem is completely
     solved
   the algorithm does not need the whole input to
     partially solve the problem

  Cf. off-line algorithm:
   the whole input has to be known prior to the
    substantive procedure
Insertion sort     Algorithms and Data Structures I   104
                       Merge sort
 Principle

      8 14   14 69   69   8     75 75               25       2   2 22   22 25   36 36




                     sort the parts recursively


Merge sort                Algorithms and Data Structures I                       105
      8      14   69         75                           2   22   25     36




                          ready
                  merge (comb) the parts


Merge sort             Algorithms and Data Structures I                 106
 Time complexity of merge sort

 Merge sort is a recursive algorithm, and so is its
 time complexity function T(n)

 What it does:
  First it halves the actual (sub)array: O(1)
  Then calls itself for the two halves: 2T(n/2)
  Last it merges the two ordered parts: O(n)

             Hence T(n) = 2T(n/2) + O(n) = ?

Merge sort           Algorithms and Data Structures I   107
                   Recursion tree of merge sort:
                           n
                                                                      n
             n/2                                n/2
                                                                     2(n/2)
     n/4            n/4              n/4                     n/4     4(n/4)




       1     1                        1        1                      n

                                                                   n∙log n
Merge sort                Algorithms and Data Structures I                108
 Time complexity of merge sort is

                   T(n) = θ (n∙logn)


 This worst case time complexity is optimal among
 comparison sorts (using only pair comparisons)

 ⇒ fast
 but unfortunately merge sort does not sort
   in-place, i.e. it uses auxiliary storage of a size
   comparable with the input

Merge sort          Algorithms and Data Structures I    109
                   Heapsort
 An array A is called heap if for all its elements
          A[i] ≥ A[2i] and A[i] ≥ A[2i + 1]




           45 27 34 20 23 31 18 19 3 14

 This property is called heap property
 It is easier to understand if a binary tree is built
     from the elements filling the levels row by row
Heapsort           Algorithms and Data Structures I     110
           45 27 34 20 23 31 18 19 3 14




Heapsort          Algorithms and Data Structures I   111
                                               1
                                                    45


                        2                                             3
                            27                                            34


      4                             5                   6                      7
           20                             23                 31                    18

                                                The heap property turns
8               9            10                 into a simple parent-child
    19              3             14
                                                relation in the tree
                                                representation

     Heapsort                      Algorithms and Data Structures I                112
 An important application of heaps is realizing
 priority queues:

 A data structure supporting the operations

  insert
  maximum (or minimum)
  extract maximum (or extract minimum)




Heapsort          Algorithms and Data Structures I   113
 First we have to build a heap from an array.

 Let us suppose that only the kth element infringes
   the heap property.

 In this case it is sunk level by level to a place
    where it fits. In the example k = 1 (the root):




Heapsort            Algorithms and Data Structures I   114
                                               1
                                                    15


                        2                                             3
                            37                                            34


      4                             5                   6                      7
           20                             23                 31                    18

                                                k=1
8               9            10                 •The key and its children
    19              3             14
                                                are compared
                                                •It is exchanged for the
                                                greater child
     Heapsort                      Algorithms and Data Structures I                115
                                               1
                                                    37


                        2                                             3
                            15                                            34


      4                             5                   6                      7
           20                             23                 31                    18

                                                k=2
8               9            10                 •The key and its children
    19              3             14
                                                are compared
                                                •It is exchanged for the
                                                greater child
     Heapsort                      Algorithms and Data Structures I                116
                                               1
                                                    37


                        2                                             3
                            23                                            34


      4                             5                   6                      7
           20                             15                 31                    18

                                                k=5
8               9            10                 •The key and its children
    19              3             14
                                                are compared
                                                •It is the greatest ⇒ ready

     Heapsort                      Algorithms and Data Structures I                117
 Sink(k,A)
   1 if 2*k ≤ A.HeapSize  and  A[2*k] > A[k]
   2    then greatest  2*k
   3    else greatest  k
   4 if 2*k + 1 ≤ A.HeapSize  and
                             A[2*k + 1] > A[greatest]
   5    then greatest  2*k + 1
   6 if greatest  k
   7    then Exchange(A[greatest],A[k])
   8          Sink(greatest,A)


Heapsort           Algorithms and Data Structures I   118
 To build a heap from an arbitrary array, all
   elements are mended by sinking them:

                                  this is the array’s last element
                                  that has any children
 BuildHeap(A)
   1 A.HeapSize  A.Length
   2 for k  A.Length / 2  downto  1
   3     do Sink(k,A)
           we are stepping backwards; this way every
           visited element has only ancestors which
           fulfill the heap property


Heapsort             Algorithms and Data Structures I                119
 Time complexity of building a heap

  To sink an element costs O(logn) in the worst
   case

  Since n/2 elements have to be sunk, an upper
   bound for the BuildHeap procedure is
                 T(n) = O(n∙logn)

  It can be proven that the sharp bound is
                    T(n) = θ (n)

Heapsort          Algorithms and Data Structures I   120
 Time complexity of the priority queue
 operations if the queue is realized using heaps

  insert
        append the new element to the array O(1)
        exchange it for the root O(1)
        sink the root O(logn)


 The time complexity is
                 T(n) = O(logn)



Heapsort               Algorithms and Data Structures I   121
 Time complexity of the priority queue
 operations if the queue is realized using heaps

  maximum
        read out the key of the root O(1)


 The time complexity is
                   T(n) = O(1)




Heapsort                Algorithms and Data Structures I   122
 Time complexity of the priority queue
 operations if the queue is realized using heaps

  extract maximum
        exchange the root for the array’s last element O(1)
        extract the last element O(1)
        sink the root O(logn)


 The time complexity is
                 T(n) = O(logn)



Heapsort                Algorithms and Data Structures I       123
 The heapsort algorithm

  build a heap θ (n)
  iterate the following (n−1)∙O(logn) = O(n∙logn):
        exchange the root for the array’s last element O(1)
        exclude the heap’s last element from the heap O(1)
        sink the root O(logn)


 The time complexity is
                T(n) = O(n∙logn)



Heapsort                Algorithms and Data Structures I       124
 HeapSort(A)
   1 BuildHeap(A)
   2 for k  A.Length  downto  2
   3     do Exchange(A[1],A[A.HeapSize])
   4        A.HeapSize  A.HeapSize – 1
   5        Sink(1,A)




Heapsort         Algorithms and Data Structures I   125
                      Quicksort
 Principle

            22   69   8          75             25       12   14   36




   Rearrange and part the elements so that every
   key in the first part is smaller than any in the
   second part.
Quicksort             Algorithms and Data Structures I              126
                      Quicksort
 Principle

            14   12   8          75             25       69   22   36




   Rearrange and part the elements so that every
   key in the first part is smaller than any in the
   second part.
Quicksort             Algorithms and Data Structures I              127
                      Quicksort
 Principle

     8
    14      12    8
                 14                   22
                                      75             25   69
                                                          36   69
                                                               22     75
                                                                      36




   Sort each part recursively,
   this will result in the whole array being sorted.

Quicksort             Algorithms and Data Structures I              128
 The partition algorithm

  choose any of the keys stored in the array; this
   will be the so-called pivot key
  exchange the large elements at the beginning of
   the array to the small ones at the end of it


            pivot
     not less thankey                      not greater than
     the pivot key                         the pivot key
     22     69      8          75             25           12   14   36



Quicksort               Algorithms and Data Structures I                  129
 Partition(A,first,last)
   1 left  first – 1
   2 right  last + 1
   3 pivotKey  A[RandomInteger(first,last)]
   4 repeat
   5            repeat left  left + 1
   6            until A[left] ≥ pivotKey
   7            repeat right  right – 1
   8            until A[right] ≤ pivotKey
   9            if left < right
   10              then Exchange(A[left],A[right])
   11              else return  right
   12 until  false

Quicksort           Algorithms and Data Structures I   130
 The time complexity of the partition algorithm is
                   T(n) = θ (n)
 because each element is visited exactly once.

 The sorting is then:

 QuickSort(A,first,last)
   1 if first < last
   2    then border  Partition(A,first,last)
   3           QuickSort(A,first,border)
   4           QuickSort(A,border+1,last)

Quicksort          Algorithms and Data Structures I   131
  Quicksort is a divide and conquer algorithm
   like merge sort, however, the partition is
   unbalanced (merge sort always halves the
   subarray).

  The time complexity of a divide and conquer
   algorithm highly depends on the balance of the
   partition.

  In the best case the quicksort algorithm halves
   the subarrays at every step ⇒
                  T(n) = θ (n∙logn)

Quicksort         Algorithms and Data Structures I   132
 Recursion tree of the worst case
            n
                                                            n
    1                       n−1
                                                           n−1
                 1                     n−2                 n−2




                 1          1                               0

                                                        n∙(n + 1) / 2
Quicksort            Algorithms and Data Structures I                   133
  Thus, the worst case time complexity of
   quicksort is
                   T(n) = θ (n2)


  The average case time complexity is
                 T(n) = θ (n∙logn)
               the same as in the best case!

       The proof is difficult but let’s see a special case
       to understand quicksort better.



Quicksort              Algorithms and Data Structures I      134
 Let λ be a positive number smaller than 1:
                       0<λ<1

 Assumption: the partition algorithm never
   provides a worse partition ratio than
                    (1− λ) : λ

 Example 1: Let λ := 0.99
   The assumption demands that the partition
   algorithm does not leave less than 1% as the
   smaller part.

Quicksort         Algorithms and Data Structures I   135
 Example 2: Let λ := 0.999 999 999
   Due to the assumption, if we have at most one
   billion(!) elements then the assumption is
   fulfilled for any functioning of the partition
   algorithm.

       (Even if it always cuts off only one element
       from the others).

 In the following it is assumed for the sake of
 simplicity that λ ≥ 0.5, i.e. always the λ part is
 bigger.

Quicksort             Algorithms and Data Structures I   136
 Recursion tree of the λ ratio case
             n
                                                            n
(1 − λ)n                       λn
                                                            n
                 (1 − λ)λn                λ2n
                                                           ≤n




                                         λdn
                                                           ≤n

                                                         ≤ n∙logn
Quicksort             Algorithms and Data Structures I              137
  In the special case if none of the parts arising at
   the partitions are bigger than a given λ ratio
   (0.5 ≤ λ < 1), the time complexity of quicksort is
                    T(n) = O(n∙logn)

  The time complexity of quicksort is practically
   optimal because the number of elements to be
   sorted is always bounded by a number N
   (finite storage). Using the value λ = 1 − 1/N it
   can be proven that quicksort finishes in
   O(n∙logn) time in every possible case.


Quicksort          Algorithms and Data Structures I   138
                    Greedy algorithms
 Problem

  Optimization problem: Let a function f(x) be
   given. Find an x where f is optimal (minimal or
   maximal) ‘under given circumstances’

  ‘Given circumstances’: An optimization
   problem is constrained if functional constraints
   have to be fulfilled such as g(x) ≤ 0

Greedy algorithms       Algorithms and Data Structures I   139
  Feasible set: the set of those x values where the
   given constraints are fulfilled

  Constrained optimization problem:

                      minimize f(x)
                    subject to g(x) ≤ 0




Greedy algorithms    Algorithms and Data Structures I   140
 Example

 Problem: There is a city A and other cities
   B1,B2,...,Bn which can be reached from A by bus
   directly. Find the farthest of these cities where
   you can travel so that your money suffices.
                                          A




                    B1   B2                   ...                Bn



Greedy algorithms             Algorithms and Data Structures I        141
 Model:
  Let x denote any of the cities: x ∊ {B1,B2,...,Bn},
  f(x) the distance between A and x,
  t(x) the price of the bus ticket from A to x,
  m the money you have, and
  g(x) = t(x) − m the constraint function.

 The constrained optimization problem to solve:

                         minimize (− f(x))
                    s.t.     g(x) ≤ 0

Greedy algorithms        Algorithms and Data Structures I   142
  In general, optimization problems are much more
                        difficult!

 However, there is a class of optimization
   problems which can be solved using a step-by-
   step simple straightforward principle:
                greedy algorithms:

  at each step the same kind of decision is made,
   striving for a local optimum, and
  decisions of the past are never revisited.


Greedy algorithms   Algorithms and Data Structures I   143
 Question:Which problems can be solved using
   greedy algorithms?

 Answer:
 Problems which obey the following two rules:
  Greedy choice property: If a greedy choice is
   made first, it can always be completed to
   achieve an optimal solution to the problem.
  Optimal substructure property: Any
   substructure of an optimal solution provides an
   optimal solution to the adequate subproblem.


Greedy algorithms   Algorithms and Data Structures I   144
 Counter example

 Find the shortest route from Szeged to Budapest.

 The greedy choice property is infringed:

 You cannot simply choose the closest town first 




Greedy algorithms   Algorithms and Data Structures I   145
                               Budapest




                                   Szeged

                                                  Deszk

                                   Deszk is the closest to Szeged but
                                   situated in the opposite direction

Greedy algorithms   Algorithms and Data Structures I                    146
 Proper example

 Activity-selection problem:
 Let’s spend a day watching TV.

 Aim: Watch as many programs (on the wole) as
   you can.

 Greedy strategy:
 Watch the program ending first, then the next you
   can watch on the whole ending first, etc.

Activity-selection problem   Algorithms and Data Structures I   147
    Include the first oneleft: by their ending times
    Let’s sort programs
    Exclude the programs ready
    No morethose which have already begun




                    The optimum is 4 (TV programs)




Activity-selection problem   Algorithms and Data Structures I   148
    Check the greedy choice property:
      The first choice of any optimal solution can
       be exchanged for the greedy one




Activity-selection problem   Algorithms and Data Structures I   149
    Check the optimal substructure property:
      The part of an optimal solution is optimal
       also for the subproblem
                                                                If this was not
                                                                optimal for the
                                                                subproblem,




  the whole solution
  could be improved
  by improving the
  subproblem’s
  solution



Activity-selection problem   Algorithms and Data Structures I                     150
                Huffman codes
 Notions

  C is an alphabet if it is a set of symbols

  F is a file over C if it is a text built up of the
      characters of C




Huffman codes           Algorithms and Data Structures I   151
 Assume we have the following alphabet
                    C = {a, b, c, d, e}
 Code it with binary codewords of equal length
 How many bits per codeword do we need at least?
 2 are not enough (only four codewords: 00, 01, 10, 11)
 Build codewords using 3 bit coding


                          a = 000
                          b = 001
                          c = 010
                          d = 011
                          e = 100

Huffman codes       Algorithms and Data Structures I   152
a = 000          Build the T binary tree of the coding
b = 001
c = 010
d = 011                      0                                  1
e = 100


                 0            1                                 0



         0           1   0             1             0
          a          b   c           d                e

 Huffman codes               Algorithms and Data Structures I       153
 Further notation

  For each cC character its frequency in the file
   is denoted by f(c)

  For each cC character its length is defined by
   its depth in the T tree of coding, d T (c)

  Hence the length of the file (in bits) equals
   B(T)= c  C f(c)d T (c)



Huffman codes       Algorithms and Data Structures I   154
 Problem




 Let a C alphabet and a file over it given. Find a T
 coding of the alphabet with minimal B(T)




Huffman codes      Algorithms and Data Structures I    155
 Example

 Consider an F file of 20,000 characters over the
   alphabet C = {a, b, c, d, e}
 Assume the frequencies of the particular
   characters in the file are
                   f(a) = 5,000
                   f(b) = 2,000
                   f(c) = 6,000
                   f(d) = 3,000
                   f(e) = 4,000


Huffman codes      Algorithms and Data Structures I   156
 Using the 3 bit coding defined previously, the bit-
   length of the file equals

                  B(T)= c  C f(c)d T (c)=
        5,0003+2,0003+6,0003+3,0003+4,0003=
          (5,000+2,000+6,000+3,000+4,000)3=
                     20,0003=60,000

 This is a so-called fixed-length code since for all
   x,yC d T (x)=d T (y) holds

Huffman codes        Algorithms and Data Structures I   157
    The fixed-length code is not always optimal


                            0                                  1



                0            1                                 0

                                                                        B(T’)=
                                                                     B(T)−f(e)1=
        0           1   0             1             0
                                                                   60,000−4,0001 =
         a          b   c           d                e
                                                                        56,000

Huffman codes               Algorithms and Data Structures I                     158
 Idea

  Construct a variable-length code, i.e., where the
   code-lengths for different characters can differ
   from each other

  We expect that if more frequent characters get
   shorter codewords then the resulting file will
   become shorter




Huffman codes     Algorithms and Data Structures I   159
 Problem: How do we recognize when a
   codeword ends and a new begins. Using
   delimiters is too “expensive”

 Solution: Use prefix codes, i.e., codewords none
   of which is also a prefix of some other
   codeword

 Result: The codewords can be decoded without
   using delimiters


Huffman codes     Algorithms and Data Structures I   160
 For instance if
                              a = 10
                             b = 010
                              c = 00


 then the following codes’ meaning is
 1000010000010010 = a c b c c a b

 However, what if a variable-length code was not
   prefix-free:


Huffman codes      Algorithms and Data Structures I   161
 Then if
                              a = 10
                             b = 100
                               c=0


 then
                100= b       or        100= a c ?

 An extra delimiter would be needed



Huffman codes      Algorithms and Data Structures I   162
 Realize the original idea with prefix codes

         f(a) = 5,000
                                           rare
         f(b) = 2,000
         f(c) = 6,000                      frequent
         f(d) = 3,000
         f(e) = 4,000

 Frequent codewords should be shorter, e.g.,
              a = 00, c = 01, e = 10
 Rare codewords can be longer, e.g.,
                 b = 110, d = 111

Huffman codes           Algorithms and Data Structures I   163
 Question: How can such a coding be done
   algorithmically?

 Answer: The Huffman codes provide exactly this
   solution




Huffman codes    Algorithms and Data Structures I   164
 The bitlength of the file using this K prefix code is
               B(K)= c  C f(c)d K (c)=
    5,0002+2,0003+6,0002+3,0003+4,0002=
    (5,000+6,000+4,000)2+(2,000+3,000 )3=
               30,000+15,000=45,000

 (cf. the fix-length codes gave 60,000,
 the improved one 56,000)



Huffman codes      Algorithms and Data Structures I   165
 The greedy method producing Huffman codes

 1. Sort the characters of the C alphabet in
    increasing order according to their frequency
    in the file and link them to an empty list

 2. Delete the two leading characters, some x and
    y from the list and connect them with a
    common parent z node. Let f(z)=f(x)+f(y),
    insert z into the list and repeat step 2 until the
    the list runs empty.


Huffman codes      Algorithms and Data Structures I      166
    Example


                   character                       frequency (thousands)


List:              a:5   b:2             c:6              d:3     e:4




   Huffman codes               Algorithms and Data Structures I            167
    Example
    1. Sort




List:              a:5   b:2         c:6              d:3     e:4




   Huffman codes           Algorithms and Data Structures I         168
    Example
    2. Merge and rearrange

                         5


List:              b:2       d:3         e:4               a:5    c:6




   Huffman codes               Algorithms and Data Structures I         169
    Example
    2. Merge and rearrange

                          9


List:              e:4             5             a:5    c:6


                         b:2            d:3




   Huffman codes     Algorithms and Data Structures I         170
    Example
    2. Merge and rearrange

                         11


List:              a:5         c:6                      9


                                          e:4               5


                                                   b:2          d:3

   Huffman codes     Algorithms and Data Structures I                 171
    Example
    2. Merge and rearrange

                                         20
                               0                   1

List:                    9                              11
                   0               1             0            1

                   e:4             5          a:5            c:6
                         0                1
                         b:2           d:3

   Huffman codes         Algorithms and Data Structures I          172
 Example
 Ready

                                      20
                            0                   1
                                                   a = 10
                      9                        11 b = 010
                0               1           0      c = 11
                                                   1
                                                   d = 011
                e:4             5          a:5     :=
                                                 c e 6 00
                      0                1
                      b:2           d:3

Huffman codes         Algorithms and Data Structures I       173
                                           f(a) = 5,000
 Example                                   f(b) = 2,000
 Length of file in bits                    f(c) = 6,000
                                           f(d) = 3,000
                                           f(e) = 4,000

                                          a = 10
                 B(H)= c  C f(c)d H (c)=b = 010
                                          c = 11
        5,0002+2,0003+6,0002+3,0003+4,0002=
                                          d = 011
        (5,000+6,000+4,000)2+(2,000+3,000 )3=
                                          e = 00
                  30,000+15,000=45,000



Huffman codes        Algorithms and Data Structures I     174
 Optimality of the Huffman codes

 Assertion 1. There exists an optimal solution
    where the two rarest characters are deepest
    twins in the tree of the coding

 Assertion 2. Merging two (twin) characters leads
    to a problem similar to the original one

 Corollary. The Huffman codes provide an
   optimal character coding

Huffman codes     Algorithms and Data Structures I   175
 Proof of Assertion 1 (There exists an optimal solution where the
   two rarest characters are deepest twins in the tree of the coding).


Two
rarest                                                      Changing
characters                                                  nodes this
                                                            way the total
                                                            lenght does
                                                            not increase




Huffman codes            Algorithms and Data Structures I                176
 Proof of Assertion 2 (Merging two (twin) characters leads to a
   problem similar to the original one).


Twin
characters                                                The new
                                                          problem is
                                                          smaller than
                                                          the original
                                                          one but
                                                          similar to it




Huffman codes          Algorithms and Data Structures I               177
         Graph representations
  Graphs can represent different structures,
   connections and relations
                                 1
                                                         7
  Weighted graphs can
   represent capacities or
                                                2            4
                                                     4
   actual flow rates
                                                                 5
                                               2
                                                                 3


Graphs            Algorithms and Data Structures I                   178
                                  Adjacency-matrix
             1
                     7
                                                           1        2   3   4
         2               4
                 4                             1          0         2
                                                                    1   0   7
                                                                            1
                             5
         2                                     2          1
                                                          2         0   0   4
                                                                            1
                             3                 3          0         0   0   5
                                                                            1
                                               4          1
                                                          7         4
                                                                    1   5
                                                                        1   0

 1: there is an redundant elements
 Drawback 1: edge leading from ‘row’ to ‘column’
 Drawback 2: such edge
 0: there is no superfluous elements

Graphs                           Algorithms and Data Structures I               179
                          Adjacency-list
             1

                 4                             1            2   4
                                               2            4   1
         2
                                               3            4
                     3
                                               4            1   3   2


  Optimal storage usage
  Drawback: slow search operations

Graphs                   Algorithms and Data Structures I               180
         Single-source shortest path
                  methods
  Problem: find the shortest path between two
   vertices in a graph

  Source: the starting point (vertex)

  Single-source shortest path method: algorithm
   to find the shortest path to all vertices in a
   graph running out


Graphs            Algorithms and Data Structures I   181
 Walk a graph:

  choose an initial vertex as the source

  visit all vertices starting from the source



 Graph walk methods:

  depth-first search

  breadth-first search


Graph walk         Algorithms and Data Structures I   182
 Depth-first search

  Backtrack algorithm
  It goes as far as it can without revisiting any
   vertex, then backtracks




                   source


Graph walk         Algorithms and Data Structures I   183
 Breadth-first search

  Like an explosion in a mine
  The shockwave reaches the adjacent vertices
   first, and starts over from them




Graph walk        Algorithms and Data Structures I   184
 The breadth-first search is not only simpler to
 implement but it is also the basis for several
 important graph algorithms (e.g. Dijkstra)

 Notation in the following pseudocode:
  A is the adjacency-matrix of the graph
  s is the source
  D is an array containing the distances from the
   source
  P is an array containing the predecessor along
   a path
  Q is the queue containing the unprocessed
   vertices already reached

Graph walk        Algorithms and Data Structures I   185
 BreadthFirstSearch(A,s,D,P)
   1 for i  1 to A.CountRows
   2      do P[i]  0
   3          D[i]  ∞
   4 D[s]  0
   5 Q.Enqueue(s)
   6 repeat
   7     v  Q.Dequeue
   8     for j  1 to A.CountColumns
   9         do if A[v,j] > 0  and  D[j] = ∞
   10               then D[j]  D[v] + 1
   11                    P[j]  v
   12                    Q.Enqueue(j)
   13 until Q.IsEmpty

Graph walk          Algorithms and Data Structures I   186
 The D,P pairs are displayed in the figure.
                                                     1,4
             1,4                                                              2,6
                                                       6
             1                                                                 9
                             0,0

                             4
                                                                    3,9
 1,4                                    2,6                               8
       2                                                                            3,9
                   1,4                                                                    10
                                              5
                         3
                                                           2,6      7

  D is the shortest distance from the source
  The shortest paths can be reconstructed using P

Graph walk                       Algorithms and Data Structures I                              187
                       Dijkstra’s algorithm
  Problem: find the shortest path between two
    vertices in a weighted graph

  Idea: extend the breadth-first search for graphs
    having integer weights:
                        unweighted edges (total weight = 3∙1 = 3)

                                               3



                                     virtual vertices


Dijkstra’s algorithm               Algorithms and Data Structures I   188
  Dijkstra(A,s,D,P)
     1 for i  1 to A.CountRows
     2      do P[i]  0
     3          D[i]  ∞               minimum priority queue
     4 D[s]  0
     5 for i  1 to A.CountRows
     6      do M.Enqueue(i)
     7 repeat
     8      v  M.ExtractMinimum
     9      for j  1 to A.CountColumns
     10         do if A[v,j] > 0
     11               then if D[j] > D[v] + A[v,j]
     12                        then D[j]  D[v] + A[v,j]
     13                             P[j]  v
     14 until M.IsEmpty

Dijkstra’s algorithm    Algorithms and Data Structures I        189
  Time complexity of Dikstra’s algorithm

   Initialization of D and P: O(n)
   Building a heap for the priority queue: O(n)
   Search: n∙O(logn + n) = O(n(logn + n)) = O(n2)

                       Grand total: T(n) = O(n2)
        extracting the minimum        checking all neighbors


  number of loop executions




Dijkstra’s algorithm          Algorithms and Data Structures I   190

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:4/9/2014
language:Unknown
pages:190