What is an Algorithm

Document Sample
What is an Algorithm Powered By Docstoc
					           The Towers of Hanoi
             1            2            3

      The Initial Position in the Tower of Hanoi.
• Rules:
  -- Move all the disks from the first needle to the
     third, subject to the conditions that only one disk
     can be moved at a time, and that no disk is ever
     allowed to be placed on top of a smaller disk.
                       The Idea
•   Begin with n disks on needle 1. We can transfer the top
    n - 1 disks, following the rules of the puzzle, to needle 3
    using Hn-1 moves. We keep the largest disk fixed during
    these moves. Then, we use one move to transfer the
    largest disk to the second needle. We can transfer the
    n -1 disks on needle 3 to needle 2, using Hn-1 additional
    moves, placing them on top of the largest disk, which
    always stays fixed on the bottom of needle 2. Moreover,
    it is easy to see that the puzzle cannot be solved using
    fewer steps. This shows that Hn = 2Hn-1 + 1
•   The initial condition is H1 = 1, since one disk can be
    transferred from needle 1 to needle 2.
                The Algorithm

   move(63, 1, 2, 3);
   printf("Move a disk from needle 1 to needle 3.\n");
   Move (63, 2, 3, 1);

where Move(n, a, b, c); means move n desks from needle a
  to needle b using needle c as temporary storage.
The Implementation
  Can’t Finish the Assigned Task

“I can’t find an efficient algorithm, I guess I’m just
too dumb.”
           Mission Impossible

“I can’t find an efficient algorithm, because no such
algorithm is possible.”


        “I can’t find an efficient algorithm, but
        neither can all these famous people.”
          Easy and Hard Problems
• We argue that the class of problems that can be solved in
  polynomial time (denoted by P) corresponds well with what we
  can feasibly compute. But sometimes it is difficult to tell when a
  particular problem is in P or not.
• Theoreticians spend a good deal of time trying to determine
  whether particular problems are in P. To demonstrate how difficult
  it can be.
• To make this determination, we will survey a number of problems,
  some of which are known to be in P, and some of which we think
  are (probably) not in P. The difference between the two types of
  problem can be surprisingly small. Throughout the following, an
  ''easy'' problem is one that is solvable in polynomial time, while a
  ''hard'' problem is one that we think cannot be solved in polynomial
  Eulerian Tour vs. Hamiltonian Tour
• Eulerian Tours -- Easy
  – INPUT: A graph G = (V, E).
  – DECIDE: Is there a path that crosses every edge
    exactly once and returns to its starting point?

• Hamiltonian Tours -- Hard
  – INPUT: A graph G = (V, E).
  – DECIDE: Is there a path that visits every vertex
    exactly once and returns to its starting point?
                     Some Facts
• Eulerian Tours
  – A famous mathematical theorem comes to our rescue.
    If the graph is connected and every vertex has even
    degree, then the graph is guaranteed to have such a
    tour. The algorithm to find the tour is a little trickier,
    but still doable in polynomial time.
• Hamiltonian Tours
  – No one knows how to solve this problem in
    polynomial time. The subtle distinction between
    visiting edges and visiting vertices changes an easy
    problem into a hard one.
              Map Colorability
• Map 2-colorability -- Easy
  – INPUT: A graph G=(V, E).
  – DECIDE: Can this map be
    colored with 2 colors so that no
    two adjacent countries have the
    same color?
• Map 3-colorability -- Hard
  – INPUT: A graph G=(V, E).
  – DECIDE: Can this map be colored with 3 colors so
    that no two adjacent countries have the same color?
• Map 4-colorability -- Easy
                    Some Facts
• Map 2-colorability
  – To solve this problem, we simply color the first
    country arbitrarily. This forces the colors of
    neighboring countries to be the other color, which in
    turn forces the color of the countries neighboring those
    countries, and so on. If we reach a country which
    borders two countries of different color, we will know
    that the map cannot be two-colored; otherwise, we will
    produce a two coloring. So this problem is easily
    solvable in polynomial time.
                     Some Facts
• Map 3-colorability
  – This problem seems very similar to the problem above,
    however, it turns out to be much harder. No one
    knows how this problem can be solved in polynomial
    time. (In fact this problem is NP-complete.)
• Map 4-colorability.
  – Here we have an easy problem again. By a famous
    theorem, any map can be four-colored. It turns out
    that finding such a coloring is not that difficult either.
    Problem vs. Problem Instance
• When we say that a problem is hard, it means that some
  instances of the problem are hard. It does not mean that
  all problem instances are hard.

• For example, the following problem instance is trivially
    Longest Path vs. Shortest Path
• Longest Path -- Hard
   – INPUT: A graph G = (V, E), two vertices u, v of V,
     and a weighting function on E.
   – OUTPUT: The longest path between u and v.
      No one is able to come up with a polynomial time
      algorithm yet.
• Shortest Path -- Easy
   – INPUT: A graph G = (V, E), two vertices u, v of V,
     and a weighting function on E.
   – OUTPUT: The shortest path between u and v.
      A greedy method will solve this problem easily.
      Multiplication vs. Factoring
• Multiplication -- Easy
   – INPUT: Integers x,y.
   – OUTPUT: The product xy.
• Factoring (Un-multiplying) -- Hard
   – INPUT: An integer n.
   – OUTPUT: If n is not prime, output two integers x, y
     such that 1 < x, y < n and x  y = n.
 Again, the problem of factoring is not known to be in P.
 In this case, the hardness of a problem turns out to be
 useful. Some cryptographic algorithms depend on the
 assumption that factoring is hard to ensure that a code
 cannot be broken by a computer.
                Boolean Formulas
• Formula evaluation -- Easy
   – INPUT: A boolean formula (e.g. (x y)  (z x)) and a value
     for all variables in the formula (e.g. x = 0, y = 1, z = 0).
   – DECIDE: The value of the formula. (e.g., 1, or "true'' in this
• Satisfiability of boolean formula -- Hard
   – INPUT: A boolean formula.
   – DECIDE: Do there exist values for all variables that would
     make the formula true?
• Tautology -- Harder
   – INPUT: A boolean formula.
   – DECIDE: Do all possible assignments of values to variables
     make the formula true?
• Formula evaluation
   – It's not too hard to think of what the algorithm would be in this
     case. All we would have to do is to substitute the values in for
     the various variables, then simplify the formula to a single value
     in multiple passes (e.g. in a pass simplify 1  0 to 1). .
• Satisfiability of boolean formula
   – Given that there are n different variables in the formula, there
     are 2n possible assignments of 0/1 to the variables. This gives
     us an easy exponential time algorithm: simply try all possible
     assignments. No one knows if there is a way to be cleverer, and
     cut the running time down to polynomial
• Tautology
   – It turns out that this problem seems to be even harder than the
     Satisfiability problem.
           Generalized Geography
• Recall the game of Geography: a player starts with the name of
  some country (''Algeria''). The 2nd player has to come up with a
  different name that starts with the letter that ends the first name (''
  Afghanistan''). The 1st player now has to come up with a country
  that starts with the letter that ended the 2nd player's country (''
  Nepal''). And so on. The game ends when some player can't think
  of a country that would obey the rule. The game can be generalized
  to a directed graph in which the vertices stand for countries, and an
  edge is directed from vertex i to vertex j, if saying country j after
  country i is a legal move. So the objective is to force the other
  player onto a vertex that has no edges coming out of it.


•   The rules of NIM are as follows:

    1. On a player's move, the player can take any number
       of stones from some (only one) pile of rocks;

    2. The player who takes the last stone wins.
            Some Typical Games
• Generalized Geography -- Hard
  – INPUT: A directed graph, and a start vertex.
  – DECIDE:Can the first player force a win; e.g. can the
    first player win no matter what the second player does

• NIM -- Easy
  – INPUT: Some piles of rocks .
  – DECIDE: Is this a winning position for the first player
    in a game of NIM?
• Generalized Geography
  – This problem seems nastier than all the other problems
    that we have considered so far. The alternation of the
    players' moves makes it very difficult to solve

  – It turns out that in some configurations, the first player
    (if the player knows what he or she is doing) can
    always win, no matter what the second player does.
  – There is a nice algorithm that solves this problem in
    polynomial time.
• 2-dimensional Matching -- Easy
   – INPUT: A graph G=(V, E).
   – DECIDE: Does there exist a perfect pairing matching
     of vertices: a set of |V|/2 edges that touch each vertex
     exactly once?

• 3-dimensional Matching -- Hard
   – INPUT: A graph G=(V, E).
   – DECIDE: Does There exists a set of disjoint triples
     covering all vertices?
                    Some Facts
• We can think of the matching problem as follows: each
  vertex represents a person at a fair. There's an edge
  between two people if they are willing to sit next to each
  other on the Ferris wheel. Is there a way to pair up
  people so that everyone rides on the Ferris wheel exactly
  once, and every seat on the Ferris wheel contains two
  people? It turns out that this problem is solvable in
  polynomial time. We won't go into the algorithm here;
  the high level idea is that we start from some matching in
  which not every vertex is matched, and we successively
  improve it by matching two more vertices until we can
  improve it no longer. However, when the Ferris wheel
  seats allow three persons instead of two, the problem
  becomes hard.
How Do You Judge an Algorithm?
• Issues Related to the analysis of Algorithms:

   – How to measure the goodness of an algorithm?

   – How to measure the difficulty of a problem?

   – How do we know that an algorithm is optimal?
  The Complexity of an Algorithm
• The space complexity of a program is the amount of
  memory that it needs to run to completion.
   – Fixed space requirements: does not depend on the
     programs inputs and outputs -- usually ignored.
   – Variable space requirement: size depends on execution
     of program (recursion, dynamic allocated variables,

• The time complexity of a program is the amount of
  computer time that it needs to run a computation.
             Input (Problem) Size
• Input (problem) size and costs of operations: The size of
  an instance corresponds formally to the number of bits
  needed to represent the instance on a computer, using
  some precisely defined and reasonably compact coding
   – uniform cost function
   – logarithmic cost function T (n )  log n i Q(n 2  log n )

• Example: Compute x = n    n            i 1

  x :=1;                   uniform            logarithmic
  for i := 1 to n do     T(n) = Q(n) T(n) = Q(n2log n)
     x := x * n;         S(n) = Q(1) S(n) = Q(n  log n)
                                     S ( n )  log n n  Q(n  log n )
      Complexity of an Algorithm
• Best case analysis: too optimistic, not really useful.
• Worst case analysis: usually only yield a rough upper bound.
• Average case analysis: a probability distribution of input is
  assumed, and the average of the cost of all possible input
  patterns are calculated. However, it is usually difficult than
  worst case analysis and does not reflect the behavior of some
  specific data patterns.
• Amortized analysis: this is similar to average case analysis
  except that no probability distribution is assumed and it is
  applicable to any input pattern (worst case result).
• Competitive analysis: Used to measure the performance of
  an on-line algorithm w.r.t. an adversary or an optimal off-line
         Example: Binary Search
• Given a sorted array A[1..n] and an item x in A. What is
  the index of x in A?

• Usually, the best case analysis is the easiest, the worst
  case the second easiest, and the average analysis the
               Another Example
• Given a stack S with 2 operations: push(S, x), and
  multipop(S, k), the cost of the two operations are 1 and
  min(k, |S|) respectively. What is the cost of a sequence of
  n operations on an initially empty stack S?
   – Best case: n, 1 for each operation.
   – Worst case: O(n2), O(n) for each operation.
   – Average case: complicate and difficult to analyze.
   – Amortized analysis: 2n, 2 for each operation. (There
     are at most n push operations and hence at most n
     items popped out of the stack.)
       The Difficulty of a Problem
• Upper bound O(f(n)) means that for sufficiently large
  inputs, running time T(n) is bounded by a multiple of f(n)
• Existing algorithms (upper bounds).
• Lower bound W(f(n)) means that for sufficiently large n,
  there is at least one input of size n such that running time
  is at least a fraction of f(n) for any algorithm that solves
  the problem.
• The inherent difficulty  lower bound of algorithms
• The lower bound of a method to solve a problem is
  not necessary the lower bound of the problem.
• Sorting n elements into ascending order.
   – O(n2), O(nlog n), etc. -- Upper bounds.
   – O(n), O(nlog n), etc. -- Lower bounds.
   – Lower bound matches upper bound.

• Multiplication of 2 matrices of size n by n.
  – Straightforward algorithm: O(n3).
  – Strassen's algorithm: O(n2.81).
  – Best known sequential algorithm: O(n2.376) ?
  – Best known lower bound: W(n2)
  – The best algorithm for this problem is still open.
            Complexity Classes
• DSPACE(S(n)) [NSPACE(S(n))]: The classes of
  problems that can be solved by deterministic
  [nondeterministic] Turing machines using ≤ S(n) space.
• DTIME(T(n)) [NTIME(T(n))]: The classes of problems
  that can be solved by deterministic [nondeterministic]
  Turing machines using ≤ T(n) time.
   P  i0 DTIME(n ), NP  i0 NTIME(n )
                         i                         i

• Tractable problems: Problems in P.
• Intractable problems: Problem not known to be in P.
• Efficient algorithms: Algorithms in P.
          Complexity Classes
Assume P  NP
        co-NEXP                       NEXP

                  co-NP    P     NP
           NP-Complete Problems
• M. R. Garey, and D. S.

• Computers and Intractability:
  A Guide to the Theory of NP-

• W. H. Freeman and Company,
 Complexity of Algorithms and Problems
• Notations
 Symbol       Meaning
  P           a problem
  I           a problem instance
  In          the set of all problem instances of size
  A           an algorithm for P
  AP          the set of algorithms for problem P
  Pr(I)       probability of instance I
  CA(I)       cost of A with input I
  RA          the set of all possible versions of a randomized
              algorithm A
Formal Definitions
  Example: Complexity of the Sorting Problem
• Assume “comparison” is used to determine the order of keys.
  Comparison Tree Model
5, 12, 8          8, 5, 12
                 Average Case
• Can we do better for the average case? -- NO!
External Path Length of a Binary Tree

Shared By: