Discrete Mathematics by T3MNMo

VIEWS: 100 PAGES: 217

									                     Discrete Mathematics

               University of Kentucky CS 275
                        Spring, 2007

                    Professor Craig C. Douglas

           Material Covered (Spring 2007)
Tuesday            Pages          Thursday          Pages
                                    1/11              1-9
 1/16               9-24            1/18            24-33
 1/23              34-45            1/25            46-52
 1/30              53-65             2/1           Exam 1
  2/6              66-73             2/8            74-83
 2/13              84-92            2/15            92-94
 2/20             95-106            2/22          106-115
 2/27             116-124            3/1           Exam 2
  3/6             125-132            3/8          No class
 3/13              Spring           3/15            Break
 3/20             132-142           3/22          No class
 3/26             142-156           3/28           Exam 3
  4/3             157-169            4/5          170-177
 4/10             178-185           4/12          186-197
 4/17             198-210           4/19           Exam 4
 4/24             211-217           4/26        Rama: review
  5/1             No class           5/3       Final: 8-10 AM
          The final exam will cover Chapters 1-10.

                              Course Outline
 1.   Logic Principles
 2.   Sets, Functions, Sequences, and Sums
 3.   Algorithms, Integers, and Matrices
 4.   Induction and Recursion
 5.   Simple Counting Principles
 6.   Discrete Probability
 7.   Advanced Counting Principles
 8.   Relations
 9.   Graphs
10.   Trees
11.   Boolean Algebra
12.   Modeling Computation

                              Logic Principles
Basic values: T or F representing true or false, respectively. In a computer T an
F may be represented by 1 or 0 bits.

Basic items:

   Propositions
      o Logic and Equivalences
   Truth tables
   Predicates
   Quantifiers
   Rules of Inference
   Proofs
      o Concrete, outlines, hand waving, and false

Definition: A proposition is a statement of a true or false fact (but not both).


   2+2 = 4 is a proposition because this is a fact.
   x+1 = 2 is not a proposition unless a specific value of x is stated.

Definition: The negation of a proposition p, denoted by ¬p and pronounced not
p, means that, “it is not the case that p.” The truth values for ¬p are the opposite
for p.


   p: Today is Thursay, ¬p: Today is not Thursday.
   p: At least a foot of snow falls in Boulder on Fridays. ¬p: Less than a foot
    of snow falls in Boulder on Fridays.

Definition: The conjunction of propositions p and q, denoted pq, is true if
both p and q are true, otherwise false.

Definition: The disjunction of propositions p and q, denoted pq, is true if
either p or q is true, otherwise false.

Definition: The exclusive or of propositions p and q, denoted pq, is true if
only one of p and q is true, otherwise false.

Truth tables:

        p           ¬p           q           pq        pq    pq
        T            F          T             T          T      F
       T*           F*          F             F          T      T
       F*           T*          T             F          T      T
        F            T          F             F          F      F
    The truth table for p and ¬p is really a 22 table.

Concepts so far can be extended to Boolean variables and Bit strings.

Definition: A bit is a binary digit. Hence, it has two possible values: 0 and 1.

Definition: A bit string is a sequence of zero or more bits. The length of a bit
string is the number of bits.

Definition: The bitwise operators OR, AND, and XOR are defined based on
, , and , bit by bit in a bit string.


     010111 is a bit string of length 6
     010111 OR 110000 = 110111
     010111 AND 110000 = 010000
     010111 XOR 110000 = 100111

Definition: The conditional statement is an implication, denoted pq, and is
false when p is true and q is false, otherwise it is true. In this case p is known
as a hypothesis (or antecedent or premise) and q is known as the conclusion
(or consequence).

Definition: The biconditional statement is a bi-implication, denoted pq, and
is true if and only if p and q have the same truth table values.

Truth tables:

                    p           q          pq         pq
                    T           T           T           T
                    T           F           F           F
                    F           T           T           F
                    F           F           T           T

 We can compound logical operators to make complicated propositions. In
 general, using parentheses makes the expressions clearer, even though more
 symbols are used. However, there is a well defined operator precedence
 accepted in the field. Lower numbered operators take precedence over higher
 numbered operators.

                           Operator Precedence
                              ¬          1
                                        2
                                        3
                                        4
                                        5


    ¬pq = (¬p) q
    pqr = (pq) r

Definition: A compound proposition that is always true is a tautology. One that
is always false is a contradiction. One that is neither is a contingency.


                    p         ¬p          p¬p        p¬p
                    T          F            F           T
                    F          T            F           T
                    contigencies      contradiction tautology

Definition: Compound propositions p and q are logically equivalent if pq is a
tautology and is denoted pq (sometimes written as pq instead).

Theorem: ¬(pq)  ¬p  ¬q.
Proof: Construct a truth table.

         p           q        ¬(pq)          ¬p           ¬q     ¬p¬q
         T           T          F             F            F        F
         T           F          F             F            T        F
         F           T          F             T            F        F
         F           F          T             T            T        T

Theorem: ¬(pq)  ¬p  ¬q.
Proof: Construct a truth table similar to the previous theorem.

These two theorems are known as DeMorgan’s laws and can be extended to any
number of propositions:
                   ¬(p1p2…pk)  ¬ p1  ¬ p2  …  ¬ pk
                   ¬(p1p2…pk)  ¬ p1  ¬ p2  …  ¬ pk
Theorem: pq  ¬pq.

Proof: Construct a truth table.

               p           q       pq           ¬p          ¬pq
               T           T        T            F            T
               T           F        F            F            F
               F           T        T            T            T
               F           F        T            T            T

These proofs are examples are concrete ones that are proven using an exhaustive
search of all possibilities. As the number of propositions grows, the number of
possibilities grows like 2k for k propositions.

The distributive laws are an example when k=3.

Theorem: p (qr)  (pq)(pr).
Proof: Construct a truth table.

    p          q           r       p (qr)      pq         pr   (pq)(pr)
    T          T           T          T           T           T         T
    T          T           F          T           T           T         T
    T          F           T          T           T           T         T
    T          F           F          T           T           T         T
    F          T           T          T           T           T         T
    F          T           F          F           T           F         F
    F          F           T          F           F           T         F
    F          F           F          F           F           F         F

Theorem: p (qr)  (pq)  (pr).
Proof: Construct a truth table similar to the previous theorem.

Some well known logical equivalences includes the following laws:

                                                   Law
                         pTp                    Identity
                         pTT                  Domination
                         ppp                  Idempotent
                        ¬(¬p) p             Double negation
                        p¬p T                 Negation
                        p¬p F
                        pqqp                Commutative
                    (pq)r p(qr)            Associative
                   (pq) r p(qr)
                  p(qr) (pq)(qr)          Distributive

                                                 Law
                 p(qr) (pq)(qr)
                   ¬(pq)  ¬p¬q              DeMorgan
                   ¬(pq)  ¬p¬q
                      p(pq)p               Absorption

All of these laws can be proven concretely using truth tables. It is a good
exercise to see if you can prove some.

Well known logical equivalences involving conditional statements:
                                pq  ¬pq
                               pq  ¬q¬p
                                pq  ¬pq
                              pq  ¬(p¬q)
                              ¬(pq)  p¬q
                          (pq)(pr)  p(qr)
                          (pr)(qr)  (pq)r
                          (pq)(pr)  p(qr)
                          (pr)(qr)  (pq)r

Well known logical equivalences involving biconditional statements:
                           pq  (pq)(qp)
                               pq  ¬p¬q
                          pq  (pq)  (¬p¬q)
                              ¬(pq)  p¬q

Propositional logic is pretty limited. Almost anything you really are interested in
requires a more sophisticated form of logic: predicate logic with quantifiers (or
predicate calculus).

Definition: P(x) is a propositional function when a specific value x is substituted
for the expression in P(x) gives us a proposition. The part of the expression
referring to x is known as the predicate.


   P(x): x > 24. P(2) = F, P(102) = T.
   P(x): x = y + 1. P(x) = T for one value only (y is an unbounded variable).
   P(x,y): x = y + 1. P(2,1) = T, P(102,-14) = F.

Definition: A statement of the form P(x1,x2,…,xn) is the value of the
propositional function P at the n-tuple (x1,x2,…,xn). P is also known as a n-place
(or n-ary) predicate.

Definition: The universal quantification of P(x) is the statement P(x) is true for
all values of x in some domain, denoted by x P(x).

Definition: The existential quantification of P(x) is the statement P(x) is true for
at least one value of x in some domain, denoted by x P(x).

Definition: The uniqueness quantification of P(x) is the statement P(x) is true for
exactly one value of x in some domain, denoted by !x P(x).

There is an infinite number of quantifiers that can be constructed, but the three
above are among the most important and common.

Examples: Assume x belongs to the real numbers.

   x<0 (x2 > 0). The negative real numbers form the domain.
   !x (x1223 = 0).

 and  have higher precedence than the logical operators.

Example: x P(x)Q(x) means (x P(x))Q(x).

Definition: When a variable is used in a quantification, it is said to be bound.
Otherwise the variable is free.

Example: x (x = y + 1).

Definition: Statements involving predicates and quantifiers are logically
equivalent if and only if they have the same truth value independent of which
predicates are substituted and in which domains are used. Notation: S  T.

DeMorgan’s Laws for Negation:

   ¬x P(x)  x ¬P(x).
   ¬x P(x)  x ¬P(x).
Nested quantifiers just means that more than one is in a statement. The order of
quantifiers is important.

Examples: Assume x and y belong to the real numbers.
   xy (x + y = 0).
   xy (x < 0)  (y > 0)  xy < 0.

Quantification of two variables:

Statement        When True?                     When False?
xy P(x,y)      For all x and y, P(x,y)=T.     There is a pair of x and y such that
xy P(x,y)      For all x there is a y such There is an x such that for all y,
                 that P(x,y)=T                  P(x,y)=F.
xy P(x,y)      There is an x such that for For all x there is a y such that
                 all y, P(x,y)=T.               P(x,y)=F.
xy P(x,y)      There is a pair x and y For all x and y, P(x,y)=F.
                 such that P(x,y)=T.
Rules of Inference are used instead of truth tables in many instances. For n
variables, there are 2n rows in a truth table, which gets out of hand quickly.

Definition: A propositional logic argument is a sequence of propositions. The
last proposition is the conclusion. The earlier ones are the premises. An
argument is valid if the truth of the premises implies the truth of the conclusion.

Definition: A propositional logic argument form is a sequence of compound
propositions involving propositional variables. An argument form is valid if no
matter what particular propositions are substituted for the proposition variables
in its premises, the conclusion remains true if the premises are all true.

Translation: An argument form with premises p1, p2, …, pn and conclusion q is
valid when (p1p2…pn)  q is a tautology.

There are eight basic rules of inference.

Rule                        Tautology              Name
p                           [p( pq)]  q         Modus ponens
¬q                          [¬q(pq)]  ¬p        Modus tollens
pq                         [(pq)(qr)]  (pr) Hypothetical syllogism
 pr
pq                         [(pq)¬p]  q         Disjunctive syllogism
p                           p  (pq)              Addition

Rule   Tautology               Name
pq    (pq)  p               Simplification
p      [(p)(q)]  (pq)       Conjunction
pq    [(pq)(¬pr)] (qr)   Resolution

Rules of Inference for Quantified Statements:

Rule of Inference                                      Name
x P(x)                                                Universal instantiation
P(c) for an arbitrary c                                Universal generalization
x P(x)
x (P(x)  Q(x))                                       Universal modus ponens
P(a), where a is a particular element in the domain
x (P(x)  Q(x))                                       Universal modus tollens
¬Q(a), where a is a particular element in the domain
x P(x)                                                Existential instantiation
P(c) for some c
P(c) for some c                                        Existential generalization
x P(x)

                   Sets, Functions, Sequences, and Sums
Definition: A set is a collection of unordered elements.


     Z = {…, -3, -2, -1, 0, 1, 2, 3, …}
     N = {1, 2, 3, …} and  = N0 = {0, 1, 2, 3, …} (Slightly different than text)
     Q = {p/q | p,qZ, q0}
     R = {reals}

Definition: The cardinality of a set S is denoted |S|. If |S| = n, where nZ, then
the set S is a finite set. Otherwise it is an infinite set (|S| = ).

Example: The cardinality of of Z, N, N0, Q, and R is infinite.

Definition: If |S| = |N|, then S is a countable set. Otherwise it is an uncountable


   Q is countable.
   R is uncountable.

Definition: Two sets S and T are equal, denoted S = T, if and only if x(xS 


   Let S = {0, 1, 2} and T = {2, 0, 1}. Then S = T. Order does not count.
   Let S = {0, 1, 2} and T = {0, 1, 3}. Then S  T. Only the elements count.

Definition: The empty set is denoted by . Note that S(S).

Definition: A set S is a subset of a set T if xS(xT) and is denoted ST. S is
a proper subset of T if ST, but ST and is denoted ST.

Example: S = {1, 0} and T = {0, 1, 2}. Then ST.

Theorem: S(SS).
Proof: By definition, xS(xS).

Definition: The Power Set of a set S, denoted P(S), is the set of all possible
subsets of S.

Theorem: If |S| = n, then |P(S)| = 2n.

Example: S = {0, 1}. Then P(S) = {, {0}, {1}, {0,1}}

Definition: The Cartesian product of n sets Ai is defined by ordered elements
from the Ai and is denoted A1A2…An = {(a1,a2,…an) | aiAi}.

Example: Let S = {0, 1} and T = {a, b}. Then ST = {(0,a), (0,b), (1,a), (1,b)}.

Definition: The union of n sets Ai is defined by
                     U   i=1
                               Ai = A1A2…An = {x | i xAi}.

Definition: The intersection of n sets Ai is defined by
                     I   i 1
                                Ai = A1A2…An = {x | i xAi

Definition: n sets Ai are disjoint if A1A2…An = .

Definition: The complement of set S with respect to T, denoted TS, is defined
by TS = {xT | xS}. TS is also called the difference of S and T.

Definitions: The universal set is denoted U. The universal complement of S is
S = US.


  Let S = {1, 0} and T = {0, 1, 2}. Then
     o ST.
     o ST = S.
     o ST = T.
     o TS = {2}.
     o Let U = N0. S = {2, 3, …}
  Let S = {0, 1} and T = {2, 3}. Then
     o ST.
     o ST = .
     o ST = {0, 1, 2, 3}.
     o TS = {2, 3}.
     o Let U=R. Then S is the set of all reals except the integers 0 and 1, i.e.,
        S = {xR | x0  x1}.

The textbook has a large number of set identities in a table.

Identity                                                        Law(s)
A = A, AU = A                                                Identity
AU = U, A =                                                 Domination
AA = A, AA = A                                                Idempotent
A=A                                                             Complementation
AB = BA, AB = BA                                            Commutative
A(BC) = (AB)C, A (BC) = (AB) C                          Associative
A (BC) = (AB)  (AC)                                        Distributive
A(BC) = (AB)  (AC)
AB = AB, AB = AB                                            DeMorgan
A (AB) = A, A (AB) = A                                      Absorption
AA = U, AA =                                                 Complement

Many of these are simple to prove from very basic laws.

Definition: A function f:AB maps a set A to a set B, denoted f(a) = b for aA
and bB, where the mapping (or transformation) is unique.

Definition: If f:AB, then

   If bB aA (f(a) = b), then f is a surjective function or onto.
   If A=B and f(a) = f(b) implies a = b, then f is one-to-one (1-1) or injective.
   A function f is a bijection or a one-to-one correspondence if it is 1-1 and

Definition: Let f:AB. A is the domain of f. The minimal set B such that
f:AB is onto is the image of f.

Definitions: Some compound functions include
     
    i fi (a)   i=1 fi (a) . We can substitute + if we expand the summation.
        n          n

    f 
        n           n
        i=1 i
                        f (a) . We can substitute * if we expand the product.
                    i=1 i

Definition: The composition of n functions fi: AiAi+1 is defined by
                     (f1f2…fn)(a) = f1(f2(…(fn(a)…)),
where aA1.

Definition: If f: AB, then the inverse of f, denoted f-1: BA exists if and only
if bB aA (f(a) = b  f-1(b) = a).

   Let A = [0,1]  R, B = [0,2]  R.
      o f(a) = a2 and g(a) = a+1. Then f+g: AB and f*g: AB.
      o f(a) = 2*a and g(a) = a-1. Then neither f+g: AB nor f*g: AB.
   Let B = A = [0,1]  R.
      o f(a) = a2 and g(a) = 1-a. Then f+g: AA and f*g: AA. Both
        compound functions are bijections.
      o f(a) = a3 and g(a) = a1/3. Then gf(a): AA is a bijection.
   Let A = [-1, 1] and B=[0, 1]. Then
      o f(a) = a3 and g(a) = {x>0 | x= a1/3}. Then gf(a): AB is onto.

Definition: The graph of a function f is {(a,f(a)) | aA}.

Example: A = {0, 1, 2, 3, 4, 5} and f(a) = a2. Then

           (a) graph(f,A)                (b) an approximation to graph(f,[0,5])

Definitions: The floor and ceiling functions are defined by

             x = largest integer smaller or equal to x.
             x = smallest integer larger or equal to x.


   2.99 = 2, 2.99 = 3
   -2.99 = -3, -2.99 = -2

Definition: A sequence is a function from either N or a subset of N to a set A
whose elements ai are the terms of the sequence.

Definitions: A geometric progression is a sequence of the form {ari, i=0, 1, …}.
An arithmetic progression is a sequence of the form {a+id, i=0, 1,…}.

Translation: f(a,r,i) = ari and f(a,d,i) = a + id are the corresponding functions.

There are a number of interesting summations that have closed form solutions.

Theorem: If a,rR, then
                                         (n+1)a,       if r=1,
                            i=0
                                 ar i =  ar n+1 -a
                                                   , otherwise.
                                         r-1
Proof: If r = 1, then we are left summing a n+1 times. Hence, the r = 1 case is
trivial. Suppose r  1. Let S = i0 ari. Then

    rS = rn ari                      Substitution S formula.
            n+1 i
           i=1
                ar                    Simplifying.

                ari  +
             i=0   
                                      Removing n+1 term and adding 0 term.
          S+(arn+1-a)                 Substituting S for formula

Solve for S in rS = S+(arn+1-a) to get the desired formula. qed
Some other common summations with closed form solutions are

                     Sum                     Closed Form Solution
                          n                  n(n+1)
                         i=1
                          n 2                n(n+1)(2n+1)
                         i=1
                          n 3
                            i               n2 (n+1)2
                          • i
                         i=0
                              x , |x|<1      (1x)-1
                         i=1
                              ixi-1, |x|<1   (1x)-2

Proving some of these requires knowledge about limits. There are close ties to
integral and differential calculus, which is no surprise since integration is
summation taken to a limit.

Example: lim i•xi = 0 when |x|<1. Using the Theorem on the previous page, we
                         • i
get the result for      i=0
                             x,   |x|<1.

Definition: Let f and g be functions from either Z or R to R. Then f(x) is O(g(x))
if there are constants C and k such that |f(x)|C|g(x)| whenever x>k.

Pronunciation: f(x) is Big Oh of g(x).


   f(x) = x2+2x is O(xn)
      o When 0x1, x2x, so 0  x2+2x  x+2x  3x
      o When x1, xx2, so 0  x2+2x  x2+2x2 = 3x2
   In general, f(x) = i=0 aixi with an0 is O(xn) when x1.
   n! is O(nn) when n1.
      o n! = 12…n  nn…n = nn.
   log(n!) is O(nlogn) when n1.
      o log(n!)  log(nn) = nlog(n)
   log(n) is O(n) when n1.

Theorem: If fi(x) is O(gi(x)), for 1in, then
                             f (x)
                           i=1 i
                                      is O(max{|g1(x)|, |g2(x)|, …, |gn(x)|}).

Proof: Let g(x) = max{|g1(x)|, |g2(x)|, …, |gn(x)|} and Ci the constants associated
with O(gi(x)). Then
        n                  n                     n                          n
          f (x)
        i=1 i
                            C g (x)
                           i=1 i i
                                                  C g(x)
                                                 i=1 i
                                                             = |g(x)|         C
                                                                            i=1 i
                                                                                    = C|g(x)|.

                                                                 n                    n
Theorem: If fi(x) is O(gi(x)), for 1in, then                     f (x)
                                                                 i=1 i
                                                                            is O( i=1 gi(x) ).

Proof: Let g(x) = |g1(x)||g2(x)|…|gn(x)| and Ci the constants associated with
O(gi(x)). Then
        n                  n                       n
          f (x)
        i=1 i
                            C g (x)
                           i=1 i i
                                           C i=1 gi(x) .

Definition: Let f and g be functions from either Z or R to R. Then f(x) is
(g(x)) if there are constants C and k such that |f(x)|  C|g(x)| whenever x>k.

Definition: Let f and g be functions from either Z or R to R. Then f(x) is (g(x))
if f(x) = O(g(x)) and f(x) = (g(x)). In this case, we say that f(x) is of order g(x).

Comment: f(x) = O(g(x)) notation is great in the limit, but does not always
provide the right bounds for all values of x. , denoted Big Omega, is used to
provide lower bounds. , denoted Big Theta, is used to provide both lower and
upper bounds.
Example: f(x) =      i=0
                          aixi   with an0 is of order xn.

Notation: Timing, as a function of the number of elements falls into the field of

                     Complexity              Terminology
                     (1)                    Constant
                     (log(n))               Logarithmic
                     (n)                    Linear
                     (nlog(n))              nlog(n)
                     (nk)                   Polynomial
                     (nklog(n))             Polylog
                     (kn), where k>1        Exponential
                     (n!)                   Factorial

Notation: Problems are tractable if they can be solved in polynomial time and
are intractable otherwise.

                  Algorithms, Integers, and Matrices
Definition: An algorithm is a finite set of precise instructions for solving a

Computational algorithms should have these properties:

   Input: Values from a specified set.
   Output: Results using the input from a specified set.
   Definiteness: The steps in the algorithm are precise.
   Correctness: The output produced from the input is the right solution.
   Finiteness: The results are produced using a finite number of steps.
   Effectiveness: Each step must be performable and in a finite amount of
   Generality: The procedure should accept all input from the input set, not
    just special cases.

                                                      
Algorithm: Find the maximum value of a          
                                                     i i=1
                                                              , where n is finite.
                                                      
                          
    procedure max( a 
                         i i=1
                                  : integers)
                          
       max := a1
       for i := 2 to n
             if max < ai then max := ai
    {max is the largest element}

Proof of correctness: We use induction.
1. Suppose n = 1, then max := a1, which is the correct result.
2. Suppose the result is true for k = 1, 2, …, i-1. Then at step i, we know that
    max is the largest element in a1, a2, …, ai-1. In the if statement, either max is
    already larger than ai or it is set to ai. Hence, max is the largest element in
    a1, a2, …, ai. Since i was arbitrary, we are done.        qed

This algorithm’s input and output are well defined and the overall algorithm can
be performed in O(n) time since n is finite. There are no restrictions on the input
set other than the elements are integers.

                                                                     
Algorithm: Find a value in a sorted, distinct valued a          
                                                                    i i=1
                                                                             , where n is finite.
                                                                     

There are many, many search algorithms.
                                      
    procedure linear_search(x, a 
                                     i i=1
                                              : integers)
                                      
         i := 1
         while (in and xai)
              i := i + 1
         if in then location := i else location := 0
                                                 n                                                n
                                                                                               
         {location is the subscript of a   
                                               i i=1
                                                        equal to x or 0 if x is not in a    
                                                                                                i i=1
                                                                                               

We can prove that this algorithm is correct using an induction argument. This
algorithm does not rely on either distinctiveness nor sorted elements.

Linear search works, but it is very slow in comparison to many other searching
algorithms. It takes 2n+2 comparisons in the worst case, i.e., O(n) time.

                                         
    procedure binary_search(x, a    
                                        i i=1
                                                  : integers)
                                         
           i := 1
           j := n
           while ( i < j )
                m := (i+j)/2
                if x > am then i := m+1 else j := m
           if x = ai then location := i else location := 0
                                                    n                                            n
                                                                                              
           {location is the subscript of a    
                                                  i i=1
                                                           equal to x or 0 if x is not in a
                                                                                               i i=1
                                                                                              

We can prove that this algorithm is correct using an induction argument.

This algorithm is much, much faster than linear_search on average. It is O(logn)
                                                                      
in time. The average time to find a member of a                  
                                                                     i i=1
                                                                              can be proven to be of
                                                                      
order n.

                                                           
Algorithm: Sort the distinct valued a                
                                                          i i=1
                                                                   into increasing order, where n is
                                                           

There are many, many sorting algorithms.
                                        
     procedure bubble_sort( a      
                                       i i=1
                                                : reals, n1)
                                        
          for i := 1 to n-1
               for j := 1 to n-i
                    if aj > aj+1 then swap aj and aj+1
                
               i i=1
                        is in increasing order}
                

This is one of the simplest sorting algorithms. It is expensive, however, but quite
easy to understand and implement. Only one temporary is needed for the
swapping and two loop variables as extra storage. The worst case time is O(n2).

                                          
    procedure insertion_sort( a      
                                         i i=1
                                                  : reals, n1)
                                          
         for j := 2 to n
              i := 1
              while aj > ai
                   i := i + 1
              t := aj
              for k := 0 to j-i-1
                   aj-k := aj-k-1
              ai := t
                
               i i=1
                        is in increasing order}
                

This is not a very efficient sorting algorithm either. However, it is easy to see
that at the jth step that the jth element is put into the correct spot. The worst case
time is O(n2). In fact, insertion_sort is trivially slower than bubble_sort.

Number theory is a rich field of mathematics. We will study four aspects briefly:

    1.   Integers and division
    2.   Primes and greatest common denominators
    3.   Integers and algorithms
    4.   Applications of number theory

Most of the theorems quoted in this part of the textbook require knowledge of
mathematical induction to rigorously prove, a topic covered in detail in the next
chapter. 

Definition: If a,bZ and a0, we say that a divides b if cZ(b=ac), denoted by
a | b. When a divides b, we denote a as a factor of b and b as a multiple of a.
When a does not divide b, we denote this as a  b.

Theorem: Let a,b,cZ. Then
1. If a | b and a | c, then a | (b+c).
2. If a | b, then a | (bc).
3. If a | b and b | c, then a | c.

Proof: Since a | b, sZ(b=as).
1. Since a | c it follows that  tZ(c=at). Hence, b+c = as + at = a(s+t).
    Therefore, a | (b+c).
2. bc = (as)c = a(sc). Therefore, a | (bc).
3. Since b | c it follows that  tZ(c=bt). c = bt = (as)t = a(st), Therefore, a | c.

Corollary: Let a,b,cZ. If a | b and b | c, then a | (mb+nc) for all m,nZ.

Theorem (Division Algorithm): Let a,dZ(d > 0). Then !q,rZ(a = dq+r).

Definition: In the division algorithm, a is the dividend, d is the divisor, q is the
quotient, and r is the remainder. We write q = a div d and r = a mod d.


   Consider 101 divided by 9: 101 = 119 + 2.
   Consider -11 divided by 3: -11 = 3(-4) + 1.

Definition: Let a,b,mZ(m > 0). Then a is congruent to b modulo m if m | (a-b),
denoted a  b (mod m). The set of integers congruent to an integer a modulo m
is called the congruence class of a modulo m.

Theorem: Let a,b,mZ(m > 0). Then a  b (mod m) if and only if a mod m = b
mod m.


   Does 17  5 mod 6? Yes, since 17 – 5 = 12 and 6 | 12.
   Does 24  14 mod 6? No, since 24 – 14 = 10, which is not divisible by 6.

Theorem: Let a,b,mZ(m > 0). Then a  b (mod m) if and only if

Proof: If a  b (mod m), then m | (a-b). So, there is a k such that a-b = km, or a =
b+km. Conversely, if there is a k such that a = b + km, then km = a-b. Hence, m
| (a-b), or a  b (mod m).

Theorem: Let a,b,c,d,mZ(m > 0). If a  b (mod m) and c  d (mod m), then
a+c  b+d (mod m) and ac  bd (mod m).

Corollary: Let a,b,mZ(m > 0). Then (a+b) mod m = ((a mod m)+(b mod m))
mod m and (ab) mod m = ((a mod m)(b mod m)) mod m.
Some applications involving congruence include

   Hashing functions h(k) = k mod m.
   Pseudorandom numbers: xn+1 = (axn+c) mod m.
      o c = 0 is known as a pure multiplicative generator.
      o c  0 is known as a linear congruential generator.
   Cryptography

Definition: A positive integer a is a prime if it is divisible only by 1 and a. It is a
composite otherwise.

Fundamental Theorem of Arithmetic: Every positive integer greater than 1 can
be written uniquely as a prime or the product of two or more primes where the
prime factors are written in nondecreasing order.

Theorem: If a is a composite number, then a has a prime divisor less than or
equal to a1/2.

Theorem: There are infinitely many primes.

Prime Number Theorem: The ratio of primes not exceeding a and x/ln(a)
approaches 1 as a.

Example: The odds of a randomly chosen positive integer n being prime is given
by (n/ln(n))/n = 1/ln(n) asymptotically.

There are still a number of open questions regarding the distribution of primes.

Definition: Let a,bZ(a and b not both 0). The largest integer d such that d | a
and d | b is the greatest common devisor of a and b, denoted by gcd(a,b).

Example: gcd(24,36) = 12.

Definition: The integers a and b are relatively prime if gcd(a,b) = 1.

                               
Definition: The integers a
                              i i=1
                                       are pairwise relatively prime if gcd(ai,aj) = 1
                               

whenever 1i<jn.


   {10, 17, 121} are relatively prime.
   {10, 19, 124} are not relatively prime.

Definition: The least common multiple of positive integers a and b is the
smallest positive integer that is divisible by both a and b, denoted lcm(a,b).

Theorem: Let a and b be positive integers. Then ab = gcd(a,b)lcm(a,b).

Integers can be expressed uniquely in any base.

Theorem: Let bZ(b>1). Then if nN, then there is a unique expression such
that n = akbk+ ak-1bk-1+…+a1b+a0, where {ai},kN0, ak0, and 0ai<b. n is
written by n = (ak ak-1… a1a0)b.


   (123)5 = 152 + 25 + 3 = (38)10,
      o the base 5 digits are {0-4}.
   (1011)2 = (11)10,
      o the binary digits are {0, 1}.
   (F)16 = (15)10,
      o the hexadecimal digits are {0-9, A-F}.

Note: Common bases are 2 (binary), 8 (octal), 10 (decimal), and 16

Algorithm: Constructing base b expansions.

    procedure base_b_expansion(n: integer)
        q := 0
        k := 0
        while q0
             ak := q mod b
             q := q/b
             k := k+1
        {the base b expansion of n is (ak-1ak-2…a1a0)b}

Examples: Converting between some bases is easier than others.

   Base 2 to any base 2k, k>1, is really easy. Just group k bits together and
    convert to the base 2k symbol.
   Base 10 to any base 2k is a pain.
   Base 2k to base 10 is also a pain.

Algorithm: Addition of integers

    procedure add(a, b: integers)
        (an-1an-2…a1a0)2 := base_2_expansion(a)
        (bn-1bn-2…b1b0)2 := base_2_expansion(b)
        c := 0
        for j := 0 to n-1
             d := (aj+bj+c)/2
             sj := aj+bj+c – 2d
             c := d
        sn := c
        {the binary expansion of the sum is (sk-1sk-2…s1s0)2}


   What is the complexity of this algorithm?
   Is this the fastest way to compute the sum?

Algorithm: Mutiplication of integers

    procedure multiply(a, b: integers)
        (an-1an-2…a1a0)2 := base_2_expansion(a)
        (bn-1bn-2…b1b0)2 := base_2_expansion(b)
        for j := 0 to n-1
             if bj = 1 then cj := a shifted j places else cj := 0
        {c0,c1,…,cn-1 are the partial products}
        p := 0
        for j := 0 to n-1
             p := p + cj
        {p is the value of ab}


   (10)2(11)2 = (110)2. Note that there are more bits than the original integers.
   (11)2(11)2 = (1001)2. Twice as many binary digits!

Algorithm: Compute div and mod

    procedure division(a: integer, d: positive integer)
        q := 0
        r := |a|
        while r  d
              r := r – d
              q := q + 1
        if a < 0 and r > 0 then
              r := d – r
              q := -(q + 1)
        {q = a div d is the quotient and r = a mod d is the remainder}

  The complexity of the multiplication algorithm is O(n2). Much more
    efficient algorithms exist, including one that is O(n1.585) using a divide and
    conquer technique we will see later in the course.
  There are O(log(a)log(d)) complexity algorithms for division.

Modular exponentiation, bk mod m, where b, k, and m are large integers is
important to compute efficiently to the field of cryptology.

Algorithm: Modular exponentiation

    procedure modular_exponentiation(b: integer, k,m: positive integers)
        (an-1an-2…a1a0)2 := base_2_expansion(k)
        y := 1
        power := b mod m
        for i := 0 to n-1
             if ai = 1 then y := (y  power) mod m
             power := (power  power) mod m
        {y = bk mod m}

Note: The complexity is O((log(m))2log(k)) bit operations, which is fast.

Euclidean Algorithm: Compute gcd(a,b)

    procedure gcd(a,b: positive integers)
        x := a
        y := b
        while y0
             r := x mod y
             x := y
             y := r
        {gcd(a,b) is x}

Correctness of this algorithm is based on

Lemma: Let a=bq+r, where a,b,q,rZ. then gcd(a,b) = gcd(b,r).

The complexity will be studied after we master mathematical induction.

Number theory useful results

Theorem: If a,bN then s,tZ(gcd(a,b) = sa+tb).

Lemma: If a,b,cN (gcd(a,b) = 1 and a | bc, then a | c).

Note: This lemma makes proving the prime factorization theorem doable.

Lemma: If p is a prime and p | a1a2…an where each aiZ, then p | ai for some i.

Theorem: Let mN and let a,b,cZ. If ac  bc (mod m) and gcd(c,m) = 1, then
a  b (mod m).

Definition: A linear congruence is a congruence of the form ax  b (mod m),
where mN, a,bZ, and x is a variable.

Definition: An inverse of a modulo m is an a such that aa  1 (mod m).

Theorem: If a and m are relatively prime integers and m>1, then an inverse of a
modulo m exists and is unique modulo m.

Proof: Since gcd(a,m) = 1, s,tZ(1 = sa+tb). Hence, sa=tb  1 (mod m). Since
tm  0 (mod m), it follows that sa  1 (mod m). Thus, s is the inverse of a
modulo m. The uniqueness argument is made by assuming there are two
inverses and proving this is a contradiction.

Systems of linear congruences are used in large integer arithmetic. The basis for
the arithmetic goes back to China 1700 years ago.

Puzzle Sun Tzu (or Sun Zi): There are certain things whose number is unknown.

   When divided by 3, the remainder is 2.
   When divided by 5, the remainder is 3, and
   When divided by 7, the remainder is 2.

What will be the number of things? (Answer: 23… stay tuned why).

Chinese Remander Theorem: Let m1, m2,…,mnN be pairwise relatively prime.
Then the system x  ai (mod mi) has a unique solution modulo m = i=1mi .

Existence Proof: The proof is by construction. Let Mk = m / mk, 1kn. Then
gcd(Mk, mk) = 1 (from pairwise relatively prime condition). By the previous
theorem we know that there is a yk which is an inverse of Mk modulo mk, i.e.,
Mkyk  1 (mod mk). To construct the solution, form the sum

                     x = a1M1y1 + a2M2y2 + … + anMnyn.

Note that Mj  0 (mod mk) whenever jk. Hence,

                      x  akMkyk  ak (mod mk), 1kn.

We have shown that x is simultaneous solution to the n congruences. qed

Sun Tzu’s Puzzle: The ak{2, 1, 2} from 2 pages earlier. Next

          mk{3, 5, 7}, m=357=105, and Mk=m/mk{35, 21, 15}.

The inverses yk are

1.   y1 = 2 (M1 = 35 modulo 3).
2.   y2 = 1 (M2 = 21 modulo 5).
3.   y3 = 1 (M3 = 15 modulo 7).

The solutions to this system are those x such that

      x  a1M1y1 + a2M2y2 + a2M2y2 = 2352 + 3211 + 2151 = 233

Finally, 233  23 (mod 105).

Definition: A mn matrix is a rectangular array of numbers with m rows and n
columns. The elements of a matrix A are noted by Aij or aij. A matrix with m=n
is a square matrix. If two matrices A and B have the same number of rows and
columns and all of the elements Aij = Bij, then A = B.

Definition: The transpose of a mn matrix A = [Aij], denoted AT, is AT = [Aji]. A
matrix is symmetric if A = AT and skew symmetric if A = -AT.

Definition: The ith row of an mn matrix A is [Ai1, Ai2, …, Ain]. The jth column
is [A1j, A2j, …, Amj]T.

Definition: Matrix arithmetic is not exactly the same as scalar arithmetic:

   C = A + B: cij = aij + bij, where A and B are mn.
   C = A – B: cij = aij - bij, where A and B are mn
   C = AB: cij = p=1aipbpj , where A is mk, B is kn, and C is mn.

Theorem: AB = BA, but ABBA in general.
Definition: The identity matrix In is nn with Iii = 1 and Iij = 0 if ij.

Theorem: If A is nn, then AIn = InA = A.

Definition: Ar = AAA (r times).

Definition: Zero-One matrices are matrices A = [aij] such that all aij{0, 1}.
Boolean operations are defined on mn zero-one matrices A = [aij] and B = [bij]
   Meet of A and B: AB = aijbij, 1im and 1jn.
   Join of A and B: AB = aijbij, 1im and 1jn.
   The Boolean product of A and B is C = A B, where A is mk, B is kn,
    and C is mn, is defined by cij = (ai1b1j)(ai2b2j)…(aikbkj).

Definition: The Boolean power of a nn matrix A is defined by A[r] =
A A … A (r times), where A[0] = In.

                          Induction and Recursion

Principle of Mathematical Induction: Given a propositional function P(n), nN,
we prove that P(n) is true for all nN by verifying
1. (Basis) P(1) is true
2. (Induction) P(k)P(k+1), kN.


   Equivalent to [P(1)  kN (P(k)P(k+1))]  nN P(n).
   We do not actually assume P(k) is true. It is shown that if it is assumed that
    P(k) is true, then P(k+1) is also true. This is a subtle grammatical point with
    mathematical implications.
   Mathematical induction is a form of deductive reasoning, not inductive
    reasoning. The latter tries to make conclusions based on observations and
    rules that may lead to false conclusions.
   Sometimes P(1) is not the basis, but some other P(k), kZ.
   Sometimes P(k) is for a (possibly infinite) subset of N or Z.

   Sometimes P(k-1)P(k) is easier to prove than P(k)P(k+1).
   Being flexible, but staying within the guiding principle usually works.
   There are many ways of proving false results using subtly wrong induction
    arguments. Usually there is a disconnect between the basis and induction
    parts of the proof.
   Examples 10, 11, and 12 in your textbook are worth studying until you
    really understand each.
Lemma: i=1(2i-1) = n2 (sum of odd numbers).
Proof: (Basis) Take k = 1, so 1 = 1.
(Induction) Assume 1+3+5+…+(2k-1) = k2 for an arbitrary k > 1. Add 2k+1 to
both sides. Then (1+3+5+…+(2k-1))+(2k+1) = k2+(2k+1) = (k+1)2.

Lemma:      i=0
                 2i   = 2n+1-1.

Proof: (Basis) Take k=0, so 20 = 1 = 21 – 1.
(Induction) Assume i=0 2i = 2k+1-1 for an arbitrary k > 0. Add 2k+1 to both
sides. Then
                                     i=0
                                          2i   + 2k+1= 2k+1-1 + 2k+1 ,

                             k+1 i
which simplifies to         i=0
                                 2   = 2k+2 -1.

Principle of Strong Induction: Given a propositional function P(n), nN, we
prove that P(n) is true for all nN by verifying
1. (Basis) P(1) is true
2. (Induction) [P(1)P(2)…P(k)]P(k+1) is true kN.

Example: Infinite ladder with reachable rungs. For mathematical or strong
induction, we need to verify the following:

       Step           Mathematical                     Strong
       Basis                  We can reach the first rung.
     Induction If we can reach an arbitrary kN, if we can reach all k
               rung k, then we can reach rungs, then we can reach
               rung k+1.                    rung k+1.

We cannot prove that you can climb an infinite ladder using mathematical
induction. Using strong induction, however, you can prove this result using a
trick: since you can prove that you can climb to rungs 1, 2, …, k, it follows that
you can climb 2 rungs arbitrarily, which gets you from rung k-1 to rung k+1.

Rule of thumb: Always use mathematical induction if P(k)P(k+1)  kN.
Only resort to strong induction when that fails.

Fundamental Theorem of Arithmetic: Every nN (n>1) is the product of primes.

Proof: Let P(n) be the proposition that n can be written as the product of primes.
(Basis) P(2) is true: 2 = 2, the product of 1 prime.
(Induction) Assume P(j) is true jk. We must verify that P(k+1) is true.
    Case 1: k+1 is a prime. Hence, P(k+1) is true.
    Case 2: k+1 is a composite. Hence k+1 = a•b, where 2ab<k+1. By the
    inductive step, P(a) and P(b) are both true. Hence, a= p and b= p ,
    where the p’s are primes. It follows then that k+1 =    p  p , so P(k+1)
    is true.

Principle of Modified Strong Induction: Given a propositional function P(n),
nN, we prove that P(n) is true for all nN by verifying
1. (Basis) P(b), P(b+1), …, P(b+j) are all true.
2. (Induction) [P(b)P(b+1)…P(k)]P(k+1) is true kb+jN.

Example: Every postage amount  $.12 can be formed using $.04 and $.05
stamp combinations only. We can prove this using modified strong induction.
(Basis) Consider 4 specific cases:

                 Postage Number of $.04’s Number of $.05’s
                  $.12         3                0
                  $.13         2                1
                  $.14         1                2
                  $.15         0                3

Hence, P(j) is true for 12j15.
(Induction) Assume P(j) is true for 12jk and k15. By the inductive
hypothesis, P(k-3) is true since k-312. Hence, we can just add another $.04

Well Ordering Property: Every nonempty set of N has a least element.

The validity of math and strong induction is based on the well ordering property.

Definition: A recursive function is defined from
1. (Basis) Initial value f(0).
2. (Recursion) f(k), k>0, in terms of {f(j) | {j} such that 0j<k} and other


      f(0) = 1, f(n) = 2f(n-1)+4, n>0.
      g(0) = 12, g(1) = 1, g(n) = 2g(n-1) – g(n-2), n>2.
      h(0) = 1, h(n) = nh(n-1) = n!
      Fibonacci numbers: f0 = 0, f1 = 1, fn = fn-1 + fn-2, n>1.

n                0             1              2              3     4
f(n)             1             6              16             36    76
g(n)             12            1              -10            -21   -32
h(n)             1             1              2              6     24
fn               0             1              1              2     3

Theorem: Whenever n3, fn > n-2, where  =(1+ 5)/2 .
The proof is by modified strong induction.

Lamé’s Theorem: Let a,bN (ab). Then the number of divisions used by the
Euclidean algorithm to find gcd(a,b)  5•decimal digits in b.

We can recursively define sets, too, not just functions. There is a basis step and
a recursion step with the possibility of an exclusion step.

Definition: The set * of strings over an alphabet  is defined by
(Basis) *, where  is the empty string.
(Recursion) If w,x, then wx*.

Example:  = {0,1}. Then * is the binary representation of N0.

Principle of Structured Induction:
1. (Basis) Show the result holds for all elements specified in the basis step of
     the recursive definition of the set.
2. (Induction) Show that if the statement is true for each element used to
     construct new elements in the recursive step of the definition, then the
     result holds for these new elements.

The validity of this approach comes from mathematical induction over N. First
state that P(n) is true whenever n or fewer elements are used to generate an
element. We must show that P(0) is true (i.e., the basis element). Now assume
that P(k) is true for an arbitrary k. Hence, P(k+1) must be true, too, due to the
recursion involving k or fewer elements.

Definition: A recursive algorithm solves a problem by reducing it to an instance
of the same problem with smaller input(s).

Note: Recursive algorithms can be proven correct using mathematical induction
or modified strong induction.


   n! = n•(n-1)!
   an = a•(an-1)
   gcd(a,b) with a,bN (a<b).

        procedure gcd(a,b: integers and a<b)
            if a = 0 then gcd(a,b) := b
            else gcd(a,b) := gcd(b mod a, a)

 linear search

      procedure search(i,j,x: integers and 1in, 1jn)
          if ai = x then location := i
          else if i = j then location := 0
          else search(i+1,j,x)

 binary search

      procedure binary_search(I,j,x: integers and 1in, 1jn)
          m := (i+j)/2
          if x = am then location := m
          else if x < am and i<m then binary_search(i,m-1,x)
          else if x > am and j>m then binary_search(m+1,j,x)
          else location := 0

 Fibonacci numbers

      procedure fib(n: nN0)
          if n = 0 then fib(0) := 0
          else if n = 1 then fib(1) := 1
          else fib(n) := fib(n-1) + fib(n-2)

  or it can be defined iteratively:

      procedure fib(n: nN0)
          if n = 0 then y := 0
               x := 0, y := 1
               for I := 1 to n-1
                    z := x+y
                    x := y
                    y := z
          {y is fn}

Graphs and trees are important concepts that we will spend a lot of time
considering later in the course.

   A graph is made up of vertices and edges that connect some of the vertices.
   A tree is a special form of a graph, namely it is a connected unidirectional
    graph with no simple circuits.
   A rooted tree is a tree with one vertex that is the root and every edge is
    directed away from the root.
   A m-ary tree is a rooted tree such that every internal vertex has no more
    than m children. If m = 2, it is a binary tree.
   The height of a rooted tree T, denoted h(T), is the maximum number of
    levels (or vertices).
   A balanced rooted tree T has all of its leaves at h(T) or h(T)-1.

Let T1, T2, …, Tm be rooted trees with roots r1, r2, …, rm. Let r be another root.
Connecting r to the roots r1, r2, …, rm constructs another rooted tree T. We can
reformulate this concept using the recursive set methodology.

Merge sort is a balanced binary tree method that first breaks a list up recursively
into two lists until each sublist has only one element. Then the sublists are
recombined, two at a time and sorted order, until only one sorted list remains.

Note: The height of the tree formed in merge sort is O(log2n) for n elements.

                                    10, 4, 7, 1
              10, 4                                            7, 1
    10                      4                       7                        1
              4, 10                                            1, 7
                                    1, 4, 7, 10


   First three rows do the sublist splitting.
   Last two rows do the merging.
   There are two distinct algorithms at work.

                                           
procedure merge_sort(L = a           
                                          i i=1
                                           
  if n > 1 then
       m := n/2
                       
      L1 := a    
                      i i=1
                       
                      
      L 2 :=    a   i i=m+1
                      
      L := merge(merge_sort(L1), merge_sort(L2))
                                     
  {L is now the sorted a        
                                    i i=1
                                     

procedure merge(L1, L2: sorted lists)
  L := 
  while L1 and L2 are both nonempty
      remove the smaller of the first element of L1 and L2 and append it to
      end of L
      if either L1 or L2 are empty, append the other list to the end of L
  {L is the merged, sorted list}

Theorem: If ni = |Li|, i=1,2, then merge requires at most n1+n2-1 comparisons. If
n = |L|, then merge_sort requires O(nlog2n) comparisons.

Quick sort is another sorting algorithm that breaks an initial list into many
                                                                            
sublists, but using a different heuristic than merge sort. If L = a    
                                                                           i i=1
                                                                            
distinct elements, then quick sort recursively constructs two lists: L1 for all ai <
a1 and L2 for all ai > a1 with a1 appended to the end of L1. This continues
recursively until each sublist has only one element. Then the sublists are
recombined in order to get a sorted list.

Note: On average, the number of comparisons is O(nlog2n) for n elements, but
can be O(n2) in the worst case. Quick sort is one of the most popular sorting
algorithms used in academia.

Exercise: Google “quick sort, C++” to see many implementations or look in
many of the 200+ C++ primers. Defining quick sort is in Rosen’s exercises.

               Counting, Permutations, and Combinations
Product Rule Principle: Suppose a procedure can be broken down into a
sequence of k tasks. If there are ni, 1ik, ways to do the ith task, then there are
 i=1ni ways to do the procedure.
Sum Rule Principle: Suppose a procedure can be broken down into a sequence
of k tasks. If there are ni, 1ik, ways to do the ith task, with each way unique,
then there are i=1ni ways to do the procedure.

Exclusion (Inclusion) Principle: If the sum rule cannot be applied because the
ways are not unique, we use the sum rule and subtract the number of duplicate

Note: Mapping the individual ways onto a rooted tree and counting the leaves is
another method for summing. The trees are not unique, however.


  Consider 3 students in a classroom with 10 seats. There are 1098 = 720
   ways to assign the students to the seats.
  We want to appoint 1 person to fill out many, may forms that the
   administration wants filled in by today. There are 3 students and 2 faculty
   members who can fill out the forms. There are 3+2 = 5 ways to choose 1
   person. (Duck fast.)
  How many variables are legal in the orginal Dartmouth BASIC computer
   language? Variables are 1 or 2 alphanumeric characters long, begin with A-
   Z, case independent, and are not one of the 5 two character reserved words
   in BASIC. We use a combination of the three counting principles:
     o 1 character variables: V1 = 26
     o 2 character variables: V2 = 2636 - 5 = 931
     o Total: V = V1 + V2 = 957

Pigeonhole Principle: If there are kN boxes and at least k+1 objects placed in
the boxes, then there is at least one box with more than one object in it.

Theorem: A function f: DE such that |D| >k and |E| = k, then f is not 1-1.
The proof is by the pigeonhole principle.

Theorem (Generalized Pigeonhole Principle): If N objects are placed in k boxes,
then at least one box contains at least N/k - 1 objects.

Proof: First recall that N/k < (N/k)+1. Now suppose that none of the boxes
contains more than N/k - 1 objects. Hence, the total number of objects has to
                      k(N/k - 1) < k((N/k)+1)-1) = N. 
Hence, the theorem must be true (proof by contradiction).

Theorem: Every sequence of n2+1 distinct real numbers contains a subsequence
of length n+1 that is either strictly increasing or decreasing.
Examples: From a standard 52 card playing deck.

   How many cards must be dealt to guarantee that k = 4 cards from the same
    suit are dealt?
      o GPP Theorem says N/k - 1  4 or N = 17.
      o Real minimum turns out to be N/k  4 or N = 16.
   How many cards must be dealt to guarantee that 4 clubs are dealt?
      o GPP Theorem does not apply.
      o The product rule and inclusion principles apply: 313+4 = 43 since all
         of the hearts, spaces, and diamonds could be dealt before any clubs.

Definition: A permutation of a set of distinct objects is an ordered arrangment of
these objects. A r-permutation is an ordered arrangement of r of these objects.

Example: Given S = {0,1,2}, then {2,1,0} is a permutation and {0,2} is a 2-
permutation of S.

Theorem: If n,rN, then there are P(n,r) = n(n-1)(n-2)…(n-r+1) = n!/(n-r)!
r-permutations of a set of n distinct elements. Further, P(n,0) = 1.

The proof is by the product rule for r1. For r=0, there is only way to order 0

Example: You want to visit 10 cities in China on a vacation. You will arrive in
Hong Kong as your first city and you want to maximize the number of frequent
flier miles you will accumulate by flying to 9 more cities. You have 9! Different
paths to check. Good luck since 9! = 362,880.

Definition: A r-combination is an unordered subset with r elements from the
original set.

Definition: The binomial coefficient is defined by  n = n! .
                                                    
                                                    r
                                                       r!(n-r)!

Theorem: The number of r-combinations of a set with n elements with n,rN0 is
C(n,r) =  n .
          
          r
          

Proof: The r-permutations can be formed using C(n,r) r-combinations and then
ordering each r-combination, which can be done in P(r,r) ways. So,
                             P(n,r) = C(n,r)P(r,r)
                    C(n,r) = P(r,r) = n!  (r-r)! = n! .
                             P(n,r) (n-r)! r!       r!(n-r)!

Theorem: C(n,r) = C(n,n-r) for 0rn.

Definition: A combinatorial proof of an identity is a proof that uses counting
arguments to prove that both sides f the identity count the same objects, but in
different ways.

Binomial Theorem: Let x and y be variables. Then for nN,
                                      n  n
                          (x+y)n =  j=0  xn-jy j.
                                          j
                                          

Proof: Expanding the terms in the product all are of the form xn-jyj for
j=0,1,…,n. To count the number of terms for xn-jyj, note that we have to choose
n-j x’s from the n sums so that the other j terms in the product are y’s. Hence,
                            n   n
                     n-j j
the coefficient for x y is   =  .
                            n-j  j 
                                 
Example: What is the coefficient of x y in (x+y) ? 25 = 5,200,300.
                                                    12 13
                                                                    25   
                                                   13 

                                        n  n
Corollary: Let nN0. Then                   
                                        k=0  k 
                                             
                                                    = 2n .

                                n  n  k n-k            n  n
Proof:   2n   =   (1+1)n   =        1 1
                                k=0  k 
                                                    =        .
                                                         k=0  k 
                                                            

                                         k  n
                                 n          
Corollary: Let nN0. Then       k=0
                                            k
                                            
                                                    = 0.

                             n  n               n          n
Proof: 0 = 0n = ((-1)+1)n = k=0   (-1)k1n-k = k=0 (-1)k   .
                                  k
                                                           k
                                                             

           n  n  n 
                             n  n  n
Corollary:  +  +  +L =   +  +  +L
           0  2  4 
                        1  3  5
                                  

                                      k  n
                                 n       
Corollary: Let nN0. Then          2
                                 k=0  k 
                                         
                                               = 3n .

Theorem (Pascal’s Identity): Let n,kN with nk. Then n+1 =  n  + n .
                                                                  
                                                              k-1  k 
                                                       k     
                                                           

               n  n
                                                           n
Note: Using  =  n = 1 as a basis, we can define   recursively using
               0  
                                                        k
                                                            

Pascal’s Identity. It is normally written as a triangular table, denoted Pascal’s

Theorem (Vandermonde’s Identity): Let m,n,rN with rm and rn. Then
                          m+n      r  m   n
                               =             .
                          r 
                                   k=0  r-k   k 
                                               

                                             n  n
Corollary: If nN0, then 2n =                                  .
                         n     
                                  
                                               k=0  k 
                                                    

                 n  n   n            n  n
Proof: 2n = 
                            =
                                                        .
          
                   k=0  n-k   k 
                             
                                           k=0  k 
                                                

                                        n+1 = n  j .
Theorem: Let n,rN0 such that rn. Then    
                                              j=r  r        

If we allow repetitions in the permutations, then all of the previous theorems and
corollaries no longer apply. We have to start over .

Theorem: The number of r-permutations of a set with n objects and repetition is
n r.

Proof: There are n ways to select an element of the set of all r positions in the r-
permutation. Using the product principle completes the proof.

Theorem: There are C(n+r-1,r) = C(n+r-1,n-1) r-combinations from a set with n
elements when repetition is allowed.

Example: How many solutions are there to x1+x2+x3 = 9 for xiN? C(3+9-1,9) =
C(11,9) = C(11,2) = 55. Only when the constraints are placed on the xi can we
possibly find a unique solution.

Definition: The multinomial coefficient is C(n; n1, n2, …, nk) =       n!      .
                                                                         n!
                                                                       i=1 i

Theorem: The number of different permutations of n objects, where there are ni,
1ik, indistinguishable objects of type i, is C(n; n1, n2, …, nk).

Theorem: The number of ways to distribute n distinguishable objects in k
distinguishable boxes so that ni objects are placed into box i, 1ik, is C(n; n1,
n2, …, nk).

Theorem: The number of ways to distribute n distinguishable objects in k
indistinguishable boxes so that ni objects are placed into box i, 1ik, is
                               k 1     j-1   j  j   n
                             j=1 j!i=0  -1  i  j-i .
                                                   

Multinomial Theorem: If nN, then
                     k                                                       n   n     n
                        x
                      i=1 i 
                                    = n +n
                                        1   2
                                                +...nk   =k
                                                            C(n;n1,n2,...,nk )x11x22 ...xkk .

Generating permutations and combinations is useful and sometimes important.

Note: We can place any n-set into a 1-1 correspondence with the first n natural
numbers. All permutations can be listed using {1, 2, …, n} instead of the actual
set elements. There are n! possible permutations.

Definition: In the lexicographic (or dictionary) ordering, the permutation of
{1,2,…,n} a1a2…an precedes b1b2…bn if and only if ai  bi, for all 1in.


   5 elements. The permutation 21435 precedes 21543.
   Given 362541, then 364125 is the next permutation lexicographically.

Algorithm: Generate the next permutation in lexicographic order.

    procedure next_perm(a1a2…an: ai{1,2,…,n} and distinct)
        j := n – 1
        while aj > aj+1
             j := j – 1
        {j is the largest subscript with aj < aj+1}
        k := n
        while aj > ak
             k := k – 1
        {ak is the smallest integer greater than aj to the right of aj}
        Swap aj and ak
        r := n, s := j+1
        while r > s
             Swap ar and as
             r := r – 1, s:= s + 1
        {This puts the tail end of the permutation after the jth position in
        increasing order}

Algorithm: Generating the next r-combination in lexicographic order.

    procedure next_r_combination(a1a2…an: ai{1,2,…,n} and distinct)
        i := r
        while ai = n-r+1
              i := i – 1
        ai := ai + 1
        for j := i+1 to r
              aj := ai + j - 1

Example: Let S = {1, 2, …, 6}. Given a 4-permutation of {1, 2, 5, 6}, the next 4-
permutation is {1, 3, 4, 5}.

                            Discrete Probability
Definition: An experiment is a procedure that yields one of a given set of
possible outcomes.

Definition: The sample space of the experiment is the set of (all) possible

Definition: An event is a subset of the sample space.

First Assumption: We begin by only considering finitely many possible

Definition: If S is a finite sample space of equally likely outcomes and ES is
an event, then the probability of E is p(E) = |E| / |S|.


   I randomly chose an exam1 to grade. What is the probability that it is one of
    the Davids? Thirty one students took exam1 of which five were Davids. So,
    p(David) = 5 / 31 ~ 0.16.
   Suppose you are allowed to choose 6 numbers from the first 50 natural
    numbers. The probability of picking the correct 6 numbers in a lottery
    drawing is 1/C(50,6) = (44!6!) / 50! ~ 1.4310-9. This lottery is just a
    regressive tax designed for suckers and starry eyed dreamers.

Definition: When sampling, there are two possible methods: with and without
replacement. In the former, the full sample space is always available. In the
latter, the sample space shrinks with each sampling.

Example: Let S = {1, 2, …, 50}. What is the probability of sampling {1, 14, 23,
32, 49}?

   Without replacement: p({1,14,23,32,49}) = 1 / (5049484746) =
   With replacement: p({1,14,23,32,49}) = 1 / (5050505050) = 3.2010-9.

Definition: If E is an even, then E is the complementary event.

Theorem: p( E ) = 1 – p(E) for a sample space S.

Proof: p( E ) = (|S| – |E|) / |S| = 1 – |E| / |S| = 1 – p(E).

Example: Suppose we generate n random bits. What is the probability that one
of the bits is 0? Let E be the event that a bit string has at least one 0 bit. Then E
is the event that all n bits are 1. p(E) = 1 – p( E ) = 1 – 2-n = (2n – 1) / 2n.

Note: Proving the example directly for p(E) is extremely difficult.

Theorem: Let E and F be events in a sample space S. Then
                      p(EF) = p(E) + p(F) – p(EF).

Proof: Recall that |EF| = |E| + |F| – |EF|. Hence,
     p(EF) = |EF| / |S| = (|E| + |F| – |EF|) / |S| = p(E) + p(F) – p(EF).

Example: What is the probability in the set {1, 2, …, 100} of an element being
divisible by 2 or 3? Let E and F represent elements divisible by 2 and 3,
respectively. Then |E| = 50, |F| = 33, and |EF| = 16. Hence, p(EF) = 0.67.

Second Assumption: Now suppose that the probability of an event is not 1 / |S|.
In this case we must assign probabilities for each possible event, either by
setting a specific value or defining a function.

Definition: For a sample space S with a finite or countable number of events, we
assign probabilities p(s) to each event sS such that
     (1) 0  p(s)  1 sS, and
     (2) sSp(s) = 1.


         1. When |S| = n, the formulas (1) and (2) can be rewritten using n.
         2. When |S| =  and is uncountable, integral calculus is required for (2).
         3. When |S| =  and is countable, the sum in (2) is true in the limit.

Example: Coin flipping with events H and T.
   S = {H, T} for a fair coin. Hence, p(H) = p(T) = 0.5.
   S = {H, H, T} for a weighted coin. Then p(H) = 0.67 and p(T) = 0.33.

Definition: Suppose that S is a set with n elements. The uniform distribution
assigns the probability 1/n to each element in S.

Definition: The probability of the event E is the sum of the probabilities of the
outcomes in E, i.e., p(E) = sE p(s) .

Note: When |E| = , the sum   sE p(s) must be convergent in the limit.
Definition: The experiment of selecting an element from a sample space S with
a uniform distribution is known as selecting an element from S at random.

We can prove that (1) p(E) = 1 – p( E ) and (2) p(EF) = p(E) + p(F) – p(EF)
using the more general probability definitions.

Definition: Let E and F be events with p(F) > 0. The conditional probability of E
given F is defined by p(E|F) = p(EF) / p(F).

Example: A bit string of length 3 is generated at random. What is the probability
that there are two 0 bits in a row given that the first bit is 0? Let F be the event
that the first bit is 0. Let E be the event that there are two 0 bits in a row. Note
that EF = {000, 001} and p(F) = 0.5. Hence, p(E|F) = 0.25 / 0.5 = 0.5.

Definition: The events E and F are independent if p(EF) = p(E)p(F).

Note: Independence is equivalent to having p(E|F) = p(E).

Example: Suppose E is the event that a bit string begins with a 1 and F is the
event that there is are an even number of 1’s. Suppose the bit strings are of
length 3. There are 4 bit strings beginning with 1: {100, 101, 110, 111}. There
are 3 strings with an even number of 1’s: {101, 110, 011}. Hence, p(E) = 0.5
and p(F) = 0.375. EF = {101, 110}, so p(EF) = 0.25. Thus, p(EF) 
p(E)p(F). Hence, E and F are not independent.

Note: For bit strings of length 4, 0.25 = p(EF) = (0.5)(0.5) = p(E)p(F), so the
events are independent. We can speculate on whether or not the even/odd length
of the bit strings plays a part in the independence characteristic.

Definition: Each performance of an experiment with exactly two outcomes,
denoted success (S) and failure (F), is a Bernoulli trial.

Definition: The Bernoulli distribution is denoted b(k; n,p) = C(n,k)pkqn-k.

Theorem: The probability of exactly k successes in n independent Bernoulli
trials, with probability of success p and failure q = 1 – p is b(k; n,p).

Proof: When n Bernoulli trials are carried out, the outcome is an n-tuple
(t1, t2, …, tn), all n ti{S, F}. Due to the trials independence, the probability of
each outcome having k successes and n-k failures is pkqn-k. There are C(n,k)
possible tuples that contain exactly k successes and n-k failures.

Example: Suppose we generate bit strings of length 10 such that p(0) = 0.7 and
p(1) = 0.3 and the bits are generated independently. Then

   b(8; 10,0.7) = C(10,8)(0.7)8(0.3)2 = 450 .08235430.09 = 0.3335
   b(7; 10,0.7) = C(10,7)(0.7)7(0.3)3 = 1200 .057648010.027 = 0.1868
Theorem:         k=0
                      b(k;n,p)   = 1.

             n                    n
Proof:      k=0
                 b(k;n,p) =      k=0
                                      C(k;   n,p)pkqn-k = (p+q)n = 1 .

Definition: A random variable is a function from the sample space of an
experiment to the set of reals.


   A random variable assigns a real number to each possible outcome.
   A random function is not a function nor random.

Example: Flip a fair coin twice. Let X(t) be the random variable that equals the
number of tails that appear when t is the outcome. Then

               X(HH) = 0, X(HT) = X(TH) = 1, and X(TT) = 2.

Definition: The distribution of a random variable X on a sample space is the set
of pairs (r, p(X=r)) rX(S), where p(X=r) is the probability that X takes the
value r.

Note: A distribution is usually described by specifying p(X=r) rX(S).

Example: For our coin flip example above, each outcome has probability 0.25.

               p(X=0) = 0.25, p(X=1) = 0.5, and p(X=2) = 0.25.

Definition: The expected value (or expectation) of the random variable X(s) in
the sample space S is E(X)=sSp(s)X(s) .

Note: If S = {xi}n , then E(X) = i=1p(xi )X(xi ) .

Example: Roll a die. Let the random variable X take the valuess 1, 2, …, 6 with
probability 1/6 each. Then E = i=1 1  = 3.5 . This is not really what you would
                                    n  
                                        6
                                        

like to see since the die does not a 3.5 face.

Theorem: If X is a random variable and p(X=r) is the probability that X=r so
that p(X=r) = rS,X(s)=r p(s) , then E(X) = rX(S)p(X=r)r .

Proof: Suppose X is a random variable with range X(S). Let p(X=r) be the
probability that X takes the value r. Hence, p(X=r) is the sum of probabilities of
outcomes s such that X(s)=r Finally, E(X) = rX(S)p(X=r)r .

Theorem: If Xi, 1in, are random variables on S and if a,bR, then

      1. E(X1+X2+…+Xn) = E(X1)+E(X2)+…+E(Xn)
      2. E(aXi+b) = aE(Xi) + b

Proof: Use mathematical induction (base case is n=2) for 1 and using the
definitions for 2.

Note: The linearity of E is extremely convenient and useful.

Theorem: The expected number of successes when n Bournoulli trials is
performed when p is the probability of success on each trial is np.

Proof: Apply 1 from the previous theorem.


   The average case complexity of an algorithm can be interpreted as the
    expected value of a random variable. Let S={ai}, where each possible input
    is an ai. Let X be the random variable such that X(ai) = bi, the number of
    operations for the algorithm with input ai. We assign a probability p(ai)
    based on bi. Then the average case complexity is E(X) = a S p(ai )X(ai ) .
   Estimating the average complexity of an algorithm tends to be quite
    difficult to do directly. Even if the best and worst cases can be estimated
    easily, there is no guarantee that the average case can be estimated without a
    great deal of work. Frankly, the average case is sometimes too difficult to
    estimate. Using the expected value of a random variable sometimes
    simplifies the process enough to make it doable.

Example of linear search average complexity: See page 44 in the class notes for
the algorithm and worst case complexity bound. We want to find x in a distinct
          n                                                                           n
                                                                                   
set a
        i i=1
                 . If x = ai, then there are 2i+1 comparisons. If x a          
                                                                                    i i=1
                                                                                             , then there are
                                                                                   
                                                                    
2n+2 comparisons. There are n+1 input types: a                 
                                                                   i i=1
                                                                            x. Clearly, p(ai) = p/n,
                                                                    
                                              
where p is the probability that x a     
                                             i i=1
                                                      . Let q = 1p. So,
                                              

                               E = (p/n) n (2i-1) + (2n+2)q
                                  = (p/n)((n+1)2 + (2n+2)q
                                  = p(n+2) + (2n+2)q.

There are three cases of interest, namely,
   p = 1, q = 0: E = n + 1
   p = q = 0.5: E = (3n + 4) / 2
   p = 0, q = 1: E = 2n + 2

Definition: A random variable X has a geometric distribution with parameter p if
p(X=k) = (1p)k-1p for k = 1, 2, …

Note: Geometric distributions occur in studies about the time required before an
event happens (e.g., time to finding a particular item or a defective item, etc.).

Theorem: If the random variable X has a geometrix distribution with parameter
p, then E(X) = 1/p.

                            E(X) =        i=1
                                   =      i=1
                                   =   p    i=1
                                   = pp-2
                                   = 1/p

Definition: The random variables X and Y on a sample space are independent if
p(X(s)=r1 and Y(S)=r2) = p(X(S)=r1)p(Y(S)=r2).

Theorem: If X and Y are independent random variables on a space S, then
E(XY) = E(X)E(Y).

Proof: From the definition of expected value and since X and Y are independent
random variables,

              E(XY) =     sS X(s)Y(s)p(s)
                      =   rX(S),tY(S)rtp(X(s)=r and Y(s)=t)
                      =   rX(S),tY(S)rtp(X(s)=r)p(Y(s)=t)
                      =   
                             rX(S)rp(X(s)=r)  tY(S) tp(Y(s)=t)
                      = E(X)E(Y).

Third Assumption: Not all problems can be solved using deterministic
algorithms. We want to assess the probability of an event based on partial

Note: Some algorithms need to make random choices and produce an answer
that might be wrong with a probability associated with its likelihood of
correctness or an error estimate. Monte Carlo algorithms are examples of
probabilistic algorithms.

Example: Consider a city with a lattice of streets. A drunk walks home from a
bar. At each intersection, the drunk must choose between continuing or turning
left or right. Hopefully, the drunk gets home eventually. However, there is no
absolute guarantee.

Example: You receive n items. Sometimes all n items are guaranteed to be good.
However, not all shipments have been checked. The probability that an item is
bad in an unchecked batch is 0.1. We want to determine whether or not a
shipment has been checked, but are not willing to check all items. So we test
items at random until we find a bad item or the probability that a shipment
seems to have been checked is 0.001. How items do we need to check? The
probability that an item is good, but comes from an unchecked batch is 10.1 =
0.9. Hence, the kth check without finding a bad item, the probability that the
items comes from an unchecked shipment is (0.9)k. Since (0.9)66~0.001, we must
check only 66 items per shipment.

Theorem: If the probability that an element of a set S does have a particular
property is in (0,1), then there exists an element in S with this property.

Bayes Theorem: Suppose that E and F are events from a sample space S such
that p(E)  0 and p(F)  0. Then

               p(F|E) = p(E|F)p(F) / (p(E|F)p(F) + p(E| F )p( F )).

Generalized Bayes Theorem: Suppose that E is an event from a sample space
and that F1, F2, …, Fn are mutually exclusive events such that Ui=1Fi = S .
Assume that p(E)  0 and p(Fi)  0, 1in. Then
                   p(Fj|E) = p(E| Fj)p(Fj) /      i=1
                                                      p(E|Fi )p(Fi ) .

Example: We have 2 boxes. The first box contains 2 green and 7 red balls. The
second box contains 4 green and 3 red balls. We select a box at random, then a
ball at random. If we picked a red ball, what is the probability that it came from
the first box?

   Let E be the event that we chose a red ball. Thus, E is the event that we
    chose a green ball. Let F be the event that we chose a ball from the first box.
    Thus, F is the event that we chose a ball from the second box. p(F) = p( F )
    = 0.5 since we pick a box at random.
   We want to calculate p(F|E) = p(EF) / p(E), which we will do in stages.
   p(E|F) = 7/9 since there are 7 red balls out of 9 total in box 1. p(E| F ) = 3/7
    since there are 3 red balls out of a total of 7 in box 2.
   p(EF) = p(E|F)p(F) = 7/18 = 0.389 and p(E F ) = p(E| F )p( F ) = 3/14.
   We need to find p(E). We do this by observing that E = (EF)(E F ),
    where EF and E F are disjoint sets. So, p(E) = p(EF)+p(E F ) = 0.603.
   p(F|E) = p(EF) / p(E) = 0.389 / 0.603 = 0.645, which is greater than the
    0.5 from the second bullet above. We have improved our estimate!

Example: Suppose one person in 100,000 has a particular rare disease and that
there is an accurate diagnostic test for this disease. The test is 99% accurate
when given to someone with the disease and is 99.5% accurate when given to
someone who does not have the disease. We can calculate
(a) the probability that someone who tests positive has the disease, and
(b) the probability that someone who tests negative does not have the disease.
Let F be the event that a person has the disease and let F be the event that this
person tests positive. We will use Bayes theorem to calculate (a) and (b), so
have to calculate p(F), p( F ), p(E|F), and p(E| F ).

   p(F) = 1 / 100000 = 105 and p( F ) = 1  p(F) = 0.99999.
   p(E|F) = 0.99 since someone who has the disease tests positive 99% of the
    time. Similarly, we know that a false negative is p( E |F) = 0.01. Further,
    p( E | F ) = 0.995 since the test is 99.5% accurate for someone who does not
    have the disease.
   p(E| F ) = 0.005, which is the probability of a false negative (100  99.5%).

Now we calculate (a):

    p(F|E) = p(E|F)p(F) / (p(E|F)p(F) + p(E| F )p( F )) =
             (0.99105) / (0.99105 + 0.0050.99999) = 0.002.

Roughly 0.2% of people who test positive actually have the disease. Getting a
positive should not be an immediate cause for alarm (famous last words).

Now we calculate (b):

    p( F | E ) = p( E | F )p( F ) / (p( E | F )p( F ) + p( E |F)p(F))
                (0.9950.99999) / (0.9950.99999 + 0.01105) = 0.9999999.

Thus, 99.99999% of people who test negative really do not have the disease.

Bayesian Spam Filters used to be the first line of defense for email programs.
Like many good things, the spammers ran right over the process in about two
years. However, it is an interesting example of useful discrete mathematics.

The filtering involves a training period. Email messages need to be marked as
Good or Bad messages, which we will denote as being the G or B sets.
Eventually the filter will mark messages for you, hopefully accurately.

The filter finds all of the words in both sets and keeps a running total of each
word per set. We construct two functions nG(w) and nB(w) that return the
number of messages containing the word w in the G and B sets, respectively.

We use a uniform distribution. The empirical probability that a spam message
contains the word w is p(w) = nB(w) / |B|. The empirical probability that a non-
spam message contains the word w is q(w) = nG(w) / |G|.

We can use p and q to estimate if an incoming message is or is not spam based
on a set of words that we build dynamically over time.

Let E be the event that an incoming message contains the word w. Let S be the
event that an incoming message is spam and contains the word w. Bayes
theorem tells us that the probability that an incoming message containing the
word w is spam is

                p(S|E) = p(E|S)p(S) / (p(E|S)p(S) + p(E| S )p( S )).

If we assume that p(S) = p( S ) = 0.5, i.e., that any incoming message is equally
likely to be spam or not, then we get the simplified formula

                       p(S|E) = p(E|S) / (p(E|S) + p(E| S )).

We estimate p(E|S) = p(w) and p(E| S ) = q(w). So, we estimate p(S|E) by

                          r(w) = p(w) / (p(w) + q(w)).

If r(w) is greater than some preset threshold, then we classify the incoming
message as spam. We can consider a threshold of 0.9 to begin with.

Example: Let w = Rolex. Suppose it occurs in 250 / 2000 spam messages and in
5 / 1000 good messages. We will estimate the probability that an incoming
message with Rolex in it is spam assuming that it is equally likely that the
incoming message is spam or not. We know that p(Rolex) = 250 / 2000 = 0.125
and q(Rolex) = 5 / 1000 = 0.005. So,

                r(Rolex) = 0.125 / (0.125 + 0.005) = 0.962 > 0.9.

Hence, we would reject the message as spam. (Note that some of us would reject
all messages with the word Rolex in it as spam, but that is another case entirely.)

Using just one word to determine if a message is spam or not leads to excessive
numbers of false positives and negatives. We actually have to use the
generalized Bayes theorem with a large set of words.

                  p(S | I
                               E)=              
                            i=1 i        k             k
                                        i=1        
                                            p(Ei|S)+ i=1p(Ei|S)

which we estimate assuming equal probability that an incoming message is
spam or not by

                    r(w1,w1,...,w1) =            i=1
                                                     p(wi )
                                            k           k
                                           i=1      
                                               p(wi )+ i=1q(wi )

Example: The word w1 = stock appears in 400 / 2000 spam messages and in just
60 / 1000 good messages. The word w2 = undervalued appears in 200 / 2000
spam messages and in just 25 / 1000 good messages. Estimate the likelihood that
an incoming message with both words in it is spam. We know p(stock) = 0.2 and
q(stock) = 0.06. Similarly, p(undervalued) = 0.1 and q(undervalued) = .025. So,

   r(stock,undervalued) = p(stock)p(undervalued)+q(stock)q(undervalued)

                        =        0.2 0.1
                          0.2 0.1+0.06 0.025
                        = 0.930 > 0.9

Note: Looking for particular pairs or triplets of words and treating each as a
single entity is another method for filtering. For example, enhance performance
probably indicates spam to almost anyone, but high performance computing
probably does not indicate spam to someone in computational sciences (but
probably will for someone working in, say, Maytag repair).

                      Advanced Counting Principles
Definition: A recurrence relation for the sequence {an} is the equation that
expresses an in terms of one or more of the previous terms in the sequence. A
sequence is called a solution to a recurrence relation if its terms satisfy the
recurrence relation. The initial conditions specify the values of the sequence
before the first term where the recurrence relation takes effect.

Note: Recursion and recurrence relations have a connection. A recursive
algorithm provides a solution to a problem of size n in terms of a problem size n
in terms of one more instances of the same problem, but of smaller size.
Complexity analysis of the recursive algorithm is a recurrence relation on the
number of operations.

Example: Suppose we have {an} with an = 3n, nN. Is this a solution for
an = 2an-1  an-2 for n2? Yes, since for n2,

                   2an-1  an-2 = 2(3(n1)) – 3(n2) = 3n = an.

Example: Suppose in 1977 you invested $100,000 into a tax free, 30 year
municipal bond that paid 15% per year. What is it worth at maturity? Did it beat
inflation and if so, by how much?
   P0 = 100000
   P1= 1.15P0
   P2 = 1.15P1 = (1.15)2P0
   Pi = (1.15)iP0, which can be rigorously proven using mathematical
   P30 = (1.15)30P0 = $6,621,180
This is a big number. What about inflation? We can find the consumer price
increase (CPI) monthly and yearly on the Internet, e.g., http://inflationdata.com.
Consider just the yearly CPI to make the comparison fairer.
   {Ij} the CPI per year
   Bj =  j=1I j = $354,580.
Investing your money in a bank that just beat inflation would have been a huge
investing error. 15% seems high, but that existed back then due to high inflation.

Fibonacci Example: A young pair of rabbits (1 male, 1 female) arrive on a
deserted island. They can breed after they are two months old and produce
another pair. Thereafter each pair at least two months old can breed once a
month. How many pairs fn of rabbits are there after n months.
   n = 1: f1 = 1           Initial
   n = 2: f2 = 1             conditions
   n > 2: fn = fn-1 + fn-2 Recurrence relation
The n > 2 formula is true since each new pair comes from a pair at least 2
months old.

Example: For bit strings of length n  3, find the recurrence relation and initial
conditions for the number of bit strings that do not have two consecutive 0’s.
   n = 1: a1 = 2            Initial                    {0,1}
   n = 2: a2 = 3              conditions               {01,10,11}
   n > 2: an = an-1 + an-2 Recurrence relation
For n > 2, there are two cases: strings ending in 1 (thus, examine the n1 case)
and strings ending in 10 (thus, examine the n2 case).

Definition: A linear homogeneous recurrence relation of degree k with constant
coefficients is a recurrence relation of the form

                          an = c1an1 + c2an2 + … + ckank,

where {ci}R.

Motivation for study: This type of recurrence relation occurs often and can be
systematically solved. Slightly more general ones can be, too. The solution
methods are related to solving certain classes of ordinary differential equations.


     Linear because the right hand side is a sum of previous terms.
     Homogeneous because no terms occur that are not multiples of aj’s.
     Constant because no coefficient is a function.
     Degree k because an is defined in terms of the previous k sequential terms.

Examples: Typical ones include

   Pn = 1.15Pn-1 is degree 1.
   fn = fn-1 + fn-2 is degree 2.
   an = an-5 is degree 5.

Examples: Ones that fail the definition include

   an = an-1 + a2 is nonlinear.
   Hn = 2Hn-1 + 1 is nonhomogeneous.
   Bn = nBn-1 is variable coefficient.

We will get to nonhomogeneous recurrence relations shortly.

Solving a recurrence relation usually assumes that the solution has the form

                                       an = rn,

where rC, if and only if

                         rn = c1rn-1 + c2rn-2 + … + cn-krn-k.

Dividing both sides by rn-k to simplify things, we get

Definition: The characteristic equation is

                         rk  c1rk-1  c2rk-2  …  cn-k = 0.

Then {an} with an = rn is a solution if and only if r is a solution to the
characteristic equation. The proof is quite involved.

The n = 2 case is much easier to understand, yet still multiple cases.

Theorem: Assume c1,c2,1,2R and r1,r2C. Suppose that r2c1rc2 = 0 has
two distinct roots r1 and r2. Then the sequence {an} is a solution to the
recurrence relation an = c1an-1 + c2an-2 if and only if an = 1r1 +  2r2 for nN0.
                                                                n       n

Example: a0 = 2, a1 = 7, and an = an-1 + 2an-2 for n2. Then

   Characteristic equation: r2 – r – 2 = 0 or (r2)(r1) = 0.
   Roots: r1 = 2 and r2 = 1.
   Constants: a0 = 2 = 1 + 2 and a1 = 7 = 21  2.
                                  
           1 1   1  =  2  or  1  =  3  .
   Solve                         
           2 -1  2 
                      7
                                  2
                                         -1
                                            

   Solution: an = 32n + (1)n.

Matlab or Maple is essential to solving recurrence relations quickly and

Fibonacci Example: f0 = 0, f1 = 1, and fn = fn-1 + fn-2, n2.

   Characteristic equation: r2 – r – 1 = 0.
   Roots: r1 = 1+ 5 and r2 = 1- 5 .
                  2              2
   Set up a 22 matrix problem to solve for 1 and 2, which are 1 = 1 and
     2 = 1 .
                                 n                    n
                                               
   Solution: fn = 1 1+ 5
                                    1 1 5 .
                                                 
                   5 2 
                                      5 2

Now comes the second case for n = 2.

Theorem: Assume c1,c2,1,2R and r0C. Suppose that r2c1rc2 = 0 has one
root r0 with multiplicity 2. Then the sequence {an} is a solution to the recurrence
relation an = c1an-1 + c2an-2 if and only if an = 1r0 +  2nr0 for nN0.
                                                     n        n

Example: a0 = 1, a1 = 6, and an = 6an-1  9an-2 for n2. Then

   Characteristic equation: r2  6r + 9 = 0 or (r3)2 = 0.
   Double root: r0 = 3.
   Constants: a0 = 1 = 1 and a1 = 6 = 31 + 32.
                                 
          1 0   1  =  1  or  1  = 1 .
   Solve                        
           3 3  2 
                     6 
                                 2
                                        1
                                           
   Solution: an = (n+1)3n.

Theorem: Let {ci}k , {i}k R and {ri}k C. Suppose the characteristic
                 i=i     i=i          i=i
equation rk – c1rk1 …  ck = 0 has k distinct roots ri, 1ik. Then the sequence
{an} is a solution of the recurrence relation an = c1an1 + c2an2 + … + ckank if
and only if an = 1r1 +  2r2 + ... +  krk for nN0.
                     n       n            n

Example: a0 = 2, a1 = 5, a2 = 15, and an = 6an1 11an2 + 6an3, n3.

   Characteristic equation: r3  6r2 +11r  6 = 0 or (r1)(r2)(r3) = 0.
   Roots: r1 = 1, r2 = 2, and r3 = 3.
   Constants: a0 = 2 = 1 + 2 + 3, a2 = 5 = 1 + 22 + 33, and
    a0 = 15 = 1 + 42 + 93.
          1 1 1  1             1 
                                        
                           2                1 
   Solve 1 2 3  2  =  5  or  2  = 1 .
                                        
                    3            3 
          1 4 9                       2 
                       15 
                                           
                                                
                                      
   Solution: an = 1  2n + 23n.

Theorem: Let {ci}k , {i}k R and {ri}k C. Suppose the characteristic
                 i=i     i=i          i=i
equation rk – c1rk1 …  ck = 0 has t distinct roots ri, 1it, with multiplicities
miN such that i=1mi = k . Then the sequence {an} is a solution of the
recurrence relation an = c1an1 + c2an2 + … + ckank if and only if

                                    m11 n
   an = (1,0 +1,1n+...+1,m 1n       )r1   + ... + ( t,0 + t,1n+...+ t,m 1nmt 1)rtn
                              1                                               t

for nN0 and all i,j, 1it and 0jmi1.

Example: Suppose the roots of the characteristic equation are 2, 2, 3, 3, 3, 5.
Then the general solution form is

                  (1,0+1,1n)2n + (2,0+2,1n+2,2n2)3n + 3,05n.

With given initial conditions, we can even compute the ’s.

Definition: A linear nonhomogeneous recurrence relation of degree k with
constant coefficients is a recurrence relation of the form

                     an = c1an1 + c2an2 + … + ckank + F(n),

where {ci}R.

Theorem: If {a(p)} is a particular solution of the recurrence relation with
constant coefficients an = c1an1 + c2an2 + … + ckank + F(n), then every solution
is of the form {a(p)+a(h)} , where {a(h)} is a solution of the associated
                     n    n                 n
homogeneous recurrence relation (i.e., F(n) = 0).

Note: Finding particular solutions for given F(n)’s is loads of fun unless F(n) is
rather simple. Usually you solve the homogeneous form first, then try to find a
particular solution from that.

Theorem: Assume {bi},{ci}R. Suppose that {an} satisfies the nonhomogeneous
recurrence relation

                     an = c1an1 + c2an2 + … + ckank + F(n)
                     f(n) = (btnt + bt-1nt-1 + … + b1n + b0)sn.

When s is not a root of the characteristic equation of the associated
homogeneous recurrence relation, there is a particular solution of the form

                        (ptnt + pt-1nt-1 + … + p1n + p0)sn.

When s is a root of multiplicity m of the characteristic equation, there is a
particular solution of the form

                       nm(ptnt + pt-1nt-1 + … + p1n + p0)sn.

Note: If s = 1, then things get even more complicated.

Example: Let an = 6an-1 – 9an-2 + F(n). When F(n) = 0, the characteristic equation
is (r3)2. Thus, r0 = 3 with multiplicity 2.

     F(n) = 3n:      particular solution is n2p03n.
     F(n) = n3n:     particular solution is n2(p1n + p0)3n.
     F(n) = n22n:    particular solution is (p2n2 + p1n + p0)2n.
     F(n) = (n+1)3n: particular solution is n2(p2n2 + p1n + p0)3n.

Definition: Suppose a recursive algorithm divides a problem of size n into m
subproblems of size n/m each. Also suppose that g(n) extra operations are
required to combine the m subproblems into a solution of the problem of size n.
If f(n) is the cost of solving a problem of size n, then the divide and conquer
recurrence relation is f(n) = af(n/b) + g(n).

We can easily work out a general cost for the divide and conquer recurrence
relation using Big-Oh notation.

Divide and Conquer Theorem: Let a,b,cR and be nonnegative. The solution to
the recurrence relation
                                       c,        for n = 1,
                        f(n) = 
                                af(n/b)+cnd, for n > 1,
for n a power of b is
                                   O(nd ), for a < bd,
                         f(n)= O(ndlogn), for a = bd,
                                     logba             d
                                O(n
                                            ), for a > b .

                                                             log n
Proof: If n is a power of b, then for r = a/b, f(n) = cn i=1b r i . There are 3 cases:

                       • i
   a < bd: Then      i=0
                           r   converges, so f(n) = O(nd).
   a = bd: Then each term in the sum is 1, so f(n) = O(ndlogn).
         d         d logbn r i = cnd  r
   a > b : Then cn i=1                        -1 which is O( alogbn ) or O( nlogba ).

Example: Recall binary search (see page 45 in the class notes). Searching for an
element in a set requires 2 comparisons to determine which half of the set to
search further. The search keeps halving the size of the set until at most 1
element is left. Hence, f(n) = f(n/2) + 2. Using the Divide and Conquer theorem,
we see that the cost is O(logn) comparisons.

Example: Recall merge sort (see pages 81-83 in the class notes). This sorts
halves of sets of elements and requires less than n comparisons to put the two
sorted sublists into a sorted list of size n. Hence, f(n) = 2f(n/2) + n. Using the
Divide and Conquer theorem, we see that the cost is O(nlogn) comparisons.

Multiplying integers can be done recursively based on a binary decomposition
of the two numbers to get a fast algorithm. The patent on this technique,
implemented in hardware, made a computer company several billion dollars
back when a billion dollars was real money (cf. a trillion dollars today).

Why stop with integers? The technique extends to multiplying matrices, too,
with real, complex, or integer entries.

Example (funny integer multiplication): Suppose a and b have 2n length binary
representations a = (a2n1a2n2… a1a0)2 and a = (b2n1b2n2… b1b0)2. We will
divide a and b into left and right halves:

                   a = 2nA1 + A0 and , where b = 2nB1 + B0 and
               A1 = (a2n1a2n2…an+1an)2 and A0 = (an-1an2…a1a0)2,
               B1 = (b2n1b2n2…bn+1bn)2 and B0 = (bn-1bn2…b1b0)2.

The trick is to notice that

              ab = (22n+2n)A1B1 + 2n(A1A0)(B0B1) + (2n+1)A0B0.

Only 3 multiplies plus adds, subtracts, and shifts are required. So, f(2n) = 3f(n) +
Cn, where C is the cost of the adds, subtracts, and shifts. The Divide and
Conquer theorem tells us this O(nlog3), which is about O(n1.6). The standard
algorithm is O(n2). It might not seem like much of an improvement, but it
actually is when lots of integers are multiplied together. The trick can be applied
recursively on the three multiplies in the ab line (halving 2n in the recursion).

Example (Strassen-Winograd Matrix-Matrix multiplication): We want to
multiply A: mk by B: kn to get C: mn. The matrix elements can be reals,
complex numbers, or integers. When m = k = n, this takes O(n3) operations
using the standard matrix-matrix multiplication algorithm. However, Strassen
first proposed a divide and conquer algorithm that reduced the exponent. The
belief is that someday, someone will devise an O(n2) algorithm. Some hope it
will even be plausible to use such an algorithm. The variation of Strassen’s
algorithm that is most commonly implemented by computer vendors in high
performance math libraries is the Winograd variant. It computes the product as

                         A11 A12   B11 B12  C11 C12 
                                                     
                                            =       .
                         A21 A22  B21 B22  C21 C22 
                                                    

C is computed in 22 steps involving the submatrices of A, B, and intermediate
temporary submatrices. An interesting question for many years was how little
extra memory was needed to implement the Strassen-Winograd algorithm (see
C. C. Douglas, M. Heroux, G. Slishman, and R. M. Smith, GEMMW: A

portable Level 3 BLAS Winograd variant of Strassen's matrix-matrix multiply
algorithm, Journal of Computational Physics, 110 (1994), pp. 1-10 for an

The 22 steps are the following:

             Step Wmk C11         C12   C21   C22   Wkn Operation
               1                                    S7   B22B12
               2  S3                                    A11A21
               3                        M4                S3S7
               4  S1                                    A21+A22
               5                                    S5   B12B11
               6                              M5          S1S5
               7                                    S6   B22S5
               8  S2                                     S1A11
               9      M1                                  S2S6
              10  S4                                     A12S2
              11                  M6                      S4B22

             Step Wmk C11       C12   C21   C22   Wkn Operation
              12                T3                     M5+M6
              13 M2                                    A11B11
              14      T1                               M1+M2
              15                C12                    T1+T3
              16      T2                               T1+M4
              17                                  S8   S6B21
              18                      M7               A22S8
              19                      C21              T2M7
              20                            C22        T2+M5
              21      M3                               A12B21
              22      C11                              M2+M3

There are four tricky steps in the table above, depending on whether or not k is
even or odd. Each step makes certain that we do not use more memory than is
allocated for a submatrix or temporary. For example,

   In step 4, we have to take care that with S1. (a) If k is odd, then copy the
    first column of A21 into Wmk. (b) Complete S1.
   In step 10, we have to take care that with S4. (a) If k is odd, then pretend the
    first column of A21 = 0 in Wmk. (b) Complete S4.
   In step 11, we have to take care that with M6. (a) If m is odd, then save the
    first row of M5. (b) Calculate most of M6. (c) Complete M6 using (a) based
    on whether or not m is odd.
   In step 21, we have to take care that with M3. (a) Caluclate M3 using an
    index shift.

This all sounds very complicated. However, the code GEMMW that is readily
available on the Web effectively is implemented in 27 calls to subroutines that
do the matrix operations and actually implements

                            C = op(A)op(B) + C,

where op(X) is either X, X transpose, X conjugate, or X conjugate transpose.

What is the total cost?

   There are 7 submatrix-submatrix multiplies and 15 submatrix-submatrix
    adds or subtracts. So the cost is f(n) = 7f(n/2) + 15n2/4 when m=k=n. This is
    actually an O(n2.807logn) algorithm, where log27 = 2.807.
   The work area Wmk needs ((m+1)max(k,n)+m+4)/4 space.
   The work area Wkn needs ((k+1)n+n+4)/4 space.
   If C overlaps A or B in memory, an additional mn space is needed to save C
    before calculating C when 0.
   The maximum amount of extra memory is bounded by
    (mmax(k,n)+kn)/3+(m+max(k,n)+k+3n)/2+32+mn. Hence, the overall
    extra storage is cN2/3, where c{2,5}.
   Typical memory usage when m=k=n is
       o 0 or A or B overlap with C: 1.67N2.
       o =0 and A and B do not overlap with C: 0.67N2.

Definition: The (ordinary) generating function for a sequence a1, a2, …, ak, … of
real numbers is the infinite series G(x) = k=0 akxk . For a finite sequence
{a k }n , the generating function is G(x) = k=0 akxk .

  1. ak = 3, G(x) = 3k=0 xk .
  2. ak = k+1, G(x) = k=0 (k+1)xk .
  3. ak = 2k, G(x) = k=0 (2x)k .
                    G(x) = k=0 xk = x 1 .
                            2         3
  4. ak = 1, 0k2,
                                     x 1


   x is a placeholder, so that G(1) in example 4 above is undefined does not
   We do not have to worry about convergence of the series, either.

   When solving a series using calculus, knowing the ball of convergence for
    the x’s is required.

Lemma: f(x) = (1ax)1 is the generating function for the sequence 1, (ax), (ax)2,
…, (ax) , … since for a0 and |ax|<1,
                                      (1-ax)1 = k=0 (ax)k .

                                              
Theorem: If f(x) = k=0 akxk and g(x) = k=0 bkxk and f and g share the same
ball of convergence, then

                                                               k
        f(x) + g(x) = k=0 (ak +bk )xk and f(x)g(x) = k=0 ( j=0 a jbk-j)xk .

Example: Let f(x) = (1-x)2 be the generating function. What is the sequence?
Consider the sequence 1, 1, …, 1, …, which has a generating function of g(x) =
(1-x)1. We can use the previous theorem to answer our question:
                              •   k            •
             (1-x)2 = k=0 ( j=01)xk = k=0 (k+1)xk or ak = k+1.

                                               u 
Definition: The extended binomial coefficient  for uR and kN0 is defined
                                                 

                       u  
                            u(u-1)L (u-k+1)/k! if k > 0,
                        = 
                       k
                                  1          if k = 0.

Extended   Binomial          Theorem:   If     u,xR   such   that   |x|<1,   then
          •  u k
(1+x)u =
         k=0  k x .

      
  1.  .5 = (.5)(.5)/2! = .125 .
      2
      

  2.  n = (1)r  n+r 1 = (-1)rC(n+r 1,r) for nN.
                        
      r 
                    r 
                          

  3. if uN, then the extended binomial theorem is equivalent to the binomial
                    u
     theorem since   = 0 when k>u.
                    k
                    
  4. (1 x)n = k=0 C(n+k 1,k)xk (uses examples 2 and 3).

Other Useful Generating Functions:

    1 xn+1 = n xk .
     1 x      k=0
   (1(ax)r )1 = k=0 (ax)rk .
   (1(ax)r )n = k=0 C(n+k 1,k)(ax)rk .
   (1+(ax)r )n = k=0 C(n,k)(ax)rk .
   (1+(ax)r )n = k=0 (1)kC(n+k 1,k)(ax)rk .
           • xk
   e x =
          k=0 k! .
                  • (1)k xk
   ln(x+1) = k=0           .

Note: Generating functions can be used to solve many counting problems.


   How many solutions are there to the constrained problem a+b = 9 for 3a5
    and 4b6? There are 3 total. The number of solutions with the constraints
    is the coefficient of x9 in (x3+x4+x5)(x4+x5+x6). We choose xa and xb from
    the two factors, respectively, so that a+b = 9. By inspection, there are only 3
    choices for a and b.
   How many ways can 8 CPUs be distributed in 3 servers if each server gets
    2-4 CPUs each? The generating function is f(x) = (x2+x3+x4)3. We need the
    coefficient of x8 in f(x). Expansion of f(x) gives us 6 ways.

Note: Maple or Mathematica is really useful in the examples above.

Note: Generating functions are useful in solving recurrence relations, too.
Example: ak = 3ak1, k > 0 with a0 = 2. Let f(x) = k=0 akxk be the generating
function for {ak}. Then xf(x) = k=1ak1xk . Using the recurrence relation
directly, we have
                                       •            •
                  f(x) – 3xf(x) =                 
                                           akxk  3 k=1ak1xk
                                 =           
                                     a0 + k=1(ak  3ak1)xk
                                 = a0

Hence, f(x)  3xf(x) = (13x)f(x) = 2 or f(x) = 2 / (13x). Using the identity for
(1ax)1, we see that
                        f(x) = k=0 23k xk or ak = 23k .

Example: an = 8an1 + 10n1 with a0 = 1, which gives us a1 = 9. Find an in closed
form. First multiply the recurrence relation by xn to give us
anxn + 8an1xn + 10n-1xn . If f(x) = k=0 akxk , then

                        f(x)  1 =
                                        a xk
                                     k=1 k
                                =   k=1
                                        (8ak-1xk +10k-1xk )
                                = 8xf(x) + x/(110x)


                   f(x) =         1 9x
                            (1 8x)(110x)
                         = 1 1 + 1         
                            2  1 8x 110x 
                                             

                         = k=0 1  8k +10k  xk

                   or    an = .5(8k+10k).

Note: It is possible to prove many identities using generating functions.

Exclusion-Inclusion Theorem: Given sets Ai, 1in, the number of elements in
the union is

                   n               n              n
                 U    A
                   i=1 i
                                    A
                                   i=1 i
                                            1Ši<jŠn Ai I A j
                                           + 1Ši<j<kŠn Ai I A j I Ak
                                           + (1)k I      A
                                                       i=1 i

and there are 2n1 terms in the formula.

Note: Venn diagrams motivate the above theorem.

Example: A factory produces vehicles that are car or truck based: 2000 could be
cars, 4000 could be trucks, and 3200 are SUV’s, which can be car or truck based
(depending on the frames). How many vehicles were produced? Let A1 be the
number of cars and A2 be the number of trucks. There are

        A1 UA2 = A1 + A2  A1 I A2 = 2000 + 4000  3200 = 2800 .

Theorem: The number of onto functions from a set of m elements to a set of n
elements with m,nN is

        nm  C(n,1)(n1)m1 + C(n,2) )(n1)m1  … + (1)n1C(n,n1).

Definition: A derangement is a permutation of objects such that no object is in
its original position.

Theorem: The number of derangements of a set of n elements is

                                                 
                           Dn = 1 
                                       n      k 1  n!
                                          (1) 
                                       k=1      k!

Example: I hand back graded exams randomly. What is the probability that no
student gets his or her own exam? It is Pn = Dn / n! since there are n! possible
permutations. As n, Pne1.


Definition: A relation on a set A is a subset of AA.

Definition: A binary relation between two sets A and B is a subset of AB. It is
a set R of ordered pairs, denoted aRb when (a,b)R and aRb when (a,b)R.

Definition: A n-ary relation on n sets A1, …, An is a subset of A1…An. Each
Ai is a domain of the relation and n is the degree of the relation.


   Let f: AB be a function. Then the ordered pairs (a,f(a)), aA, forms a
    binary relation.
   Let A = {Springfield} and B = {U.S. state | Springfield in the state}. Then
    (Springfield,U.S. states) is a relation with about 44 elements (the so-called
    Simpsons relation).
Theorem: Let A be a set with n elements. There are 2n2 unique relations on A.

Proof: We know there are n2 elements in AA and that there are 2m possible
subsets of a set with m elements. Hence, the result.

Definitions: Consider a relation R on a set A. Then
  R is reflexive if (a,a)R, aA.
  R is symmetric if (a,b)R and (b,a)R, a,bA.
  R is antisymmetric if (a,b)R and (b,a)R, then a=b, a,bA.
  R is transitive if (a,b)R and (b,c)R, then (a,c)R, a,b,cA.

Theorem: Let A be a set with n elements. There are 2n(n1) unique transitive
relations on A.

Proof: Each of the n pairs (a,a)R. The remaining n(n1) pairs may or may not
be in R. The product rule and previous theorem give the result.

Examples: Let A = {1, 2, 3, 4}.

   R1 = {(1,1), (1,2), (2,1), (2,2), (3,4), (4,1), (4,4)} is
      o just a relation
   R2 = {(1,1), (1,2), (2,1)} is
      o symmetric
   R3 = {(1,1), (1,2), (1,4), (2,1), (2,2), (3,3), (4,1), (4,4)} is
      o reflexive and symmetric
   R4 = {(2,1), (3,1), (3,2), (4,1), (4,2), (4,3)} is
      o antisymmetric and transitive
   R5 = {(1,1), (1,2), (1,3), (1,4), (2,1), (2,2), (2,3), (2,4), (3,3), (3,4), (4,1),
    (4,4)} is
      o reflexive, antisymmetric, and transitive
   R6 = {(3,4)} is
      o antisymmetric

Note: We will come back to these examples when we get around to
representations of relations that work in a computer.

Note: We can combine two or more relations to get another relation. We use
standard set operations (e.g., , , , , …).

Definition: Let R be a relation on a set A to B and S a relation on B to a set C.
Then the composite of R and S is the relation SoR such that if (a,b)R and
(b,c)S, then (a,c) SoR , where aA, bB, and cC.

Definition: Let R be a relation on a set A. Then Rn is defined recursively: R1 = R
and Rn =Rn1 oR , n>1.

Theorem: The relation R is transitive if and only if RRn, n1.

Representation: The relation R from a set A to a set B can be represented by a
zero-one matrix MR = [mij], where
                                  1 if (a i,b j)R,
                             mij= 
                                  0 if (a ,b )R.
                                           i j


   This is particularly useful on computers, particularly ones with hardware bit
    operations for packed words.
   MR contains I for reflexive relations.
   MR = MR for symmetric relations.

   mij = 0 or mji = 0 when ij for antisymmetric relations.


        1 1 0
             
  MR = 1 1 1  is transitive and symmetric.
             
        0 1 1
             
               
                
        0 1 0
                
  MR = 0 0 0 is antisymmetric.
        0 1 0
                

Representation: A relation can be represented as a directed graph (or digraph).
For (a,b)R, a and b are vertices (or nodes) in the graph and a directional edge
runs from a to b.

Example: The following digraph represents {(a,b), (b,c), (c,a), (c,b)}.

                        a             b


What about all of those examples on page 159 of the class notes? We can do all
of them over in either representation.

Examples (from page 159):

       1     1   0   0
   M =1     1   0   0
      R1 
       0      0   0   1
                      
       1 
              0   0   1
       1     1   0   0
   M =1     0   0   0  or a digraph a 
                                                   a2
      R2                               1
       0     0   0   0
       0  
              0   0   0
       1     1   0   1
   M =1     1   0   0
      R3 
       0      0   1   0
                      
       1 
              0   0   1

      0   0   0   0
 MR = 1
          0   0   0
      1   1   0   0
       1   1   1   0
      1   1   1   1
 MR = 1
          1   1   1
      0   0   1   0
       1   0   0   1
      0   0   0   0
 MR = 0
          0   0   0  or the digraph a 
                                            a4
      0   0   0   1
       0   0   0   0

Definition: A relation on a set A is an equivalence relation if it is reflexive,
symmetric, and transitive. Two elements a and b that are related by an
equivalence relation are called equivalent and denoted a~b.


   Let A = Z. Define aRb if and only if either a = b or a = b.
      o symmetric: aRa since a = a.
      o reflexive: aRb  bRa since a = b.
      o transitive: aRb and bRc  aRc since a = b = c.
   Let A = R. Define aRb if and only if abZ.
      o symmetric: aRa since aa = 0Z.
      o reflexive: aRb  bRa since abZ  (ab) = baZ.
      o transitive: aRb and bRc  aRc since (ab)+(bc) Z  acZ.

Definition: Let R be an equivalence relation on a set A. The set of all elements
that are related to an element aA is called the equivalence class of a and is
denoted by [a]R. When R is obvious, it is just [a]. If b[a]R, b is called a
representative of this equivalence class.

Example: Let A = Z. Define aRb if and only if either a = b or a = b. There are
two cases for the equivalence class:
   [0] = {0}
   [a] = {a, a} if a0.

Theorem: Let R be an equivalence relation on a set A. For a,bA, the following
are equivalent:
       1. aRb
       2. [a] = [b]
       3. [a]  [b]  .

Proof: 1  2  3  1.
           1  2: Assume aRb. Suppose c[a]. Then aRc. Due to symmetry,
             we know that bRa. Knowing that bRa and aRc, by transitivity,
             bRc. Hence, c[b]. A similar argument shows that if c[b], then
             c[a]. Hence, [a] = [b].
           Assume that [a] = [b]. Since aA and R is reflexive, [a]  [b]  .
           Assume [a]  [b]  . So there is a c[a] and c[b], too. So, aRc
             and bRc. By symmetry, cRb. By transitivity, aRc and cRb, so aRb.

Lemma: For any equivalence relation R on a set A,   UaA[a]R =A .
Proof: For all aA, a[a]R.

Definition: A partition of a set S is a collection of disjoint sets whose union is A.

Theorem: Let R be an equivalence relation on a set S. Then the equivalence
classes of R form a partition of S. Conversely, given a partition {Ai | iI} of the
set S, there is an equivalence relation R that has the sets Ai, iI, as its
equivalence classes.

Definition: A graph G = (V,E) consists of a nonempty set of vertices V and a set
of edges E. Each edge has either one or two vertices as endpoints. An edge
connects its endpoints.

Note: We will only study finite graphs (|V| < ).


   A simple graph has edges that connects two different vertices and no two
    edges connect the same vertex.
   A multigraph has multiple edges connecting the same vertices.
   A loop is a set of edges from a vertex back to itself.
   A pseudograph is a graph in which the edges do not have a direction
    associated with them.
   An undirected graph is a graph in which the edges do not have direction.
   A mixed graph has both directed and undirected edges.

Definition: Two vertices u and v in an undirected graph G are adjacent (or
neighbors) in G if u and v are endpoints of an edge e in G. Edge e is incident to
{u,v} and e connects u and v.

Definition: The degree of a vertex v, denoted deg(v), in an undirected graph is
the number of edges incident with it except that loops contribute twice to the
degree of that vertex. If deg(v) = 0, then it is isolated. If deg(v) = 1, then it is a

Handshaking Theorem: If G = (V,E) is an undirected graph with e edges, then
e= vVdeg(v) /2 .
             
               

Proof: Each edge contributes 2 to the sum since it is incident to 2 vertices.

Example: Let G = (V,E). Suppose |V| = 100,000 and deg(v) = 4 for all vV.
Then there are (4100,000)/2 = 200,000 edges.

Theorem: An undirected graph has an even number of vertices and an odd

Definition: Let (u,v)E in a directed graph G(V,E). Then u and v are the initial
and terminal vertices of (u,v), respectively. The initial and terminal vertices of a
loop (u,u) are both u.

Definition: The in-degree of a vertex, denoted deg(v), is the number of edges
with v as their terminal vertex. The out-degree of a vertex, denoted deg+(v), is
the number of edges with v as their initial vertex.

Theorem: For a directed graph G(V,E),    vVdeg(v) = vVdeg+(v) =        E.

Examples of Simple Graphs:

   A complete graph has an edge between any vertex.
   A cycle Cn is a graph with |V|3 such that the n edges are from {v1,v2},
    {v2,v3}, …, {vn,v1}.

   A wheel Wn is a cycle Cn with an extra vertex with an edge connecting to
    each vertex in Cn.

Definition: A simple graph G = (V,E) is bipartite if V = V1V2 with V1V2 = 
and every edge in the graph connects a vertex in V1 to a vertex in V2. The pair
(V1,V2) is a bipartition of V in G.

Theorem: A simple graph is bipartite if and only if it is possible to assign one of
two colors to each vertex of the graph so that no two adjacent vertices are
assigned the same color.

Definition: The union of two simple graphs G = (V,E) and H = (W,F) is the
simple graph GH = (VW,EF).

Representation: For graphs without multiple edges we can use adjacency lists or
matrices. For general graphs we can use incidence matrices.

Definition: Let G(V,E) have no multiple edges. The adjacency list LG = {av}vV,
where av = adj(v) = {wV | w is adjacent to v}.

Definition: Let G(V,E) have no multiple edges. The adjacency matrix AG = [aij]
                              1 if {vi,v j} is an edge of G,
                       aij= 
                            0            otherwise.


    v1      v2                 
                                  1   1    0
                                                              v1: v2,v3
                                            1  and L =      v2: v1,v4
               results in AG = 1
                                  0   0      
                                                              v3: v1,v4 .
                                  0   0    1           
    v4      v3                 0
                                  1   1    0
                                                              v4 : v2,v3
                                             
                                                         

Note: For an undirected graph, AG = AG . However, this is not necessarily true
for a directed graph.

Definition: The incidence matrix M = [mij] for G(V,E) is

              1 when edge ei is incident with v j,
    mij=   
               0             otherwise.

Definition: The simple graphs G(V,E) and H = (W,F) are isomorphic if there is
an isomorphism f: VW, a one to one, onto function, such that a and b are
adjacent in G if and only if f(a) and f(b) are adjacent in H for all a,bV.

        v1      v2         v1   v2
                    and             are not isomorphic.
          v4    v3         v3   v4

          v1    v2         v1   v2
                    and             are isomorphic.
          v3    v4         v4   v3

Note: Isomorphic simple graphs have the same number of vertices and edges.

Definition: A property preserved by graph isomorphism is called a graph

Note: Determining whether or not two graphs are isomorphic has exponential
worst case complexity, but linear average case complexity using the bet
algorithms known.

Definition: Let G = (V,E) be an undirected graph and nN. A path of length n
for u,vV is a sequence of edges e1, e2, …, enE with associated vertices in V
of u = x0, x1, …, xn = v. A circuit is a path with u = v. A path or circuit is simple
if all of the edges are distinct.


   We already defined these terms for directed graphs.
   The terminal vertex of the first edge in a path is the initial vertex of the
    second edge. We can define a path using a recursive definition.

Definition: An undirected graph is connected if there is a path between every
pair of distinct vertices in the graph.

Theorem: There is a simple path between every distinct pair of vertices of a
connected undirected graph G = (V,E).

Proof: Let u,vV such that u  v. Since G is connected, there is a path from u to
v that has minimum length n. Suppose this path is not simple. Then in this
minimum length path, there is some pair of vertices xi=xjV for some 0i<j n.
Hence, there is a shorter path from u to v, which is a contradiction.

Definition: A connected component of a graph is a connected subgraph of G that
is not a proper subgraph of another connected subgraph of G.

Note: A connected component is a maximally connected subgraph.

Example: Telecoms analyze call graphs routinely in order to provide better, less
expensive services. The old AT&T used to publish information routinesly
(typically by Bell Labs researchers). One of their recent published graphs G =
(V,E) had |V| ~ 54,000,000 with |E| ~ 170,000,000. G had approximately
3,700,000 connected subgraphs. Most of the subgraphs were of size 2 or just
slightly larger. However, one was of size approximated 45,000,000 with all of
the vertices being connected with less or equal to 20 calls.

Note: Sometimes removing a vertex v and all of the edges incident to v produces
a subgraph with more connected components that the original graph. The vertx v
is called a cut vertex or an articulation point.

Definition: A directed graph G = (V,E) is strongly connected if there are paths
from both u to v and v to u for all distinct u,vV. G is weakly connected if there
is a path between and two distinct vertices in the underlying undirected graph.
The maximal strongly connected subgraphs of G are strongly connected

Theorem: Let G = (V,E) be a graph with adjacency matrix A. The number of
different paths of length n from vi to vj, where vi,vjV and nN, is the (i,j) entry
in An.


     v1      v2      0    1   1   0
                                                8 0   0   8
                  A= 1    0   0   1  and A4 = 0 8
                                                     8   0
                     1 
                          0   0   1          0 8
                                                      8   0
    v4       v3      0    1   1   0          8 0   0   8
                                              
                                                           

Note: The theorem can be used to find the shortest path between any two
vertices and also to determine if a graph is connected.

Definition: Let G = (V,E) have an associated weighting function w(u,v):
VVR. G is called a weighted graph. The weighted length of a path in G is
the sum of the weights for the edges in the path.

Example: Let G = (V,E) be a weighted graph where V represents airports. Then
some interesting weighting functions include the following between pairs of
distinct airports:

     Distance
     Flight times
     Airfares
     Frequent flier miles
     Frequent flier qualification miles

Note: Weighted graphs are extremely important in analyzing transportation of
goods and people and trying to minimize time and expenses.

Dijkska’s Algorithm (Shortest Path) – [published in 1959]

Procedure Dijkstra( G = (V,E) with w: VVR+. G is a weighted connected
                      simple graph,
                      a,zV: initial and terminal vertices )
    for i := 1 to n
         L(i) := 
    L(a) := 0
    S := 
    while zS
         u := a vertex not in S with L(u) minimal
         S := S{u}
         for all vV such that vS
              if L(u) + w(u,v) < L(v) then L(v) := L(u,v) + w(u,v)
    { L(z) = length of shortest path from a to z. }

Theorem: Dijkstra’s algorithm finds the length of the shortest path between two
vertices in a connected simple undirected weighted graph. The algorithm uses
O(n2) comparison and addition operations.

Traveling Salesman Problem: Find the circuit of minimum total weight in a
weighted complete undirected graph that visits every vertex exactly once and
returns to its starting vertex.

Note: There are n! possible circuits to consider, which is intractable when n is
sufficiently large. A tremendous amount of research has been devoted to finding
fast approximate solution algorithms. The best ones can produce a circuit of
length 1,000 in a few seconds and still be within 2% of the optimum circuit.

Definition: A coloring of a simple graph is the assignment of a color to each
vertex of the graph so that no adjacent vertices are assigned the same color.

Definition: The chromomatic number (G) is the least number of colors needed
for a coloring of the graph G = (V,E).

Definition: A planar graph is a graph that can be drawn in a plane with no edges
crossing in the picture.

Four Color Theorem: If G is a planar graph, then (G)  4.

Note: The Four Color Conjecture was made in the 1850’s and not proven until
1976. Like Fermat’s last theorem, this theorem became famous partly for how
many wrong proofs (some quite ingenious) were either published or submitted
for publication.

Definition: A tree is a connected undirected graph with no simple circuits. A
weighted tree is a tree with weights associated with the edges.


   An efficient data structure for searching a list.
     o Useful in encoding data for transmission.
     o Computational complexity easily determined for algorithms using trees.
   Weighted trees have edges with weights.
     o Useful in decision making.
     o Used by telecoms to dynamically connect calls cheaply.

Historical Note: Trees were first developed in the context of this course to
describe molecules in chemistry, where atoms were the vertices and bonds were
the edges.

Theorem: An undirected graph T = (V,E) is a tree if and only if there is a unique
simple path between any two of its distinct vertices.

  1. Assume T is a tree, so it has no simple circuits. Since T is connected, for all
     distinct u,vV, there is exactly one simple path between u and v.
     Otherwise, there is another simple path. Combining the two simple paths is
     a circuit, which is a contradiction that T is a tree.
  2. Assume that there is a unique simple path between any two distinct vertices
     u,vV. The T is connected. T has no simple circuits since then there would
     be two simple paths between u and v (thus forming a crcuit), which is a

Definition: A rooted tree is a tree with one vertex designated as the root and
every edge is directed away from the root.

Note: Any tree can become a rooted tree by picking the right vertex as the root.

Terminology/Definitions: Let T = (V,E) be a rooted tree. Then

   If vV is not a root of, the parent wV of v is a vertex with an edge
    directed at v and v is a child of u.
   If viV are children of the same uV, they are siblings.
   The ancestors viV of uV are any vertices in V except the root which are
    in the path from the root to u.
   The descendents viV of uV are all vertices with u as an ancestor.
   A leaf vV is a vertex with no children.
   An internal vertice vV has children.
   A subtree is the subgraph formed from aV and all of its descendents and
    the edges incident to these descendents.
   The height of a rooted tree T, denoted h(T), is the maximum number of
    levels (or vertices).
   A balanced rooted tree T has all of its leaves at h(T) or h(T)-1.

Definition: A m-ary tree is a rooted tree such that every internal vertex has no
more than m children. A full m-ary tree is a rooted tree such that every internal
vertex has exactly m children. If m = 2, it is a (full) binary tree.

Definition: An ordered rooted tree is a rooted tree with an ordering applied to
the children of all of the children of the root and the internal vertices.


   Management charts
   Directory based file or memory systems

Theorem: A tree with n vertices has n1 edges.
The proof is by mathematical induction.

Theorem: A full m-ary tree with i internal vertices contains n = mi+1 vertices.

Proof: There are mi children plus the root.

Theorem: A full m-ary tree with
   n vertices has i = (n1)/m internal vertices and q = [(m1)n+1]/m leaves.
   i internal vertices has n = m+1 vertices and q = (m1)i + 1 leaves.
   q leaves has n = (mq1) / (m1) vertices and i = (q1) / (m1) internal

Theorem: There are at most mh leaves in a m-ary tree of height h.

The proof uses mathematical induction.

Corollary: If an m-ary tree of height h has q leaves, then h  logmq. For a full
m-ary and balnced m-ary tree, h = logmq.

Definition: A binary search tree T = (V,E) is a binary tree with a key for each
vertex. The keys are ordered such that a key for a vertex is greater in value than
all keys associated with its left subtree and less in value than all keys associated
with its right subtree. The key for vertex vV is denoted by label(v).

Note: Recursive algorithms search binary trees for keys in O(loghn) operations
for a binary tree of height h and with n vertices.

Notation: Let T = (V,E) be a binary tree.

   Let root(T) be the root vertex in T.
   Let left_child(v) and right_child(v) refer to the left or right child of a root or
    internal vertice v in a binary tree.
   Let add_new_vertex(parent, value) add a new left or right vertex to the
    parent vertex with a key of value. The details are left intentionally fuzzy.

Note: One of the most common operation with a binary tree is to search it.
Another is to search a binary tree for a key and add it if it is missing.

procedure insertion( T = (V,E): binary tree, x: item )
     v := root(T)
     while v   and label(v)  x
           if x < label(v) then
                 if left_child(v)   then
                       v := left_child(v)
                       add_new_vertex(left_child(v), x) and v = 
                 if right_child(v)   then
                       v := right_child(v)
                 add_new_vertex(right_child(v), x) and v = 
     if root(T) =  then
           add_new_vertex(T, x)
     else if v =  or label(v) =  then
           label the new vertex x and set v := the new vertex
     { v = location of x. }

Definition: A decision tree is a rooted tree in which the children are the possible
outcomes of their ancestors’ keys.

Note: There is usually a weighting associated with a decision tree. The keys may
not be unique.

Definition: A prefix code is an encoding based on bit strings representing
symbols such that a symbol, as a bit string, never occurs as the first part of
another symbol’s bit string.

Example: We can represent normally a-z in 5 bits and a-zA-Z in 6 bits. Suppose
we only have 3 letters: a = 0, c = 10, and t = 11. Then cat = 10011. Wowee! We
saved one whole bit!!!

Representation: Prefix codes form a binary tree.

Example: The prefix code for a = 0, c = 10, and t = 11 is stored as

        0         1
       a         
             0         1
            c        t

Definition: A Huffman coding takes the frequency of symbols and is the prefix
code with the smallest number of bits.

Note: Huffman coding was a course project by a graduate student at MIT in the
1950’s. Needless to say, his professor was stunned.

procedure Huffman(ai: symbols, wi: frequencies, 1in )
    F := forest of n rooted trees, each with a single vertex ai with weight wi
    while F  tree
         Replace the rooted trees T and T’ of least weights from F with w(T) 
         w(T’) with a tree T’’ having a new root that has T and T’ as it left and
         right children. Label the edge to T as 0 and the edge to T’ as 1.
         Assign w(T) + w(T’) to the new tree T’’
    { Huffman encoding tree is complete. }

Example: Given {(a,1), (c,2), (t,3)} as (symbol,frequency). What is the Huffman

Initial forest         (a,1)    (c,2)     (t,3)

Step1                 3         (t,3)
                 0         1
                 a        c

Step 2                6
                 0         1
                 a        
                      0         1
                      c        t

Note: Game trees are another highly studied tree.

Definition (Minimax Strategy): The value of a vertex in a game tree is defined
recursively as:
  1. The value of a leaf is the payoff to the first player when the game terminates
     in the position represented by this leaf.
  2. The value of an internal vertex at an even level is the maximum of the
     values of its children. The value of an internal vertex at an odd level is the
     inximum of the values of its children.

Theorem: The value of a vertex v of a game tree tells us the payoff to the first
player if both players follow the Minimax strategy and play starts from the
position represented by vertex v.

Notes: Game trees are
  Enormous (not just slightly, but really, really enormous)
  Lead to optimal solutions (if you can compute them)
  Basically intractable using standard computer

Note: Tree traversal is extremely important to accessing data. There are many
algorithms, each with a plus and a minus. We will study three traversal

     Preorder
     Inorder
     Postorder

These traversal methods not only are used for data storage, but for representing
arithmetic that is useful for compilers.

Definition: The universal addressing system is defined recursively for an
ordered rooted tree T = (V,E). The root rV is labeled 0 and its k children are
labeled 1, …, k. For each vertex vV, labeled Av, its n children are labeled Av.1,
Av.2, …, Av.n.

Example: Given a tree T = (V,E) with keys ordered 0 < 1 < 1.1 < 2 < 2.1 < 2.2 <
2.2.1 < 2.3, we represent it as


        1                2

         1.1     2.1     2.2      2.3

                           2.2.1

We will use this example for quite some time.

Definition (Preorder Traversal): Let T be an ordered rooted tree with root r. If T
consists only of r, then r is the preorder traversal of T. Otherwise, suppose T1,
T2, …, Tn are subtrees at r from left to right in T. Then the preorder traversal
begins at r and continues by traversing T1 in preorder, T2 in preorder, …, and Tn
in preorder.

Example: In the tree example at the top of page 199, the preorder traversal order
is 0, 1, 1.1, 2, 2.1, 2.2, 2.2.1, and 2.3.

Definition (Inorder Traversal): Let T be an ordered rooted tree with root r. If T
consists only of r, then r is the inorder traversal of T. Otherwise, suppose T1, T2,
…, Tn are subtrees at r from left to right in T. Then the inorder traversal begins
by traversing T1 in inorder, then r, and continues with T2 in inorder, …, and Tn
in inorder.

Example: In the tree example at the top of page 199, the inorder traversal order
is 1.1, 1, 0, 2.1, 2, 2.2.1, 2.2, and 2.3.

Definition (Postorder Traversal): Let T be an ordered rooted tree with root r. If T
consists only of r, then r is the postorder traversal of T. Otherwise, suppose T1,
T2, …, Tn are subtrees at r from left to right in T. Then the postorder traversal
begins by traversing T1 in postorder, T2 in postorder, …, Tn in postorder, and r.

Example: In the tree example at the top of page 199, the postorder traversal
order is 1.1, 1, 2.1, 2.2.1, 2.2, 2.3, 2, and 0.

Notation: Let add_to_list(v) be a global function to append a vertex v to a list.
The list must be initialized to  at some point before use.

Note: The tree traversal algorithms are all easily defined recursively using a
global list that must be initialized first.

procedure preorder_traversal( T: ordered rooted tree )
    r := root(T)
    for each child c of r from left to right
         T(c) := subtree with c as its root
         preorder_traversal( T(c) )

procedure inorder_traversal( T: ordered rooted tree )
    r := root(T)
    if r = leaf then add_to_list(r)
          q := first child of r from left to right
          T(q) := subtree with q as its root
          inorder( T(q) )
          for each remaining child c of r from left to right
               T(c) := subtree with c as its root
               inorder_traversal( T(c) )

    procedure postorder_traversal( T: ordered rooted tree )
        r := root(T)
        for each child c of r from left to right
             T(c) := subtree with c as its root
             postorder_traversal( T(c) )

Definition: Logic and arithmetic can be rewritten using binary trees. Using
inorder, preorder, or postorder traversal of the binary tree is known as infix,
prefix, or postfix notation.

Note: The best known is prefix notation, otherwise known as reverse Polish
notation (RPN). This was used in the first pocket sized electronic calculator, the
HP-45 (1972). This notation is valuable in writing compilers, too. See

     http://glow.sourceforge.net/tutorial/lesson7/side_rpn.html
     http://www.hpmuseum.org/rpn.htm

Examples: Parentheses disappear completely. It is best to think of a RPN
calculator as a stack machine where data is in the stack and arithmetic operates
on the top elements of the stack.

   The expression 2+3 is written as 2 3 + in RPN.
   The expression [(9+3) * (4/2)] - [(3x) + (2-y)] is written as 9 3 + 4 2 / * 3 x
    * 2 y - + - in RPN, where x and y are numbers.

Tree representation: Labels are the operations on internal vertices or the root and
values (constants or simple variables) on the leaves.

Example: 4 * 3 + 2 in RPN is 4 3 * 2 +, or


              *        2

         4        3

Definition: Let G = (V,E) be a simple graph. A spanning tree of G is a subgraph
of G that is a tree containing every vertex in G.

Example: Your instructor wants his town, the states of Connecticut and New
York, and New York City to keep the roads and highways cleared in of ice and
snow connecting his house and Laguardia airport. A graph connecting each of
the relevant endpoints and connecting points can be made. The relevant agencies
can use this graph when deciding how to keep roads open after a storm.

        G                         G                        G

         PC                        PC                       PC

              RB                        RB                       RB

             S                         S                        S

    WB            LGA       WB             LGA       WB            LGA

Theorem: A simple graph G is connected if and only if it has a spanning tree T.

Example: Multicasting over networks.

Note: Constructing a spanning tree can be done in many different ways,
including some very inefficient ones. Two common ways are depth first and
breadth first searches.

Notation: Let visit(v) mean that we keep track of when we first go to vertex v
until we return to v using a backtrack.

    procedure visit( G = (V,E): connected graph, T: tree )
        for each wV adjacent to v and not yet in T
             add w and edge {v,w} to T
             visit(w, T)

procedure depth_first( G = (V,E): connected graph )
    T := tree with only some single vV
    visit( v, T )
    { T is a spanning tree. }

procedure breadth_first( G = (V,E): connected graph )
    T := tree with only some single vV
    L := v
    while L  
         Remove first vertex vL
         for each neighbor wV of v
              if wL and wT then
                  Add w to the end of L
                  Add w and edge {v,w} to T
    { T is a spanning tree. }

Theorem: Let G = (V,E) be a connected graph with |V| = n. Then either depth
first or breadth first takes O(e), or O(n2), steps to construct a spanning tree.

Proof: For a simple graph, |E|  n(n1)/2.

Bactracking applications:

   Graph coloring: can a graph be colored in n colors
   n-Queens problem: find places on a nn board so n queens are toothless
                                  
   Sums of subsets: Given x 
                                 i i=1
                                          , where xiN, find a subset whose sum is M
                                  

   Web crawlers: search all hyperlinks on a network efficiently

Definition: A minimum spanning tree in a connected weighted graph is a
spanning tree that has the smallest possible sum of weights on its edges.

    procedure Pim( G = (V,E): weighted connected undirected graph )
        T := minimum weighted edge
        for i := 1 to |V|2
             e := an edge of minimum weight incident to a vertex in T not
                  forming a simple circuit in T if it is added to T
             T := T with e added
        { T is a minimum spanning tree. }

    procedure Kruskal(G = (V,E): weighted connected undirected graph )
        T := empty graph
        for i := 1 to |V|1
             e := an edge in G of minimum weight that does not form a simple
                  circuit in T if it is added to T
             T := T with e added
        { T is a minimum spanning tree. }

Theorem: The cost of Pim’s algorithm is O(|E|log|V|). The cost of Kruskal’s
algorithm is O(|E|log|E|).

Definition: A graph G = (V,E) is sparse if |E| is very small with respect to |V|2.

Comment: Sparse is ill defined intentionally. There are different degrees of
sparseness, too (highly sparse, very sparse, somewhat sparse, hardly sparse, not
sparse, and the Scottish favorite, a wee bit sparse). Matrices can also be
categorized as (fill in the blank type) sparse based on their graphs.

Note: When G is sparse, Kruskal’s algorithm is much less expensive than Pim’s

                               Boolean Algebra

Definition: Let B = { 0, 1 } and Bn = BB…B ( n times). A Boolean
variable xB. A Boolean function of degree n is a function f: BnB.

Notation: For x,yB, define

   x+y=xy
   xy=xy
   x = ¬x

using the logic predicate notation from the class notes (circa pages 5-6).

Definition: A Boolean algebra is a set B with binary operators  and , the
unitary operator ¬, elements 0 and 1, and the following laws holding for all
elements of B: identity, complement, associative, commutative, and distributive.

Logic gates: Boolean algebra is used to model electronic logic gates, such as
AND, OR, NOT, XAND, XOR, … We design functions with Boolean algebras
and operators. Then we build them using the right gates and wiring patterns.
Typical symbols for AND, OR, and NOT are the following:

    AND:                    OR:                   NOT:

These are two input AND and OR gates. Versions of these gates exist for more
than two inputs and perform the expected operation on all of the inputs to get
one output.

Definition: A simple output circuit takes the input(s) and has one output. A
multiple output circuit takes input(s) and has multiple outputs.

Example: The gates above are simple output circuits.

Examples: Most circuits are of the multiple output variety.
   A half adder adds two bits producing a single bit sum plus a single bit carry:
    S := (xy)  (¬(xy)) = xy and Cout := xy. A half adder has two AND,
    one OR, and one NOT gates.
   A full adder computes the complete two bit sum and carry out:
    S := (xy)cin, where Cin is the incoming carry. The carry is quite
    complicated: Cout := (xy) + (yCin) + (Cinx). A full adder has two half
    adders and an OR gate.
   Ripple adders, lookahead adders, and lookahead carry circuits use many
    bits as input to implement integer adders.

                    Half adder                     Full adder

Note: Minimizing the Boolean algebra function means a less complicated
circuit. Simpler circuits are cheaper to make, take up less space, and are usually
faster. Add in how many devices are made and there is potentially a lot of
money involved in saving even a small amount of circuitry.

There are two basic methods for simplifying Boolean algebra functions:

   Karnaugh maps (or K-maps) provide a graphical or table driven technique
    that works up to about 6 variables before it becomes too complicated.
   The Quine-McCluskey algorithm works with any number of variables.

Going to Google and searching on Karnaugh map software leads to a number of
programs to do some of the work for you.

Definition: A literal of a Boolean variable is its value or its complement. A
minterm of Boolean variables x1, x2, …, xn is a Boolean product of the {xi,xi} .

Note: A minterm is just the product of n literals.

Karnaugh maps: The area of a K-map rectangle is determined by the number of
variables (n) and how many (k) are used in a Boolean expression: 2 nk. Common
arrangements are
    2 variables: 22,
    3 variables: 42, and
    4 variables: 44.

Each variable contributes two possibilities to each possibility of every other
variable in the system. K-maps are organized so that all the possibilities of the
system are arranged in a grid form and between two adjacent boxes only one
variable can change value. Each square in a K-map corresponds to a minterm.

Cover the ones on the map by rectangule that contain a number of boxes equal
to a power of 2 (e.g., 4 boxes in a line, 4 boxes in a square, 8 boxes in a
rectangle, etc.). Once the ones are covered, a term of a sum of products is
produced by finding the variables that do not change throughout the entire
covering, and taking a 1 to mean that variable and a 0 as the complement of that
variable. Doing this for every covering produces a matching function.

Given a Boolean function f with inputs x1, …, xn, make a table with all possible
inputs and outputs. Then create a K-map with the variables on the left and top
sides of the rectangle. Look for 1’s. The rectangle is a torus, so look for wrap
arounds, too.

Example: f: B4B with a corresponding K-map of

                                                  x1, x2
                               00            01            11           10
                  00            0            0              1           1
    x3,           01            0            0              1           1
    x4            11            0            0              0           1
                  10            0            1              1           1

The K-map is colored to try to find patterns in the Boolean expression that can
be simplified. It is quite common to eliminate some of the Boolean variables
using this approach. Use high quality software if you use the K-map approach.

Definition: An implicant is sum term or product term of one or more minterms
in a sum of products. A prime implicant of a function is an implicant that cannot
be covered by a more reduced (i.e., one with fewer literals) implicant.

Note: Suppose f is a Boolean function and P is a product term. Then P is an
implicant of f if f takes the value 1 whenever P takes the value 1. This is
sometimes written as P  f in the natural ordering of the Boolean algebra.

Quine-McCluskey: This algorithm has two steps:

  1. Find all prime implicants of the function.
  2. Use those prime implicants in a prime implicant chart to find the essential
     prime implicants of the function as well as other prime implicants that are
     necessary to cover the function.

The algorithm constructs a table and then simplifies the table. The method leads
to computer implementations for large numbers of variables. Use high quality
software if you use the Quine-McCluskey approach.


To top