VIEWS: 100 PAGES: 217 POSTED ON: 11/23/2011 Public Domain
Discrete Mathematics University of Kentucky CS 275 Spring, 2007 Professor Craig C. Douglas http://www.mgnet.org/~douglas/Classes/discrete-math/notes/2007s.pdf Material Covered (Spring 2007) Tuesday Pages Thursday Pages 1/11 1-9 1/16 9-24 1/18 24-33 1/23 34-45 1/25 46-52 1/30 53-65 2/1 Exam 1 2/6 66-73 2/8 74-83 2/13 84-92 2/15 92-94 2/20 95-106 2/22 106-115 2/27 116-124 3/1 Exam 2 3/6 125-132 3/8 No class 3/13 Spring 3/15 Break 3/20 132-142 3/22 No class 3/26 142-156 3/28 Exam 3 4/3 157-169 4/5 170-177 4/10 178-185 4/12 186-197 4/17 198-210 4/19 Exam 4 4/24 211-217 4/26 Rama: review 5/1 No class 5/3 Final: 8-10 AM The final exam will cover Chapters 1-10. 2 Course Outline 1. Logic Principles 2. Sets, Functions, Sequences, and Sums 3. Algorithms, Integers, and Matrices 4. Induction and Recursion 5. Simple Counting Principles 6. Discrete Probability 7. Advanced Counting Principles 8. Relations 9. Graphs 10. Trees 11. Boolean Algebra 12. Modeling Computation 3 Logic Principles Basic values: T or F representing true or false, respectively. In a computer T an F may be represented by 1 or 0 bits. Basic items: Propositions o Logic and Equivalences Truth tables Predicates Quantifiers Rules of Inference Proofs o Concrete, outlines, hand waving, and false 4 Definition: A proposition is a statement of a true or false fact (but not both). Examples: 2+2 = 4 is a proposition because this is a fact. x+1 = 2 is not a proposition unless a specific value of x is stated. Definition: The negation of a proposition p, denoted by ¬p and pronounced not p, means that, “it is not the case that p.” The truth values for ¬p are the opposite for p. Examples: p: Today is Thursay, ¬p: Today is not Thursday. p: At least a foot of snow falls in Boulder on Fridays. ¬p: Less than a foot of snow falls in Boulder on Fridays. 5 Definition: The conjunction of propositions p and q, denoted pq, is true if both p and q are true, otherwise false. Definition: The disjunction of propositions p and q, denoted pq, is true if either p or q is true, otherwise false. Definition: The exclusive or of propositions p and q, denoted pq, is true if only one of p and q is true, otherwise false. Truth tables: p ¬p q pq pq pq T F T T T F T* F* F F T T F* T* T F T T F T F F F F * The truth table for p and ¬p is really a 22 table. 6 Concepts so far can be extended to Boolean variables and Bit strings. Definition: A bit is a binary digit. Hence, it has two possible values: 0 and 1. Definition: A bit string is a sequence of zero or more bits. The length of a bit string is the number of bits. Definition: The bitwise operators OR, AND, and XOR are defined based on , , and , bit by bit in a bit string. Examples: 010111 is a bit string of length 6 010111 OR 110000 = 110111 010111 AND 110000 = 010000 010111 XOR 110000 = 100111 7 Definition: The conditional statement is an implication, denoted pq, and is false when p is true and q is false, otherwise it is true. In this case p is known as a hypothesis (or antecedent or premise) and q is known as the conclusion (or consequence). Definition: The biconditional statement is a bi-implication, denoted pq, and is true if and only if p and q have the same truth table values. Truth tables: p q pq pq T T T T T F F F F T T F F F T T 8 We can compound logical operators to make complicated propositions. In general, using parentheses makes the expressions clearer, even though more symbols are used. However, there is a well defined operator precedence accepted in the field. Lower numbered operators take precedence over higher numbered operators. Operator Precedence ¬ 1 2 3 4 5 Examples: ¬pq = (¬p) q pqr = (pq) r 9 Definition: A compound proposition that is always true is a tautology. One that is always false is a contradiction. One that is neither is a contingency. Example: p ¬p p¬p p¬p T F F T F T F T contigencies contradiction tautology Definition: Compound propositions p and q are logically equivalent if pq is a tautology and is denoted pq (sometimes written as pq instead). 10 Theorem: ¬(pq) ¬p ¬q. Proof: Construct a truth table. p q ¬(pq) ¬p ¬q ¬p¬q T T F F F F T F F F T F F T F T F F F F T T T T qed Theorem: ¬(pq) ¬p ¬q. Proof: Construct a truth table similar to the previous theorem. These two theorems are known as DeMorgan’s laws and can be extended to any number of propositions: ¬(p1p2…pk) ¬ p1 ¬ p2 … ¬ pk ¬(p1p2…pk) ¬ p1 ¬ p2 … ¬ pk Theorem: pq ¬pq. 11 Proof: Construct a truth table. p q pq ¬p ¬pq T T T F T T F F F F F T T T T F F T T T qed These proofs are examples are concrete ones that are proven using an exhaustive search of all possibilities. As the number of propositions grows, the number of possibilities grows like 2k for k propositions. The distributive laws are an example when k=3. 12 Theorem: p (qr) (pq)(pr). Proof: Construct a truth table. p q r p (qr) pq pr (pq)(pr) T T T T T T T T T F T T T T T F T T T T T T F F T T T T F T T T T T T F T F F T F F F F T F F T F F F F F F F F qed Theorem: p (qr) (pq) (pr). Proof: Construct a truth table similar to the previous theorem. 13 Some well known logical equivalences includes the following laws: Law pTp Identity pFp pTT Domination pFF ppp Idempotent ppp ¬(¬p) p Double negation p¬p T Negation p¬p F pqqp Commutative pqqp (pq)r p(qr) Associative (pq) r p(qr) p(qr) (pq)(qr) Distributive 14 Law p(qr) (pq)(qr) ¬(pq) ¬p¬q DeMorgan ¬(pq) ¬p¬q p(pq)p Absorption p(pq)p All of these laws can be proven concretely using truth tables. It is a good exercise to see if you can prove some. 15 Well known logical equivalences involving conditional statements: pq ¬pq pq ¬q¬p pq ¬pq pq ¬(p¬q) ¬(pq) p¬q (pq)(pr) p(qr) (pr)(qr) (pq)r (pq)(pr) p(qr) (pr)(qr) (pq)r Well known logical equivalences involving biconditional statements: pq (pq)(qp) pq ¬p¬q pq (pq) (¬p¬q) ¬(pq) p¬q 16 Propositional logic is pretty limited. Almost anything you really are interested in requires a more sophisticated form of logic: predicate logic with quantifiers (or predicate calculus). Definition: P(x) is a propositional function when a specific value x is substituted for the expression in P(x) gives us a proposition. The part of the expression referring to x is known as the predicate. Examples: P(x): x > 24. P(2) = F, P(102) = T. P(x): x = y + 1. P(x) = T for one value only (y is an unbounded variable). P(x,y): x = y + 1. P(2,1) = T, P(102,-14) = F. Definition: A statement of the form P(x1,x2,…,xn) is the value of the propositional function P at the n-tuple (x1,x2,…,xn). P is also known as a n-place (or n-ary) predicate. 17 Definition: The universal quantification of P(x) is the statement P(x) is true for all values of x in some domain, denoted by x P(x). Definition: The existential quantification of P(x) is the statement P(x) is true for at least one value of x in some domain, denoted by x P(x). Definition: The uniqueness quantification of P(x) is the statement P(x) is true for exactly one value of x in some domain, denoted by !x P(x). There is an infinite number of quantifiers that can be constructed, but the three above are among the most important and common. Examples: Assume x belongs to the real numbers. x<0 (x2 > 0). The negative real numbers form the domain. !x (x1223 = 0). and have higher precedence than the logical operators. 18 Example: x P(x)Q(x) means (x P(x))Q(x). Definition: When a variable is used in a quantification, it is said to be bound. Otherwise the variable is free. Example: x (x = y + 1). Definition: Statements involving predicates and quantifiers are logically equivalent if and only if they have the same truth value independent of which predicates are substituted and in which domains are used. Notation: S T. DeMorgan’s Laws for Negation: ¬x P(x) x ¬P(x). ¬x P(x) x ¬P(x). Nested quantifiers just means that more than one is in a statement. The order of quantifiers is important. 19 Examples: Assume x and y belong to the real numbers. xy (x + y = 0). xy (x < 0) (y > 0) xy < 0. Quantification of two variables: Statement When True? When False? xy P(x,y) For all x and y, P(x,y)=T. There is a pair of x and y such that P(x,y)=F. xy P(x,y) For all x there is a y such There is an x such that for all y, that P(x,y)=T P(x,y)=F. xy P(x,y) There is an x such that for For all x there is a y such that all y, P(x,y)=T. P(x,y)=F. xy P(x,y) There is a pair x and y For all x and y, P(x,y)=F. such that P(x,y)=T. Rules of Inference are used instead of truth tables in many instances. For n variables, there are 2n rows in a truth table, which gets out of hand quickly. 20 Definition: A propositional logic argument is a sequence of propositions. The last proposition is the conclusion. The earlier ones are the premises. An argument is valid if the truth of the premises implies the truth of the conclusion. Definition: A propositional logic argument form is a sequence of compound propositions involving propositional variables. An argument form is valid if no matter what particular propositions are substituted for the proposition variables in its premises, the conclusion remains true if the premises are all true. Translation: An argument form with premises p1, p2, …, pn and conclusion q is valid when (p1p2…pn) q is a tautology. 21 There are eight basic rules of inference. Rule Tautology Name p [p( pq)] q Modus ponens pq q ¬q [¬q(pq)] ¬p Modus tollens pq ¬p pq [(pq)(qr)] (pr) Hypothetical syllogism qr pr pq [(pq)¬p] q Disjunctive syllogism ¬p q p p (pq) Addition pq 22 Rule Tautology Name pq (pq) p Simplification p p [(p)(q)] (pq) Conjunction q pq pq [(pq)(¬pr)] (qr) Resolution ¬pr qr 23 Rules of Inference for Quantified Statements: Rule of Inference Name x P(x) Universal instantiation P(c) P(c) for an arbitrary c Universal generalization x P(x) x (P(x) Q(x)) Universal modus ponens P(a), where a is a particular element in the domain Q(a) x (P(x) Q(x)) Universal modus tollens ¬Q(a), where a is a particular element in the domain ¬P(a) x P(x) Existential instantiation P(c) for some c P(c) for some c Existential generalization x P(x) 24 Sets, Functions, Sequences, and Sums Definition: A set is a collection of unordered elements. Examples: Z = {…, -3, -2, -1, 0, 1, 2, 3, …} N = {1, 2, 3, …} and = N0 = {0, 1, 2, 3, …} (Slightly different than text) Q = {p/q | p,qZ, q0} R = {reals} Definition: The cardinality of a set S is denoted |S|. If |S| = n, where nZ, then the set S is a finite set. Otherwise it is an infinite set (|S| = ). Example: The cardinality of of Z, N, N0, Q, and R is infinite. Definition: If |S| = |N|, then S is a countable set. Otherwise it is an uncountable set. 25 Examples: Q is countable. R is uncountable. Definition: Two sets S and T are equal, denoted S = T, if and only if x(xS xT). Examples: Let S = {0, 1, 2} and T = {2, 0, 1}. Then S = T. Order does not count. Let S = {0, 1, 2} and T = {0, 1, 3}. Then S T. Only the elements count. Definition: The empty set is denoted by . Note that S(S). 26 Definition: A set S is a subset of a set T if xS(xT) and is denoted ST. S is a proper subset of T if ST, but ST and is denoted ST. Example: S = {1, 0} and T = {0, 1, 2}. Then ST. Theorem: S(SS). Proof: By definition, xS(xS). 27 Definition: The Power Set of a set S, denoted P(S), is the set of all possible subsets of S. Theorem: If |S| = n, then |P(S)| = 2n. Example: S = {0, 1}. Then P(S) = {, {0}, {1}, {0,1}} Definition: The Cartesian product of n sets Ai is defined by ordered elements from the Ai and is denoted A1A2…An = {(a1,a2,…an) | aiAi}. Example: Let S = {0, 1} and T = {a, b}. Then ST = {(0,a), (0,b), (1,a), (1,b)}. Definition: The union of n sets Ai is defined by n U i=1 Ai = A1A2…An = {x | i xAi}. Definition: The intersection of n sets Ai is defined by n I i 1 Ai = A1A2…An = {x | i xAi 28 Definition: n sets Ai are disjoint if A1A2…An = . Definition: The complement of set S with respect to T, denoted TS, is defined by TS = {xT | xS}. TS is also called the difference of S and T. Definitions: The universal set is denoted U. The universal complement of S is S = US. 29 Examples: Let S = {1, 0} and T = {0, 1, 2}. Then o ST. o ST = S. o ST = T. o TS = {2}. o Let U = N0. S = {2, 3, …} Let S = {0, 1} and T = {2, 3}. Then o ST. o ST = . o ST = {0, 1, 2, 3}. o TS = {2, 3}. o Let U=R. Then S is the set of all reals except the integers 0 and 1, i.e., S = {xR | x0 x1}. 30 The textbook has a large number of set identities in a table. Identity Law(s) A = A, AU = A Identity AU = U, A = Domination AA = A, AA = A Idempotent A=A Complementation AB = BA, AB = BA Commutative A(BC) = (AB)C, A (BC) = (AB) C Associative A (BC) = (AB) (AC) Distributive A(BC) = (AB) (AC) AB = AB, AB = AB DeMorgan A (AB) = A, A (AB) = A Absorption AA = U, AA = Complement Many of these are simple to prove from very basic laws. 31 Definition: A function f:AB maps a set A to a set B, denoted f(a) = b for aA and bB, where the mapping (or transformation) is unique. Definition: If f:AB, then If bB aA (f(a) = b), then f is a surjective function or onto. If A=B and f(a) = f(b) implies a = b, then f is one-to-one (1-1) or injective. A function f is a bijection or a one-to-one correspondence if it is 1-1 and onto. Definition: Let f:AB. A is the domain of f. The minimal set B such that f:AB is onto is the image of f. Definitions: Some compound functions include i fi (a) i=1 fi (a) . We can substitute + if we expand the summation. n n f n n (a)= i=1 i f (a) . We can substitute * if we expand the product. i=1 i 32 Definition: The composition of n functions fi: AiAi+1 is defined by (f1f2…fn)(a) = f1(f2(…(fn(a)…)), where aA1. Definition: If f: AB, then the inverse of f, denoted f-1: BA exists if and only if bB aA (f(a) = b f-1(b) = a). Examples: Let A = [0,1] R, B = [0,2] R. o f(a) = a2 and g(a) = a+1. Then f+g: AB and f*g: AB. o f(a) = 2*a and g(a) = a-1. Then neither f+g: AB nor f*g: AB. Let B = A = [0,1] R. o f(a) = a2 and g(a) = 1-a. Then f+g: AA and f*g: AA. Both compound functions are bijections. o f(a) = a3 and g(a) = a1/3. Then gf(a): AA is a bijection. Let A = [-1, 1] and B=[0, 1]. Then o f(a) = a3 and g(a) = {x>0 | x= a1/3}. Then gf(a): AB is onto. 33 Definition: The graph of a function f is {(a,f(a)) | aA}. Example: A = {0, 1, 2, 3, 4, 5} and f(a) = a2. Then (a) graph(f,A) (b) an approximation to graph(f,[0,5]) 34 Definitions: The floor and ceiling functions are defined by x = largest integer smaller or equal to x. x = smallest integer larger or equal to x. Examples: 2.99 = 2, 2.99 = 3 -2.99 = -3, -2.99 = -2 Definition: A sequence is a function from either N or a subset of N to a set A whose elements ai are the terms of the sequence. Definitions: A geometric progression is a sequence of the form {ari, i=0, 1, …}. An arithmetic progression is a sequence of the form {a+id, i=0, 1,…}. Translation: f(a,r,i) = ari and f(a,d,i) = a + id are the corresponding functions. 35 There are a number of interesting summations that have closed form solutions. Theorem: If a,rR, then (n+1)a, if r=1, i=0 n ar i = ar n+1 -a , otherwise. r-1 Proof: If r = 1, then we are left summing a n+1 times. Hence, the r = 1 case is n trivial. Suppose r 1. Let S = i0 ari. Then rS = rn ari Substitution S formula. i=0 n+1 i i=1 ar Simplifying. arn+1-a n ari + i=0 Removing n+1 term and adding 0 term. S+(arn+1-a) Substituting S for formula Solve for S in rS = S+(arn+1-a) to get the desired formula. qed Some other common summations with closed form solutions are 36 Sum Closed Form Solution n n(n+1) i=1 i 2 n 2 n(n+1)(2n+1) i=1 i 6 n 3 i n2 (n+1)2 i=1 4 • i i=0 x , |x|<1 (1x)-1 i=1 ixi-1, |x|<1 (1x)-2 Proving some of these requires knowledge about limits. There are close ties to integral and differential calculus, which is no surprise since integration is summation taken to a limit. Example: lim i•xi = 0 when |x|<1. Using the Theorem on the previous page, we • i get the result for i=0 x, |x|<1. 37 Definition: Let f and g be functions from either Z or R to R. Then f(x) is O(g(x)) if there are constants C and k such that |f(x)|C|g(x)| whenever x>k. Pronunciation: f(x) is Big Oh of g(x). Examples: f(x) = x2+2x is O(xn) o When 0x1, x2x, so 0 x2+2x x+2x 3x o When x1, xx2, so 0 x2+2x x2+2x2 = 3x2 n In general, f(x) = i=0 aixi with an0 is O(xn) when x1. n! is O(nn) when n1. o n! = 12…n nn…n = nn. log(n!) is O(nlogn) when n1. o log(n!) log(nn) = nlog(n) log(n) is O(n) when n1. 38 Theorem: If fi(x) is O(gi(x)), for 1in, then n f (x) i=1 i is O(max{|g1(x)|, |g2(x)|, …, |gn(x)|}). Proof: Let g(x) = max{|g1(x)|, |g2(x)|, …, |gn(x)|} and Ci the constants associated with O(gi(x)). Then n n n n f (x) i=1 i C g (x) i=1 i i C g(x) i=1 i = |g(x)| C i=1 i = C|g(x)|. n n Theorem: If fi(x) is O(gi(x)), for 1in, then f (x) i=1 i is O( i=1 gi(x) ). Proof: Let g(x) = |g1(x)||g2(x)|…|gn(x)| and Ci the constants associated with O(gi(x)). Then n n n f (x) i=1 i C g (x) i=1 i i C i=1 gi(x) . 39 Definition: Let f and g be functions from either Z or R to R. Then f(x) is (g(x)) if there are constants C and k such that |f(x)| C|g(x)| whenever x>k. Definition: Let f and g be functions from either Z or R to R. Then f(x) is (g(x)) if f(x) = O(g(x)) and f(x) = (g(x)). In this case, we say that f(x) is of order g(x). Comment: f(x) = O(g(x)) notation is great in the limit, but does not always provide the right bounds for all values of x. , denoted Big Omega, is used to provide lower bounds. , denoted Big Theta, is used to provide both lower and upper bounds. n Example: f(x) = i=0 aixi with an0 is of order xn. 40 Notation: Timing, as a function of the number of elements falls into the field of Complexity. Complexity Terminology (1) Constant (log(n)) Logarithmic (n) Linear (nlog(n)) nlog(n) (nk) Polynomial (nklog(n)) Polylog (kn), where k>1 Exponential (n!) Factorial Notation: Problems are tractable if they can be solved in polynomial time and are intractable otherwise. 41 Algorithms, Integers, and Matrices Definition: An algorithm is a finite set of precise instructions for solving a problem. Computational algorithms should have these properties: Input: Values from a specified set. Output: Results using the input from a specified set. Definiteness: The steps in the algorithm are precise. Correctness: The output produced from the input is the right solution. Finiteness: The results are produced using a finite number of steps. Effectiveness: Each step must be performable and in a finite amount of time. Generality: The procedure should accept all input from the input set, not just special cases. 42 n Algorithm: Find the maximum value of a i i=1 , where n is finite. n procedure max( a i i=1 : integers) max := a1 for i := 2 to n if max < ai then max := ai {max is the largest element} Proof of correctness: We use induction. 1. Suppose n = 1, then max := a1, which is the correct result. 2. Suppose the result is true for k = 1, 2, …, i-1. Then at step i, we know that max is the largest element in a1, a2, …, ai-1. In the if statement, either max is already larger than ai or it is set to ai. Hence, max is the largest element in a1, a2, …, ai. Since i was arbitrary, we are done. qed This algorithm’s input and output are well defined and the overall algorithm can be performed in O(n) time since n is finite. There are no restrictions on the input set other than the elements are integers. 43 n Algorithm: Find a value in a sorted, distinct valued a i i=1 , where n is finite. There are many, many search algorithms. n procedure linear_search(x, a i i=1 : integers) i := 1 while (in and xai) i := i + 1 if in then location := i else location := 0 n n {location is the subscript of a i i=1 equal to x or 0 if x is not in a i i=1 } We can prove that this algorithm is correct using an induction argument. This algorithm does not rely on either distinctiveness nor sorted elements. Linear search works, but it is very slow in comparison to many other searching algorithms. It takes 2n+2 comparisons in the worst case, i.e., O(n) time. 44 n procedure binary_search(x, a i i=1 : integers) i := 1 j := n while ( i < j ) m := (i+j)/2 if x > am then i := m+1 else j := m if x = ai then location := i else location := 0 n n {location is the subscript of a i i=1 equal to x or 0 if x is not in a i i=1 } We can prove that this algorithm is correct using an induction argument. This algorithm is much, much faster than linear_search on average. It is O(logn) n in time. The average time to find a member of a i i=1 can be proven to be of order n. 45 n Algorithm: Sort the distinct valued a i i=1 into increasing order, where n is finite. There are many, many sorting algorithms. n procedure bubble_sort( a i i=1 : reals, n1) for i := 1 to n-1 for j := 1 to n-i if aj > aj+1 then swap aj and aj+1 n {a i i=1 is in increasing order} This is one of the simplest sorting algorithms. It is expensive, however, but quite easy to understand and implement. Only one temporary is needed for the swapping and two loop variables as extra storage. The worst case time is O(n2). 46 n procedure insertion_sort( a i i=1 : reals, n1) for j := 2 to n i := 1 while aj > ai i := i + 1 t := aj for k := 0 to j-i-1 aj-k := aj-k-1 ai := t n {a i i=1 is in increasing order} This is not a very efficient sorting algorithm either. However, it is easy to see that at the jth step that the jth element is put into the correct spot. The worst case time is O(n2). In fact, insertion_sort is trivially slower than bubble_sort. 47 Number theory is a rich field of mathematics. We will study four aspects briefly: 1. Integers and division 2. Primes and greatest common denominators 3. Integers and algorithms 4. Applications of number theory Most of the theorems quoted in this part of the textbook require knowledge of mathematical induction to rigorously prove, a topic covered in detail in the next chapter. 48 Definition: If a,bZ and a0, we say that a divides b if cZ(b=ac), denoted by a | b. When a divides b, we denote a as a factor of b and b as a multiple of a. When a does not divide b, we denote this as a b. | Theorem: Let a,b,cZ. Then 1. If a | b and a | c, then a | (b+c). 2. If a | b, then a | (bc). 3. If a | b and b | c, then a | c. Proof: Since a | b, sZ(b=as). 1. Since a | c it follows that tZ(c=at). Hence, b+c = as + at = a(s+t). Therefore, a | (b+c). 2. bc = (as)c = a(sc). Therefore, a | (bc). 3. Since b | c it follows that tZ(c=bt). c = bt = (as)t = a(st), Therefore, a | c. Corollary: Let a,b,cZ. If a | b and b | c, then a | (mb+nc) for all m,nZ. Theorem (Division Algorithm): Let a,dZ(d > 0). Then !q,rZ(a = dq+r). 49 Definition: In the division algorithm, a is the dividend, d is the divisor, q is the quotient, and r is the remainder. We write q = a div d and r = a mod d. Examples: Consider 101 divided by 9: 101 = 119 + 2. Consider -11 divided by 3: -11 = 3(-4) + 1. Definition: Let a,b,mZ(m > 0). Then a is congruent to b modulo m if m | (a-b), denoted a b (mod m). The set of integers congruent to an integer a modulo m is called the congruence class of a modulo m. Theorem: Let a,b,mZ(m > 0). Then a b (mod m) if and only if a mod m = b mod m. 50 Examples: Does 17 5 mod 6? Yes, since 17 – 5 = 12 and 6 | 12. Does 24 14 mod 6? No, since 24 – 14 = 10, which is not divisible by 6. Theorem: Let a,b,mZ(m > 0). Then a b (mod m) if and only if kZ(a=b+km). Proof: If a b (mod m), then m | (a-b). So, there is a k such that a-b = km, or a = b+km. Conversely, if there is a k such that a = b + km, then km = a-b. Hence, m | (a-b), or a b (mod m). Theorem: Let a,b,c,d,mZ(m > 0). If a b (mod m) and c d (mod m), then a+c b+d (mod m) and ac bd (mod m). Corollary: Let a,b,mZ(m > 0). Then (a+b) mod m = ((a mod m)+(b mod m)) mod m and (ab) mod m = ((a mod m)(b mod m)) mod m. Some applications involving congruence include 51 Hashing functions h(k) = k mod m. Pseudorandom numbers: xn+1 = (axn+c) mod m. o c = 0 is known as a pure multiplicative generator. o c 0 is known as a linear congruential generator. Cryptography Definition: A positive integer a is a prime if it is divisible only by 1 and a. It is a composite otherwise. Fundamental Theorem of Arithmetic: Every positive integer greater than 1 can be written uniquely as a prime or the product of two or more primes where the prime factors are written in nondecreasing order. Theorem: If a is a composite number, then a has a prime divisor less than or equal to a1/2. Theorem: There are infinitely many primes. 52 Prime Number Theorem: The ratio of primes not exceeding a and x/ln(a) approaches 1 as a. Example: The odds of a randomly chosen positive integer n being prime is given by (n/ln(n))/n = 1/ln(n) asymptotically. There are still a number of open questions regarding the distribution of primes. Definition: Let a,bZ(a and b not both 0). The largest integer d such that d | a and d | b is the greatest common devisor of a and b, denoted by gcd(a,b). Example: gcd(24,36) = 12. Definition: The integers a and b are relatively prime if gcd(a,b) = 1. 53 n Definition: The integers a i i=1 are pairwise relatively prime if gcd(ai,aj) = 1 whenever 1i<jn. Examples: {10, 17, 121} are relatively prime. {10, 19, 124} are not relatively prime. Definition: The least common multiple of positive integers a and b is the smallest positive integer that is divisible by both a and b, denoted lcm(a,b). Theorem: Let a and b be positive integers. Then ab = gcd(a,b)lcm(a,b). 54 Integers can be expressed uniquely in any base. Theorem: Let bZ(b>1). Then if nN, then there is a unique expression such that n = akbk+ ak-1bk-1+…+a1b+a0, where {ai},kN0, ak0, and 0ai<b. n is written by n = (ak ak-1… a1a0)b. Examples: (123)5 = 152 + 25 + 3 = (38)10, o the base 5 digits are {0-4}. (1011)2 = (11)10, o the binary digits are {0, 1}. (F)16 = (15)10, o the hexadecimal digits are {0-9, A-F}. Note: Common bases are 2 (binary), 8 (octal), 10 (decimal), and 16 (hexadecimal). 55 Algorithm: Constructing base b expansions. procedure base_b_expansion(n: integer) q := 0 k := 0 while q0 ak := q mod b q := q/b k := k+1 {the base b expansion of n is (ak-1ak-2…a1a0)b} Examples: Converting between some bases is easier than others. Base 2 to any base 2k, k>1, is really easy. Just group k bits together and convert to the base 2k symbol. Base 10 to any base 2k is a pain. Base 2k to base 10 is also a pain. 56 Algorithm: Addition of integers procedure add(a, b: integers) (an-1an-2…a1a0)2 := base_2_expansion(a) (bn-1bn-2…b1b0)2 := base_2_expansion(b) c := 0 for j := 0 to n-1 d := (aj+bj+c)/2 sj := aj+bj+c – 2d c := d sn := c {the binary expansion of the sum is (sk-1sk-2…s1s0)2} Questions: What is the complexity of this algorithm? Is this the fastest way to compute the sum? 57 Algorithm: Mutiplication of integers procedure multiply(a, b: integers) (an-1an-2…a1a0)2 := base_2_expansion(a) (bn-1bn-2…b1b0)2 := base_2_expansion(b) for j := 0 to n-1 if bj = 1 then cj := a shifted j places else cj := 0 {c0,c1,…,cn-1 are the partial products} p := 0 for j := 0 to n-1 p := p + cj {p is the value of ab} Examples: (10)2(11)2 = (110)2. Note that there are more bits than the original integers. (11)2(11)2 = (1001)2. Twice as many binary digits! 58 Algorithm: Compute div and mod procedure division(a: integer, d: positive integer) q := 0 r := |a| while r d r := r – d q := q + 1 if a < 0 and r > 0 then r := d – r q := -(q + 1) {q = a div d is the quotient and r = a mod d is the remainder} Notes: The complexity of the multiplication algorithm is O(n2). Much more efficient algorithms exist, including one that is O(n1.585) using a divide and conquer technique we will see later in the course. There are O(log(a)log(d)) complexity algorithms for division. 59 Modular exponentiation, bk mod m, where b, k, and m are large integers is important to compute efficiently to the field of cryptology. Algorithm: Modular exponentiation procedure modular_exponentiation(b: integer, k,m: positive integers) (an-1an-2…a1a0)2 := base_2_expansion(k) y := 1 power := b mod m for i := 0 to n-1 if ai = 1 then y := (y power) mod m power := (power power) mod m {y = bk mod m} Note: The complexity is O((log(m))2log(k)) bit operations, which is fast. 60 Euclidean Algorithm: Compute gcd(a,b) procedure gcd(a,b: positive integers) x := a y := b while y0 r := x mod y x := y y := r {gcd(a,b) is x} Correctness of this algorithm is based on Lemma: Let a=bq+r, where a,b,q,rZ. then gcd(a,b) = gcd(b,r). The complexity will be studied after we master mathematical induction. 61 Number theory useful results Theorem: If a,bN then s,tZ(gcd(a,b) = sa+tb). Lemma: If a,b,cN (gcd(a,b) = 1 and a | bc, then a | c). Note: This lemma makes proving the prime factorization theorem doable. Lemma: If p is a prime and p | a1a2…an where each aiZ, then p | ai for some i. Theorem: Let mN and let a,b,cZ. If ac bc (mod m) and gcd(c,m) = 1, then a b (mod m). Definition: A linear congruence is a congruence of the form ax b (mod m), where mN, a,bZ, and x is a variable. Definition: An inverse of a modulo m is an a such that aa 1 (mod m). 62 Theorem: If a and m are relatively prime integers and m>1, then an inverse of a modulo m exists and is unique modulo m. Proof: Since gcd(a,m) = 1, s,tZ(1 = sa+tb). Hence, sa=tb 1 (mod m). Since tm 0 (mod m), it follows that sa 1 (mod m). Thus, s is the inverse of a modulo m. The uniqueness argument is made by assuming there are two inverses and proving this is a contradiction. Systems of linear congruences are used in large integer arithmetic. The basis for the arithmetic goes back to China 1700 years ago. Puzzle Sun Tzu (or Sun Zi): There are certain things whose number is unknown. When divided by 3, the remainder is 2. When divided by 5, the remainder is 3, and When divided by 7, the remainder is 2. What will be the number of things? (Answer: 23… stay tuned why). 63 Chinese Remander Theorem: Let m1, m2,…,mnN be pairwise relatively prime. n Then the system x ai (mod mi) has a unique solution modulo m = i=1mi . Existence Proof: The proof is by construction. Let Mk = m / mk, 1kn. Then gcd(Mk, mk) = 1 (from pairwise relatively prime condition). By the previous theorem we know that there is a yk which is an inverse of Mk modulo mk, i.e., Mkyk 1 (mod mk). To construct the solution, form the sum x = a1M1y1 + a2M2y2 + … + anMnyn. Note that Mj 0 (mod mk) whenever jk. Hence, x akMkyk ak (mod mk), 1kn. We have shown that x is simultaneous solution to the n congruences. qed 64 Sun Tzu’s Puzzle: The ak{2, 1, 2} from 2 pages earlier. Next mk{3, 5, 7}, m=357=105, and Mk=m/mk{35, 21, 15}. The inverses yk are 1. y1 = 2 (M1 = 35 modulo 3). 2. y2 = 1 (M2 = 21 modulo 5). 3. y3 = 1 (M3 = 15 modulo 7). The solutions to this system are those x such that x a1M1y1 + a2M2y2 + a2M2y2 = 2352 + 3211 + 2151 = 233 Finally, 233 23 (mod 105). 65 Definition: A mn matrix is a rectangular array of numbers with m rows and n columns. The elements of a matrix A are noted by Aij or aij. A matrix with m=n is a square matrix. If two matrices A and B have the same number of rows and columns and all of the elements Aij = Bij, then A = B. Definition: The transpose of a mn matrix A = [Aij], denoted AT, is AT = [Aji]. A matrix is symmetric if A = AT and skew symmetric if A = -AT. Definition: The ith row of an mn matrix A is [Ai1, Ai2, …, Ain]. The jth column is [A1j, A2j, …, Amj]T. Definition: Matrix arithmetic is not exactly the same as scalar arithmetic: C = A + B: cij = aij + bij, where A and B are mn. C = A – B: cij = aij - bij, where A and B are mn k C = AB: cij = p=1aipbpj , where A is mk, B is kn, and C is mn. 66 Theorem: AB = BA, but ABBA in general. Definition: The identity matrix In is nn with Iii = 1 and Iij = 0 if ij. Theorem: If A is nn, then AIn = InA = A. Definition: Ar = AAA (r times). Definition: Zero-One matrices are matrices A = [aij] such that all aij{0, 1}. Boolean operations are defined on mn zero-one matrices A = [aij] and B = [bij] by Meet of A and B: AB = aijbij, 1im and 1jn. Join of A and B: AB = aijbij, 1im and 1jn. The Boolean product of A and B is C = A B, where A is mk, B is kn, and C is mn, is defined by cij = (ai1b1j)(ai2b2j)…(aikbkj). Definition: The Boolean power of a nn matrix A is defined by A[r] = A A … A (r times), where A[0] = In. 67 Induction and Recursion Principle of Mathematical Induction: Given a propositional function P(n), nN, we prove that P(n) is true for all nN by verifying 1. (Basis) P(1) is true 2. (Induction) P(k)P(k+1), kN. Notes: Equivalent to [P(1) kN (P(k)P(k+1))] nN P(n). We do not actually assume P(k) is true. It is shown that if it is assumed that P(k) is true, then P(k+1) is also true. This is a subtle grammatical point with mathematical implications. Mathematical induction is a form of deductive reasoning, not inductive reasoning. The latter tries to make conclusions based on observations and rules that may lead to false conclusions. Sometimes P(1) is not the basis, but some other P(k), kZ. Sometimes P(k) is for a (possibly infinite) subset of N or Z. 68 Sometimes P(k-1)P(k) is easier to prove than P(k)P(k+1). Being flexible, but staying within the guiding principle usually works. There are many ways of proving false results using subtly wrong induction arguments. Usually there is a disconnect between the basis and induction parts of the proof. Examples 10, 11, and 12 in your textbook are worth studying until you really understand each. n Lemma: i=1(2i-1) = n2 (sum of odd numbers). Proof: (Basis) Take k = 1, so 1 = 1. (Induction) Assume 1+3+5+…+(2k-1) = k2 for an arbitrary k > 1. Add 2k+1 to both sides. Then (1+3+5+…+(2k-1))+(2k+1) = k2+(2k+1) = (k+1)2. 69 n Lemma: i=0 2i = 2n+1-1. Proof: (Basis) Take k=0, so 20 = 1 = 21 – 1. k (Induction) Assume i=0 2i = 2k+1-1 for an arbitrary k > 0. Add 2k+1 to both sides. Then k i=0 2i + 2k+1= 2k+1-1 + 2k+1 , k+1 i which simplifies to i=0 2 = 2k+2 -1. Principle of Strong Induction: Given a propositional function P(n), nN, we prove that P(n) is true for all nN by verifying 1. (Basis) P(1) is true 2. (Induction) [P(1)P(2)…P(k)]P(k+1) is true kN. 70 Example: Infinite ladder with reachable rungs. For mathematical or strong induction, we need to verify the following: Step Mathematical Strong Basis We can reach the first rung. Induction If we can reach an arbitrary kN, if we can reach all k rung k, then we can reach rungs, then we can reach rung k+1. rung k+1. We cannot prove that you can climb an infinite ladder using mathematical induction. Using strong induction, however, you can prove this result using a trick: since you can prove that you can climb to rungs 1, 2, …, k, it follows that you can climb 2 rungs arbitrarily, which gets you from rung k-1 to rung k+1. Rule of thumb: Always use mathematical induction if P(k)P(k+1) kN. Only resort to strong induction when that fails. 71 Fundamental Theorem of Arithmetic: Every nN (n>1) is the product of primes. Proof: Let P(n) be the proposition that n can be written as the product of primes. (Basis) P(2) is true: 2 = 2, the product of 1 prime. (Induction) Assume P(j) is true jk. We must verify that P(k+1) is true. Case 1: k+1 is a prime. Hence, P(k+1) is true. Case 2: k+1 is a composite. Hence k+1 = a•b, where 2ab<k+1. By the inductive step, P(a) and P(b) are both true. Hence, a= p and b= p , where the p’s are primes. It follows then that k+1 = p p , so P(k+1) is true. Principle of Modified Strong Induction: Given a propositional function P(n), nN, we prove that P(n) is true for all nN by verifying 1. (Basis) P(b), P(b+1), …, P(b+j) are all true. 2. (Induction) [P(b)P(b+1)…P(k)]P(k+1) is true kb+jN. 72 Example: Every postage amount $.12 can be formed using $.04 and $.05 stamp combinations only. We can prove this using modified strong induction. (Basis) Consider 4 specific cases: Postage Number of $.04’s Number of $.05’s $.12 3 0 $.13 2 1 $.14 1 2 $.15 0 3 Hence, P(j) is true for 12j15. (Induction) Assume P(j) is true for 12jk and k15. By the inductive hypothesis, P(k-3) is true since k-312. Hence, we can just add another $.04 stamp. Well Ordering Property: Every nonempty set of N has a least element. The validity of math and strong induction is based on the well ordering property. 73 Definition: A recursive function is defined from 1. (Basis) Initial value f(0). 2. (Recursion) f(k), k>0, in terms of {f(j) | {j} such that 0j<k} and other terms. Examples: f(0) = 1, f(n) = 2f(n-1)+4, n>0. g(0) = 12, g(1) = 1, g(n) = 2g(n-1) – g(n-2), n>2. h(0) = 1, h(n) = nh(n-1) = n! Fibonacci numbers: f0 = 0, f1 = 1, fn = fn-1 + fn-2, n>1. n 0 1 2 3 4 f(n) 1 6 16 36 76 g(n) 12 1 -10 -21 -32 h(n) 1 1 2 6 24 fn 0 1 1 2 3 74 Theorem: Whenever n3, fn > n-2, where =(1+ 5)/2 . The proof is by modified strong induction. Lamé’s Theorem: Let a,bN (ab). Then the number of divisions used by the Euclidean algorithm to find gcd(a,b) 5•decimal digits in b. We can recursively define sets, too, not just functions. There is a basis step and a recursion step with the possibility of an exclusion step. Definition: The set * of strings over an alphabet is defined by (Basis) *, where is the empty string. (Recursion) If w,x, then wx*. Example: = {0,1}. Then * is the binary representation of N0. 75 Principle of Structured Induction: 1. (Basis) Show the result holds for all elements specified in the basis step of the recursive definition of the set. 2. (Induction) Show that if the statement is true for each element used to construct new elements in the recursive step of the definition, then the result holds for these new elements. The validity of this approach comes from mathematical induction over N. First state that P(n) is true whenever n or fewer elements are used to generate an element. We must show that P(0) is true (i.e., the basis element). Now assume that P(k) is true for an arbitrary k. Hence, P(k+1) must be true, too, due to the recursion involving k or fewer elements. 76 Definition: A recursive algorithm solves a problem by reducing it to an instance of the same problem with smaller input(s). Note: Recursive algorithms can be proven correct using mathematical induction or modified strong induction. Examples: n! = n•(n-1)! an = a•(an-1) gcd(a,b) with a,bN (a<b). procedure gcd(a,b: integers and a<b) if a = 0 then gcd(a,b) := b else gcd(a,b) := gcd(b mod a, a) 77 linear search procedure search(i,j,x: integers and 1in, 1jn) if ai = x then location := i else if i = j then location := 0 else search(i+1,j,x) binary search procedure binary_search(I,j,x: integers and 1in, 1jn) m := (i+j)/2 if x = am then location := m else if x < am and i<m then binary_search(i,m-1,x) else if x > am and j>m then binary_search(m+1,j,x) else location := 0 78 Fibonacci numbers procedure fib(n: nN0) if n = 0 then fib(0) := 0 else if n = 1 then fib(1) := 1 else fib(n) := fib(n-1) + fib(n-2) or it can be defined iteratively: procedure fib(n: nN0) if n = 0 then y := 0 else x := 0, y := 1 for I := 1 to n-1 z := x+y x := y y := z {y is fn} 79 Graphs and trees are important concepts that we will spend a lot of time considering later in the course. A graph is made up of vertices and edges that connect some of the vertices. A tree is a special form of a graph, namely it is a connected unidirectional graph with no simple circuits. A rooted tree is a tree with one vertex that is the root and every edge is directed away from the root. A m-ary tree is a rooted tree such that every internal vertex has no more than m children. If m = 2, it is a binary tree. The height of a rooted tree T, denoted h(T), is the maximum number of levels (or vertices). A balanced rooted tree T has all of its leaves at h(T) or h(T)-1. Let T1, T2, …, Tm be rooted trees with roots r1, r2, …, rm. Let r be another root. Connecting r to the roots r1, r2, …, rm constructs another rooted tree T. We can reformulate this concept using the recursive set methodology. 80 Merge sort is a balanced binary tree method that first breaks a list up recursively into two lists until each sublist has only one element. Then the sublists are recombined, two at a time and sorted order, until only one sorted list remains. Note: The height of the tree formed in merge sort is O(log2n) for n elements. 10, 4, 7, 1 10, 4 7, 1 10 4 7 1 4, 10 1, 7 1, 4, 7, 10 Notes: First three rows do the sublist splitting. Last two rows do the merging. There are two distinct algorithms at work. 81 n procedure merge_sort(L = a i i=1 ) if n > 1 then m := n/2 m L1 := a i i=1 n L 2 := a i i=m+1 L := merge(merge_sort(L1), merge_sort(L2)) n {L is now the sorted a i i=1 } procedure merge(L1, L2: sorted lists) L := while L1 and L2 are both nonempty remove the smaller of the first element of L1 and L2 and append it to end of L if either L1 or L2 are empty, append the other list to the end of L {L is the merged, sorted list} 82 Theorem: If ni = |Li|, i=1,2, then merge requires at most n1+n2-1 comparisons. If n = |L|, then merge_sort requires O(nlog2n) comparisons. Quick sort is another sorting algorithm that breaks an initial list into many n sublists, but using a different heuristic than merge sort. If L = a i i=1 with distinct elements, then quick sort recursively constructs two lists: L1 for all ai < a1 and L2 for all ai > a1 with a1 appended to the end of L1. This continues recursively until each sublist has only one element. Then the sublists are recombined in order to get a sorted list. Note: On average, the number of comparisons is O(nlog2n) for n elements, but can be O(n2) in the worst case. Quick sort is one of the most popular sorting algorithms used in academia. Exercise: Google “quick sort, C++” to see many implementations or look in many of the 200+ C++ primers. Defining quick sort is in Rosen’s exercises. 83 Counting, Permutations, and Combinations Product Rule Principle: Suppose a procedure can be broken down into a sequence of k tasks. If there are ni, 1ik, ways to do the ith task, then there are k i=1ni ways to do the procedure. Sum Rule Principle: Suppose a procedure can be broken down into a sequence of k tasks. If there are ni, 1ik, ways to do the ith task, with each way unique, k then there are i=1ni ways to do the procedure. Exclusion (Inclusion) Principle: If the sum rule cannot be applied because the ways are not unique, we use the sum rule and subtract the number of duplicate ways. Note: Mapping the individual ways onto a rooted tree and counting the leaves is another method for summing. The trees are not unique, however. 84 Examples: Consider 3 students in a classroom with 10 seats. There are 1098 = 720 ways to assign the students to the seats. We want to appoint 1 person to fill out many, may forms that the administration wants filled in by today. There are 3 students and 2 faculty members who can fill out the forms. There are 3+2 = 5 ways to choose 1 person. (Duck fast.) How many variables are legal in the orginal Dartmouth BASIC computer language? Variables are 1 or 2 alphanumeric characters long, begin with A- Z, case independent, and are not one of the 5 two character reserved words in BASIC. We use a combination of the three counting principles: o 1 character variables: V1 = 26 o 2 character variables: V2 = 2636 - 5 = 931 o Total: V = V1 + V2 = 957 85 Pigeonhole Principle: If there are kN boxes and at least k+1 objects placed in the boxes, then there is at least one box with more than one object in it. Theorem: A function f: DE such that |D| >k and |E| = k, then f is not 1-1. The proof is by the pigeonhole principle. Theorem (Generalized Pigeonhole Principle): If N objects are placed in k boxes, then at least one box contains at least N/k - 1 objects. Proof: First recall that N/k < (N/k)+1. Now suppose that none of the boxes contains more than N/k - 1 objects. Hence, the total number of objects has to be k(N/k - 1) < k((N/k)+1)-1) = N. Hence, the theorem must be true (proof by contradiction). Theorem: Every sequence of n2+1 distinct real numbers contains a subsequence of length n+1 that is either strictly increasing or decreasing. Examples: From a standard 52 card playing deck. 86 How many cards must be dealt to guarantee that k = 4 cards from the same suit are dealt? o GPP Theorem says N/k - 1 4 or N = 17. o Real minimum turns out to be N/k 4 or N = 16. How many cards must be dealt to guarantee that 4 clubs are dealt? o GPP Theorem does not apply. o The product rule and inclusion principles apply: 313+4 = 43 since all of the hearts, spaces, and diamonds could be dealt before any clubs. Definition: A permutation of a set of distinct objects is an ordered arrangment of these objects. A r-permutation is an ordered arrangement of r of these objects. Example: Given S = {0,1,2}, then {2,1,0} is a permutation and {0,2} is a 2- permutation of S. 87 Theorem: If n,rN, then there are P(n,r) = n(n-1)(n-2)…(n-r+1) = n!/(n-r)! r-permutations of a set of n distinct elements. Further, P(n,0) = 1. The proof is by the product rule for r1. For r=0, there is only way to order 0 objects. Example: You want to visit 10 cities in China on a vacation. You will arrive in Hong Kong as your first city and you want to maximize the number of frequent flier miles you will accumulate by flying to 9 more cities. You have 9! Different paths to check. Good luck since 9! = 362,880. Definition: A r-combination is an unordered subset with r elements from the original set. Definition: The binomial coefficient is defined by n = n! . r r!(n-r)! 88 Theorem: The number of r-combinations of a set with n elements with n,rN0 is C(n,r) = n . r Proof: The r-permutations can be formed using C(n,r) r-combinations and then ordering each r-combination, which can be done in P(r,r) ways. So, P(n,r) = C(n,r)P(r,r) or C(n,r) = P(r,r) = n! (r-r)! = n! . P(n,r) (n-r)! r! r!(n-r)! Theorem: C(n,r) = C(n,n-r) for 0rn. Definition: A combinatorial proof of an identity is a proof that uses counting arguments to prove that both sides f the identity count the same objects, but in different ways. 89 Binomial Theorem: Let x and y be variables. Then for nN, n n (x+y)n = j=0 xn-jy j. j Proof: Expanding the terms in the product all are of the form xn-jyj for j=0,1,…,n. To count the number of terms for xn-jyj, note that we have to choose n-j x’s from the n sums so that the other j terms in the product are y’s. Hence, n n n-j j the coefficient for x y is = . n-j j Example: What is the coefficient of x y in (x+y) ? 25 = 5,200,300. 12 13 25 13 n n Corollary: Let nN0. Then k=0 k = 2n . n n k n-k n n Proof: 2n = (1+1)n = 1 1 k=0 k = . k=0 k 90 k n n Corollary: Let nN0. Then k=0 (-1) k = 0. n n n n Proof: 0 = 0n = ((-1)+1)n = k=0 (-1)k1n-k = k=0 (-1)k . k k n n n n n n Corollary: + + +L = + + +L 0 2 4 1 3 5 k n n Corollary: Let nN0. Then 2 k=0 k = 3n . Theorem (Pascal’s Identity): Let n,kN with nk. Then n+1 = n + n . k-1 k k 91 n n n Note: Using = n = 1 as a basis, we can define recursively using 0 k Pascal’s Identity. It is normally written as a triangular table, denoted Pascal’s Triangle. Theorem (Vandermonde’s Identity): Let m,n,rN with rm and rn. Then m+n r m n = . r k=0 r-k k 2 n n Corollary: If nN0, then 2n = . n k=0 k 2 n n n n n Proof: 2n = = . n k=0 n-k k k=0 k n+1 = n j . Theorem: Let n,rN0 such that rn. Then r+1 j=r r 92 If we allow repetitions in the permutations, then all of the previous theorems and corollaries no longer apply. We have to start over . Theorem: The number of r-permutations of a set with n objects and repetition is n r. Proof: There are n ways to select an element of the set of all r positions in the r- permutation. Using the product principle completes the proof. Theorem: There are C(n+r-1,r) = C(n+r-1,n-1) r-combinations from a set with n elements when repetition is allowed. Example: How many solutions are there to x1+x2+x3 = 9 for xiN? C(3+9-1,9) = C(11,9) = C(11,2) = 55. Only when the constraints are placed on the xi can we possibly find a unique solution. Definition: The multinomial coefficient is C(n; n1, n2, …, nk) = n! . k n! i=1 i 93 Theorem: The number of different permutations of n objects, where there are ni, 1ik, indistinguishable objects of type i, is C(n; n1, n2, …, nk). Theorem: The number of ways to distribute n distinguishable objects in k distinguishable boxes so that ni objects are placed into box i, 1ik, is C(n; n1, n2, …, nk). Theorem: The number of ways to distribute n distinguishable objects in k indistinguishable boxes so that ni objects are placed into box i, 1ik, is k 1 j-1 j j n j=1 j!i=0 -1 i j-i . Multinomial Theorem: If nN, then n k n n n x i=1 i = n +n 1 2 +...nk =k C(n;n1,n2,...,nk )x11x22 ...xkk . 94 Generating permutations and combinations is useful and sometimes important. Note: We can place any n-set into a 1-1 correspondence with the first n natural numbers. All permutations can be listed using {1, 2, …, n} instead of the actual set elements. There are n! possible permutations. Definition: In the lexicographic (or dictionary) ordering, the permutation of {1,2,…,n} a1a2…an precedes b1b2…bn if and only if ai bi, for all 1in. Examples: 5 elements. The permutation 21435 precedes 21543. Given 362541, then 364125 is the next permutation lexicographically. 95 Algorithm: Generate the next permutation in lexicographic order. procedure next_perm(a1a2…an: ai{1,2,…,n} and distinct) j := n – 1 while aj > aj+1 j := j – 1 {j is the largest subscript with aj < aj+1} k := n while aj > ak k := k – 1 {ak is the smallest integer greater than aj to the right of aj} Swap aj and ak r := n, s := j+1 while r > s Swap ar and as r := r – 1, s:= s + 1 {This puts the tail end of the permutation after the jth position in increasing order} 96 Algorithm: Generating the next r-combination in lexicographic order. procedure next_r_combination(a1a2…an: ai{1,2,…,n} and distinct) i := r while ai = n-r+1 i := i – 1 ai := ai + 1 for j := i+1 to r aj := ai + j - 1 Example: Let S = {1, 2, …, 6}. Given a 4-permutation of {1, 2, 5, 6}, the next 4- permutation is {1, 3, 4, 5}. 97 Discrete Probability Definition: An experiment is a procedure that yields one of a given set of possible outcomes. Definition: The sample space of the experiment is the set of (all) possible outcomes. Definition: An event is a subset of the sample space. First Assumption: We begin by only considering finitely many possible outcomes. Definition: If S is a finite sample space of equally likely outcomes and ES is an event, then the probability of E is p(E) = |E| / |S|. 98 Examples: I randomly chose an exam1 to grade. What is the probability that it is one of the Davids? Thirty one students took exam1 of which five were Davids. So, p(David) = 5 / 31 ~ 0.16. Suppose you are allowed to choose 6 numbers from the first 50 natural numbers. The probability of picking the correct 6 numbers in a lottery drawing is 1/C(50,6) = (44!6!) / 50! ~ 1.4310-9. This lottery is just a regressive tax designed for suckers and starry eyed dreamers. Definition: When sampling, there are two possible methods: with and without replacement. In the former, the full sample space is always available. In the latter, the sample space shrinks with each sampling. 99 Example: Let S = {1, 2, …, 50}. What is the probability of sampling {1, 14, 23, 32, 49}? Without replacement: p({1,14,23,32,49}) = 1 / (5049484746) = 3.9310-9. With replacement: p({1,14,23,32,49}) = 1 / (5050505050) = 3.2010-9. Definition: If E is an even, then E is the complementary event. Theorem: p( E ) = 1 – p(E) for a sample space S. Proof: p( E ) = (|S| – |E|) / |S| = 1 – |E| / |S| = 1 – p(E). Example: Suppose we generate n random bits. What is the probability that one of the bits is 0? Let E be the event that a bit string has at least one 0 bit. Then E is the event that all n bits are 1. p(E) = 1 – p( E ) = 1 – 2-n = (2n – 1) / 2n. Note: Proving the example directly for p(E) is extremely difficult. 100 Theorem: Let E and F be events in a sample space S. Then p(EF) = p(E) + p(F) – p(EF). Proof: Recall that |EF| = |E| + |F| – |EF|. Hence, p(EF) = |EF| / |S| = (|E| + |F| – |EF|) / |S| = p(E) + p(F) – p(EF). Example: What is the probability in the set {1, 2, …, 100} of an element being divisible by 2 or 3? Let E and F represent elements divisible by 2 and 3, respectively. Then |E| = 50, |F| = 33, and |EF| = 16. Hence, p(EF) = 0.67. 101 Second Assumption: Now suppose that the probability of an event is not 1 / |S|. In this case we must assign probabilities for each possible event, either by setting a specific value or defining a function. Definition: For a sample space S with a finite or countable number of events, we assign probabilities p(s) to each event sS such that (1) 0 p(s) 1 sS, and (2) sSp(s) = 1. Notes: 1. When |S| = n, the formulas (1) and (2) can be rewritten using n. 2. When |S| = and is uncountable, integral calculus is required for (2). 3. When |S| = and is countable, the sum in (2) is true in the limit. 102 Example: Coin flipping with events H and T. S = {H, T} for a fair coin. Hence, p(H) = p(T) = 0.5. S = {H, H, T} for a weighted coin. Then p(H) = 0.67 and p(T) = 0.33. Definition: Suppose that S is a set with n elements. The uniform distribution assigns the probability 1/n to each element in S. Definition: The probability of the event E is the sum of the probabilities of the outcomes in E, i.e., p(E) = sE p(s) . Note: When |E| = , the sum sE p(s) must be convergent in the limit. Definition: The experiment of selecting an element from a sample space S with a uniform distribution is known as selecting an element from S at random. We can prove that (1) p(E) = 1 – p( E ) and (2) p(EF) = p(E) + p(F) – p(EF) using the more general probability definitions. 103 Definition: Let E and F be events with p(F) > 0. The conditional probability of E given F is defined by p(E|F) = p(EF) / p(F). Example: A bit string of length 3 is generated at random. What is the probability that there are two 0 bits in a row given that the first bit is 0? Let F be the event that the first bit is 0. Let E be the event that there are two 0 bits in a row. Note that EF = {000, 001} and p(F) = 0.5. Hence, p(E|F) = 0.25 / 0.5 = 0.5. Definition: The events E and F are independent if p(EF) = p(E)p(F). Note: Independence is equivalent to having p(E|F) = p(E). Example: Suppose E is the event that a bit string begins with a 1 and F is the event that there is are an even number of 1’s. Suppose the bit strings are of length 3. There are 4 bit strings beginning with 1: {100, 101, 110, 111}. There are 3 strings with an even number of 1’s: {101, 110, 011}. Hence, p(E) = 0.5 and p(F) = 0.375. EF = {101, 110}, so p(EF) = 0.25. Thus, p(EF) p(E)p(F). Hence, E and F are not independent. 104 Note: For bit strings of length 4, 0.25 = p(EF) = (0.5)(0.5) = p(E)p(F), so the events are independent. We can speculate on whether or not the even/odd length of the bit strings plays a part in the independence characteristic. Definition: Each performance of an experiment with exactly two outcomes, denoted success (S) and failure (F), is a Bernoulli trial. Definition: The Bernoulli distribution is denoted b(k; n,p) = C(n,k)pkqn-k. Theorem: The probability of exactly k successes in n independent Bernoulli trials, with probability of success p and failure q = 1 – p is b(k; n,p). Proof: When n Bernoulli trials are carried out, the outcome is an n-tuple (t1, t2, …, tn), all n ti{S, F}. Due to the trials independence, the probability of each outcome having k successes and n-k failures is pkqn-k. There are C(n,k) possible tuples that contain exactly k successes and n-k failures. 105 Example: Suppose we generate bit strings of length 10 such that p(0) = 0.7 and p(1) = 0.3 and the bits are generated independently. Then b(8; 10,0.7) = C(10,8)(0.7)8(0.3)2 = 450 .08235430.09 = 0.3335 b(7; 10,0.7) = C(10,7)(0.7)7(0.3)3 = 1200 .057648010.027 = 0.1868 n Theorem: k=0 b(k;n,p) = 1. n n Proof: k=0 b(k;n,p) = k=0 C(k; n,p)pkqn-k = (p+q)n = 1 . Definition: A random variable is a function from the sample space of an experiment to the set of reals. Notes: A random variable assigns a real number to each possible outcome. A random function is not a function nor random. 106 Example: Flip a fair coin twice. Let X(t) be the random variable that equals the number of tails that appear when t is the outcome. Then X(HH) = 0, X(HT) = X(TH) = 1, and X(TT) = 2. Definition: The distribution of a random variable X on a sample space is the set of pairs (r, p(X=r)) rX(S), where p(X=r) is the probability that X takes the value r. Note: A distribution is usually described by specifying p(X=r) rX(S). Example: For our coin flip example above, each outcome has probability 0.25. Hence, p(X=0) = 0.25, p(X=1) = 0.5, and p(X=2) = 0.25. 107 Definition: The expected value (or expectation) of the random variable X(s) in the sample space S is E(X)=sSp(s)X(s) . n Note: If S = {xi}n , then E(X) = i=1p(xi )X(xi ) . i=1 Example: Roll a die. Let the random variable X take the valuess 1, 2, …, 6 with probability 1/6 each. Then E = i=1 1 = 3.5 . This is not really what you would n 6 like to see since the die does not a 3.5 face. Theorem: If X is a random variable and p(X=r) is the probability that X=r so that p(X=r) = rS,X(s)=r p(s) , then E(X) = rX(S)p(X=r)r . Proof: Suppose X is a random variable with range X(S). Let p(X=r) be the probability that X takes the value r. Hence, p(X=r) is the sum of probabilities of outcomes s such that X(s)=r Finally, E(X) = rX(S)p(X=r)r . 108 Theorem: If Xi, 1in, are random variables on S and if a,bR, then 1. E(X1+X2+…+Xn) = E(X1)+E(X2)+…+E(Xn) 2. E(aXi+b) = aE(Xi) + b Proof: Use mathematical induction (base case is n=2) for 1 and using the definitions for 2. Note: The linearity of E is extremely convenient and useful. Theorem: The expected number of successes when n Bournoulli trials is performed when p is the probability of success on each trial is np. Proof: Apply 1 from the previous theorem. 109 Notes: The average case complexity of an algorithm can be interpreted as the expected value of a random variable. Let S={ai}, where each possible input is an ai. Let X be the random variable such that X(ai) = bi, the number of operations for the algorithm with input ai. We assign a probability p(ai) based on bi. Then the average case complexity is E(X) = a S p(ai )X(ai ) . i Estimating the average complexity of an algorithm tends to be quite difficult to do directly. Even if the best and worst cases can be estimated easily, there is no guarantee that the average case can be estimated without a great deal of work. Frankly, the average case is sometimes too difficult to estimate. Using the expected value of a random variable sometimes simplifies the process enough to make it doable. 110 Example of linear search average complexity: See page 44 in the class notes for the algorithm and worst case complexity bound. We want to find x in a distinct n n set a i i=1 . If x = ai, then there are 2i+1 comparisons. If x a i i=1 , then there are n 2n+2 comparisons. There are n+1 input types: a i i=1 x. Clearly, p(ai) = p/n, n where p is the probability that x a i i=1 . Let q = 1p. So, E = (p/n) n (2i-1) + (2n+2)q i=1 = (p/n)((n+1)2 + (2n+2)q = p(n+2) + (2n+2)q. There are three cases of interest, namely, p = 1, q = 0: E = n + 1 p = q = 0.5: E = (3n + 4) / 2 p = 0, q = 1: E = 2n + 2 111 Definition: A random variable X has a geometric distribution with parameter p if p(X=k) = (1p)k-1p for k = 1, 2, … Note: Geometric distributions occur in studies about the time required before an event happens (e.g., time to finding a particular item or a defective item, etc.). Theorem: If the random variable X has a geometrix distribution with parameter p, then E(X) = 1/p. Proof: • E(X) = i=1 ip(X=i) • = i=1 i(1-p)i-1p • = p i=1 i(1-p)i-1 = pp-2 = 1/p 112 Definition: The random variables X and Y on a sample space are independent if p(X(s)=r1 and Y(S)=r2) = p(X(S)=r1)p(Y(S)=r2). Theorem: If X and Y are independent random variables on a space S, then E(XY) = E(X)E(Y). Proof: From the definition of expected value and since X and Y are independent random variables, E(XY) = sS X(s)Y(s)p(s) = rX(S),tY(S)rtp(X(s)=r and Y(s)=t) = rX(S),tY(S)rtp(X(s)=r)p(Y(s)=t) = rX(S)rp(X(s)=r) tY(S) tp(Y(s)=t) = E(X)E(Y). 113 Third Assumption: Not all problems can be solved using deterministic algorithms. We want to assess the probability of an event based on partial evidence. Note: Some algorithms need to make random choices and produce an answer that might be wrong with a probability associated with its likelihood of correctness or an error estimate. Monte Carlo algorithms are examples of probabilistic algorithms. Example: Consider a city with a lattice of streets. A drunk walks home from a bar. At each intersection, the drunk must choose between continuing or turning left or right. Hopefully, the drunk gets home eventually. However, there is no absolute guarantee. 114 Example: You receive n items. Sometimes all n items are guaranteed to be good. However, not all shipments have been checked. The probability that an item is bad in an unchecked batch is 0.1. We want to determine whether or not a shipment has been checked, but are not willing to check all items. So we test items at random until we find a bad item or the probability that a shipment seems to have been checked is 0.001. How items do we need to check? The probability that an item is good, but comes from an unchecked batch is 10.1 = 0.9. Hence, the kth check without finding a bad item, the probability that the items comes from an unchecked shipment is (0.9)k. Since (0.9)66~0.001, we must check only 66 items per shipment. Theorem: If the probability that an element of a set S does have a particular property is in (0,1), then there exists an element in S with this property. 115 Bayes Theorem: Suppose that E and F are events from a sample space S such that p(E) 0 and p(F) 0. Then p(F|E) = p(E|F)p(F) / (p(E|F)p(F) + p(E| F )p( F )). Generalized Bayes Theorem: Suppose that E is an event from a sample space n and that F1, F2, …, Fn are mutually exclusive events such that Ui=1Fi = S . Assume that p(E) 0 and p(Fi) 0, 1in. Then n p(Fj|E) = p(E| Fj)p(Fj) / i=1 p(E|Fi )p(Fi ) . 116 Example: We have 2 boxes. The first box contains 2 green and 7 red balls. The second box contains 4 green and 3 red balls. We select a box at random, then a ball at random. If we picked a red ball, what is the probability that it came from the first box? Let E be the event that we chose a red ball. Thus, E is the event that we chose a green ball. Let F be the event that we chose a ball from the first box. Thus, F is the event that we chose a ball from the second box. p(F) = p( F ) = 0.5 since we pick a box at random. We want to calculate p(F|E) = p(EF) / p(E), which we will do in stages. p(E|F) = 7/9 since there are 7 red balls out of 9 total in box 1. p(E| F ) = 3/7 since there are 3 red balls out of a total of 7 in box 2. p(EF) = p(E|F)p(F) = 7/18 = 0.389 and p(E F ) = p(E| F )p( F ) = 3/14. We need to find p(E). We do this by observing that E = (EF)(E F ), where EF and E F are disjoint sets. So, p(E) = p(EF)+p(E F ) = 0.603. p(F|E) = p(EF) / p(E) = 0.389 / 0.603 = 0.645, which is greater than the 0.5 from the second bullet above. We have improved our estimate! 117 Example: Suppose one person in 100,000 has a particular rare disease and that there is an accurate diagnostic test for this disease. The test is 99% accurate when given to someone with the disease and is 99.5% accurate when given to someone who does not have the disease. We can calculate (a) the probability that someone who tests positive has the disease, and (b) the probability that someone who tests negative does not have the disease. Let F be the event that a person has the disease and let F be the event that this person tests positive. We will use Bayes theorem to calculate (a) and (b), so have to calculate p(F), p( F ), p(E|F), and p(E| F ). p(F) = 1 / 100000 = 105 and p( F ) = 1 p(F) = 0.99999. p(E|F) = 0.99 since someone who has the disease tests positive 99% of the time. Similarly, we know that a false negative is p( E |F) = 0.01. Further, p( E | F ) = 0.995 since the test is 99.5% accurate for someone who does not have the disease. p(E| F ) = 0.005, which is the probability of a false negative (100 99.5%). 118 Now we calculate (a): p(F|E) = p(E|F)p(F) / (p(E|F)p(F) + p(E| F )p( F )) = (0.99105) / (0.99105 + 0.0050.99999) = 0.002. Roughly 0.2% of people who test positive actually have the disease. Getting a positive should not be an immediate cause for alarm (famous last words). Now we calculate (b): p( F | E ) = p( E | F )p( F ) / (p( E | F )p( F ) + p( E |F)p(F)) (0.9950.99999) / (0.9950.99999 + 0.01105) = 0.9999999. Thus, 99.99999% of people who test negative really do not have the disease. 119 Bayesian Spam Filters used to be the first line of defense for email programs. Like many good things, the spammers ran right over the process in about two years. However, it is an interesting example of useful discrete mathematics. The filtering involves a training period. Email messages need to be marked as Good or Bad messages, which we will denote as being the G or B sets. Eventually the filter will mark messages for you, hopefully accurately. The filter finds all of the words in both sets and keeps a running total of each word per set. We construct two functions nG(w) and nB(w) that return the number of messages containing the word w in the G and B sets, respectively. We use a uniform distribution. The empirical probability that a spam message contains the word w is p(w) = nB(w) / |B|. The empirical probability that a non- spam message contains the word w is q(w) = nG(w) / |G|. We can use p and q to estimate if an incoming message is or is not spam based on a set of words that we build dynamically over time. 120 Let E be the event that an incoming message contains the word w. Let S be the event that an incoming message is spam and contains the word w. Bayes theorem tells us that the probability that an incoming message containing the word w is spam is p(S|E) = p(E|S)p(S) / (p(E|S)p(S) + p(E| S )p( S )). If we assume that p(S) = p( S ) = 0.5, i.e., that any incoming message is equally likely to be spam or not, then we get the simplified formula p(S|E) = p(E|S) / (p(E|S) + p(E| S )). We estimate p(E|S) = p(w) and p(E| S ) = q(w). So, we estimate p(S|E) by r(w) = p(w) / (p(w) + q(w)). If r(w) is greater than some preset threshold, then we classify the incoming message as spam. We can consider a threshold of 0.9 to begin with. 121 Example: Let w = Rolex. Suppose it occurs in 250 / 2000 spam messages and in 5 / 1000 good messages. We will estimate the probability that an incoming message with Rolex in it is spam assuming that it is equally likely that the incoming message is spam or not. We know that p(Rolex) = 250 / 2000 = 0.125 and q(Rolex) = 5 / 1000 = 0.005. So, r(Rolex) = 0.125 / (0.125 + 0.005) = 0.962 > 0.9. Hence, we would reject the message as spam. (Note that some of us would reject all messages with the word Rolex in it as spam, but that is another case entirely.) 122 Using just one word to determine if a message is spam or not leads to excessive numbers of false positives and negatives. We actually have to use the generalized Bayes theorem with a large set of words. k p(S | I k E)= i=1 p(Ei|S) , i=1 i k k i=1 p(Ei|S)+ i=1p(Ei|S) which we estimate assuming equal probability that an incoming message is spam or not by k r(w1,w1,...,w1) = i=1 p(wi ) . k k i=1 p(wi )+ i=1q(wi ) 123 Example: The word w1 = stock appears in 400 / 2000 spam messages and in just 60 / 1000 good messages. The word w2 = undervalued appears in 200 / 2000 spam messages and in just 25 / 1000 good messages. Estimate the likelihood that an incoming message with both words in it is spam. We know p(stock) = 0.2 and q(stock) = 0.06. Similarly, p(undervalued) = 0.1 and q(undervalued) = .025. So, p(stock)p(undervalued) r(stock,undervalued) = p(stock)p(undervalued)+q(stock)q(undervalued) = 0.2 0.1 0.2 0.1+0.06 0.025 = 0.930 > 0.9 Note: Looking for particular pairs or triplets of words and treating each as a single entity is another method for filtering. For example, enhance performance probably indicates spam to almost anyone, but high performance computing probably does not indicate spam to someone in computational sciences (but probably will for someone working in, say, Maytag repair). 124 Advanced Counting Principles Definition: A recurrence relation for the sequence {an} is the equation that expresses an in terms of one or more of the previous terms in the sequence. A sequence is called a solution to a recurrence relation if its terms satisfy the recurrence relation. The initial conditions specify the values of the sequence before the first term where the recurrence relation takes effect. Note: Recursion and recurrence relations have a connection. A recursive algorithm provides a solution to a problem of size n in terms of a problem size n in terms of one more instances of the same problem, but of smaller size. Complexity analysis of the recursive algorithm is a recurrence relation on the number of operations. Example: Suppose we have {an} with an = 3n, nN. Is this a solution for an = 2an-1 an-2 for n2? Yes, since for n2, 2an-1 an-2 = 2(3(n1)) – 3(n2) = 3n = an. 125 Example: Suppose in 1977 you invested $100,000 into a tax free, 30 year municipal bond that paid 15% per year. What is it worth at maturity? Did it beat inflation and if so, by how much? P0 = 100000 P1= 1.15P0 P2 = 1.15P1 = (1.15)2P0 Pi = (1.15)iP0, which can be rigorously proven using mathematical induction. P30 = (1.15)30P0 = $6,621,180 This is a big number. What about inflation? We can find the consumer price increase (CPI) monthly and yearly on the Internet, e.g., http://inflationdata.com. Consider just the yearly CPI to make the comparison fairer. {Ij} the CPI per year 30 Bj = j=1I j = $354,580. Investing your money in a bank that just beat inflation would have been a huge investing error. 15% seems high, but that existed back then due to high inflation. 126 Fibonacci Example: A young pair of rabbits (1 male, 1 female) arrive on a deserted island. They can breed after they are two months old and produce another pair. Thereafter each pair at least two months old can breed once a month. How many pairs fn of rabbits are there after n months. n = 1: f1 = 1 Initial n = 2: f2 = 1 conditions n > 2: fn = fn-1 + fn-2 Recurrence relation The n > 2 formula is true since each new pair comes from a pair at least 2 months old. Example: For bit strings of length n 3, find the recurrence relation and initial conditions for the number of bit strings that do not have two consecutive 0’s. n = 1: a1 = 2 Initial {0,1} n = 2: a2 = 3 conditions {01,10,11} n > 2: an = an-1 + an-2 Recurrence relation For n > 2, there are two cases: strings ending in 1 (thus, examine the n1 case) and strings ending in 10 (thus, examine the n2 case). 127 Definition: A linear homogeneous recurrence relation of degree k with constant coefficients is a recurrence relation of the form an = c1an1 + c2an2 + … + ckank, where {ci}R. Motivation for study: This type of recurrence relation occurs often and can be systematically solved. Slightly more general ones can be, too. The solution methods are related to solving certain classes of ordinary differential equations. Notes: Linear because the right hand side is a sum of previous terms. Homogeneous because no terms occur that are not multiples of aj’s. Constant because no coefficient is a function. Degree k because an is defined in terms of the previous k sequential terms. 128 Examples: Typical ones include Pn = 1.15Pn-1 is degree 1. fn = fn-1 + fn-2 is degree 2. an = an-5 is degree 5. Examples: Ones that fail the definition include an = an-1 + a2 is nonlinear. n-2 Hn = 2Hn-1 + 1 is nonhomogeneous. Bn = nBn-1 is variable coefficient. We will get to nonhomogeneous recurrence relations shortly. 129 Solving a recurrence relation usually assumes that the solution has the form an = rn, where rC, if and only if rn = c1rn-1 + c2rn-2 + … + cn-krn-k. Dividing both sides by rn-k to simplify things, we get Definition: The characteristic equation is rk c1rk-1 c2rk-2 … cn-k = 0. Then {an} with an = rn is a solution if and only if r is a solution to the characteristic equation. The proof is quite involved. The n = 2 case is much easier to understand, yet still multiple cases. 130 Theorem: Assume c1,c2,1,2R and r1,r2C. Suppose that r2c1rc2 = 0 has two distinct roots r1 and r2. Then the sequence {an} is a solution to the recurrence relation an = c1an-1 + c2an-2 if and only if an = 1r1 + 2r2 for nN0. n n Example: a0 = 2, a1 = 7, and an = an-1 + 2an-2 for n2. Then Characteristic equation: r2 – r – 2 = 0 or (r2)(r1) = 0. Roots: r1 = 2 and r2 = 1. Constants: a0 = 2 = 1 + 2 and a1 = 7 = 21 2. 1 1 1 = 2 or 1 = 3 . Solve 2 -1 2 7 2 -1 Solution: an = 32n + (1)n. Matlab or Maple is essential to solving recurrence relations quickly and accurately. 131 Fibonacci Example: f0 = 0, f1 = 1, and fn = fn-1 + fn-2, n2. Characteristic equation: r2 – r – 1 = 0. Roots: r1 = 1+ 5 and r2 = 1- 5 . 2 2 Set up a 22 matrix problem to solve for 1 and 2, which are 1 = 1 and 5 2 = 1 . 5 n n Solution: fn = 1 1+ 5 1 1 5 . 5 2 5 2 132 Now comes the second case for n = 2. Theorem: Assume c1,c2,1,2R and r0C. Suppose that r2c1rc2 = 0 has one root r0 with multiplicity 2. Then the sequence {an} is a solution to the recurrence relation an = c1an-1 + c2an-2 if and only if an = 1r0 + 2nr0 for nN0. n n Example: a0 = 1, a1 = 6, and an = 6an-1 9an-2 for n2. Then Characteristic equation: r2 6r + 9 = 0 or (r3)2 = 0. Double root: r0 = 3. Constants: a0 = 1 = 1 and a1 = 6 = 31 + 32. 1 0 1 = 1 or 1 = 1 . Solve 3 3 2 6 2 1 Solution: an = (n+1)3n. 133 Theorem: Let {ci}k , {i}k R and {ri}k C. Suppose the characteristic i=i i=i i=i equation rk – c1rk1 … ck = 0 has k distinct roots ri, 1ik. Then the sequence {an} is a solution of the recurrence relation an = c1an1 + c2an2 + … + ckank if and only if an = 1r1 + 2r2 + ... + krk for nN0. n n n Example: a0 = 2, a1 = 5, a2 = 15, and an = 6an1 11an2 + 6an3, n3. Characteristic equation: r3 6r2 +11r 6 = 0 or (r1)(r2)(r3) = 0. Roots: r1 = 1, r2 = 2, and r3 = 3. Constants: a0 = 2 = 1 + 2 + 3, a2 = 5 = 1 + 22 + 33, and a0 = 15 = 1 + 42 + 93. 1 1 1 1 1 2 1 Solve 1 2 3 2 = 5 or 2 = 1 . 3 3 1 4 9 2 15 Solution: an = 1 2n + 23n. 134 Theorem: Let {ci}k , {i}k R and {ri}k C. Suppose the characteristic i=i i=i i=i equation rk – c1rk1 … ck = 0 has t distinct roots ri, 1it, with multiplicities t miN such that i=1mi = k . Then the sequence {an} is a solution of the recurrence relation an = c1an1 + c2an2 + … + ckank if and only if m11 n an = (1,0 +1,1n+...+1,m 1n )r1 + ... + ( t,0 + t,1n+...+ t,m 1nmt 1)rtn 1 t for nN0 and all i,j, 1it and 0jmi1. Example: Suppose the roots of the characteristic equation are 2, 2, 3, 3, 3, 5. Then the general solution form is (1,0+1,1n)2n + (2,0+2,1n+2,2n2)3n + 3,05n. With given initial conditions, we can even compute the ’s. 135 Definition: A linear nonhomogeneous recurrence relation of degree k with constant coefficients is a recurrence relation of the form an = c1an1 + c2an2 + … + ckank + F(n), where {ci}R. Theorem: If {a(p)} is a particular solution of the recurrence relation with n constant coefficients an = c1an1 + c2an2 + … + ckank + F(n), then every solution is of the form {a(p)+a(h)} , where {a(h)} is a solution of the associated n n n homogeneous recurrence relation (i.e., F(n) = 0). Note: Finding particular solutions for given F(n)’s is loads of fun unless F(n) is rather simple. Usually you solve the homogeneous form first, then try to find a particular solution from that. 136 Theorem: Assume {bi},{ci}R. Suppose that {an} satisfies the nonhomogeneous recurrence relation an = c1an1 + c2an2 + … + ckank + F(n) and f(n) = (btnt + bt-1nt-1 + … + b1n + b0)sn. When s is not a root of the characteristic equation of the associated homogeneous recurrence relation, there is a particular solution of the form (ptnt + pt-1nt-1 + … + p1n + p0)sn. When s is a root of multiplicity m of the characteristic equation, there is a particular solution of the form nm(ptnt + pt-1nt-1 + … + p1n + p0)sn. Note: If s = 1, then things get even more complicated. 137 Example: Let an = 6an-1 – 9an-2 + F(n). When F(n) = 0, the characteristic equation is (r3)2. Thus, r0 = 3 with multiplicity 2. F(n) = 3n: particular solution is n2p03n. F(n) = n3n: particular solution is n2(p1n + p0)3n. F(n) = n22n: particular solution is (p2n2 + p1n + p0)2n. F(n) = (n+1)3n: particular solution is n2(p2n2 + p1n + p0)3n. Definition: Suppose a recursive algorithm divides a problem of size n into m subproblems of size n/m each. Also suppose that g(n) extra operations are required to combine the m subproblems into a solution of the problem of size n. If f(n) is the cost of solving a problem of size n, then the divide and conquer recurrence relation is f(n) = af(n/b) + g(n). We can easily work out a general cost for the divide and conquer recurrence relation using Big-Oh notation. 138 Divide and Conquer Theorem: Let a,b,cR and be nonnegative. The solution to the recurrence relation c, for n = 1, f(n) = af(n/b)+cnd, for n > 1, for n a power of b is O(nd ), for a < bd, f(n)= O(ndlogn), for a = bd, logba d O(n ), for a > b . log n Proof: If n is a power of b, then for r = a/b, f(n) = cn i=1b r i . There are 3 cases: • i a < bd: Then i=0 r converges, so f(n) = O(nd). a = bd: Then each term in the sum is 1, so f(n) = O(ndlogn). 1+logbn d d logbn r i = cnd r a > b : Then cn i=1 -1 which is O( alogbn ) or O( nlogba ). r-1 139 Example: Recall binary search (see page 45 in the class notes). Searching for an element in a set requires 2 comparisons to determine which half of the set to search further. The search keeps halving the size of the set until at most 1 element is left. Hence, f(n) = f(n/2) + 2. Using the Divide and Conquer theorem, we see that the cost is O(logn) comparisons. Example: Recall merge sort (see pages 81-83 in the class notes). This sorts halves of sets of elements and requires less than n comparisons to put the two sorted sublists into a sorted list of size n. Hence, f(n) = 2f(n/2) + n. Using the Divide and Conquer theorem, we see that the cost is O(nlogn) comparisons. Multiplying integers can be done recursively based on a binary decomposition of the two numbers to get a fast algorithm. The patent on this technique, implemented in hardware, made a computer company several billion dollars back when a billion dollars was real money (cf. a trillion dollars today). Why stop with integers? The technique extends to multiplying matrices, too, with real, complex, or integer entries. 140 Example (funny integer multiplication): Suppose a and b have 2n length binary representations a = (a2n1a2n2… a1a0)2 and a = (b2n1b2n2… b1b0)2. We will divide a and b into left and right halves: a = 2nA1 + A0 and , where b = 2nB1 + B0 and A1 = (a2n1a2n2…an+1an)2 and A0 = (an-1an2…a1a0)2, B1 = (b2n1b2n2…bn+1bn)2 and B0 = (bn-1bn2…b1b0)2. The trick is to notice that ab = (22n+2n)A1B1 + 2n(A1A0)(B0B1) + (2n+1)A0B0. Only 3 multiplies plus adds, subtracts, and shifts are required. So, f(2n) = 3f(n) + Cn, where C is the cost of the adds, subtracts, and shifts. The Divide and Conquer theorem tells us this O(nlog3), which is about O(n1.6). The standard algorithm is O(n2). It might not seem like much of an improvement, but it actually is when lots of integers are multiplied together. The trick can be applied recursively on the three multiplies in the ab line (halving 2n in the recursion). 141 Example (Strassen-Winograd Matrix-Matrix multiplication): We want to multiply A: mk by B: kn to get C: mn. The matrix elements can be reals, complex numbers, or integers. When m = k = n, this takes O(n3) operations using the standard matrix-matrix multiplication algorithm. However, Strassen first proposed a divide and conquer algorithm that reduced the exponent. The belief is that someday, someone will devise an O(n2) algorithm. Some hope it will even be plausible to use such an algorithm. The variation of Strassen’s algorithm that is most commonly implemented by computer vendors in high performance math libraries is the Winograd variant. It computes the product as A11 A12 B11 B12 C11 C12 = . A21 A22 B21 B22 C21 C22 C is computed in 22 steps involving the submatrices of A, B, and intermediate temporary submatrices. An interesting question for many years was how little extra memory was needed to implement the Strassen-Winograd algorithm (see C. C. Douglas, M. Heroux, G. Slishman, and R. M. Smith, GEMMW: A 142 portable Level 3 BLAS Winograd variant of Strassen's matrix-matrix multiply algorithm, Journal of Computational Physics, 110 (1994), pp. 1-10 for an answer). The 22 steps are the following: Step Wmk C11 C12 C21 C22 Wkn Operation 1 S7 B22B12 2 S3 A11A21 3 M4 S3S7 4 S1 A21+A22 5 S5 B12B11 6 M5 S1S5 7 S6 B22S5 8 S2 S1A11 9 M1 S2S6 10 S4 A12S2 11 M6 S4B22 143 Step Wmk C11 C12 C21 C22 Wkn Operation 12 T3 M5+M6 13 M2 A11B11 14 T1 M1+M2 15 C12 T1+T3 16 T2 T1+M4 17 S8 S6B21 18 M7 A22S8 19 C21 T2M7 20 C22 T2+M5 21 M3 A12B21 22 C11 M2+M3 There are four tricky steps in the table above, depending on whether or not k is even or odd. Each step makes certain that we do not use more memory than is allocated for a submatrix or temporary. For example, 144 In step 4, we have to take care that with S1. (a) If k is odd, then copy the first column of A21 into Wmk. (b) Complete S1. In step 10, we have to take care that with S4. (a) If k is odd, then pretend the first column of A21 = 0 in Wmk. (b) Complete S4. In step 11, we have to take care that with M6. (a) If m is odd, then save the first row of M5. (b) Calculate most of M6. (c) Complete M6 using (a) based on whether or not m is odd. In step 21, we have to take care that with M3. (a) Caluclate M3 using an index shift. This all sounds very complicated. However, the code GEMMW that is readily available on the Web effectively is implemented in 27 calls to subroutines that do the matrix operations and actually implements C = op(A)op(B) + C, where op(X) is either X, X transpose, X conjugate, or X conjugate transpose. 145 What is the total cost? There are 7 submatrix-submatrix multiplies and 15 submatrix-submatrix adds or subtracts. So the cost is f(n) = 7f(n/2) + 15n2/4 when m=k=n. This is actually an O(n2.807logn) algorithm, where log27 = 2.807. The work area Wmk needs ((m+1)max(k,n)+m+4)/4 space. The work area Wkn needs ((k+1)n+n+4)/4 space. If C overlaps A or B in memory, an additional mn space is needed to save C before calculating C when 0. The maximum amount of extra memory is bounded by (mmax(k,n)+kn)/3+(m+max(k,n)+k+3n)/2+32+mn. Hence, the overall extra storage is cN2/3, where c{2,5}. Typical memory usage when m=k=n is o 0 or A or B overlap with C: 1.67N2. o =0 and A and B do not overlap with C: 0.67N2. 146 Definition: The (ordinary) generating function for a sequence a1, a2, …, ak, … of real numbers is the infinite series G(x) = k=0 akxk . For a finite sequence n {a k }n , the generating function is G(x) = k=0 akxk . k=0 Examples: 1. ak = 3, G(x) = 3k=0 xk . 2. ak = k+1, G(x) = k=0 (k+1)xk . 3. ak = 2k, G(x) = k=0 (2x)k . G(x) = k=0 xk = x 1 . 2 3 4. ak = 1, 0k2, x 1 Notes: x is a placeholder, so that G(1) in example 4 above is undefined does not matter. We do not have to worry about convergence of the series, either. 147 When solving a series using calculus, knowing the ball of convergence for the x’s is required. Lemma: f(x) = (1ax)1 is the generating function for the sequence 1, (ax), (ax)2, • …, (ax) , … since for a0 and |ax|<1, k (1-ax)1 = k=0 (ax)k . Theorem: If f(x) = k=0 akxk and g(x) = k=0 bkxk and f and g share the same ball of convergence, then k f(x) + g(x) = k=0 (ak +bk )xk and f(x)g(x) = k=0 ( j=0 a jbk-j)xk . Example: Let f(x) = (1-x)2 be the generating function. What is the sequence? Consider the sequence 1, 1, …, 1, …, which has a generating function of g(x) = (1-x)1. We can use the previous theorem to answer our question: • k • (1-x)2 = k=0 ( j=01)xk = k=0 (k+1)xk or ak = k+1. 148 u Definition: The extended binomial coefficient for uR and kN0 is defined k by u u(u-1)L (u-k+1)/k! if k > 0, = k 1 if k = 0. Extended Binomial Theorem: If u,xR such that |x|<1, then • u k (1+x)u = k=0 k x . Examples: 1. .5 = (.5)(.5)/2! = .125 . 2 2. n = (1)r n+r 1 = (-1)rC(n+r 1,r) for nN. r r 149 3. if uN, then the extended binomial theorem is equivalent to the binomial u theorem since = 0 when k>u. k • 4. (1 x)n = k=0 C(n+k 1,k)xk (uses examples 2 and 3). Other Useful Generating Functions: 1 xn+1 = n xk . 1 x k=0 • (1(ax)r )1 = k=0 (ax)rk . • (1(ax)r )n = k=0 C(n+k 1,k)(ax)rk . • (1+(ax)r )n = k=0 C(n,k)(ax)rk . • (1+(ax)r )n = k=0 (1)kC(n+k 1,k)(ax)rk . • xk e x = k=0 k! . • (1)k xk ln(x+1) = k=0 . k 150 Note: Generating functions can be used to solve many counting problems. Examples: How many solutions are there to the constrained problem a+b = 9 for 3a5 and 4b6? There are 3 total. The number of solutions with the constraints is the coefficient of x9 in (x3+x4+x5)(x4+x5+x6). We choose xa and xb from the two factors, respectively, so that a+b = 9. By inspection, there are only 3 choices for a and b. How many ways can 8 CPUs be distributed in 3 servers if each server gets 2-4 CPUs each? The generating function is f(x) = (x2+x3+x4)3. We need the coefficient of x8 in f(x). Expansion of f(x) gives us 6 ways. Note: Maple or Mathematica is really useful in the examples above. 151 Note: Generating functions are useful in solving recurrence relations, too. Example: ak = 3ak1, k > 0 with a0 = 2. Let f(x) = k=0 akxk be the generating function for {ak}. Then xf(x) = k=1ak1xk . Using the recurrence relation directly, we have • • f(x) – 3xf(x) = akxk 3 k=1ak1xk k=0 • = a0 + k=1(ak 3ak1)xk = a0 =2 Hence, f(x) 3xf(x) = (13x)f(x) = 2 or f(x) = 2 / (13x). Using the identity for (1ax)1, we see that • f(x) = k=0 23k xk or ak = 23k . 152 Example: an = 8an1 + 10n1 with a0 = 1, which gives us a1 = 9. Find an in closed form. First multiply the recurrence relation by xn to give us anxn + 8an1xn + 10n-1xn . If f(x) = k=0 akxk , then • f(x) 1 = a xk k=1 k • = k=1 (8ak-1xk +10k-1xk ) = 8xf(x) + x/(110x) Hence, f(x) = 1 9x (1 8x)(110x) = 1 1 + 1 2 1 8x 110x = k=0 1 8k +10k xk • 2 or an = .5(8k+10k). 153 Note: It is possible to prove many identities using generating functions. Exclusion-Inclusion Theorem: Given sets Ai, 1in, the number of elements in the union is n n n U A i=1 i A i=1 i 1Ši<jŠn Ai I A j n + 1Ši<j<kŠn Ai I A j I Ak … n + (1)k I A i=1 i and there are 2n1 terms in the formula. Note: Venn diagrams motivate the above theorem. 154 Example: A factory produces vehicles that are car or truck based: 2000 could be cars, 4000 could be trucks, and 3200 are SUV’s, which can be car or truck based (depending on the frames). How many vehicles were produced? Let A1 be the number of cars and A2 be the number of trucks. There are A1 UA2 = A1 + A2 A1 I A2 = 2000 + 4000 3200 = 2800 . Theorem: The number of onto functions from a set of m elements to a set of n elements with m,nN is nm C(n,1)(n1)m1 + C(n,2) )(n1)m1 … + (1)n1C(n,n1). 155 Definition: A derangement is a permutation of objects such that no object is in its original position. Theorem: The number of derangements of a set of n elements is Dn = 1 n k 1 n! (1) k=1 k! Example: I hand back graded exams randomly. What is the probability that no student gets his or her own exam? It is Pn = Dn / n! since there are n! possible permutations. As n, Pne1. 156 Relations Definition: A relation on a set A is a subset of AA. Definition: A binary relation between two sets A and B is a subset of AB. It is a set R of ordered pairs, denoted aRb when (a,b)R and aRb when (a,b)R. Definition: A n-ary relation on n sets A1, …, An is a subset of A1…An. Each Ai is a domain of the relation and n is the degree of the relation. Examples: Let f: AB be a function. Then the ordered pairs (a,f(a)), aA, forms a binary relation. Let A = {Springfield} and B = {U.S. state | Springfield in the state}. Then (Springfield,U.S. states) is a relation with about 44 elements (the so-called Simpsons relation). Theorem: Let A be a set with n elements. There are 2n2 unique relations on A. 157 Proof: We know there are n2 elements in AA and that there are 2m possible subsets of a set with m elements. Hence, the result. Definitions: Consider a relation R on a set A. Then R is reflexive if (a,a)R, aA. R is symmetric if (a,b)R and (b,a)R, a,bA. R is antisymmetric if (a,b)R and (b,a)R, then a=b, a,bA. R is transitive if (a,b)R and (b,c)R, then (a,c)R, a,b,cA. Theorem: Let A be a set with n elements. There are 2n(n1) unique transitive relations on A. Proof: Each of the n pairs (a,a)R. The remaining n(n1) pairs may or may not be in R. The product rule and previous theorem give the result. 158 Examples: Let A = {1, 2, 3, 4}. R1 = {(1,1), (1,2), (2,1), (2,2), (3,4), (4,1), (4,4)} is o just a relation R2 = {(1,1), (1,2), (2,1)} is o symmetric R3 = {(1,1), (1,2), (1,4), (2,1), (2,2), (3,3), (4,1), (4,4)} is o reflexive and symmetric R4 = {(2,1), (3,1), (3,2), (4,1), (4,2), (4,3)} is o antisymmetric and transitive R5 = {(1,1), (1,2), (1,3), (1,4), (2,1), (2,2), (2,3), (2,4), (3,3), (3,4), (4,1), (4,4)} is o reflexive, antisymmetric, and transitive R6 = {(3,4)} is o antisymmetric Note: We will come back to these examples when we get around to representations of relations that work in a computer. 159 Note: We can combine two or more relations to get another relation. We use standard set operations (e.g., , , , , …). Definition: Let R be a relation on a set A to B and S a relation on B to a set C. Then the composite of R and S is the relation SoR such that if (a,b)R and (b,c)S, then (a,c) SoR , where aA, bB, and cC. Definition: Let R be a relation on a set A. Then Rn is defined recursively: R1 = R and Rn =Rn1 oR , n>1. Theorem: The relation R is transitive if and only if RRn, n1. 160 Representation: The relation R from a set A to a set B can be represented by a zero-one matrix MR = [mij], where 1 if (a i,b j)R, mij= 0 if (a ,b )R. i j Notes: This is particularly useful on computers, particularly ones with hardware bit operations for packed words. MR contains I for reflexive relations. MR = MR for symmetric relations. T mij = 0 or mji = 0 when ij for antisymmetric relations. 161 Examples: 1 1 0 MR = 1 1 1 is transitive and symmetric. 0 1 1 0 1 0 MR = 0 0 0 is antisymmetric. 0 1 0 162 Representation: A relation can be represented as a directed graph (or digraph). For (a,b)R, a and b are vertices (or nodes) in the graph and a directional edge runs from a to b. Example: The following digraph represents {(a,b), (b,c), (c,a), (c,b)}. a b c What about all of those examples on page 159 of the class notes? We can do all of them over in either representation. 163 Examples (from page 159): 1 1 0 0 M =1 1 0 0 R1 0 0 0 1 1 0 0 1 1 1 0 0 M =1 0 0 0 or a digraph a a2 R2 1 0 0 0 0 0 0 0 0 1 1 0 1 M =1 1 0 0 R3 0 0 1 0 1 0 0 1 164 0 0 0 0 MR = 1 0 0 0 4 1 1 0 0 1 1 1 0 1 1 1 1 MR = 1 1 1 1 5 0 0 1 0 1 0 0 1 0 0 0 0 MR = 0 0 0 0 or the digraph a a4 3 6 0 0 0 1 0 0 0 0 165 Definition: A relation on a set A is an equivalence relation if it is reflexive, symmetric, and transitive. Two elements a and b that are related by an equivalence relation are called equivalent and denoted a~b. Examples: Let A = Z. Define aRb if and only if either a = b or a = b. o symmetric: aRa since a = a. o reflexive: aRb bRa since a = b. o transitive: aRb and bRc aRc since a = b = c. Let A = R. Define aRb if and only if abZ. o symmetric: aRa since aa = 0Z. o reflexive: aRb bRa since abZ (ab) = baZ. o transitive: aRb and bRc aRc since (ab)+(bc) Z acZ. 166 Definition: Let R be an equivalence relation on a set A. The set of all elements that are related to an element aA is called the equivalence class of a and is denoted by [a]R. When R is obvious, it is just [a]. If b[a]R, b is called a representative of this equivalence class. Example: Let A = Z. Define aRb if and only if either a = b or a = b. There are two cases for the equivalence class: [0] = {0} [a] = {a, a} if a0. 167 Theorem: Let R be an equivalence relation on a set A. For a,bA, the following are equivalent: 1. aRb 2. [a] = [b] 3. [a] [b] . Proof: 1 2 3 1. 1 2: Assume aRb. Suppose c[a]. Then aRc. Due to symmetry, we know that bRa. Knowing that bRa and aRc, by transitivity, bRc. Hence, c[b]. A similar argument shows that if c[b], then c[a]. Hence, [a] = [b]. Assume that [a] = [b]. Since aA and R is reflexive, [a] [b] . Assume [a] [b] . So there is a c[a] and c[b], too. So, aRc and bRc. By symmetry, cRb. By transitivity, aRc and cRb, so aRb. Lemma: For any equivalence relation R on a set A, UaA[a]R =A . Proof: For all aA, a[a]R. 168 Definition: A partition of a set S is a collection of disjoint sets whose union is A. Theorem: Let R be an equivalence relation on a set S. Then the equivalence classes of R form a partition of S. Conversely, given a partition {Ai | iI} of the set S, there is an equivalence relation R that has the sets Ai, iI, as its equivalence classes. 169 Graphs Definition: A graph G = (V,E) consists of a nonempty set of vertices V and a set of edges E. Each edge has either one or two vertices as endpoints. An edge connects its endpoints. Note: We will only study finite graphs (|V| < ). Categorizations: A simple graph has edges that connects two different vertices and no two edges connect the same vertex. A multigraph has multiple edges connecting the same vertices. A loop is a set of edges from a vertex back to itself. A pseudograph is a graph in which the edges do not have a direction associated with them. An undirected graph is a graph in which the edges do not have direction. A mixed graph has both directed and undirected edges. 170 Definition: Two vertices u and v in an undirected graph G are adjacent (or neighbors) in G if u and v are endpoints of an edge e in G. Edge e is incident to {u,v} and e connects u and v. Definition: The degree of a vertex v, denoted deg(v), in an undirected graph is the number of edges incident with it except that loops contribute twice to the degree of that vertex. If deg(v) = 0, then it is isolated. If deg(v) = 1, then it is a pendant. Handshaking Theorem: If G = (V,E) is an undirected graph with e edges, then e= vVdeg(v) /2 . Proof: Each edge contributes 2 to the sum since it is incident to 2 vertices. Example: Let G = (V,E). Suppose |V| = 100,000 and deg(v) = 4 for all vV. Then there are (4100,000)/2 = 200,000 edges. 171 Theorem: An undirected graph has an even number of vertices and an odd degree. Definition: Let (u,v)E in a directed graph G(V,E). Then u and v are the initial and terminal vertices of (u,v), respectively. The initial and terminal vertices of a loop (u,u) are both u. Definition: The in-degree of a vertex, denoted deg(v), is the number of edges with v as their terminal vertex. The out-degree of a vertex, denoted deg+(v), is the number of edges with v as their initial vertex. Theorem: For a directed graph G(V,E), vVdeg(v) = vVdeg+(v) = E. 172 Examples of Simple Graphs: A complete graph has an edge between any vertex. A cycle Cn is a graph with |V|3 such that the n edges are from {v1,v2}, {v2,v3}, …, {vn,v1}. A wheel Wn is a cycle Cn with an extra vertex with an edge connecting to each vertex in Cn. 173 Definition: A simple graph G = (V,E) is bipartite if V = V1V2 with V1V2 = and every edge in the graph connects a vertex in V1 to a vertex in V2. The pair (V1,V2) is a bipartition of V in G. Theorem: A simple graph is bipartite if and only if it is possible to assign one of two colors to each vertex of the graph so that no two adjacent vertices are assigned the same color. Definition: The union of two simple graphs G = (V,E) and H = (W,F) is the simple graph GH = (VW,EF). 174 Representation: For graphs without multiple edges we can use adjacency lists or matrices. For general graphs we can use incidence matrices. Definition: Let G(V,E) have no multiple edges. The adjacency list LG = {av}vV, where av = adj(v) = {wV | w is adjacent to v}. Definition: Let G(V,E) have no multiple edges. The adjacency matrix AG = [aij] is 1 if {vi,v j} is an edge of G, aij= 0 otherwise. Example: v1 v2 0 1 1 0 v1: v2,v3 1 and L = v2: v1,v4 results in AG = 1 0 0 G v3: v1,v4 . 1 0 0 1 v4 v3 0 1 1 0 v4 : v2,v3 175 T Note: For an undirected graph, AG = AG . However, this is not necessarily true for a directed graph. Definition: The incidence matrix M = [mij] for G(V,E) is 1 when edge ei is incident with v j, mij= 0 otherwise. Definition: The simple graphs G(V,E) and H = (W,F) are isomorphic if there is an isomorphism f: VW, a one to one, onto function, such that a and b are adjacent in G if and only if f(a) and f(b) are adjacent in H for all a,bV. 176 Examples: v1 v2 v1 v2 and are not isomorphic. v4 v3 v3 v4 v1 v2 v1 v2 and are isomorphic. v3 v4 v4 v3 Note: Isomorphic simple graphs have the same number of vertices and edges. Definition: A property preserved by graph isomorphism is called a graph invariant. Note: Determining whether or not two graphs are isomorphic has exponential worst case complexity, but linear average case complexity using the bet algorithms known. 177 Definition: Let G = (V,E) be an undirected graph and nN. A path of length n for u,vV is a sequence of edges e1, e2, …, enE with associated vertices in V of u = x0, x1, …, xn = v. A circuit is a path with u = v. A path or circuit is simple if all of the edges are distinct. Notes: We already defined these terms for directed graphs. The terminal vertex of the first edge in a path is the initial vertex of the second edge. We can define a path using a recursive definition. Definition: An undirected graph is connected if there is a path between every pair of distinct vertices in the graph. 178 Theorem: There is a simple path between every distinct pair of vertices of a connected undirected graph G = (V,E). Proof: Let u,vV such that u v. Since G is connected, there is a path from u to v that has minimum length n. Suppose this path is not simple. Then in this minimum length path, there is some pair of vertices xi=xjV for some 0i<j n. Hence, there is a shorter path from u to v, which is a contradiction. Definition: A connected component of a graph is a connected subgraph of G that is not a proper subgraph of another connected subgraph of G. Note: A connected component is a maximally connected subgraph. 179 Example: Telecoms analyze call graphs routinely in order to provide better, less expensive services. The old AT&T used to publish information routinesly (typically by Bell Labs researchers). One of their recent published graphs G = (V,E) had |V| ~ 54,000,000 with |E| ~ 170,000,000. G had approximately 3,700,000 connected subgraphs. Most of the subgraphs were of size 2 or just slightly larger. However, one was of size approximated 45,000,000 with all of the vertices being connected with less or equal to 20 calls. Note: Sometimes removing a vertex v and all of the edges incident to v produces a subgraph with more connected components that the original graph. The vertx v is called a cut vertex or an articulation point. Definition: A directed graph G = (V,E) is strongly connected if there are paths from both u to v and v to u for all distinct u,vV. G is weakly connected if there is a path between and two distinct vertices in the underlying undirected graph. The maximal strongly connected subgraphs of G are strongly connected components. 180 Theorem: Let G = (V,E) be a graph with adjacency matrix A. The number of different paths of length n from vi to vj, where vi,vjV and nN, is the (i,j) entry in An. Example: v1 v2 0 1 1 0 8 0 0 8 A= 1 0 0 1 and A4 = 0 8 8 0 1 0 0 1 0 8 8 0 v4 v3 0 1 1 0 8 0 0 8 Note: The theorem can be used to find the shortest path between any two vertices and also to determine if a graph is connected. 181 Definition: Let G = (V,E) have an associated weighting function w(u,v): VVR. G is called a weighted graph. The weighted length of a path in G is the sum of the weights for the edges in the path. Example: Let G = (V,E) be a weighted graph where V represents airports. Then some interesting weighting functions include the following between pairs of distinct airports: Distance Flight times Airfares Frequent flier miles Frequent flier qualification miles Note: Weighted graphs are extremely important in analyzing transportation of goods and people and trying to minimize time and expenses. 182 Dijkska’s Algorithm (Shortest Path) – [published in 1959] Procedure Dijkstra( G = (V,E) with w: VVR+. G is a weighted connected simple graph, a,zV: initial and terminal vertices ) for i := 1 to n L(i) := L(a) := 0 S := while zS u := a vertex not in S with L(u) minimal S := S{u} for all vV such that vS if L(u) + w(u,v) < L(v) then L(v) := L(u,v) + w(u,v) { L(z) = length of shortest path from a to z. } 183 Theorem: Dijkstra’s algorithm finds the length of the shortest path between two vertices in a connected simple undirected weighted graph. The algorithm uses O(n2) comparison and addition operations. Traveling Salesman Problem: Find the circuit of minimum total weight in a weighted complete undirected graph that visits every vertex exactly once and returns to its starting vertex. Note: There are n! possible circuits to consider, which is intractable when n is sufficiently large. A tremendous amount of research has been devoted to finding fast approximate solution algorithms. The best ones can produce a circuit of length 1,000 in a few seconds and still be within 2% of the optimum circuit. 184 Definition: A coloring of a simple graph is the assignment of a color to each vertex of the graph so that no adjacent vertices are assigned the same color. Definition: The chromomatic number (G) is the least number of colors needed for a coloring of the graph G = (V,E). Definition: A planar graph is a graph that can be drawn in a plane with no edges crossing in the picture. Four Color Theorem: If G is a planar graph, then (G) 4. Note: The Four Color Conjecture was made in the 1850’s and not proven until 1976. Like Fermat’s last theorem, this theorem became famous partly for how many wrong proofs (some quite ingenious) were either published or submitted for publication. 185 Trees Definition: A tree is a connected undirected graph with no simple circuits. A weighted tree is a tree with weights associated with the edges. Uses: An efficient data structure for searching a list. o Useful in encoding data for transmission. o Computational complexity easily determined for algorithms using trees. Weighted trees have edges with weights. o Useful in decision making. o Used by telecoms to dynamically connect calls cheaply. Historical Note: Trees were first developed in the context of this course to describe molecules in chemistry, where atoms were the vertices and bonds were the edges. 186 Theorem: An undirected graph T = (V,E) is a tree if and only if there is a unique simple path between any two of its distinct vertices. Proof: 1. Assume T is a tree, so it has no simple circuits. Since T is connected, for all distinct u,vV, there is exactly one simple path between u and v. Otherwise, there is another simple path. Combining the two simple paths is a circuit, which is a contradiction that T is a tree. 2. Assume that there is a unique simple path between any two distinct vertices u,vV. The T is connected. T has no simple circuits since then there would be two simple paths between u and v (thus forming a crcuit), which is a contradiction. Definition: A rooted tree is a tree with one vertex designated as the root and every edge is directed away from the root. Note: Any tree can become a rooted tree by picking the right vertex as the root. 187 Terminology/Definitions: Let T = (V,E) be a rooted tree. Then If vV is not a root of, the parent wV of v is a vertex with an edge directed at v and v is a child of u. If viV are children of the same uV, they are siblings. The ancestors viV of uV are any vertices in V except the root which are in the path from the root to u. The descendents viV of uV are all vertices with u as an ancestor. A leaf vV is a vertex with no children. An internal vertice vV has children. A subtree is the subgraph formed from aV and all of its descendents and the edges incident to these descendents. The height of a rooted tree T, denoted h(T), is the maximum number of levels (or vertices). A balanced rooted tree T has all of its leaves at h(T) or h(T)-1. 188 Definition: A m-ary tree is a rooted tree such that every internal vertex has no more than m children. A full m-ary tree is a rooted tree such that every internal vertex has exactly m children. If m = 2, it is a (full) binary tree. Definition: An ordered rooted tree is a rooted tree with an ordering applied to the children of all of the children of the root and the internal vertices. Examples: Management charts Directory based file or memory systems Theorem: A tree with n vertices has n1 edges. The proof is by mathematical induction. Theorem: A full m-ary tree with i internal vertices contains n = mi+1 vertices. Proof: There are mi children plus the root. 189 Theorem: A full m-ary tree with n vertices has i = (n1)/m internal vertices and q = [(m1)n+1]/m leaves. i internal vertices has n = m+1 vertices and q = (m1)i + 1 leaves. q leaves has n = (mq1) / (m1) vertices and i = (q1) / (m1) internal vertices. Theorem: There are at most mh leaves in a m-ary tree of height h. The proof uses mathematical induction. Corollary: If an m-ary tree of height h has q leaves, then h logmq. For a full m-ary and balnced m-ary tree, h = logmq. 190 Definition: A binary search tree T = (V,E) is a binary tree with a key for each vertex. The keys are ordered such that a key for a vertex is greater in value than all keys associated with its left subtree and less in value than all keys associated with its right subtree. The key for vertex vV is denoted by label(v). Note: Recursive algorithms search binary trees for keys in O(loghn) operations for a binary tree of height h and with n vertices. Notation: Let T = (V,E) be a binary tree. Let root(T) be the root vertex in T. Let left_child(v) and right_child(v) refer to the left or right child of a root or internal vertice v in a binary tree. Let add_new_vertex(parent, value) add a new left or right vertex to the parent vertex with a key of value. The details are left intentionally fuzzy. Note: One of the most common operation with a binary tree is to search it. Another is to search a binary tree for a key and add it if it is missing. 191 procedure insertion( T = (V,E): binary tree, x: item ) v := root(T) while v and label(v) x if x < label(v) then if left_child(v) then v := left_child(v) else add_new_vertex(left_child(v), x) and v = else if right_child(v) then v := right_child(v) else add_new_vertex(right_child(v), x) and v = if root(T) = then add_new_vertex(T, x) else if v = or label(v) = then label the new vertex x and set v := the new vertex { v = location of x. } 192 Definition: A decision tree is a rooted tree in which the children are the possible outcomes of their ancestors’ keys. Note: There is usually a weighting associated with a decision tree. The keys may not be unique. Definition: A prefix code is an encoding based on bit strings representing symbols such that a symbol, as a bit string, never occurs as the first part of another symbol’s bit string. Example: We can represent normally a-z in 5 bits and a-zA-Z in 6 bits. Suppose we only have 3 letters: a = 0, c = 10, and t = 11. Then cat = 10011. Wowee! We saved one whole bit!!! Representation: Prefix codes form a binary tree. 193 Example: The prefix code for a = 0, c = 10, and t = 11 is stored as 0 1 a 0 1 c t Definition: A Huffman coding takes the frequency of symbols and is the prefix code with the smallest number of bits. Note: Huffman coding was a course project by a graduate student at MIT in the 1950’s. Needless to say, his professor was stunned. 194 procedure Huffman(ai: symbols, wi: frequencies, 1in ) F := forest of n rooted trees, each with a single vertex ai with weight wi while F tree Replace the rooted trees T and T’ of least weights from F with w(T) w(T’) with a tree T’’ having a new root that has T and T’ as it left and right children. Label the edge to T as 0 and the edge to T’ as 1. Assign w(T) + w(T’) to the new tree T’’ { Huffman encoding tree is complete. } 195 Example: Given {(a,1), (c,2), (t,3)} as (symbol,frequency). What is the Huffman coding? Initial forest (a,1) (c,2) (t,3) Step1 3 (t,3) 0 1 a c Step 2 6 0 1 a 0 1 c t 196 Note: Game trees are another highly studied tree. Definition (Minimax Strategy): The value of a vertex in a game tree is defined recursively as: 1. The value of a leaf is the payoff to the first player when the game terminates in the position represented by this leaf. 2. The value of an internal vertex at an even level is the maximum of the values of its children. The value of an internal vertex at an odd level is the inximum of the values of its children. Theorem: The value of a vertex v of a game tree tells us the payoff to the first player if both players follow the Minimax strategy and play starts from the position represented by vertex v. Notes: Game trees are Enormous (not just slightly, but really, really enormous) Lead to optimal solutions (if you can compute them) Basically intractable using standard computer 197 Note: Tree traversal is extremely important to accessing data. There are many algorithms, each with a plus and a minus. We will study three traversal algorithms: Preorder Inorder Postorder These traversal methods not only are used for data storage, but for representing arithmetic that is useful for compilers. Definition: The universal addressing system is defined recursively for an ordered rooted tree T = (V,E). The root rV is labeled 0 and its k children are labeled 1, …, k. For each vertex vV, labeled Av, its n children are labeled Av.1, Av.2, …, Av.n. 198 Example: Given a tree T = (V,E) with keys ordered 0 < 1 < 1.1 < 2 < 2.1 < 2.2 < 2.2.1 < 2.3, we represent it as 0 1 2 1.1 2.1 2.2 2.3 2.2.1 We will use this example for quite some time. 199 Definition (Preorder Traversal): Let T be an ordered rooted tree with root r. If T consists only of r, then r is the preorder traversal of T. Otherwise, suppose T1, T2, …, Tn are subtrees at r from left to right in T. Then the preorder traversal begins at r and continues by traversing T1 in preorder, T2 in preorder, …, and Tn in preorder. Example: In the tree example at the top of page 199, the preorder traversal order is 0, 1, 1.1, 2, 2.1, 2.2, 2.2.1, and 2.3. Definition (Inorder Traversal): Let T be an ordered rooted tree with root r. If T consists only of r, then r is the inorder traversal of T. Otherwise, suppose T1, T2, …, Tn are subtrees at r from left to right in T. Then the inorder traversal begins by traversing T1 in inorder, then r, and continues with T2 in inorder, …, and Tn in inorder. Example: In the tree example at the top of page 199, the inorder traversal order is 1.1, 1, 0, 2.1, 2, 2.2.1, 2.2, and 2.3. 200 Definition (Postorder Traversal): Let T be an ordered rooted tree with root r. If T consists only of r, then r is the postorder traversal of T. Otherwise, suppose T1, T2, …, Tn are subtrees at r from left to right in T. Then the postorder traversal begins by traversing T1 in postorder, T2 in postorder, …, Tn in postorder, and r. Example: In the tree example at the top of page 199, the postorder traversal order is 1.1, 1, 2.1, 2.2.1, 2.2, 2.3, 2, and 0. Notation: Let add_to_list(v) be a global function to append a vertex v to a list. The list must be initialized to at some point before use. Note: The tree traversal algorithms are all easily defined recursively using a global list that must be initialized first. 201 procedure preorder_traversal( T: ordered rooted tree ) r := root(T) add_to_list(r) for each child c of r from left to right T(c) := subtree with c as its root preorder_traversal( T(c) ) procedure inorder_traversal( T: ordered rooted tree ) r := root(T) if r = leaf then add_to_list(r) else q := first child of r from left to right T(q) := subtree with q as its root inorder( T(q) ) add_to_list(r) for each remaining child c of r from left to right T(c) := subtree with c as its root inorder_traversal( T(c) ) 202 procedure postorder_traversal( T: ordered rooted tree ) r := root(T) for each child c of r from left to right T(c) := subtree with c as its root postorder_traversal( T(c) ) add_to_list(r) Definition: Logic and arithmetic can be rewritten using binary trees. Using inorder, preorder, or postorder traversal of the binary tree is known as infix, prefix, or postfix notation. Note: The best known is prefix notation, otherwise known as reverse Polish notation (RPN). This was used in the first pocket sized electronic calculator, the HP-45 (1972). This notation is valuable in writing compilers, too. See http://glow.sourceforge.net/tutorial/lesson7/side_rpn.html http://www.hpmuseum.org/rpn.htm 203 Examples: Parentheses disappear completely. It is best to think of a RPN calculator as a stack machine where data is in the stack and arithmetic operates on the top elements of the stack. The expression 2+3 is written as 2 3 + in RPN. The expression [(9+3) * (4/2)] - [(3x) + (2-y)] is written as 9 3 + 4 2 / * 3 x * 2 y - + - in RPN, where x and y are numbers. Tree representation: Labels are the operations on internal vertices or the root and values (constants or simple variables) on the leaves. Example: 4 * 3 + 2 in RPN is 4 3 * 2 +, or + * 2 4 3 204 Definition: Let G = (V,E) be a simple graph. A spanning tree of G is a subgraph of G that is a tree containing every vertex in G. Example: Your instructor wants his town, the states of Connecticut and New York, and New York City to keep the roads and highways cleared in of ice and snow connecting his house and Laguardia airport. A graph connecting each of the relevant endpoints and connecting points can be made. The relevant agencies can use this graph when deciding how to keep roads open after a storm. G G G PC PC PC RB RB RB S S S WB LGA WB LGA WB LGA 205 Theorem: A simple graph G is connected if and only if it has a spanning tree T. Example: Multicasting over networks. Note: Constructing a spanning tree can be done in many different ways, including some very inefficient ones. Two common ways are depth first and breadth first searches. Notation: Let visit(v) mean that we keep track of when we first go to vertex v until we return to v using a backtrack. procedure visit( G = (V,E): connected graph, T: tree ) for each wV adjacent to v and not yet in T add w and edge {v,w} to T visit(w, T) 206 procedure depth_first( G = (V,E): connected graph ) T := tree with only some single vV visit( v, T ) { T is a spanning tree. } procedure breadth_first( G = (V,E): connected graph ) T := tree with only some single vV L := v while L Remove first vertex vL for each neighbor wV of v if wL and wT then Add w to the end of L Add w and edge {v,w} to T { T is a spanning tree. } 207 Theorem: Let G = (V,E) be a connected graph with |V| = n. Then either depth first or breadth first takes O(e), or O(n2), steps to construct a spanning tree. Proof: For a simple graph, |E| n(n1)/2. Bactracking applications: Graph coloring: can a graph be colored in n colors n-Queens problem: find places on a nn board so n queens are toothless n Sums of subsets: Given x i i=1 , where xiN, find a subset whose sum is M Web crawlers: search all hyperlinks on a network efficiently 208 Definition: A minimum spanning tree in a connected weighted graph is a spanning tree that has the smallest possible sum of weights on its edges. procedure Pim( G = (V,E): weighted connected undirected graph ) T := minimum weighted edge for i := 1 to |V|2 e := an edge of minimum weight incident to a vertex in T not forming a simple circuit in T if it is added to T T := T with e added { T is a minimum spanning tree. } procedure Kruskal(G = (V,E): weighted connected undirected graph ) T := empty graph for i := 1 to |V|1 e := an edge in G of minimum weight that does not form a simple circuit in T if it is added to T T := T with e added { T is a minimum spanning tree. } 209 Theorem: The cost of Pim’s algorithm is O(|E|log|V|). The cost of Kruskal’s algorithm is O(|E|log|E|). Definition: A graph G = (V,E) is sparse if |E| is very small with respect to |V|2. Comment: Sparse is ill defined intentionally. There are different degrees of sparseness, too (highly sparse, very sparse, somewhat sparse, hardly sparse, not sparse, and the Scottish favorite, a wee bit sparse). Matrices can also be categorized as (fill in the blank type) sparse based on their graphs. Note: When G is sparse, Kruskal’s algorithm is much less expensive than Pim’s algorithm. 210 Boolean Algebra Definition: Let B = { 0, 1 } and Bn = BB…B ( n times). A Boolean variable xB. A Boolean function of degree n is a function f: BnB. Notation: For x,yB, define x+y=xy xy=xy x = ¬x using the logic predicate notation from the class notes (circa pages 5-6). Definition: A Boolean algebra is a set B with binary operators and , the unitary operator ¬, elements 0 and 1, and the following laws holding for all elements of B: identity, complement, associative, commutative, and distributive. 211 Logic gates: Boolean algebra is used to model electronic logic gates, such as AND, OR, NOT, XAND, XOR, … We design functions with Boolean algebras and operators. Then we build them using the right gates and wiring patterns. Typical symbols for AND, OR, and NOT are the following: AND: OR: NOT: These are two input AND and OR gates. Versions of these gates exist for more than two inputs and perform the expected operation on all of the inputs to get one output. Definition: A simple output circuit takes the input(s) and has one output. A multiple output circuit takes input(s) and has multiple outputs. Example: The gates above are simple output circuits. 212 Examples: Most circuits are of the multiple output variety. A half adder adds two bits producing a single bit sum plus a single bit carry: S := (xy) (¬(xy)) = xy and Cout := xy. A half adder has two AND, one OR, and one NOT gates. A full adder computes the complete two bit sum and carry out: S := (xy)cin, where Cin is the incoming carry. The carry is quite complicated: Cout := (xy) + (yCin) + (Cinx). A full adder has two half adders and an OR gate. Ripple adders, lookahead adders, and lookahead carry circuits use many bits as input to implement integer adders. Half adder Full adder 213 Note: Minimizing the Boolean algebra function means a less complicated circuit. Simpler circuits are cheaper to make, take up less space, and are usually faster. Add in how many devices are made and there is potentially a lot of money involved in saving even a small amount of circuitry. There are two basic methods for simplifying Boolean algebra functions: Karnaugh maps (or K-maps) provide a graphical or table driven technique that works up to about 6 variables before it becomes too complicated. The Quine-McCluskey algorithm works with any number of variables. Going to Google and searching on Karnaugh map software leads to a number of programs to do some of the work for you. Definition: A literal of a Boolean variable is its value or its complement. A minterm of Boolean variables x1, x2, …, xn is a Boolean product of the {xi,xi} . Note: A minterm is just the product of n literals. 214 Karnaugh maps: The area of a K-map rectangle is determined by the number of variables (n) and how many (k) are used in a Boolean expression: 2 nk. Common arrangements are 2 variables: 22, 3 variables: 42, and 4 variables: 44. Each variable contributes two possibilities to each possibility of every other variable in the system. K-maps are organized so that all the possibilities of the system are arranged in a grid form and between two adjacent boxes only one variable can change value. Each square in a K-map corresponds to a minterm. Cover the ones on the map by rectangule that contain a number of boxes equal to a power of 2 (e.g., 4 boxes in a line, 4 boxes in a square, 8 boxes in a rectangle, etc.). Once the ones are covered, a term of a sum of products is produced by finding the variables that do not change throughout the entire covering, and taking a 1 to mean that variable and a 0 as the complement of that variable. Doing this for every covering produces a matching function. 215 Given a Boolean function f with inputs x1, …, xn, make a table with all possible inputs and outputs. Then create a K-map with the variables on the left and top sides of the rectangle. Look for 1’s. The rectangle is a torus, so look for wrap arounds, too. Example: f: B4B with a corresponding K-map of x1, x2 00 01 11 10 00 0 0 1 1 x3, 01 0 0 1 1 x4 11 0 0 0 1 10 0 1 1 1 The K-map is colored to try to find patterns in the Boolean expression that can be simplified. It is quite common to eliminate some of the Boolean variables using this approach. Use high quality software if you use the K-map approach. 216 Definition: An implicant is sum term or product term of one or more minterms in a sum of products. A prime implicant of a function is an implicant that cannot be covered by a more reduced (i.e., one with fewer literals) implicant. Note: Suppose f is a Boolean function and P is a product term. Then P is an implicant of f if f takes the value 1 whenever P takes the value 1. This is sometimes written as P f in the natural ordering of the Boolean algebra. Quine-McCluskey: This algorithm has two steps: 1. Find all prime implicants of the function. 2. Use those prime implicants in a prime implicant chart to find the essential prime implicants of the function as well as other prime implicants that are necessary to cover the function. The algorithm constructs a table and then simplifies the table. The method leads to computer implementations for large numbers of variables. Use high quality software if you use the Quine-McCluskey approach. 217