VIEWS: 3 PAGES: 13 POSTED ON: 1/13/2012 Public Domain
CS C341 / IS C361 Data Structures & Algorithms 9/28/2010 ALGORITHM DESIGN TECHNIQUES Sundar B. Dynamic Programming - Examples - Matrix-Chain Multiplication - Sequence Alignment CSIS, BITS, Pilani 1 EXAMPLE – MATRIX-CHAIN MULTIPLICATION 9/28/2010 Consider the following expression: M1 * M 2 * M 3 Sundar B. where Mj is a matrix of dimensions pj-1 * pj for j = 1 to 3 Matrix Multiplication is associative i.e. (M1 * M2 ) * M3 = M1 * (M2 * M3) Exercise: Prove this. CSIS, BITS, Pilani Hint: Use Induction. Then the above expression can be evaluated either as (M1 * M2 )* M3 using (p0 * p1 * p2 ) + (p0 * p2 * p3 ) scalar multiplications or as M1 * (M2 * M3) 2 using (p1 * p2 * p3 ) + (p0 * p1 * p3 ) scalar multiplications EXAMPLE – MATRIX-CHAIN MULTIPLICATION 9/28/2010 Consider the following generalized expression: M1 * M 2 * … * Mn where Mj is a matrix of dimensions pj-1 * pj for j = 1 to n Sundar B. Problem: Given the above expression, how do we minimize the number of scalar multiplications? This depends on the way the expression is parenthesized CSIS, BITS, Pilani (which determines the order of evaluation) Definition: Given a chain (M1 * M2 * … * Mn ) of n matrices, where for j = 1 to n, Mj is a matrix of dimensions pj-1 * pj find the optimal parenthesization i.e. the parenthesization resulting in minimal number of scalar 3 multiplications EXAMPLE – MCM - BRUTE FORCE SOLUTION Algorithm BF_MCM: 9/28/2010 Find all possible parenthesizations For each possible parenthesization, count the scalar multiplications required. Sundar B. Find the minimum among these counts. Time Complexity: O(Par(n)) where Par(n) is the number of possible CSIS, BITS, Pilani parethesizations. Each parenthesization is a parse tree: * * M1 * (M2 * M3 ) * M3 M1 * M1 M2 (M1 * M2 ) * M3 M2 M3 4 EXAMPLE – MCM - BRUTE FORCE SOLUTION 9/28/2010 What is Par(n)? A chain of n matrices can be split between the kth and (k+1)st Sundar B. matrices for any k =1,2, …n-1; Then the sub-chains can be parenthesized independently Thus Par(n) = 1 if n=1 CSIS, BITS, Pilani ∑k=1 to n-1 Par(k) * Par(n-k) if n>=2 Par(n) grows at the same rate as B(n) where B(n) is the number of different binary trees with n nodes and B(n) = Ω (4n / n3/2 ) // see Problem 12-4 in Cormen et. al. Conclusion: Time taken by BF_MCM is exponential in n. 5 EXAMPLE – MCM – OPTIMAL SUB-STRUCTURE 9/28/2010 Let Mi..j denote the result of the product Mi * Mi+1 * … Mj An optimal parenthesization splits the chain between Mk Sundar B. and Mk+1 for some k, where 1<=k<n. The resulting parenthesizations for the subchains must be optimal for the respective subchains. Why? CSIS, BITS, Pilani i.e. optimal substructure property holds for MCM. Hence MCM is a candidate for Dynamic Programming. 6 EXAMPLE – MCM - RECURRENCE 9/28/2010 Let m[i,j] be the minimum number of scalar multiplications required for computing Mi..j Sundar B. Then m[1,n] is the required value (to be computed). m[i,j] can be defined recursively as follows: m[i,j] = 0 if i=j, m[i,j] = mini <= k < j { m[i,k] + m[k+1,j] + pi-1 * pk * pj } if i<j CSIS, BITS, Pilani 7 EXAMPLE – MCM – DP SOLUTION Recurrence: (for j-i > 0) m[i,j] = mini <= k < j { m[i,k] + m[k+1,j] + pi-1 * pk * pj } 9/28/2010 DP_MCM(P,n) // p[i-i]*p[i] is the size of matrix Mi, 0<i<=n { Sundar B. for (i=1; i<n; i++) m[i,i] =0; for (l=2; l<=n; l++) // l is length of the sequence i..j … CSIS, BITS, Pilani return m ; } 8 EXAMPLE – MCM – DP SOLUTION Recurrence: (for j-i > 0) m[i,j] = mini <= k < j { m[i,k] + m[k+1,j] + pi-1 * pk * pj } 9/28/2010 DP_MCM(P,n) // p[i-i]*p[i] is the size of matrix Mi, 0<i<=n { for (i=1; i<n; i++) m[i,i] =0; Sundar B. for (l=2; l<=n; l++) // l is length of the sequence i..j for (i=1; i<=n-l+1; i++) { j = i+l-1; m[i,j] = MAX_INT; CSIS, BITS, Pilani for (k = 1; k<j; k++) { q = m[i,k] + m[k+1,j] + p[i-1]*p[k]*p[j]; if (q < m[i,j]) then m[i,j] = q; } This procedure computes the minimal number } of scalar multiplications required. return m; •How do we get the parenthesization that } results in the minimal number of scalar 9 multiplications? EXAMPLE – MCM – DP SOLUTION Recurrence: (for j-i > 0) m[i,j] = mini <= k < j { m[i,k] + m[k+1,j] + pi-1 * pk * pj } 9/28/2010 DP_MCM(P,n) // p[i-i]*p[i] is the size of matrix Mi, 0<i<=n { for (i=1; i<n; i++) m[i,i] =0; Sundar B. for (l=2; l<=n; l++) // l is length of the sequence i..j for (i=1; i<=n-l+1; i++) { j = i+l-1; m[i,j] = MAX_INT; CSIS, BITS, Pilani for (k = 1; k<j; k++) { q = m[i,k] + m[k+1,j] + p[i-1]*p[k]*p[j]; if (q < m[i,j]) then m[i,j] = q; s[i,j] = k; // the point of split for i..j } } return (m, s) ; 10 } EXAMPLE – SEQUENCE ALIGNMENT 9/28/2010 Consider the following DNA sequences: GACGGATTAG and GATCGGAATAG Sundar B. They are very similar: GA?CGGATTAG GATCGGAATAG CSIS, BITS, Pilani ? denotes a missing element In this example, The two sequences differ in 2 positions Problem: Given two sequences determine the best alignment (i.e. the alignment minimal difference) 11 Sub-problem: Compute a similarity score EXAMPLE – SEQUENCE ALIGNMENT 9/28/2010 What is an alignment? An alignment is defined as the insertion of spaces in arbitrary Sundar B. locations in either sequence, so that they end up with the same size but no space in one sequence should align with a space in the other CSIS, BITS, Pilani Given an alignment a similarity score can be assigned: Each column receives a certain value: Identical characters: +1 (match) Different characters: -1 (mismatch) (One) Space in the column: -2 The similarity score is the sum of values of all columns 12 EXAMPLE – SEQUENCE ALIGNMENT 9/28/2010 Given sequences s and t, we determine alignment scores between arbitrary prefixes: i.e. between s[1..i] and t[1..j] Sundar B. Aligning between s[1..i] and t[1..j] is : Align s[1..i] with t[1..j-1] and match a space with t[j] Align s[1..i-1] with t[1..j-1] and match s[i] with t[j] Algin s[1..i-1] with t[1..j] and match s[i] with a space So, recurrence for similarity scores is: CSIS, BITS, Pilani sim(s[1..i],t[1..j]) = max { sim(s[1..i],t[1..j-1])-2 sim(s[1..i],t[1..j] + ((s[i]==t[j]) ? +1 : -1) sim(s[1..i-1],t[1..j])-2 } Exercise: Write the DP algorithm for computing the similarity score. 13