VIEWS: 4 PAGES: 21 POSTED ON: 12/4/2011
Dynamic programming Some divide and conquer algorithms are awkward or inefficient to implement recursively. For example, computing Fibonacci numbers and binomial coefficients from their recursive definitions can take exponential time (to add 0's and 1's to get exponentially large function values). The inefficiency in these cases arises from the existence of shared subproblems that are solved repeatedly. Another class of typical examples are those that require constructing an optimal binary tree with a given (inorder) traversal. These examples include: optimal multiplication of chains of matrices optimal BSTs (both in Ch. 15, CLRS) parsing (in HMU, Sec. 7.4.4) In this class of problems, one can identify a small class of subproblems as the only subproblems that can occur. And the cost and form of an optimal solution can be determined quickly by knowing the optimal cost and form of solutions to the possible subproblems. This property (cf. (ii) of K&T, p. 260) is called the principle of optimality or the optimal substructure property. In our three sample problems, the "form" of a solution means the shape of the binary tree. Other problems have solutions of different forms. For example, for the problem of finding the shortest string accepted by each of a given set of n DFAs, the solution has the form of a string. This problem fails to satisfy the principle of optimality. For problems that do satisfy this principle, the dynamic programming strategy can often give an efficient solution algorithm. Here the subproblems are identified in advance, solved bottom-up (i.e., smallest to largest), and the solutions stored in a table. Working bottom-up means that for each subproblem to be solved, solutions to its subproblems are already present in the table. In the Fibonacci case, the table is just a 1-dimensional array. In the binomial case, the table is the 2-dimensional Pascal's triangle. In the binary tree examples, the table is also 2-dimensional. Here, entry (j,k) corresponds to the subproblem extending from position j to position k in the traversed tree. It's possible to retain the top down approach to divide and conquer algorithms, and construct the table on the fly as needed. This strategy is called memoization. Table entries are typically numeric. For optimization problems, these entries are costs. A parallel table typically is used for solution structures. Its entries give links to the substructures needed to build the solution (or say that the solution is atomic and has no substructures). One dynamic programming algorithm that is useful in text editing and computational biology is described in Section 6.6 of Kleinberg & Tardos. It relates to the problem of finding an optimal matching of two strings, such as "CS 49C" and "CS47". By a matching we mean a mapping between indices of the first string and indices of the second string, such that if i maps to j and i' maps to j' and i<j, then i' < j'. In order to allow for strings of different lengths, we allow indices to remain unmapped. So for example, one matching of our two sample strings is given below. C S 4 9 C C S _ 4 7 _ This matching has two unmapped indices and one mismatch, and is in fact optimal if we assign each unmapped index and each mismatch a penalty of 1. The algebraic condition on matching is intended to prevent "crossing". That is, we don't want "CS 49" and "CS 94" to have a zero-cost matching that maps the 9's to each other and the 4's to each other. Construction of a dynamic programming algorithm begins with the observation that in a optimal matching of strings of length m and n, either symbol m is mapped to symbol n, or symbol m of the first string is unmapped, or symbol n of the second string is unmapped This observation induces a recurrence relation on OPT(i,j), where OPT(i,j) is defined as the minimum cost of an alignment between the prefix of length i of the first string, and the prefix of length j of the second string. With the penalty function used above, the recurrence is just OPT(i,j) = min {ij + OPT(i-1,j-1), 1 + OPT(i-1,j), 1 + OPT(i,j-1)}, where ij = 0 if characters i and j of the appropriate strings agree, and 1 otherwise. Here the 2nd and 3rd terms in braces correspond to unmapped indices. The 1st corresponds to a match or to a mismatch. A more general penalty function can be used, and is used in Section 6.6. Computing OPT(i,j) requires O(1) time, since a bottom-up approach guarantees that function values are available as needed for smaller indices. So finding the minimum matching cost OPT(m,n) takes time O(mn). But what if the optimal matching itself is wanted, and not just its cost? Recall that here we use a parallel table. This table may also be computed in time O(mn). Entries in this table record which of the three terms inside the braces gave the minimum value. The actual matching could be constructed top- down from these table entries, beginning with entry (m,n). Sometimes dynamic programming won't give an efficient algorithm. Consider the zero-one version of the knapsack problem -- where we have to take all or none of each item, so that the amount xi taken of item i is either 0 or 1. If there are n items and the capacity is W, then the recurrence below will give the solution V[n][W]. V[j][k] = max {V[j-1][k], vj+ V[j-1][k-wk]} for j>0 and k>0 V[0][k] = 0 V[j][0] = 0 Unfortunately the time complexity (nW) needn't be polynomial in n.