Document Sample

Constraint Propagation 1. Motivating examples (P. Winston) 1.1 Numerical constraint nets. Any set of equations defines a numeric constraint net. In such a net, variables are represented as variable boxes, while numerical operations on those variables are represented as operator boxes (e.g. multiply boxes, adder boxes, etc.). The operator boxes are linked to the variable boxes on which the operator needs to be performed. In a numerical constraint net, values for variables can propagate (or flow) in various directions. Similar to mathematical equations, which do not impose a direction on how the equation is used procedurally, the data-flow in a numeric constraint net is not fixed. In fact, many different procedural behaviors can be attached to a same net. This is essentially one of the main characteristics of constraint solving in general: the way in which various procedural behaviors can operate on a same set of constraint to allow fast propagation of values to determine problem solutions. Consider the constraint net associated to the three equations in the slide. If we assign a value 3000 to the variable box A, this may activate a data-flow through the first multiply box, instantiating the box for C to 3300. On its turn, this value may now be propagated further through the second multiply box, instantiating the box for D to 3630. If instead, we would have started by giving the box for D the value 3630, then this information could have propagated in the inverse direction, instantiating C and A to 3300 and 3000 respectively, by using the multiply boxes as divisors instead of multipliers. Even more complicated propagation can occur, as we will illustrate further on. 1.2 Spreadsheets. Spreadsheets are a (very weak) application of numerical constraint nets. Typically, a spreadsheet can be used in 2 modes. In a first mode, the user identifies relevant constants, variables and equations between the variables occurring in his problem. This can be seen as defining an equation theory, or equivalently, defining a constraint net. In the second mode, often referred to as the ‘what if’-mode, the spreadsheet propagates the available values through the equations in order the compute values for the dependent variables. This is similar, but less powerful, than constraint propagation. In particular, it is weaker because most spreadsheets do not allow propagation to occur in different directions. It is usually impossible in a spreadsheet to assign a value to some field, which is already occupied by an equation. It that would be possible, than, similar to what happened in the constraint nets, the equations might be usefully activated in an inverse 1 direction (e.g. changing multipliers into divisors, as in the previous example) to produce a richer functionality. 1.3 More advanced numerical propagation. Reconsider the constraint net of Section 1.1. Assume that we are given the information that the value for variable A can be either 2000 OR 3000, while the value for variable D can be either 3630 OR 4840. Another type of propagation can now take place. We can select the possible value 4840 for variable D and try to check whether this value allows us to construct a consistent solution for the entire net. Consistency here means that, given the possible values we have put forward for the different variables in the net, there is at least 1 such value for each variable in the net such that all equations become satisfied in combination with the selected value 4840 for D. Again, we propagate the value 4840 for D through the net and we observe that the corresponding value obtained for A should be 4000, which is not one of the possibilities we had for A. Thus, the value 4840 for D cannot be part of a solution of all equations, given the restriction on the values for A. It may therefore be eliminated as a possibility for D. Similarly, the possible value 2000 for A can be propagated through the multiply boxes, allowing us the eliminate this value as a possibility for A, due to inconsistency with the remaining value (3000) for D. At this point, no further value elimination can occur, since both A and D have only 1 possible value left. Propagation of any of these two values will show that they are indeed consistent, and we have reached a solution. 1.4 Initial conclusions. Detecting, representing, exploiting and propagation the natural constraints of a problem you wish to solve is considered as the most crucial part of any form of problem solving, be it in software engineering or in knowledge engineering. In the remainder of this chapter, we will first study constraint propagation problems in more technical detail. We will formally introduce a fairly generic class of constraint problems. Then we will study a collection of methods that can be used to propagate information through such constraints. A first class of methods will be based on backtracking techniques. It will turn out that there is a very rich class of algorithms and techniques, all variants of standard backtracking schemes, which can provide efficient solutions to these problems. Then, we study relaxation techniques, also called arc-consistency techniques. These are usually not complete, in the sense that they are not always sufficient to generate a solution, but they help very much in reducing the possibilities. Then, we study techniques that combine backtracking techniques with relaxation techniques. Such hybrid techniques turn out to be extremely useful and efficient in practice, especially in the context of scheduling, rostering and planning problems. Next we look at some applications. We investigate a problem in computer vision: how to provide a 3-dimensional interpretation of a (2-dimensional) line drawing. It turns out that this is very naturally described as a constraint problem. In particular, the solution that Waltz proposed for this problem formed the basis for the first constraint propagation algorithm ever. It was 2 later generalized to other application domains. We also study a problem in understanding the meaning of natural language sentences. Again, constraint propagation forms the key to success (although we will only cover this application in high-level terms, not identifying the formal representation of it as a constraint problem). As a final conclusive statement, notice that even artificial neural networks can be considered as a form of (numerical) constraint propagation networks. Numerical values in the inputs to the neural network are propagated, using some multiply and adder boxes, over intermediate nodes in the net to output values for the net. However, in the context of neural nets, other issues than the methods of propagation are the important ones. 2. Constraint problem solving: introducing the notions. (Nadel) 2.1 Definition of a constraint problem. On the slide, we give a formal definition of a constraint problem (also called consistent labeling problem). It consists of: a finite set of variables; for each variable: a finite set of possible values for that variable (referred to as the domain of the variable); for each 2 variables: a constraint (or, otherwise stated: a relation) which should hold between the values for those 2 variables. A solution to a constraint problem is a selection of 1 value from the domain of each variable, such that all the constraints are fulfilled with these selected values. The problem we address here is to find efficient methods that produce solutions to constraint problems. In some cases, we may be interested in techniques that compute just 1 solution to such a problem. In other cases, we may want the techniques to generate all solutions. In yet other cases, we may be interested in techniques that computer an ‘optimal’ solution, where optimal means that we are given some additional function, defined on the variables, which should reach its maximum (or minimum) in the computed solution. The stated definition of a constraint problem is not as general as it could be. For instance, we have restricted the attention to ‘binary’ constraint problems. This means: we only consider constraints (relations) between pairs of variables. In principle, there can be many practical problems in which there are natural constraints expressible only in terms of 3 or more variables at once. The reason why we restrict to binary constraints only, is that this will considerably reduce the notational complexity in our discussions, examples and algorithms. However, most of the techniques we will discuss can be extended to more general constraint problems (sometimes straightforwardly, sometimes with more technical difficulty). Another restriction is the one on the finiteness of the domains. This restriction delimits the applicability of the methods that we will study to a specific class of problems: finite domain constraint problems. There are other types. For instance, we could allow the domains of the variables to range over the natural numbers, or the integers, or the 3 rational or real numbers. The techniques that we will discuss in this chapter are often not (easily) extendible to such more general constraint problems. For instance, the backtracking variants that we will study in the next section have no obvious counterpart when moving to infinite domains. One way in which a lot of the techniques we study here can be ported to infinite domains is to reason about finite sets of (possibly infinitely large) intervals. Just to illustrate the concept: consider the 3-equation example again and assume that we know that the value for A must be in the range [2000,3000], while the value for D must be in the range [3630,4840]. Completely similar as what we did in the example before, the bounds on these intervals can be propagated through the constraint net (either from D to A, or from A to D). Propagating the bounds for D backward in the net gives us the resulting interval bounds 3000 and 4000 for A. Comparing these with the given interval [2000, 3000] for A, we notice that only the value 3000 remains consistent. Thus: the interval [3630,4840] contains only 1 point (3630) which is consistent with the possible values for A. We can therefore reduce the possible values for D to just that point: [3630,3630]. 2.2 Examples: q-queens and confused q-queens. We will illustrate most points and techniques on 2 famous toy examples: the q-queens puzzle and the confused q-queens puzzle. Assume you are given a chess board of dimensions q x q, where q is some integer number. The problem is to place a total number of q different queen pieces on this chess board, in such a way that no 2 of these q queens attack each other. Note that 2 queens on some board are said to attack each other if the 2 queens are either on a same row, or on a same column, or on a same diagonal of the board. The slide shows an example of a solution for the q-queens puzzle, for q = 4 (abbreviated from here on as the 4-queens puzzle). The confused q-queens puzzle is completely similar to the previous one, except that we are now looking for placements of q queens on the board, such that every 2 queens do attack each other. Again, some examples of solutions for dimension 4 are given on the slide. This being said, we haven’t actually defined any constraint problems as yet. In particular: we didn’t specify what the variables are, what the domains are and what the constraints are that link them. There are in fact a number of different constraint problems associated to q-queens or confused q-queens, depending on the choice of the variables and the domains. One possibility is to introduce 1 variable for each queen that needs to be placed, and to associate to each of these variables the same domain: namely, the set of all pairs of integers (n,m), where n denotes the row-number and m the column-number of a possible position on the q x q-chess board. Each constraint between two variables zi and zj would then take the form described on the slide. Note that the last line in this constraint expresses that the 2 queens are not on a same diagonal. Although this is a perfectly correct representation, we will use a different, slightly better representation of the problem. To introduce it, note that in the q-queens problem, 4 it makes no sense to try combinations of queen placements in which 2 queens are on a same row. We exploit this in our representation by assigning each queen (and its corresponding variable) to a specific row from the start. Say that zi denotes the variable that is associated to the queen on row i. Then, for each i, we define the domain of zi to be the set {1,...,q}, that is to say: all the possible column-positions for the queen. This representation (and the new form that the constraints take under it) is expressed on the slide. The drawing shows a solution to the problem, which - in this representation - would correspond to z1 = 2, z2 = 4, z3 = 1 and z4 = 3. A main reason for moving to this second representation is that both the domains and the constraints are more easily expressed. A (possibly even more important) second motivation is that the total number of possible queen-placements has been seriously reduced, making the problem more easy to solve. Note that if we also move to this second representation for the confused q-queens problem, then we change the conceptual problem of that puzzle. Indeed, in this second representation, a number of solutions to the original formulation of the puzzle are no longer solutions. Specifically, placing all the queens on a same row is no longer possible in this representation. Still, we will consider this second representation as our formal definition for the confused q-queens problem from now on. As a final comment, the confused q-queens may seem like a trivial and dumb puzzle for humans. Our graphical interpretation of the problem allows humans to come up with the solutions very quickly. The reason why we want to study the puzzle anyway is that: 1) for a general constraint solving algorithm, there is no real big difference between the inequality constraints of the q-queens problem and the corresponding equality constraints in the confused q-queens one. In other words, the amount of search or computation involved (for general-purpose algorithms) should be roughly the same; 2) the q-queens puzzle has a very irregular behavior. Suppose that for q = 11 is might have 7 solutions, then it may very well be that for q = 10 there are some 296 solutions. The amount of solutions behaves disproportional to the dimensions of the problem. This is a highly undesirable feature if we aim to study the efficiency of algorithms working on this problem, because the efficiency will behave irregularly on scale-ups of the problem dimensions. Luckily, confused q-queens does not suffer from this problem. The confused q-queens puzzle always has q + 2 solutions: one for each column, plus the positioning on the 2 main diagonals (see the slide). There is only 1 exception to this: the case in which q = 3. In this case, we get 4 additional solutions (see slide). 2.3 Representing the search. A last item of discussion, before we can address some problem solving methods, is on how we will represent the search for solutions. There are 3 different ways of representing the search space for constraint problems: the OR-tree representation, the network representation and the domain-array representation. 2.3.1. The OR-tree representation. 5 Let z1,…, zn be the variables in our problem, while aij denotes the j-th value in the domain of zi and c(zi,zj) is the binary constraint between zi and zj. We assume that an order on the variables z1,…, zn has been fixed (in particular, assume that it corresponds to the order of their indices). We also fix an order on the values in each domain, say ai1,…,aim. The search space is represented by a tree. The root of the tree has one branch for each value in the domain of z1. The leaves of these initial branches are labeled by the different values in the domain of z1. Then, each of these initial leaves branches once again: again, one branch for each value in the domain of z2. At this point, at every leave of the current tree, we verify the value of the constraint c(z1,z2). The values resulting from the evaluation of this constraint are added as additional labels to these leaves. In particular, the values can be either ‘true’, represented as v (for victory?), or ‘false’, represented as x. Next, all the leaves for which the constraint value is v are further extended. Note that there is no point in extending the leaves with label x, because the values for z1 and z2 on the corresponding branch already violate the first constraint c(z1,z2). Thus, further extending these assignments can never lead to a solution of the entire constraint problem. So, the next step will create branchings for all nodes labeled by v, constructing one branch for each value in the domain of z3. Again, at the resulting leaves, we test the constraints c(z1,z3) and c(z2,z3), which are all the constraints that relate the previously encountered variables with the new variable z3. Again, the next layer will only be built for those leaves for which both results for these constraint tests are v. This process continues up till the variable zn, including constraint test for all the constraints c(z1,zn) up to c(zi-1,zi) in the final leaves. Those final leaves in which all the final constraint tests result in v represent solutions to the problem. It suffices to combine all the values for the variables on the branch leading to such a leaf to get the solution. See the corresponding slide for the general layout. Note that this representation does not fix a search strategy. It is only a representation of the search space that needs to be searched. We can still use various strategies to construct this tree (depth-first, breadth-first, etc.) in actually searching for a solution. But the techniques using this representation will all be of the ‘backtracking-type’. 2.3.2 The network representation. In a second representation, we construct a network. The nodes in the network correspond to the variables in the constraint problem and are labeled by these variables. An additional label placed on each of these nodes, is the domain of the variable. Each two nodes are connected by an arc. These arcs are labeled by the constraint c(zi,zj) which is imposed on the variables zi and zj, labeling the two nodes. The network representation is intended to be used with a ‘relaxation’ or ‘arc consistency’ method to solve the constraint problem. Roughly, relaxation proceeds as follows. Select a value, say aij, in a domain of some variable, say zi. Select also a constraint involving that 6 same variable, say c(zi,zk). Now check whether exist any value in the domain of zk, say akl, such that c(aij, akl) is true. If there does not exist any value in zk’s domain for which this is the case, then the value aij is inconsistent with all the remaining values for zk. Thus: aij cannot occur in a solution of the problem. Then: remove aij from the domain of zi. We can now select a new value in some domain, and a constraint, and continue in the same way as above. The process is continued until no more inconsistent values can be removed from any domain. We return to relaxation or arc consistency later on. 2.3.3 The domain-array representation. This third representation is a syntactic variant of the previous one. For each variable there is an array including all the values of the domain of that variable as elements. For any two arrays, there is an arc connecting the arrays, labeled by the constraint between these variables. It should be clear that this is essentially the same representation as the previous one. Again, relaxation is used as the problem solving technique. 3. Backtracking, backjumping and backmarking (Nadel). 3.1 The basic backtracking scheme. Basic backtracking is the depth-first, left-to-right traversal of the OR-tree representation of the problem. The initial part of the traversal of the tree for the 4-queens problem is drawn on the slide. This initial part is traversed in the usual way: depth-first and backtracking from left to right. The algorithm is shown on the next slide. It is described in a Pascal-like syntax. Backtr(<input>) is a recursive procedure. <input> is a number that corresponds to the depth in the OR-tree that needs to be dealt with next. Initially, Backtr is called with <input> = 1 : we need to construct the backtrack search, starting with the first level in the tree. To that end, we need to construct 1 branch for every possible value in the domain of z1. Thus, we get a ‘For’-loop in which we assign the values a11 to a1n1 to z1 (in that order). The next part of the algorithm (including the ‘While’-loop) checks whether all constraints involving the current value for z1 are consistent with the values already assigned to previous variables (in other words, whether all constraint c(z1,..) hold). Because z1 is the first variable, no checking needs to be done. There are no constraints c(z1,zj), with j < 1. For z1, Consistent will therefore be true after the ‘While’-loop. If 1 is not the last variable, then we will increase the depth-variable to 2 and move on to Backtr(2). At later stages, the ‘While’-loop will check all constraints c(zi,zj), where zi is the variable of the current depth and zj is any previously encountered variable. Only is all checks evaluate to true, Consistent will remain to be true, thus allowing the algorithm to go to the next level in the tree. If the maximal depth is reached, the values for all the variables are returned. 7 There are a number of problems with the standard backtrack algorithm. All of these problems have to do with a notion called ‘trashing’. We explain the notion in the next slides and show how it can be dealt with by changing the algorithm. 3.2 Backjumping. Consider the part of the OR tree traversed by the standard backtrack algorithm for the confused 4-queens puzzle, shown on the next slide. The value for z1 is fixed to 2 for this entire segment of the search space, while we backtrack over the values for z2, z3 and z4. Consider the point in the tree where the algorithm assigns the value 3 to z3. At this point, it has just tried the value 2 for z3 and got some surprising (and extremely informative) results for the consistency tests. So look back at the values it obtained for the constraint-checks for the assignment 2 to z3. It appears that all descending assignments to z4 made the constraints fail. But, the constraints didn’t fail in just any kind of way: there is something special about the failure. Notice that all the tests failed already for the first two constraints, c(z1,z4) and c(z2,z4). The constraint c(z3,z4) wasn’t even tried! This has one very important implication: the reason why all assignments to z4 caused failure of the constraints has nothing to do with the current value of z3! The values of z1 and z2 were already completely incompatible with any possible value for z4. This means that backtracking to try the next value for z3 makes no sense at all. If we backtrack to the next value for z3, without changing the values for z1 or z2, then the same constraint checks for z4 will fail at exactly the same locations as for the previous value for z3. You can see this in the drawing: the tests for c(z1,z4) and c(z2,z4) fail again under the assignment z3 = 4, and at exactly the same locations. Note that we don’t get the same duplicated tests for z3 = 3 because this assignment simply fails even earlier: we don’t even get the chance here to see how it would fail for z4-assignments. In conclusion, we know in advance that the indicated part of the tree represents a completely useless computation. Another, completely similar redundant part of the tree is more to the left of the drawing, again related to assignments to z3. The behavior we notice here is a form of the notion ‘trashing’. Trashing means that, after detecting that a certain assignment fails to produce a solution, you are spending time resetting values for a variable that had nothing to do with the reason for the failure. Evidently, if computation proceeds, the failure will occur again, (at the latest) at the point of occurrence of the previous failure. What we really would want to do in the examples, is not to backtrack to the next value of z3, but to backtrack to a higher level: namely, the last (deepest) level that was involved in the reason for the failure. In both examples shown in the drawing, this last (deepest) level was that of z2. So, the algorithm would need to ‘jump’ back to a higher level. In order to achieve this in an improved version of the backtrack algorithm, we need to keep track of some extra information: the ‘deepest fail-level’ (or here in this text also referred to as the ‘backjump-dept’). This notion is explained on the next slide. Assume we have completely checked all the different possibilities for the assignment of one particular variable, say zi, at one particular location in the tree. Also assume that all the assignments caused failure of one of the constraint checks (so: they all end in an ‘x’). 8 Now, let c(zk,zi) be the deepest of these constraint checks that failed. This means: there is no zj, with j>k, such that c(zj,zi) gave ‘v’. Then, we define the backjump-depth at that point in the tree to be k. Obviously, backtracking will now need to occur, because all alternatives for zi led to failure. When returning to the level of zi-1, we will check whether i-1 is equal to backjump-depth. If it is, we take the next value for zi-1 and proceed. If i-1 > backjump-depth, then we backtrack further to i-2 and check whether this is the backjump-depth. We continue going up the tree, until we reach the backjump-depth, before we restart assigning values. Note that the backjump-dept is trivially smaller than i. On the next slide we show the backjump algorithm. The structure of the algorithm is completely the same as that of Backtr. There are 2 new variables that are used to compute and pass on the backjump-depth. The variable checkdepthk stores the depth at which the constraint checks failed for 1 particular branch (= one particular value) for the current variable. We compute the maximum over all the different checkdepthk values: this is the deepest level at which the constraints failed, considering all the different branches (= all possible values) for the current variable. The variable jumpback records this maximum. It is another name for backjump-depth. Note that if there is a value for which none of the constraint checks fails, then the corresponding checkdepthk is set to the depth of the variable. Also, jumpback is set to the depth of the variable (i-1) in this case. A final change with respect to Backtr is that jumpback is included as an extra argument to the procedure. Note that the declaration ‘VAR’ in front of the argument ‘jumpback’ for BackJ indicates that this argument is (also) an output variable. Any assignment made to that variable in 1 particular evocation of the procedure will be returned to the previous evocation to it. This is used in the statement ‘If jumpback < depth then RETURN’. It means: if we just backtracked to a previous level and jumpback is still smaller than that level, backtrack even more to the next previous level. 3.3 Backmarking. Backjumping solves the trashing problem of backtracking only in part. There is still considerable redundancy left. Consider the example on the confused 4-queens puzzle on the slide. Look at the results of the constraint checks for the assignments to z3 rooted at the assignment z2 = 1 and compare them with the results of the constraint checks for the same assignments to z3 rooted at the assignment z2 = 2. All the outcomes of these checks are identical in these two cases. Even for the results of those same checks, but now rooted at the assignment z2 = 3, we again get the same results. The reason is basically the same as in the problem we discussed for backjumping. At z2 = 1, we get the results: x, v, x, v for the constraint c(z1,z3). Then, we backtrack over z2 and give it the value z2 = 2. Note though that z2 does not occur in the constraint c(z1,z3). Therefore, assigning the values 1,2,3 and 4 again to z3 MUST result in the same results x, v, x, v for the constraint c(z1,z3) again, because the value for z1 has not changed! The same problem occurs in the assignments to z4 (even at different levels). All these tests are 9 redundant, because we could predict the outcome of the checks without computing them again. There are two possible approaches to avoid computing these redundant checks. The first solution is to use tabulation. Tabulation is a general technique that performs ‘lemma generation’ and ‘lemma application’. Lemma generation means that a table is constructed in which all the different constraint checks performed so far are kept as entries, together with the result of the checks. Lemma application means that, when a new constraint check needs to be computed, the algorithm first checks whether this check is already present in the table. If it is, the result from the table is used for the check (without computing the check again). Otherwise, the check is computed and added to the table with its result. Tabling only partly solves our problem. It may improve the search so some extend, but it causes the overhead of checking and building tables. In some cases it may impose high storage requirements. Most importantly: it doesn’t really allow to avoid looking at redundant checks completely. It just avoids to recompute them. Backmarking was introduced as a variant of backtracking to avoid computing these redundant checks completely. Redundant checks of the type illustrated in the drawing will simply be jumped over by the algorithm. It is practically totally time saving (virtually no overhead), while it only requires modest space overhead. The space overhead consists of 2 new variables, both arrays, that need to be maintained. We explain their meaning below. Checkdepth(k,l) is a 2-dimensional array of sizes k: 1->n, where n is the number of variables in the problem, and l:1->M, where M is the maximum of the sizes of the domains of the variables. So, for each variable zk, and for each possible assignment to that variable zk, say akl, Checkdepth will contain a value at its position (k,l). What is the value of this position? It is precisely the value of the variable checkdepthl that we computed in the Backjump algorithm for that particular variable and value. In other words, it is the depth of the deepest constraint check that was performed when we last assigned to value akl to the variable zk. See the slide for a (symbolic) illustration. The depth of checking ak1 reaches only 1, that for ak2 reaches 2, if all check are successful (in the case of the next to last branch in the picture), we get k-1, for the last branch, we again get 2. Backup(k): is a 1-dimensional array of size k: 1->n. It has one entry for each variable. Its value on position k contains the most shallow depth in the tree that we backtracked to since we last visited the values for the variable zk. Now let us see how we can use these two new variables. They have 2 very interesting and practical properties. For the first of these, assume that for some k and l we have that Checkdepth(k,l) < Backup(k). See the slide for an illustration of this. In this situation, first note that the constraint checking for the value akl must have ended in an ‘x’, not in a ‘v’. The reason is 10 that Backup(k) is at most k-1, since we backtracked at least one level. If all constraint checks for akl had been successful (= true or ‘v’), then Checkdepth(k,l) should have been equal to k-1, which contradicts our assumption that Checkdepth(k,l) < Backup(k). Second, because of the inequality, we know that we have not backtracked as deep as any of the variables, which were involved in the checks previously. So: all the results of all the constraint checks we did for akl MUST be the same as they were when we previously checked them. Moreover, there MUST be one final check that still fails. What can we conclude if we assume the inverse of the inequality, Checkdepth(k,l) >= Backup(k)? Look again at the drawing. In this case, the checks for akl were at least as deep as Backup(k). This means that all the checks up till Backup(k)-1 must all have been successful (=true or ‘v’) and also, that they must all still be true on our next visit to akl, because we haven’t backtracked over the variables involved in those checks. These two properties are exploited in the BackM algorithm on the next slide. It is again a slight variant of the Backtr algorithm, but includes keeping track of the two new variables and using the two properties we discussed. We will not go into the details of this algorithm, but only show where and how these properties are used. The first property is used at the very start of the ‘For’-loop. If the first property holds, then we know that assigning this value to this variable will lead to failure (as it did before). Thus, it makes no sense to do anything for this value. In the algorithm, this is expressed by only doing the remainder of the work if the first property does NOT hold. Otherwise, we just go to the end of the ‘For’-loop and pick the next value for this variable. The second property is used to determine the scope of the constraint checking. In the standard Backtr algorithm, we check constraints for all variables with smaller index, starting from z1. Here, we just start from the variable with index Backup(depth) instead. This means that we skip checking the constraints for variables lower than Backup(depth). This makes sense, because property 2 tells us that these constraint checks were true and are still true at this point. The remainder of the changes to the Backtr algorithm all have to do with how to update and how to pass-on the two new variables. We will skip these issues here. Finally note that the algorithm needs to be called with the variables Checkdepth and Backup both completely initialized with 1’s. This makes sure that we are not cutting away any checks at the start of the algorithm. 3.4 Results and discussion. It is hard to compare different optimizations of Backtr because it is always possible to find some application on which one performs better than the other. Nadel compared them on a number of applications that were more or less randomly chosen. Let us here just consider the confused 4-queens puzzle. A table with results is included on the slides. The 2 first rows are of little relevance: they give results for a fragment of the search space. Let us consider the last 2 rows instead. The first of these gives the number of 11 nodes visited by the Backtr, BackJ and BackM algorithms respectively. Note that only the BackJ algorithm saves on visiting nodes. The last row gives the number of constraint checks computed by each algorithm. BackJ doesn’t really explicitly save checks with respect to Backtr, but, because it saves visiting 2 nodes, it saves checking the 21 constraints that corresponded to those nodes in Backtr. The savings by BackM are entirely due to saving checks: 70 in total for this example. On the whole, BackM tends to be the best algorithm for most applications: more checks are saved, only causing relatively low overhead. An interesting question is whether the optimizations of BackJ and BackM could be integrated into 1 single optimized algorithm. Work has been devoted to this and solutions were found. However, the matter is never clear-cut. Because both these algorithms avoid to compute some parts of the search tree, they also avoid computing some of the information that the other algorithm needs to do its optimization. As such, in integrated BackJ-BackM algorithms, you never get the full optimization of either of them, only part of it. In the remainder of this chapter we study some further alternatives in backtracking methods. In particular, we will study Intelligent backtracking and Dynamic search rearrangement. A further alternative that exists and has been used intensively in some knowledge-based systems is ‘Dependency-directed backtracking’ (Doyle). This has mostly been used for search in logical reasoning systems that deal with incompletely represented knowledge (part of the information about the world is unknown). In those cases, it is necessary to perform some form of hypothetical reasoning and the propagation of the effects of forming a hypothesis needs to be traced. Although introduced in a very different context than Intelligent backtracking, Dependency-directed backtracking has essentially the same technical characteristics. Therefore, we restrict our further discussion to Intelligent backtracking only. 4. Intelligent backtracking (Bruynooghe) Intelligent backtracking is a general framework for defining more clever backtracking schemes. It is not one specific algorithm, as BackJ or BackM are. Instead, it gives a general strategy that may give rise to many different algorithms, depending on the choices that are made. The key idea in intelligent backtracking is the notion of the ‘no-good’. A no-good is a set of assignments of values to variables that cannot co-exist in any solution. We will illustrate this with examples below. The use of these no-goods in the approach is roughly as follows: During the construction of the OR-tree, collect, infer and store some no-goods. Later, upon backtracking, use these no-goods to improve the backtrack behavior of the algorithm. Consider the first example in the slides. It represents the state of affairs at a particular time-point in the search for the 8-queens problem. Note that, compared to our earlier representation, the horizontal and vertical axis have been swapped here (without particular reason). The drawing represents a situation in which z1, z2, z3, z4 and z5 have 12 obtained a value. We are in the process of assigning a value for z6. Each line in the drawing represents a no-good. In this case, these no-goods are trivial, because they correspond to 1 specific constraint in the problem, which would be violated is the 2 assignments would occur together. These 8 different ‘basic’ no-goods are written out explicitly on the slide in the form of sets of non-allowed simultaneous assignments. Next, we infer another no-good. Note that z6 needs to obtain a value, and that there are only 8 possible values. Thus, we know that z6 = 1 or z6 = 2 or ... or z6 = 8. Now consider the set of assignments {z1 = 1, z2 = 3, z3 = 5, z4 = 2}. These are all the assignments for the variables z1, z2, z3 and z4 that occur in the basic no-goods listed before. This set is clearly a no-good, because, assuming that all these assignments would exist together, then all the left-hand side assignments in all the basic no-goods would hold. Thus, none of the right-hand side assignments in these basic no-goods are possible. Therefore, z6 cannot obtain a value. How can this no-good be used? Well it essentially deals with the backtracking improvement discussed in backjumping. The no-good states that there is no point in backtracking over the value of z5, because the assignments of z1 to z4 themselves cannot co-exist in a solution. Thus: backtrack to the next value of z4 instead. A second example is shown on the next slide. Again, we have a state of the search in which z1 to z5 have obtained a value and we are at the point of assigning a value to z6. Again, we can derive 8 basic no-goods, due to individual constraints of the problem that would be violated if the 2 assignments would be made together. As before, we have that z6 needs to get one of the 8 possible values. As before, the set {z2 = 1, z3 = 4, z4 = 6, z5 = 3} is a no-good. If they would co-exist, then the left-hand sides of all basic no-goods would already be assigned, thus, there is no more possibility to assign a value to z6. In this case, this no-good can be used later on when we backtrack over the value of z1. If for a later assignment to z1 we again come to a situation where z2 to z5 obtain these same values, then we can backtrack immediately (disregarding z6). In a way, this is a simulation of the effect of backmarking. A bad partial-assignment to the variables is remembered and stored, so that constraints do not have to be checked again when this partial-assignment re-occurs. Note that it is not really like BackM, because it does a form of tabulation. Of course, this only illustrates the concept of intelligent backtracking. The main problems that are left are: Which no-goods should we store? How should we systematically try to infer new no-goods from these? When should we consult and use them? These problems are hard and may be done differently in different types of applications. We will not discuss them here. 5. Dynamic search rearrangement. Yet another enhancement in backtracking techniques is dynamic search rearrangement. Similar to intelligent backtracking, dynamic search rearrangement is not a specific algorithm, but a general strategy. It can be applied to most other backtracking variants. 13 The idea is that we drop the assumption that the variables are ordered in a fixed way. Instead, we will allow to select the order of the variables dynamically during the construction of the tree. What we hope to gain by this is to reduce the size of the tree that needs to be explored. In particular, we aim to reduce the branching factoring and/or cut off failing branches as soon as possible. The underlying principle that will be used to find a good ordering of the variables is the first-fail principle. This principle states that: If the assignment of a value from domain Di to zi is more likely to cause failure than assingment of a value of domain Dj to aj, Then: assign to zi first. The intuition behind this heuristic principle is illustrated in the next slides. The first slide shows a situation in which our guess that assignment to zi would cause failure was correct. Assume that assignment to zj would not lead to (immediate) failure – in the case of the example: Dj has 2 values for zj for which constraints do not fail -. In this case the gain of selecting zi first is obvious from the drawing. On the left, only the values for zi need to be enumerated to detect the failure. On the right, we need to construct a tree with approximately three times more nodes to get to the same conclusion. The next slide shows a situation in which our guess was only partly correct. Namely, assume that assignment to zi doesn’t really result in immediately failure (as we hoped), but that zi allows less values for which the constraints hold than zj. Even in this case we gain. In the drawing, zi allows 1 successful value, while zj allows 3 successful values. Selecting zi first clearly leads to a smaller tree. In the selection on the right, the subtree for zi is repeated three times. In the one on the left, we only do these constraint checks once. Of course, the problem is to decide how to apply the first-fail principle: What are good guesses to whether one variable will lead to failure sooner than another? There are general heuristics to help us on this, as well as application specific ones. On the level of the general heuristics, two basic ones: Select that zi with the smallest domain Di. Because zi has less possible values, it is likely to have less successful values too. The successful branches are a subset of the branches. A second heuristic is: Select that zi which occurs in the highest number of non-trivial constraints. This may seems strange at first sight. We had agreed that a constraint problem would have 1 constraint for each 2 different variables. So, all variables would seem to occur in equally many constraints. However, for most problems, a large number of constraints c(zi,zj) will just be the constraint ‘true’, which always holds. In other words: in most problems there are only constraints between some of the pairs zi, zj. Introducing the constraints c(ck,zl) = ‘true’ for all others is just a matter of getting uniformity in our presentation of the methods. By selecting the variable occurring the largest number of non-trivial constraints, we reduce the chances of that variable getting successful 14 solutions (at least, given that the size of its domain is the same as for the other variables). Apart from these general heuristics, the problem at hand may give you additional information on the crucial (more constrainted) variables. It is important to extract such information from your specification and apply it to control your problem solving method. An interesting question to think about is whether or not a backtracking algorithm augmented with dynamic search rearrangement is actually still complete. What is meant here is: considering that after backtracking to a previous variable you are free to chose a completely different variable than before, are we still considering all the possibilities for assignments? The answer is yes. It is a good exercise to convince yourself of this. As a final comment, apart from reordering the variables to decrease the size of the search space, we can also change the order in which values from the domains are assigned to the variables. Given the domain {1,2,3,4}, we could either take the standard order of assigning them in the order that they occur, we could reverse this order, we assign them in a random order, etc. In some cases, this can again affect the efficiency of the search very much. In the case of q-queens for instance, random order most often improves the speed at which a first solution is found. However, finding good heuristics for this order is usually very difficult. Experimentation with different orders is often the only way to optimize. 6. Arc-consistency or relaxation techniques. The principle of arc-consistency or relaxation was already explained in the subsection on the network representation. We aim to eliminate some values from the domains of certain variables. The way we do it is by verifying that such a value is inconsistent with all the variables in the domain of another variable, under the basic constraint imposed between them. However, there are many ways in which one can use this principle. In fact, there are some 10 (or more?) arc-consistency algorithms around. They are usually referred to as AC1, AC2, AC3, …. Two very simple ones will be illustrated further on. Let us start with an example to illustrate the ideas and methods. The example is the 4-houses puzzle. It is completely described on the slide. In the next slide, we represent the problem as a constraint problem, defining the variables, domains and constraints. Note that this is slightly different from the definition of a constraint problem that we provided earlier: there are constraints here that are defined on only 1 variable. In particular: the constraints that C =/= 4 and D =/=2. We will deal with these in a preprocessing phase that eliminates them, mapping the problem fully into our earlier definition. On the same slide we see the network representation of the problem. Again, there is a slight change, in that constraints defined on 1 single variable are added as extra labels on the nodes. 15 Moving to the next slide, we see how these 1-variable constraints are dealt with. This is done in a phase ensuring ‘node-consistency’ or 1-consistency. For each variable in the problem, we eliminate all the variables in its domain that doesn’t satisfy the constraint on that variable. Concretely, for this example, the value 4 is removed from the domain of C and the value 2 is removed from the domain of D. This gives us a new network representation, which is now completely within the scope of our formal definition of a constraint problem. On the next slide, we show the most trivial arc-consistency (or 2-consistency) algorithm. It is appropriately referred to as AC1 (numbers added to AC tend to increase with the level of refinement of these algorithms). The method was developed by Mackworth, just as the next one that we will see later on. We omit comments on the forward check, the look ahead check and AC1. These should be clear from the slides. There are some comments explaining the forward check and the look ahead check below. AC1 is illustrated on the 4-houses puzzle in the next slides. We need three traversals through the queue. At the third traversal, nothing changes anymore: we have reached a consistent set of domains. Note that the result of AC1 is NOT a solution to the constraint problem. Each domain still contains 2 elements. Although we know that for each of these elements there is a value in other domains making the linking constraints true, we do not know which combination of these values makes ALL the constraints true. In fact, it could very well be that there is no solution to the constraint problem, in spite of the fact that AC1 returns non-empty domains. AC1 ensures local consistency, but not global consistency. Of course, if we now activate a backtracking search on the reduced domains, the search space is much smaller than before, making the problem more easy to solve. The AC1 algorithm is immensely inefficient. At each pass through the ‘Repeat’-loop, all constraints are again checked. There may be no need to do this. If we removed values from the domains of zi, due to checking the consistency of c(zi,zj), there is no immediate reason to reconsider the constraint c(zk,zl). The second algorithm, AC3, is more economic in the way it adds previously visited constraints back to the queue. Initially, the queue contains all the constraints again. In the ‘While’-loop, we again take 1 constraint out of the queue, say c(zi,zj). Again, we remove all values from the domains of zi and zj which are inconsistent with the other domain. Finally, if the domain of either zi or zj has changed, we add all the constraints in which that variable (for which deletions occurred) occurs to the queue. Then, if the queue is not empty, we restart with the current queue. The next slides revisit the 4-houses puzzle. Note that ‘add constraints to the queue’ actually means: check whether the constraint is already in the queue and, if it isn’t, add it. In the concrete version illustrated here, constraints are added to the back of the queue. Note that the algorithm performs much less constraint checks. In particular: AC3 visits 9 constraints in total, while AC1 visits 18 (in this example). 16 In principle, the problem of ‘locality’ of arc-consistency can be reduced. Node-consistency is a very local step: it only considers 1 individual variable and makes sure that the domain of that variable contains only values consistent with the constraint on that variable. Arc-consistency (also called 2-consistency) is still very local: only the relations imposed by individual constraints between 2 variables are enforced on the domains. We could go further. We could pick out 3 variables, say zi, zk and zl, from the problem and consider all the binary constraints that relate these 3 variables. Then, we could define a value aij for zi to be consistent, if there exist values akn and alm for zk and zl, such that all the binary constraints connecting the 3 variables hold. If aij is not consistent in this sense, then aij is removed from the domain. This can of course be further generalized to k-consistency, with k any natural number larger than 0. Note that 4-consistency trivially solves the 4-houses puzzle, in the sense that, if all domains remain non-empty after checking 4-consistency, then the problem has a solution. This does not necessarily mean that the domains returned by 4-consistency would be singletons though. One thing should be observed: computing k-consistency, for k > 2, is very complex. There are no efficient known techniques for doing this. Thus, in practice, people restrict to arc-consistency. 7. Hybrid Backtrack-Consistency techniques. It should be clear by now that neither the backtrack techniques, nor the consistency techniques by themselves are optimal for dealing with these problems. Backtracking usually needs to search very large OR-trees, and – in the absence of consistency methods – if the domains become large, we get exponential behavior rooted on very large branching factors. Remember that with n variables and b values for each variable, the number of nodes in the OR-tree is bn, where b is also the branching factor of the tree. Consistency techniques alone aren’t that powerful either, since, in general, they do not result in a solution. What we need is a combination of backtracking and consistency techniques. We already mentioned before that we could solve a problem by first applying arc-consistency and then apply backtracking on the resulting domains. The alternative, which is more commonly applied, is to combined them the other way around: do a backtrack search, but after each assignment, interrupt the backtrack search to perform a consistency check. In such hybrid backtrack-consistency techniques, the consistency checking is usually reduced to a simpler and less powerful check than AC1 or AC3. The reason for this is that methods like AC1 and AC3 are computationally rather expensive. You do not want to activate such an expensive consistency check after each assignment of a value to a variable. Most likely, the amount of removals of values from domains will not be proportional to the computation cost of the AC activation. We will illustrate 2 hybrid BT-consistency algorithms: forward checking and lookahead checking. There are many more around, but the ones we will study here are very well know and tend to be useful in many different applications. 17 To introduce the forward checking algorithm, let us first discuss the ‘simplified’ consistency technique it relies on. The check is forward check(zi). This check activates every constraint in which zi occurs just once. More specifically, for every constraint c(zi,zj) or c(zj,zi), it removes all the values from the domain Dj for zj which are not consistent with the value ai for zi. Note that this is a very weak and very inexpensive check (compared to AC1 and AC3). Forward checking now works as follows: apply standard backtracking, but, after each assignment of a value ai to a variable zi, apply forward-check(zi). The algorithm is applied to the 4-houses puzzle in the slides. Note how the domains of B, C and D are already very strongly reduced in size after the first assignment to A. Eventually, we end up with an OR-tree with only 9 nodes before the first success is obtained (we only have a few more in the entire tree). Observe also that in this type of algorithm, backtracking occurs as soon as some domain becomes empty. As a final observation, note that the checking of the constraints c(zi,zj) that we did at each level in the definition of the OR-tree has disappeared here. There is no more need for it. Once we assign a value to a variable, then it is already consistent with the values assigned to variables at earlier stages, because the forward-check removes the inconsistent values from all domains of variables that remain to be assigned. As a second hybrid BT-consistency technique, we discuss looking ahead. The consistency check here is look ahead check. Look ahead is more expensive than the forward check, but on the other hand, it does more work. As a result, more values tend to be removed by the check and the branching factor of the backtrack search is further reduced. Lookahead activates every constraint c(zi,zj) of the problem exactly once, and removes all the inconsistent values from Di and Dj for that constraint. The best way to understand this is to look at the AC1 algorithm, but to imagine that it would stop after having traversed the queue just once. Lookahead checking then proceeds as follows: First do a look ahead check. Then, apply standard backtracking, but after each assignment, apply look ahead check. The algorithm is illustrated on the 4-houses puzzle in the next slide. Note that we now have only 6 nodes left in the OR-tree. There is a trade-off here. Forward checking spends little effort at each node, imposing only a very weak form of consistency. This is at the cost of a slightly larger search tree for the backtrack part. Looking ahead checking does more work to get a stronger consistency at each node, with a smaller resulting tree. Which of these is best depends on the specific problem and is very hard to predict. One general heuristic is that, if the constraints are such that the value of one variable constraints very heavily the possible values of the other variable (for instance, the constraint B = A + 1 in the example), then it might be better to apply a stronger consistency check. This may pay off in getting much more removal of domain values. If the constraints are relatively weak, it may be better to only propagate the effect of the last assigned variable to its neighboring variables only (= forward-check). 18 The above techniques were both defined in terms of standard backtracking. In principle it is also possible to combine more advanced backtrack schemes with consistency checking. In particular, the use of dynamic search rearrangement in forward checking or looking ahead is strongly recommended. Because of the dynamic elimination of values from domains, some domains may become much smaller than others at some points in the computation. Of course, it pays off a lot to select such variables first. Also other optimizations, such as picking a good strategy for which values to assign to the variable first, are frequently used in combination with the above methods. The hybrid BT-consistency techniques discussed here are very compeditive to other methods for solving complex combinatorial problems. There are several alternative approaches that we did not discuss here. If the dimensions of the problem are not excessively high and assuming that all constraints are linear equations, a valid alternative is to apply linear programming techniques. Specifically in the context where the solution needs to be optimized in terms of some maximalization or minimalization function, linear programming may be a very good alternative. In the context of non-linear constraints or excessive problem dimensions, the above techniques tend to be only feasible option. Typically problems in scheduling or rostering (think of scheduling of trains, flights, exams, or of building time-tables for the personel of a large company) these techniques are increasingly applied in practice. As a final concluding comment, the selection of the appropriate algorithm (which combination of techniques to apply for a specific application at hand: Should we use forward checking or looking ahead? Should we use dynamic search rearrangement and, if so, with which strategy? Should we order the values in domains in a particular way? Etc.) may seem problematic. The developer cannot be expected to first write his program with a number of these choices in mind and then, if the choices turn out to be bad, re-develop the system again for completely different choices. The answer to this is given by the programming languages that support constraint problem solving. In Constraint Logic Programming languages, the language for defining the constraints in the problem is kept completely separate from the language for selecting the constraint solving method. The constraints themselves are defined in logic formulae, in particular, in Horn clause logic (see other parts of this course). The constraint solving technique is selected with a separate declaration language. As such, it is easy to test one particular problem solving strategy on your problem and, if it is not satisfactory, adapt only some declarations to experiment with another. This separation of the logic and the control is essential for the success of these techniques in the considered application domains. 8. Non-numerical constraint processing. Constraint processing methods are in no way restricted to numerical applications. In some applications, the possible values that variables can take are described as just sets of (possibly symbolic) data. As long as there is some way of accurately describing the constraints that relate the variables, then most techniques studied here are extendible. 19 In particular, we will briefly study some applications in symbolic constraint processing for 3-dimensional interpretation of line drawings and for disambiguation of semantics of natural language sentences at the end of this chapter. One (tiny) example that moves in the direction on non-numerical constraint processing is in theorem proving or logical reasoning systems. Consider the slides on truth-propagation nets. In these nets, variable boxes are related to propositional logic formulas. For each propositional formula, we have an associated variable box. The value of the variable box can be either: unknown (this variable did not receive any value as yet), true or false. Instead of adder boxes or multiply boxes, we now have truth-propagation boxes, which represent the relations between propositional formulas and their sub-formulas. In the example, the truth-propagation boxes are both related to the implication symbol. They connect an implication to its antecedence and consequence. Again, these boxes can be activated in various directions. We can even compute a complete set of propagation rules that give us all cases in which truth-values of some of the connecting boxes allow propagation to others. These are presented in the slide’s overlayer. Note that constraint propagation through truth-propagation nets has been the basis of a widely influential technique for building logical reasoning systems, called ‘truth-maintenance’ (or ‘assumption-based truth-maintenance’) systems. Other examples of non-numerical constraint propagation are illustrated in some additional case studies later on. 9. Bayesian networks and probability nets. In some applications, the variables in your problem representation may describe probabilities with which certain properties hold. In such cases, variable boxes for interdependent properties may be connected by yet another type of propagation box: ‘probability propagation boxes’. In these boxes we express the probability laws that connect the dependent concepts. Depending on the directionality in which the propagation is performed, again, different probability equations may be used to enforce the given probability law. Constraint propagation nets of this type are referred to as bayesian nets or probability nets. On the Geninfer slide, we show one particular application of probability nets. Geninfer is a system that provides advice on the probability that individuals have the hemophilia disease. Hemophilia is a genetically carried disease. It is carried by X chromosomes only. Women have 2 X-chromosomes. If one of these is a hemophilia-defective X chromosome, then the woman is a carrier of the disease, but she does not have any signs of the disease herself. Men have 1 X and 1 Y-chromosome. If they have a hemophilia-defective X chromosome, then they are hemophiliacs. Because every child inherits 1 of its mother’s X-chromosomes, if the mother is a carrier, then there is 0.5 probability that the diseased X-chromosome is carried over to the 20 child. So, for female children: 0.5 probability of becoming carrier; for male children: 0.5 probability of having the disease. The problems given to Geninfer are of the following type. Suppose that for at least one ancestor in a family tree it is known that he/she was/is hemopheliac or carrier. What is the chance that a newly born in the family will have/carry the disease? The propagation is clearly over probabilities. For at least one person in the family tree, it is known that he/she has/carries the disease. Thus, the variable representing the probability for this person has the value 1. Most often, for a number of other people in the family tree it will be known that they do not have/carry the disease. For instance: a grandfather who did not have the signs of the disease. In such cases the corresponding variable has value 0. For yet other people in the tree, it is unknown whether they had/carried the disease or not. The variables corresponding to these people are uninstantiated at first. It is the job of Geninfer to propagate the values from the known variables to probabilities for the unknown variables. As a very simple example, considered in the slide, if the known information is: great uncle diseased, grandfather ok, father ok, then the probability of the grandmother being a carrier is 0.5, of the mother being a carrier is 0.25 and of the child being a carrier/having the disease is 0.125. Much more complex is the propagation when it is also known that there are healthy uncles and brothers of the child. In that case, complex probability rules allow to diminish the probabilities for the grandmother, mother and child (for the child: 0.028 if uncles are ok, nothing known about brothers; 0.007 if both uncles and brothers are ok). 10. An illustration: interpretation of line-drawings (Winston). See the enclosed extract from Winston’s book. 11. An illustration: disambiguation of natural language (Winston). See the enclosed extract from Winston’s book. 21

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 5 |

posted: | 12/5/2011 |

language: | English |

pages: | 21 |

OTHER DOCS BY keralaguest

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.