Integration of Artificial Intelligence and Operations Research for (PowerPoint download)

Document Sample
Integration of Artificial Intelligence and Operations Research for (PowerPoint download) Powered By Docstoc
					     Solving Large-Scale Computational
       Problems Using Insights from
             Statistical Physics

                               Bart Selman
                   Dept. of Computer Science
                       Cornell University



Joint work with Carla Gomes.
    Computational Challenges

Many core computational tasks have been
shown to be computationally intractable,
i.e. solution times scales exponentially with
problem size.

We have results in e.g.
       Reasoning (logical and probabilistic)
       Planning and Scheduling
       Machine Learning
       Hardware and Software Design
           Exponential Complexity Growth
     Planning (single-agent):
  find the right sequence of actions


HARD: 10 actions, 10! = 3 x 106 possible plans


   Contingency planning (multi-agent):
          actions may or may not
         produce the desired effect!
               4 out
         2 out of 8
 1 out    of 9
                       …




 of 10




   REALLY HARD: 10 x 92 x 84 x 78 x … x 2256 =
                                                 exponential
             10224 possible contingency plans!                 3
                                                 polynomial
          Computational Complexity Hierarchy

                                                               Hard
    EXP-complete:
     games like Go,
      “chess”…                                           EXP
  PSPACE-complete:
    QBF, adversarial
                                                     PSPACE
     planning, …

                                                         PH
 NP-complete:
 SAT, scheduling,
graph coloring, …
                                                         NP
  P-complete:
 circuit-value, …
                                                          P
                 In P:
         sorting, shortest path,
        compilers, databases…                                  Easy

Note: widely believed hierarchy; know P ≠ EXP for sure           4
An abundance of negative complexity results in comp. sci.
 (literally 10,000+).
Results often apply to very restricted formalisms, and also to
 finding approximate solutions.

However, results are based on a worst-case analysis and there
 continues to be a debate on their practical relevance.

Contradictory experiences with practical algorithms.

Question: When and where do computationally
 hard instances show up?
           New Developments

A --- A better understanding of the nature of
      computationally hard problems.

B --- New solution methods.
                     Overview
PART A. Computationally Hard Instances
   worst-case vs. average-case
   critically-constrained problems
   phase transitions
       (starts connection with statistical physics)

PART B. New Solution Methods
   Survey Propagation (derived from cavity equations)
   Solution clustering
   More structured problems

Summary
PART A. Computationally Hard Instances

I’ll use the propositional satisfiability problem (SAT)
  to illustrate ideas and concepts throughout this talk.

SAT: prototypical hard combinatorial search and
 reasoning problem.

 More general concept: constraint satisfaction problems.
    Boolean Satisfiability Problem (SAT)
SAT: Set of Boolean variables with domains {0,1} or {true, false}
    with logical constraints between the variables.

k-SAT: All constraints logical “ORs”with exactly
     k variables each.

Example 3-SAT formula/instance:

    F = (Øx Ú y Ú z) Ù ( x ÚØy Ú z) Ù ( x Ú y ÚØz)
                 a                   b                   g
Read as: ((NOT x) OR y OR z) AND (x OR (NOT y) OR z) AND … etc.

Example satisfying assignment: x = False, y = False, z = False.

But, the assignment x = False, y = False, z = True does not satisfy
the formula.
Computational task: Given a k-SAT instance (formula) find an
 assignment that satisfies all constraints or show that no such
 assignment exists.

With N Boolean variables, we have a search space of 2^N.

Complexity class: NP-Complete. (Cook 1971)

$1 million prize for providing a polytime algorithm or showing
  none exists. (P vs. NP. Clay Millenium Prize problem.)

SAT provides a general problem encoding language.
E.g., exam scheduling:
  x_1 for “CS2800 exam @ Mon 7pm”
  x_2 for “CS2800 exam @ Mon 8pm”
  x_3 for “CS2800 exam @ Tue 7pm”

  x1  x2  x3   x1  x2   x1  x3   x2  x3 
5,000+ NP-complete problems identified so far (including,
  scheduling and planning problems, hardware and software
  verification, protein folding, graph coloring, etc.)

All NP-complete problems are fully equivalent from a
 computational perspective.

Key concept: Polynomial time reductions.

Aside: Quantum computer most likely won’t help.
                  Exponential Complexity Growth:
                       The Challenge of Complex Domains

Note: rough estimates, for propositional reasoning
                                                                                                            1M Multi agent
                                      10301,020                                                             5M  systems
                    Case complexity




                                                                                                  0.5M   VLSI
                                      10150,500                                                    1M Verification



                                                                                     100K Military Logistics
                                                                                     450K
                                      106020
                                                                            20K Chess (20 steps deep)
                                                                            100K

No. of atoms                          103010
                                                               10K Deep space mission control
on the earth
                                                               50K
 Seconds until                        1047
heat death of sun                                       100
                                      1030              200
                                                              Car repair diagnosis
Protein folding
  Calculation
(petaflop-year)
                                                  100           10K        20K         100K           1M     Variables
                                                                                                                          12
                                                                                                           Rules (Constraints)
 [Credit: Kumar, DARPA; Computer World magazine]
How well can SAT be solved in practice?
         Generating Hard Random
                Formulas

Generate M clauses uniformly at random.

Critical parameter: ratio of the number of clauses
                    to the number of variables (M/N).

Hardest 3SAT problems at ratio = 4.3
                      Hardness of 3SAT
           4000

                                                50 var
                                                40 var
           3000                                 20 var
DP Calls




           2000



           1000



             0
                  2   3      4      5      6      7      8
                      Ratio of Clauses-to-Variables
             Intuition

At low ratios:
      few clauses (constraints)
      many assignments
      easily found

At high ratios:
      many clauses
      inconsistencies easily detected

In between:
      critically constrained
                         The 4.3 Point
              4000

                                                          50 var
                                                          40 var
                                                          20 var
              3000




DP Calls
              2000



              1000



                0

               1.0

               0.8                                     50% sat
Probability




               0.6

               0.4

               0.2

               0.0
                     2   3       4      5      6      7            8
                          Ratio of Clauses-to-Variables
                         Mitchell, Selman, and Levesque 1991
200 var 3-SAT




                18
          Exact Location of Threshold
Surprisingly challenging problem ...
Current rigorously proved results:
     3SAT threshold lies between 3.42 and 4.506.
     Motwani et al. 1994; Broder et al. 1992; Frieze and Suen 1996;
     Dubois 1990, 1997; Kirousis et al. 1995; Friedgut 1997;
     Beame, Karp, Pitassi, and Saks 1998;
     Bollobas, Borgs, Chayes, Han Kim, and Wilson 1999, 2001;
     Achlioptas, Beame and Molloy 2003; Frieze 2001;
     Kirousis et al. 2006; Achlioptas et al. ’05, 07; and ongoing…

Using techniques from statistical physics: 4.26
    (disordered systems; replica / cavity method;
    energy = num. of violated constraints)
     Monasson and Zecchina ’97; Biroli, Monasson, and
     Weight ’00; Zecchina et al. ’05; )

Empirical: 4.25
   Mitchell, Selman, and Levesque ’92, Crawford ’93.
  Finite-Size Scaling For 3SAT
                                   1.0
                                                          100
                                   0.8               50               20
                                                     40                     12
                                                     24
                                   0.6                                       UNSAT
                                                                              Phase

                                   0.4


Fraction of Formulae Unsatisfied
                                              SAT
                                                                                          ÒSlow DownÓ
                                             Phase
                                   0.2                                                     Transition
                                                                                            for High N
                                    0
                                         3           4           5               6        7
                                                                M/N


                                   1.0

                                   0.8

                                   0.6

                                   0.4

                                   0.2

                                   0.0
                                             -10            0          10            20
                                         Phase Transition for 3-SAT, N = 12 to 100
                                         Data Rescaled Using  c = 4.17,  = 1.5
                                         (Kirkpatrick and Selman, Science, May 1994)
               Finite-Size Scaling For 4SAT
                                   1.0
                                                             65
                                   0.8                  50           24
                                                                             12
                                                                                  UNSAT
                                                                                   Phase
                                   0.6

                                   0.4



Fraction of Formulae Unsatisfied
                                               SAT                                         ÒSlow DownÓ
                                   0.2        Phase                                         Transition
                                                                                             for High N
                                    0
                                         6         8         10         12        14       16
                                                                  M/N


                                   1.0

                                   0.8

                                   0.6

                                   0.4                             Universal Form:
                                                                   e -2-y
                                   0.2

                                   0.0
                                             -20         0              20           40

                                              Phase Transition for 4-SAT, N = 12 to 65
                                              Data Rescaled Using  c = 9.7,  = 1.25
                       Recap

Computationally hard problem instances

Hardest ones are critically-constrained.

Under- and over-constrained ones can be
 surprisingly easy.

Critically-constrained instances at phase-
  transition boundaries.
      Properties of transition can be analyzed with
      tools from statistical physics.
Critically-constrained --- Practical relevance

    Airline fleet scheduling (example, Nemhauser ‘96)
         Delta airlines aircraft scheduling
             heuristic solution: 395 planes
             optimal solution (five months of computation):
                                  394.5 planes 
    Why? Economic factors had driven problem into
      criticality --- at the edge of infeasibility!

    Many real-world computational problems live at the
       phase transition boundaries.
PART B. Algorithmic techniques




                                 24
                      Phase
                                        Random 3-SAT as of 2010
                    transition
         Linear time algs.


          Random Walk


                     DP


                       DP
                          ’
                              GSAT


                              Walksat


                                 SP




                                                                  25
Mitchell, Selman, and Levesque ’92
   Linear time results --- Random 3-SAT

Random walk up to ratio 1.36 (Alekhnovich and Ben Sasson 03).
       empirically up to 2.5
Davis Putnam (DP) up to 3.42 (Kaporis et al. ’02) empirically up to 3.6
       exponential, ratio 4.0 and up (Achlioptas and Beame ’02)
       approx. 400 vars at phase transition
GSAT up till ratio 3.92 (Selman et al. ’92, Zecchina et al. ‘02)
       approx. 1,000 vars at phase transition
Walksat up till ratio 4.1 (empirical, Selman et al. ’93)
       approx. 100,000 vars at phase transition
Survey propagation (SP) up till 4.2
       (empirical, Mezard, Parisi, Zecchina ’02)
       approx. 1,000,000 vars near phase transition

Unsat phase: little algorithmic progress.
      Exponential resolution lower-bound (Chvatal and Szemeredi 1988)


                                                                          26
   Linear time results --- Random 3-SAT

Random walk up to ratio 1.36 (Alekhnovich and Ben Sasson 03).
       empirically up to 2.5
Davis Putnam (DP) up to 3.42 (Kaporis et al. ’02) empirically up to 3.6
       exponential, ratio 4.0 and up (Achlioptas and Beame ’02)
       approx. 400 vars at phase transition
GSAT up till ratio 3.92 (Selman et al. ’92, Zecchina et al. ‘02)
       approx. 1,000 vars at phase transition
Walksat up till ratio 4.1 (empirical, Selman et al. ’93)
       approx. 100,000 vars at phase transition
Survey propagation (SP) up till 4.2
       (empirical, Mezard, Parisi, Zecchina ’02)
       approx. 1,000,000 vars near phase transition

Unsat phase: little algorithmic progress.
      Exponential resolution lower-bound (Chvatal and Szemeredi 1988)


                                                                          27
   Linear time results --- Random 3-SAT

Random walk up to ratio 1.36 (Alekhnovich and Ben Sasson 03).
       empirically up to 2.5
Davis Putnam (DP) up to 3.42 (Kaporis et al. ’02) empirically up to 3.6
       exponential, ratio 4.0 and up (Achlioptas and Beame ’02)
       approx. 400 vars at phase transition
GSAT up till ratio 3.92 (Selman et al. ’92, Zecchina et al. ‘02)
       approx. 1,000 vars at phase transition
Walksat up till ratio 4.1 (empirical, Selman et al. ’93)
       approx. 100,000 vars at phase transition
Survey propagation (SP) up till 4.2
       (empirical, Mezard, Parisi, Zecchina ’02)
       approx. 1,000,000 vars near phase transition

Unsat phase: little algorithmic progress.
      Exponential resolution lower-bound (Chvatal and Szemeredi 1988)


                                                                          28
   Linear time results --- Random 3-SAT

Random walk up to ratio 1.36 (Alekhnovich and Ben Sasson 03).
       empirically up to 2.5
Davis Putnam (DP) up to 3.42 (Kaporis et al. ’02) empirically up to 3.6
       exponential, ratio 4.0 and up (Achlioptas and Beame ’02)
       approx. 400 vars at phase transition
GSAT up till ratio 3.92 (Selman et al. ’92)
       approx. 1,000 vars at phase transition
Walksat up till ratio 4.1 (empirical, Selman et al. ’93)
       approx. 100,000 vars at phase transition
Survey propagation (SP) up till 4.2
       (empirical, Mezard, Parisi, Zecchina ’02)
       approx. 1,000,000 vars near phase transition

Unsat phase: little algorithmic progress.
      Exponential resolution lower-bound (Chvatal and Szemeredi 1988)


                                                                          29
                Survey Propagation (SP)
Mezard et al. 2002. New reasoning / combinatorial search paradigm.

Applies probabilistic reasoning technique for solving combinatorial
  search problems.

Basic idea: Let N be the total number of satisfying assignments.
 N_x+ the number of satisfying assigns with x set to True.
 N_x- with x set to False.


Define: P_x+ = N_x+ / N and P_x- = N_x- / N.


I.e., P_x+ is the probability of seeing x assigned True when
      randomly sampling satisfying assignments.
                               SP, cont.
Consider the following “decimation” strategy:

If P_x+ >= P_x- then set x to True
                      else set x to False.
I.e. set variable to its most likely value.


Simplify instance and repeat, until a satisfying assignment is
  reached.

Sure, but only a physicist would think of such a strategy! 

        [Almost took this out for today’s talk…]



  Since computing the probabilities is believed to be much
  harder (#P-complete) than finding a satisfying assignment
  (NP-complete)…
But, perhaps one can efficiently compute good
 approximations of P_x+ and P_x-

Strategy is to iteratively solve a set of recursive equations. Linear time.

The so-called SP equations are rather involved. They are a form of
 probabilistic reasoning called Belief Propagation (reaching back to
 Bethe ’35) .

Intuitively, the idea is to consider the effect of adding a clause (constraint)
    to a set of clauses.

Example: start with the empty set of clauses over two variables p and q. So,
  P_p+ = P_p- =   ½   and P_q_+ = P_q- = ½.        [all 4 solns equally likely]


Now add a clause (p OR (NOT q)). What happens to P_p+ and P_q+? First
 should go up a bit and the other down a bit…
(p OR (NOT q)) is satisfied by (T, F), (T, T), and (F,F).

So, P_p+ = 2/3 and P_p- = 1/3 and
P_q+ = 1/3 and P_q- = 2/3.

Now consider adding ((NOT p) OR q OR r).
P_p+ should go down a bit. P_q+ and P_r+ up a bit. Etc.

Brute force enumeration quickly infeasible but SP equations model the
 changes in these probabilities directly to capture the addition of
 clauses/constraints.

Clauses and variables interact, so we will have to look for a fixed point
  of a set of coupled recursive equations.
The CNF:




The “Factor” Graph:
 (Graphical Model.
   Bayesian Net)




The equations:
SP is surprisingly effective on hard random k-SAT and graph coloring.

10M var instances with 42M clauses can be solved in linear time (around
  one hour of cpu time; sets batches of variables, never backtracks ,
  finds satisfying assignment!)

Walksat, a biased random walk strategy, is the next best but would
 require 100+ hrs of cpu time.

A formal understanding of SP is has emerged only relatively
     recently.

Zecchina et al. 2004; Wainright et al. 2006; Kroc et al. 2007, 2008.
SP computes marginal probabilities of so-called “covers”. Each
cover represents a cluster of satisfying assignments.

Two satisfying assignments are in the same cluster if you can “flip
variables” to go from one assignment to the other visiting only
satisfying assignments.


Covers are even harder to find than satisfying assignments but SP is
remarkably accurate and fast at computing the marginal probabilities
of variable settings in covers. Even much faster than finding a single
cover!

(Kroc, Sabharwal, and Selman 2007)
Hard random 3-SAT. 5,000 var; 21,000 clauses




                              1.0
  SP Marginal Probabilities



                                                       SP marginals in seconds.
                                                    Cover marginals 100+ hrs (direct
                                                            computation)
                              0.0




                                      0.0                       1.0

                                    True Cover Marginal Probabilities
                                Solution Clusters

                                         Clusters

                                                                                Combinatorial
Statistical                                                                          notions
physics
                 1. High              2. Enclosing               3. Filling
notion
              density regions         hypercubes                hypercubes



                BP for BP           BP for “covers”              BP for Z(-1)


          The original SP          First rigorous            More direct
          derivation from          derivation of SP          (variational)
          stat. mechanics          for SAT                   approach to
                                                             clusters.

          [Mezard et al. ’02]      [Braunstein et al. ’04]   [Kroc et al’ 08]
          [Mezard et al. ’09]      [Maneva et al. ’05]       [Kroc et al’ 09]
                                   [Kroc et al. ‘07]
                                                                                         38
             Representing Soln. Clusters
Clusters are subsets of solutions, possibly exponential in size
     Impractical to work with in explicit form
To compactly represent clusters, we trade off expressive power for
  shorter representation
     Will loose some details about the cluster, but will be able to
       work with it.

We will approximate clusters by hypercubes “from outside” and
  “from inside”.
                                             010         110
       • E.g. with  = {0,1},
                                                               y = (1)
         y = (1) is a                011         111

         2-dimensional hypercube
                                             000         100
         in 3-dim space
                                       001         101


    From outside: The (unique) minimal hypercube enclosing the whole
     cluster.
    From inside: A (non-unique) maximal hypercube fitting inside the cluster.
                                                                           39
                   Factor Graph for Clusters
To reason about clusters, we seek a factor graph representation
     We can do approximate inference on factor graphs
     Need to count clusters with an expression similar to Z for
       solutions:


                                                      = 1 iff x is a solution
   We derived an approximating expression for # of clusters:




                                                              Checks whether all points
       # (y) counts the number of  elements of y            in y are good

      Exactly counts clusters under certain conditions
      Inclusion / exclusion style expression.
      [Kroc, Sabharwal, Selman ’08]
                                                                                 40
            Formal Results for Z(-1)
On what kind of solution spaces does Z(-1) count clusters
 exactly?

Theorem: Z(-1) is exact for any 2-SAT problem.
Theorem: Z(-1) is exact for a 3-COL problem on G, if every
  connected component of G has at least one triangle.




               Any connected
                   graph




Theorem: Z(-1) is exact if the solution space decomposes into
  “recursively-monotone subspaces”.
                                                                41
   Empirical Results: Z(-1) for SAT
Random 3-SAT, n=90, =4.0   Random 3-SAT, n=200, =4.0
One point per instance      One instance
                            One point per variable




                             Remarkable fit for Z(-1)




                                                         42
    Empirical Results: Z(-1) for SAT
Z(-1) is accurate even for many structured formulas
  (encoding real-world problem):




   [Kroc, Sabharwal, Selman ‘09]
                                                      43
                      SP = BP for Z(-1)
Need to efficiently evaluate Z(-1) to count clusters:



     This expression is in a form that is very similar to the standard
      partition function of the original problem, which we can
      approximate with BP.
Z(-1) approximated with BP style eqs. (variational method derivation):

  The BP(-1) iterative equations:                             The black part is BP




    For SAT: BP(-1) is equivalent to SP
    For COL: BP(-1) is different from SP (possibly better)               44
          SP, final observation


The use of probabilistic techniques for solving
 SAT problems provides an intriguing
 alternative to the existing two main search
 paradigms:

 (1) complete, backtrack search, and
 (2) local search.
                                        Random 3-SAT

Linear time algs.
                                      5.19

Random Walk                       5.081         Upper bounds
                                                by combinatorial
          DP                    4.762              arguments


            DP                  4.643
               ’
                   GSAT        4.601


                    Walksat   4.596


                       SP     4.506




                                                            46
  Physics contributing to computation
80’s --- Simulated annealing
   General combinatorial search technique, inspired by physics.
   (Kirkpatrick et al., Science ’83)


90’s --- Phase transitions in computational systems
   Discovery of “physical phenomena” (e.g. 1st and 2nd
   order transitions) in computational systems.
   (Cheeseman et al. ’91; Selman et al. ’92);
   Explicit connection to physics:
   Kirkpatrick and Selman, Science ’94 (finite-size scaling);
   Monasson et al., Nature ’99. (order of phase transition))


’02 --- Survey Propagation
   Analytical tool from statistical physics leads to powerful
   algorithmic method. (Mezard et al., Science ’02).

More expected!
    Capturing Problem Structure

Results and algorithms for hard random k-SAT
problems have had significant impact on
development of practical SAT solvers. However…

Next challenge: Dealing with SAT problems with
   more inherent structure.




                                                 48
    I) Mixtures: The 2+p-SAT problem
Motivation: Most real-world computational
   problems involve some mix of tractable
   and intractable sub-problems.


Study: mixture of binary (2-SAT) and ternary
      clauses (3-SAT)
      p = fraction ternary
      p = 0.0 --- 2-SAT / p = 1.0 --- 3-SAT
Note: 2-SAT can be solved in linear time; 3-SAT NP-complete.


What happens in between?


                                                               49
            Phase Transition for 2+p-SAT




We have good approximations for location of thresholds.
  (Monasson, Zecchina, Kirkpatrick, Selman, Troyansky, Nature 1999.)
                           Computational Cost: 2+p-SAT
                       Tractable substructure can dominate!




                                > 40% 3-SAT --- exponential scaling
         Medium cost




                                          Mixing 2-SAT (tractable)
                                       & 3-SAT (intractable) clauses.




                                                 <= 40% 3-SAT --- linear scaling


(Monasson et al. 99; Achlioptas ‘02)       Num variables
                Results for 2+p-SAT


p < = 0.4 --- model behaves as 2-SAT
               search algorithm “sees” only binary constraints
               smooth, continuous phase transition (2nd order)

p > 0.4 --- behaves as 3-SAT (exponential scaling)
             abrupt, discontinuous transition (1st order)


Note: problem is NP-complete for any p > 0.




                                                                 52
                  Observation

In a worst-case intractable problem --- such as 2+p-SAT ---
having a sufficient amount of tractable problem substructure
(possibly hidden) can lead to provably poly-time --- in fact
linear --- average case behavior.

Conjecture: Our world may be “friendly enough” to make
many typical computational tasks poly-time --- challenging the
value of the conventional worst-case complexity view in CS.
II) Backdoors to the real-world
                                   Backtrack search




  Observation: Complete backtrack search SAT
  solvers (e.g. DPLL) display a remarkably wide range of run
  times.

  Even when repeatedly solving the same problem instance; variable
  branching is choice randomized.

  Run time distributions are often “heavy-tailed”.

  Orders of magnitude difference in run time on different runs.



 (Gomes et al. ’00; ‘04)                                             54
                         Heavy-tails on structured problems
50% runs:
solved with
1 backtrack
     Unsolved fraction


                                                                 10% runs:
                                                                 > 100,000
                                                                 backtracks




                          1
                                  Number backtracks (log)   100,000
           Eliminating Heavy Tails:
            Randomized Restarts
Solution: randomize the backtrack strategy
      Add noise to the heuristic branching (variable
       choice) function
      Cutoff and restart search after a fixed number of
       backtracks

Eliminates heavy tailed behavior.

In practice: rapid restarts with low cutoff can dramatically
  improve performance


Exploited in many current SAT solvers combined
with clause learning and non-chronological backtracking.
(Chaff etc.)

                                                               56
Sample Results Random Restarts

                                              3
                         Deterministic       R
  Logistics Planning       108 mins.     95 sec.
  Scheduling 14             411 sec      250 sec
  Scheduling 16               ---(*)      1.4 hours
  Scheduling 18               ---(*)     ~18 hrs
  Circuit Synthesis 1         ---(*)     165sec.
  Circuit Synthesis 2         ---(*)     17min.

   (*) not found after 2 days


                                                      57
               Formal Model Yielding
               Heavy-Tailed Behavior
T - the number of leaf nodes visited up to and including
  the successful node; b - branching factor



                                P[T bi ](1 p) pi i 0
                                          (heavy-tailed distribution)



                                            p = probability wrong
                                              branching choice.



     2^k time to recover
   from k wrong choices.


b=2
Intuitively: Exponential penalties hidden in backtrack search,
consisting of large inconsistent subtrees in the search space.

But, for restarts to be effective, you also need
short runs.




   Where do short runs come from?
               Explaining short runs:
              Backdoors to tractability

Informally:

A backdoor to a given problem is a subset of the variables such
that once they are assigned values, the polynomial propagation
mechanism of the SAT solver solves the remaining formula.

Formal definition includes the notion of a “subsolver”:
  a polynomial simplification procedure with certain general
  characteristics found in current DPLL SAT solvers.


Backdoors correspond to “clever reasoning shortcuts” in the
search space. (Gomes et al. ’04, ’08)
    Backdoors can be surprisingly small:




Most recent: Other combinatorial domains. E.g. graphplan planning,
near constant size backdoors (2 or 3 variables) and log(n) size
in certain domains.

Backdoors capture critical problem resources (bottlenecks).
            Backdoors --- “seeing is believing”
                                                Constraint graph of
                                                reasoning problem.
                                                One node per variable:
                                                edge between two variables
                                                if they share a constraint.




                    Logistics_b.cnf planning formula.
             843 vars, 7,301 clauses, approx min backdoor 16
                   (backdoor set = reasoning shortcut)
Visualization by Anand Kapur.
Logistics.b.cnf after setting 5 backdoor vars.
After setting just 12 (out of 800+) backdoor vars – problem almost solved.
Another example




 MAP-6-7.cnf infeasible planning instances. Strong backdoor of size 3.
                        392 vars, 2,578 clauses.
  After setting 2 (out of 392) backdoor vars. ---
reducing problem complexity in just a few steps!
Last example.




 Inductive inference problem --- ii16a1.cnf. 1650 vars, 19,368 clauses.
                          Backdoor size 40.
After setting 6 backdoor vars.
 Some other intermediate stages:




After setting 38 (out of 1600+)
        backdoor vars:

So: Real-world structure
 hidden in the network.
  Can be exploited by
  automated reasoning
        engines.
But… we also need to take into account the
cost of finding the backdoor!

We considered:
   Generalized Iterative Deepening
   Randomized Generalized Iterative Deepening
   Variable and value selection heuristics
     Size
   backdoor




n = num. vars.
k is a constant
                  Current
                  solvers
Dynamic view: Running SAT solver
    (no backdoor detection)
SAT solver detects backdoor set
                           Summary
Considered complexity of the Boolean Satisfiability (SAT) problem
           The prototypical NP-complete problem

Hardest instances occur at phase transition boundaries
            Instances go from satisfiable to unsatisfiable

Tools from statistical physics (disordered systems) provide new
            insights into these (computational) phase transitions.
            (e.g. 1st vs. 2nd order transitions)

Work led to the Survey Propagation method, the fastest current
             algorithm (1M+ vars).
             A computational interpretation of the cavity method

Insights into highly structured problems: backdoor variable sets

“Self-referential” thought of the day:
The design of this laptop and presentation software was done using a

                              SAT solver
               using techniques discussed in this talk! 
The end

				
DOCUMENT INFO
Categories:
Tags:
Stats:
views:7
posted:6/1/2012
language:English
pages:75