Learning as search.docx - by liwenting


									Learning as search
How does the pac-man baddy get it’s way to you effectively?
How do you decide on your next move in chess?
How do you solve a rubrik’s cube?
How do you quickly find the quickest/shortest route between two places?
How do you navigate a maze?
How do you solve the logic puzzles we gave you?
How do you solve a sudoko puzzle?
How do you write code to animate a character’s actions in a game or film?

The answer for a human is probably to use a mixture of rules-of-thumb, logic, all sorts.
We’re amazing, and do all sorts of things without really thinking about how we do them.
However more and more we want to be able to automate this kind of problem-solving, so that
people don’t have to do things that are Dull (repetitive, the same kind of thing day after day,
or even second by second), Dirty or Dangerous. But before we can get a machine to do it,
we have to be able to tell it what to do, and we don’t want to have to write custom-made
programs from scratch every time. So that’s where the AI bit comes in.

This raises a whole bunch of questions about how humans solve problems, are whether some
of these ways are the results of our physics (we only have one body, can only be in one place
at a time, can’t hold lots of things in memory at once). This line of thinking suggests that
there might be a range of alternative strategies we could apply, but the important thing if we
want to be able to apply AI quickly and easily to a range of problems is that have a general
framework. Luckily, such a framework exists, and is called “state space search”.

Crudely put, the idea is that what you know about a given problem at any point in time can be
thought of as the “state “ of your system. We can place some conditions on what we want to
achieve, and when we have met those conditions we are in our “goal” state. We can usually
place some restrictions on things that we can do to move from one state to another. How we
navigate between states from the starting position, to the goal state is by applying what is
called a search algorithm.
 Problem       Starting       Example states     Valid moves           Goal state
 chess         Initial board State of board Dictated             by Checkmate
                              after move         pieces
 sudoko        Few squares Current state of Apply rules to fix No empty squares
               prefilled      square             value of blank No rules (constraints)
                                                 square                violated
 creating      Blank          Sequence       of Add instruction Code passes series of
 assembly      program        machine            on end;               user-supplied tests
 code to do                   instructions       Add instruction in
 task X                                          middle
 Route         Position       Positions,         Add path from Reach target;
 finding                      possibly     with current position to Could         be      extra
                              history            next junction         constraints e.g. < 3
                                                                       hours,        “fastest”,
Lets go back to our maze problem. Imagine that you are put at the entrance to the maze
below and have to find your way to the middle. How are you going to do this?

      You could of course do it by making a series of random guesses at each junction. If
       you had some way of marking which paths you had been down, you could avoid
       repeating paths you had just been down, but you could easily get stuck in dead ends.
               No animation here- too much randomness…
      So to avoid this you decided to be a bit more systematic. Lets assume that:
           o At every junction you put an arrow on the path you came from, and then take
               the leftmost path leading out.
           o If you get to the middle you stop,
           o Otherwise if you get to a dead end you just go back to the last junction, and
               take the next untravelled entrance to the left.
           o If you end up back at a junction and find you have taken all of the paths
               leading out of it, then you go “backwards” up the path you arrived on
               (remember that you marked it with an arrow).
       In most cases this strategy will find the middle. It might take you a long time, and a
       lot of exploring and backtracking but you will get there.
       The only problem you have apart from time, is that you might get stuck if your maze
       has loops in it – for example a wall which is not attached at either end, and you might
       just keep following it around.

             Animation of maze-following. Red lines indicate current state of seeker.
         Green lines show where back-tracking has happened. Depending on context (and
                     available storage) we may remember these paths or not.
      So maybe you decide on a different approach. Perhaps someone has told you there is
       a quick way in?
           o You start out by taking the left hand path until you get to the next junction,
              where you leave a mark with the value “1”.
           o You then go back to the entrance and take the next path, again marking the
              first junction with a “1”.
           o    When you’ve done all the first set of paths, you now take the first path again,
              and this time repeat the process leaving “2”s at the next level of junctions.
           o Of course in an ideal world (x-men 3 anyone?) you create clones of yourself at
              each junction to avoid all that running to and fro. The main point is that you
              explore all the first level junctions, then all the second, and so on.
       Problems with this approach? – well the point about cloning really- – especially if you
       have lots of ways out of a junction you could rapidly need lots of clones. Could be a
       big problem if the middle is a long way in (lots of junctions). On the plus side, you
       are guaranteed to find the quickest way to the middle, even if it takes you ages.

                 Animation of second maze-following strategy assuming “cloning”.
           Items in red lines indicate current state of seeker and must be held in memory.
        Items in green lines show information discoverd and show where back-tracking has
         happened. Depending on context (and available storage) these may be kept or not

      As an aside, what if you are doing the left-first approach and someone tells you that
       the middle is only four junctions in? Easy – you just exploit that knowledge to tell
       you when to give up on a path even if you haven’t got to the middle/a dead end.

      All sounds a bit tricky, so what if you have a beacon that tells you how many metres
       away the middle is? You could use that to guide your choice. But then the maze
       designer may have put in some blind alleys that get you near to , but not quite at, the
      Or maybe the maze is built on a hill, and you are told that the middle is somewhere
       near the top, so you have an inexact measure of how close you are? Both of these
       would help provided you learned from our “blind” experiences and used some way of

Why have I chosen a maze? Well, it’s easy to visualise, we all know them, and lots of games
are built on them – even if they are dressed up in other ways. Best of all, mazes have this
nice property that you can unfold them into a tree, and trees are really important in
Maze discovered by second strategy. This time the walls have been removed and the nodes
are renumbered in order of discovery/exploration. For completeness the final two “dead-
ends” are added in blue

The image above shows what happens if you take the set of things discovered by the second
strategy and then simply remove the walls. For completeness I have added the last two dead
ends – shown in blue. As you can see, it is a simple matter of “unkinking” the odd line, and
putting in some extra bends to get nice simple tree shape - a bit like a “family tree”.
     Because I wanted to keep the example fairly simple, I only had two choices at each
        junction, so this is what is called a “binary” tree.
     As you saw, some of the points where the algorithm halts are junctions, and some are
        dead ends. When we unfold the maze to a tree, these correspond to what we would
        call “interior nodes” and “terminal nodes”.
     The latter are also often called “leaves” for obvious reasons, and the starting point is
        usually known as the “root” of the tree.

Trees are what’s known a data structure, and its easy to store a tree as a collection of nodes
– each one is like a little container which has inside it the id of the parent node above it in the
tree, any the “children” ones below it, and maybe some other information.

   Computer scientists like trees because they are really simple to implement, manipulate,
    and extend. The more mathematically minded people like them because they are a form
    of what is known as “acyclic graphs”, which are handy for doing reasoning about (acyclic
    means without cycles or loops).
   For example we might implement this like:
        Node ( id = 3, parent id = 1, child id = 6,7, data = “”)
        Node(id = 19, parent id = 16, child id = , data = “X marks the spot”)

In Steve’s knowledge representation lectures you saw one way in which facts and
relationships could be stored, so hopefully the analogies are obvious: If classes of things are
nodes, then “isa” is a line from one node to another ...

Most of you will either be familiar with, or about to learn, an object-oriented programming
language. There are lots of links between the reasoning behind OO languages and the ideas of
tress and graphs. Especially so once people start talking about classes and inheritance
Hopefully by now you are starting to see that many of the examples I gave above easily fit
into this format. More generally we recognised above that mazes could have “free standing”
walls in them which could create loops. In the practicals there are several examples of this –
for example in the missionary and cannibals problem you can get back to an earlier state.

So where does this get us? Well, we have seen how many problems can be represented in a
fairly simple way as a search in what is called state space. We have seen that we need not
“know” everything about our state space in advance – as long as we can say what possible
ways there are out of a given state, then we can apply one of the strategies above, storing the
information we need in the form of a tree/graphs via a collection of nodes. All we need to do
now is formalise the fairly folksy definition above in a way that they can be written down in
a form that a programmer can reproduce.

In the lectures we cover the formal definitions of this kind of search in a lot more detail. The
first method I described in called depth-first search, and the second is breadth-first search.
These are known as “blind search” methods since they have no other data to guide them to
the middle of the maze. They are implemented via two classic computing methods: the stack
and the queue.

A stack is just like a stack of cards. You can add cards to the top, or take the top one off, but
you can never access the interior cards without first removing the ones above them. Lots of
computer programming uses this metaphor to manage memory and variables, as you’ll find
out when you start debugging programmes, although thankfully nowadays compilers mostly
hide the details from the user. 
Depth first search is implemented using a stack.
        When you reach a junction (interior node) you:
                  push that node (lets call it “A”) on the top of the stack,
                  and go down the first child.
        If that child is a dead-end, then you simply:
                 “pop” node “A” off the top of the stack,
                  remove the id of the first child node,
                 retrieve the second child id as our next destination
                 and push “A” back on the stack.
        If the child node “B” is itself a junction, then you:
                  add node “B” to the stack
                  and move on to its first child.
        If you eventually back-track to “A” and it has no children left, then you:
                  discard it,
                  “pop” the next node off the top of the stack
                  (which must have been “A”s parent),
                 Go to the next unexplored node - i.e. one of A’s siblings.
                 Or discard that node and get the next off the stack if there are no siblings left.

Hopefully you can see that in this way the size of the queue is restricted to the current depth
of the search.

A queue is like a bus queue – sometimes called first in first out.
Breadth-first search is implemented using this type of queue:
In the example above:
       Start with a queue with just node 1 in.
       Remove node one from the front of the queue,
       Put its child nodes (2 and 3) on the back.
       Then take the next node (2) off the front and add its children (4 and 5) to the back).
       Repeat until goal found.

Every time this reduces the size of the queue by one, then increases it by zero or more:
       Obviously terminal nodes do not have any children so they shorten the queue
       Interior nodes increase the queue by adding two or more children on the back.

Especially if nodes may have several children, this means that the queue may well carry lots
more nodes in memory than the stack – in other words breadth-first generally requires more
memory. If not, try looking at the two animations again, where the nodes in red have to be
held in memory and compare the two approaches. If there are more than two children per
node the situation gets even worse.

The lectures will cover the specifics of depth/breadth first search, and informed methods such
as best-first , hill-climbing and A*. Hopefully you can see that those methods known as
“heuristic” or “informed” search which use of information about the quality of a state (e.g.
how far it is from the goal) to guide the search process are really just variants on these two
methods where each node also carries some sort of “value” or “cost”.

So what’s all this other stuff about optimisation and modelling? Time for some more
In the lectures we draw a distinction between optimisation and modelling. Where do these
examples fit in?

Well in a maze we have a goal state “X marks the spot”. We also have a model of the world-
our maze. We may not have a full model in our chosen algorithm to start of with, but we
build it up in the form of a tree as we examine sequence of moves and see what nodes (dead
ends or junctions) they take us to. What we are looking for is the right sequence of moves to
get us to our goal. This corresponds to a list of the nodes as we go from the root of our tree
through to the goal. In fact for depth first search this is even held neatly for us, since it is the
contents of the stack! Breadth-first search holds rather more detail, and we will need to
retain the discarded nodes if we want to be able to print the exact route to the goal, but the
principal is the same. In order to solve the problem we build up a partial representation of the
bit of the world we are interested in, but what we provide at the end is a sequence of moves
- ( inputs) that take us from the start to the goal state.

Pac-man, chess (with allowance for the fact that is too hard to do more than a few moves
ahead), and various flavours of route/path finding are all essentially the same.

So what about modelling problems like classification, how do they fit in?
Well, let’s imagine we are trying to build up a rule base that can later be used in a help-desk
to diagnose printer problems. The sorts of information that an expert might user are: is the
power reaching both computer and printer?, is there any paper?, is the printer driver installed
and working?, is there a printer jam? ... Each of these is something that could be measured
or found out by asking the user. There are also a set of actions that could be performed –e,.g.
turn on, reboot, reconnect leads, check paper jam, not all of which will help every situation.
Lets imagine that we are given lots of examples, with answers to the questions above, which
actions solved which problems.
There many ways of tackling this problem, here is one simple one:
     Let a state correspond to a list of rules and the number of our examples solved/not
        solved by applying those rules.
     When we move to a child state we add a rule to the set inherited from the parent, and
        re-test the unsolved examples.
     We can create rules from any combination of situations and outcomes. For example
        “IF paper_out=TRUE THEN Reboot” is a valid, but fairly daft rule,
     We continue until:
            o we have found a path that (sequences of rules) that solves all our examples.
            o Or the number of unsolved examples has not decreased for some (e.g. 5)
     We then take the rule-set from the path leading to the “winning” node to be the
        “model” for our help desk
Now this is a rather simplistic idea I’ve chosen just to illustrate how we could view model-
building as a search problem. Real-world methods are generally more sophisticated but
nevertheless many “rule-induction” and “decision tree” methods essentially do something
similar to this. By the way, note that since we have a “number of unsolved cases” that we
want to reduce to zero, we automatically have a way to apply informed search.
Partial tree resultnig from one possible approach to create model of printer diagnosis
problem. Each node in tree corresponds to a possible model to be used in the help desk.

Exact Search Methods and the Need for “Heursitics”
Sadly most interesting problems in real life suffer from the problem that so-called exact
methods like breadth-first, A* etc . just can’t run fast enough (see box on NP below). This is
basically because the number f possible states just grows too quickly. For example even a
simple maze like the one above, with only two-way junctions had 2n paths leading out of
level n i.e. 3 nodes at or below level 2, 7 at or below level 3, 15 at or below level 4 ... which
rapidly means that even with the fastest machines can’t compute all the possibilities within
years by the time n gets above the mid twenties (I’ll leave it as an exercise for you to work
this out ..). For other types of problems – like finding optimal permutations of events/cities
etc, the number of possible solutions grows even faster. It’s a matter of conjecture, but almost
universally accepted within computer science that there is no algorithm that can definitely
sovle these problems that runs in a time which is polynomial in n (e.g. n2, n3, or even just ny
for some fixed value y).

The net result of all this is that we are forced to look for ways of coming up with solutions
that usually come up with high quality solutions, even if they cannot be guaranteed to be the
One approach is to apply rules of thumb to build up solutions – like “keep adding the nearest
unvisited drop” if we are planning a delivery schedule. However these may not yield results
without adding lots of tricks such as sophisticated back-tracking and loop avoidance for our
maze example.

The other approach is to find a way of specifying our problem so that we can work with
whole “candidate” solutions, and apply rules of thumb to tell us how to generate the next
candidate solution based on what we have seen so far, and some kind of quality measure.
These methods tend to be called “metaheuristics” – we’ll be covering one family of them in
some detail. To give you an idea though, below are some different approaches for
formulating the problems we have discussed above.

Different ways of asking the same question

One way that can be useful to think of applying these methods is just as we did for our maze:
in other words to construct a solution, extending it as necessary until they reach the goal state.
        But is worth bearing in mind how many nodes we examined just to do our simple
        maze. As I suggested, if the maze has loops in it then depth first can get trapped in
        endless loops, but on the other hand depth-first search needs lots of memory.

However, as with so much else in life, there are different ways that we could pose this
problem. For example how about this as an approach to solving the maze:
    Assume that our maze can be solved in 10 or less states.
    Let our “state” be the sequence of turns we take
    Let the starting state be all “left” i.e. [L,L,L,L,L,L,L,L,L,L]
      Let the children of a state be all those that can be reached by changing one turn – i.e
       from the first state we have :
      [R,L,L,L,L,L,L,L,L,L],         [L,R,L,L,L,L,L,L,L,L]       ,     [L,L,R,L,L,L,L,L,L,L]
       ,[L,L,L,R,L,L,L,L,L,L]           ,[L,L,L,L,R,L,L,L,L,L]         ,[L,L,L,L,L,R,L,L,L,L]
       ,[L,L,L,L,L,L,R,L,L,L]           ,[L,L,L,L,L,L,L,R,L,L]         ,[L,L,L,L,L,L,L,L,R,L]
       ,[L,L,L,L,L,L,L,L,L,R] ,
      When we “examine” a state, we start at the beginning of the maze and stop when we
       reach a dead-end
      If the dead end is our goal state then the search is over.
      This time our state space obviously has loops in it, but we could still apply depth or
       breadth first search, or for larger mazes we might use a genetic algorithm to evolve
       the solution.
      Clearly in this case we are not holding any specific knowledge about the maze

Here’s another example: sorting out a delivery schedule for a lorry. We are given a list of
drops they must make, we are told not to revisit, and we know that for safety reasons the
driver must not be driving for more than 8 hours. In this case the obvious “constructive”
approach is as follows:
     Start state = depot, time = zero
     Possible transitions from any state = move to unvisited drop, set time =time
     Goal state = no unvisited drops and time + time_from_drop_to_depot < 8 hours

Clearly we could apply depth or breadth first to this problem (if we had time). Where we had
“dead-ends” to stop lines of travel in the maze, here we can use our time constraint to stop
exploring lengthy routes. We could also make use of some knowledge to give us “informed
search”. For example, we could apply a rule of thumb that initially we always took the route
to the nearest unvisited drop. However, as you can probably imagine, even for pretty small
problems the size of the queue need to hold all the nodes at a given level for breadth first
becomes huge.

In this case the alternative way of asking the question might be start with a random sequence
of cities, so that a state consisted of:
         a valid (i.e. complete) sequence of drops,
         the time taken to complete the run for that state
         ids of all the states reached by swapping the positions of two drops in the sequence.
This has some benefits – for example it gives us a natural source of information about
sequences (their total lengths) that we can use to guide our search.
Again genetic algorithms and other metaheuristics are a natural way to tackle large instances.

To top