Your Federal Quarterly Tax Payments are due April 15th

Search Algorithms for Agents by dffhrtcv3

VIEWS: 12 PAGES: 47

• pg 1
```									  CISC 886: MultiAgent Systems
Fall 2004

Search Algorithms for
Agents

Sachin Kamboj
Outline
   Introduction
   Path-Finding Problems
   Formal Definition
   Asynchronous Dynamic Programming
   Learning Real Time A*
   Moving Target Search
   Real –Time Bidirectional Search
   Constraint Satisfaction Problems
   Formal Definition
   Filtering Algorithm
   Hyper-Resolution Based Consistency Algorithm
   Asynchronous Backtracking
   Distributed Constraint Optimization Problems
   OptAPO (OPTimal      Asynchronous Partial Overlay)
Introduction
   Search:
   an umbrella term for various problem solving techniques
in AI
   used when the sequence of actions required for solving a
problem is not known a priori
   hence trial and error exploration of the alternatives is
required
   Search algorithms are designed to solve three classes
of problems:
   Path-finding problems
   Constraint satisfaction problems
   Competitive games
Introduction
   A whole set of search algorithms exist for single
agents
   have known properties (like time and space complexity).
   have been used effectively to solve a large number of AI
problems.
   Examples: BFS, DFS, Branch and Bound, A*
   So, why use multiple agents?
   Agents have limited rationality
   search is often intractable
   may not have a complete picture of the problem
   may not have the required computational capability
   Agents may be self interested
Introduction
   Approach
   If we represent the search problem as a graph, we
can solve it by accumulating local computations
for each node in the graph.
   Local computations can be executed asynchronously
and concurrently

Agent 2
Agent 3
Agent 1
Introduction
algorithms:
   Local computations needed will fit within the
limited rationality of the agents
   Execution order of these algorithms can be highly
flexible and arbitrary
Path Finding Problems
Example 1: Finding a path through a Maze
Goal

Start
Example 2: Solving the 8-puzzle problem
4 2
1 3 5
6 7 8

1 4 2       1 4 2        1 2
3 5     3 3 5      3 4 5
6 7 8       6 7 8      6 7 8

Initial     1 4 2       Goal
State                   State
6 3 5
7 8
Formal Definition
   A path finding problem consists of the
following components:
   A set of nodes, N, each representing a state
   A set of directed links, L, each representing an
operator available to a problem solving agent
   A unique start state, S
   A set of goal states, G
   A set of weights, W, associated with each link
   represent the cost of applying the operator
   called the “distance” between the nodes
   Neighbors are nodes that have directed links
between them
Principle of Optimality
   States that a path is optimal if and only if
every segment of it is optimal
Asynchronous Dynamic Programming
   Let:
   h*(i) = shortest distance from node i to the goal
   k(i,j) = cost of link between i and j
   f*(j) = shortest distance from node i to goal via a
neighboring node j
f*(j) = k(i,j) +h*(j)
   By the principle of optimality:
h*(i) = minj f*(j)
   Asynchronous dynamic programming computes
h* by repeating the local computations of each
node
Asynchronous Dynamic Programming
   Assumes the following situation:
   For each node, i, there exists a process
corresponding to i
   Each process records h(i), which is the estimated
value of h*(i).
   The initial value of h*(i) is arbitrary (e.g., , 0) except for
the goal nodes
   For each goal node g, h(g) is 0.
   Each process can refer to h values of neighboring
nodes (via shared memory or message passing)
Asynchronous Dynamic Programming
   Each process updated h(i) by the following
procedure:
   For each neighboring node j:
   Compute f(j) = k(i,j) + h(j) where
   h(j) is the current estimated distance from j to a goal node
   k(i,j) is the cost of the link from i to j
   update h(i) as follows
   h(i) ← minj f(j)
Asynchronous Dynamic Programming
   Example:


3           
1
2

   1                   1
0
1   1
1

initial       3
2       3   goal
state             
2           
3
2       state
Asynchronous Dynamic Programming
   Is the algorithm complete?
   Yes
   Is the algorithm optimal?
   Yes
   Are there any problems?
   cannot be used for reasonably large path-finding
problems
   we cannot afford to have processes for all the nodes
Learning Real-Time A*
   Used when:
   only one agent is present
   not possible to perform local computations for all nodes
   when planning and execution needs to be
interleaved
   In this algorithm:
   the agents selectively execute the computations
for the current node
   each agent repeats the following procedure:
   Lookahead: calculate f(j) = k(i,j) + h(j)
   Update: the estimate of node i as h(i) ← minj f(j)
   Action Selection: Move to the neighbor j that has the
minimum f(j) value. Ties are broken randomly
Learning Real-Time A*
   Requirement:
   the initial value of h must be optimistic, i.e.
h(i)  h*(i)
   Is the algorithm complete?
   Yes, in a finite number of nodes with positive link costs, in
which there exists a path from every node to a goal node,
and starting with non-negative initial estimates, LRTA* will
eventually reach a goal node
   Is the algorithm optimal?
   Requires repeated trials for optimality
   If the initial estimates are admissible, then over repeated
problem solving trials, the values learned by LRTA* will
eventually converge to their actual distances along every
optimal path to the goal node
Moving Target Search
   Allows the goal state to change during the
course of the search
   For example, a robot’s task is to reach
another robot which is in fact moving as well
   The target robot may
   cooperatively try to reach the problem solving robot
   actively avoid the problem solving robot
   move independent of the problem solving robot
   In order to guarantee success, the problem
solver must be able to move faster than the
target
Moving Target Search
   Is a generalization of LRTA*
   The algorithm:
   does NOT maintain a single heuristic of the
distance to the target goal
   instead tries to acquire heuristic information for
each potential target location.
   Thus, MTS maintains a matrix of heuristic values,
representing the function h(x,y) for all pairs of states x
and y
   The matrix is updated on each move of the problem
solver and the target.
Moving Target Search
   Let xi and xj be the current and neighboring
positions of the problem solver and yi and yj
be the current and neighboring positions of
the target.
   Assume all edges in the graph have unit
cost
   When the problem solver moves:
1.   Calculate h(xj,yi) for each neighbor xj of xi.
2.   Update the value of h(xi,yi) as follows:
h(xi,yi) ← max ( h(xi,yi) , minxj{h(xj,yi) + 1} )
3.   Move to the neighbor xj with the minimum h(xj,yi), i.e.
assign the value of xj to xi. Ties are broken randomly.
Moving Target Search
   When the problem solver moves:
1.   Calculate h(xi,yj) for the target’s new position yj.
2.   Update the value of h(xi,yi) as follows:
h(xi,yi) ← max ( h(xi,yi) , h(xj,yj) – 1 )
3.   Reflect the target’s new position as the new goal of the
problem solver, i.e. assign the value of yj to yi.
   Is the algorithm complete?
   Yes, A problem solver executing MTS is
guaranteed to eventually reach the target
   Is the algorithm optimal?
   No
Real –Time Bidirectional Search
   Two problem solvers starting from the initial and
goal states physically move towards each other.
   Planning and execution are interleaved
   The following steps are repeatedly executed until
the two problem solvers meet in the problem space:
1.   Control Strategy: Select a forward (step2) or backward
move (step3)
2.   Forward Move: The problem solver starting from the
initial stage (i.e. the forward problem solver) moves
towards the problem solver starting from the goal state.
3.   Backward Move: The problem solver starting from the
goal stage (i.e. the backward problem solver) moves
towards the problem solver starting from the initial state.
Real –Time Bidirectional Search
   Can be classified into two categories:
   Centralized RTBS
   The best action is selected among all possible moves of
the two problem solvers
   The control strategy selects which of the two problem
solvers to run depending on what the best action is
   Two centralized RTBS algorithms (based on LRTA* and
RTA*) can be implemented
   Decoupled RTBS
   The two problem solvers independently make their own
decisions.
   The control strategy alternatively runs the forward and
backward problem solvers
   MTS can be used for implementing decoupled RTBS.
Constraint Satisfaction
Problems
Example 1: Scheduling a set of tasks
   A set of exams need to be scheduled during
the last week of December. No more than 5
exams can be scheduled on a Tuesday and no
more than 7 exams on any other day………
Example 2: Graph-Coloring Problem
X1                X2
{ red, blue, yellow }                          { red, blue, yellow }

X3   { red, blue, yellow }

X4   { red, blue, yellow }

   Objective:
   To paint the nodes of a graph so that any two nodes
connected by a link do not have the same color.
    Each node has a finite number of possible colors
Formal Definition
   A constraint satisfaction problem consists of:
   A set of n variables V = {x1, x2, …, xn }
   Discrete, finite domains for each of the variables D = {
D1, D2, …, Dn }
   A set of constraints on the value of the variables.
   The constraints are defined by predicates,
pk(xk1, xk2, …, xkj) where each pk is the function
pk : Dk1 x Dk2 x … x Dkj  {0 , 1}.
   The problem is to find an assignment of values to the
variables such that all the constraints are satisfied.
   Constraint satisfaction is NP-complete in general
   A trial and error exploration of alternatives is inevitable
Relation to DAI
   We assume that the variables of the CSP are
distributed amongst multiple agents.
   Many application problems in DAI can be
formalized as distributed constraint satisfaction
problems.
   For example:
   interpretation problems
   assignment problems, and
   multiagent truth maintenance problems
   For simplicity, we assume an agent for each variable
in all the algorithms
Filtering Algorithm
   Each agent communicates its domain to its neighbor and then
removes values that cannot satisfy constraints from its
domain.
   More specifically, a process (agent), xi performs the
following procedure revise(xi,xj) for each neighbor xj.
procedure revise (xi, xj)
for all vi  Di do
if there is no value vj  Dj such that vj is
consistent with vi
then delete vi from Di; end if; end do;
   If some value of the domain is removed by performing the
procedure revise, process xi sends the new domain to its
neighboring processes.
   If a new domain is received from a neighbor, call procedure
revise again.
Filtering Algorithm
   For example,
X1                   X2
{ red, blue, yellow }                             { red }

X3   { blue }

X4   { red, blue, yellow }

   As a result of the filtering algorithm, x1 will
remove red and blue from its domain and x4
will remove blue from its domain.
Filtering Algorithm
   If the domain of some variable becomes the empty
set:
   the problem is over-constrained and has no solution
   If each domain has a unique value:
   the assignment of the unique values to the variables is a
solution.
   If there exist multiple values for some variable:
   we cannot tell whether the problem has a solution or not
   further trial and error search is required to find a solution
   Filtering algorithms cannot solve CSP problems in
general
   This algorithm is used as a preprocessing procedure
before the application of some other method.
Hyper-Resolution Based Consistency Algorithm
   All constraints are represented as a “nogood”
   a prohibited combination of variable values.
   For example, in the figure below:
X1                 X2
{ red, blue }                           { red, blue }

X3   { red, blue }

   A constraint between x1and x2 can be represented using two
nogoods:
   {x1 = red, x2 = red}
   {x1 = blue, x2 = blue}
   The algorithm uses several existing nogoods and the domain
of a variable to generate a new nogood.
Hyper-Resolution Based Consistency Algorithm
   For example, using the nogoods:
   {x1 = red, x2 = red}
   {x1 = blue, x3 = blue}
and the domain of x1 {red, blue}, a new nogood:
   {x2 = red, x3 = blue}
is generated
   The hyper-resolution rule is described as follows:
A 1 V A2 V … V Am
 (A1  A11 … )
 (A2  A21 … )
:
:
 (Am  Am1 … )
 (A11  …  A21  …  Am1 …)
Asynchronous Backtracking
   Asynchronous version of a backtracking algorithm
   standard method for solving CSPs
   Each variable/process is assigned a priority
   usually based on the alphabetical order of the variable identifiers
   Each process selects a random value from its domain
   Each process communicates its tentative variable assignments
to its neighboring processes.
   If the current value of a process is not consistent with the
assignment of higher priority processes, the process changes
its value
   If no consistent value exists, generate a new nogood and send it to the
higher priority process
   On receiving a nogood, higher priority process changes its value.
   Each process maintains the current variable assignments of
other processes in its local_view.
   May contain obsolete information.
Asynchronous Backtracking
   Two main types of messages are
communicated:
   ok? messages to communicate the current value
   nogood messages to communicate a new nogood
   Example:
(nogood {(x1, 1) })
X1                            X2
{ 1, 2 }                                      { 2 } local_view {(x1, 1) }

                  
(ok? (x1, 1))                                     (nogood {(x
(ok? (x2, 2)) 1, 1), (x2, 2) })
X3      { 1, 2 }

local_view {(x1, 1), (x2, 2) }
Distributed Constraint Optimization Problems
   Are a generalization of constraint satisfaction problems
   Like DCSP, DCOP includes a set of variables:
   each variable is assigned to an agent that has control over its value
   In DCSP
   the agents assign values to variables so as to satisfy the constraints on
them
   In DCOP
   the agents must coordinate their choice of values so that a global
objective function is optimized.
   Applications of DCOP:
   Multiagent Teamwork
   Distributed Scheduling
   Distributed Sensor Networks
Distributed Constraint Optimization Problems

   Formal Definition
   A constraint satisfaction problem consists of:
   A set of n variables V = {x1, x2, …, xn }
   Discrete, finite domains for each of the variables D = { D1, D2,
…, Dn }
   A set of cost functions f = {f1, …, fm} .
   where each fi is a function
fi : Di1 x Di2 x … x Dij  N U .
   The problem is to find an assignment A* = {d1, …, dn | di  Di}
such that the global cost called F, is minimized.
   F is defined as follows:
m
F ( A)   f i ( A)
i 1
Distributed Constraint Optimization Problems
   Design Criteria for DCOP algorithms:
   Agents should be able to optimize a global
function in a distributed fashion using only local
communication
   The agents should operate asynchronously
   agents should not sit idle waiting for a particular
message from a particular agent
   The algorithm should provide provable quality
guarantees on system performance
   Generalization of Asynchronous Backtracking
   with a bunch of performance tweaks.
   Starts by assigning a priority to the agents based on a
depth-first search tree
   each node has a single parent and multiple children
   parents have higher priority than the children
   hence, does not require a linear priority ordering on the
agents
   Constraints are only allowed between a node and any
of its ancestors and descendants
   there can be no constraints between different subtrees of
the DFS tree
   not a restriction of the constraint network itself
   Example:

x1                  x1

x2                  x2

x3              x4   x3              x4

Constraint Graph         DFS Tree
   Algorithm begins by all agents choosing their values
concurrently
   The algorithm uses three types of messages:
   VALUE Messages:
   used to send the current selected value of the variable to the
descendants below the node in the DFS tree
   similar to ok? messages in ABT
   THRESHOLD Messages:
   are only sent by a parent to its immediate children
   contain a single number which represents the backtrack threshold
   COST Messages:
   are a generalization of nogood messages in ABT
   contain the current context (same as in ABT) and the lb and the
ub.
   The algorithm calculates the local cost using the
formula:
 (di )  ( x ,d                        f ij (di , d j )
j     j )CurrentContext

where δ(di) is the local cost at xi when xi chooses d.
   This formula is used to calculate the cost of a node only
on the basis of the constraints that the node shares with its
ancestors (NOT its children)
    This is because the current context is built from the VALUE
   The node (xi) also calculates LB and UB
    The idea is that LB and UB are the lower and upper bounds on
the cost seen so far for a subtrees rooted at xi.
   For a leaf node,
   lb(di) = ub(di) = δ(di)
   For any other node,
d  Di , lb(d )   (d )  x Children lb(d , xl )
l

   For all nodes:
LB  min dDi lb(d )
   Similar for UB
   By keeping a track of LB and UB, the agent knows
the current lower bound and upper bound on cost in
the subtrees
   The algorithm uses a threshold values to decide
when to backtrack
OptAPO
   OPTimal Asynchronous Partial Overlay
   used to increase the efficiency of previous DCOP
   previous DCOP algorithms were based on a total
separation of the agents knowledge during the
problem solving process
   is based on a partial centralization technique
called cooperative mediation
   allows the agents to extend and overlap the context
that they use for making their local decisions
OptAPO
   When an agent acts as a mediator, it
   computes a solution to the overall problem
   recommends value changes to the agents involved
in the mediation session
Questions?

```
To top