# Graphical Models, Distributed Fusion, and Sensor Networks

Document Sample

```					Distributed Fusion in Sensor Networks:
A Graphical Models Perspective

Alan S. Willsky

May 2006
Why do graphical models have anything to do
with distributed fusion and sensor networks?
Sets of variables defined on nodes of a
“network” (graph)
And inference algorithms for “fusing”
information throughout the graph have
evocative structure (e.g., message-passing)
But there are special issues in sensor
networks that add some twists and require
some thought
And that also lead to new results for graphical
models more generally
Graphical Models 101
G = (V, E) = a graph
V = Set of vertices
E ⊂ V×V = Set of edges
C = Set of cliques
Markovianity on G (Hammersley-Clifford)
P({ xS | s ∈ V }) ∝ ∏ψ C ( xC )
C ⊂C

• Objectives

Estimation :     Compute     Ps ( xs )
Optimization :     arg max    P({ xS | s ∈ V })
Algorithms that do this on trees
Message-passing algorithms for “estimation”
(marginal computation)
Two-sweep algorithms (leaves-root-leaves)
For linear/Gaussian models, these are the
generalizations of Kalman filters and smoothers
Belief propagation, sum-product algorithm
Non-directional (no root; all nodes are equal)
Lots of freedom in message scheduling
Message-passing algorithms for
“optimization” (MAP estimation)
Two sweep: Generalization of Viterbi/dynamic
programming
Max-product algorithm

I.    Message Product: Multiply incoming messages (from all nodes
but s) with the local observation to form a distribution over
II. Message Propagation: Transform distribution from node t to node
s using the pairwise interaction potential
Integrate over   to form distribution summarizing
What do people do when
there are loops?
One well-oiled approach
Belief propagation (and max-product) are algorithms
whose local form is well defined for any graph
So why not just use these algorithms?
Well-recognized limitations
The algorithm fuses information based on invalid
assumptions of conditional independence
Think Chicken Little, rumor propagation,…
Do these algorithms converge?
If so, what do they converge to?
There are also other, new algorithms that also
have implications for sensor networks
A first class of applications
Mapping of spatial fields (e.g., in
environmental networks)
Irregular set of locations are sensed
Extensions are “easy” if there are multiresolution
data using multiresolution graphs
MRF model for the underlying phenomenon
Approaches
One based on “walk-sums” as messages
take hikes through the network
Recursive cavity modeling
Alternate approach to approximate
inference: Recursive Cavity Models
Recursive Cavity Modeling:
Remote Sensing Application
Another example: Sensor
Localization and Calibration
Variables at each node can include
Node location, orientation, time offset
Sources of information
Priors on variables (single-node potentials)
Time of arrival (1-way or 2-way), bearing, and absence of signal
These enter as edge potentials
Modeling absence of signals may be needed for well-posedness, but it
Nonparametric Inference for General Graphs
Belief Propagation                           Particle Filters
• General graphs                             • Markov chains
• Discrete or Gaussian                       • General potentials

Nonparametric BP
• General graphs
• General potentials

Problem: What is the product of two
collections of particles?
Nonparametric BP

Stochastic update of kernel based messages:

I. Message Product: Draw samples of      from the product of all incoming
messages and the local observation potential
II. Message Propagation: Draw samples of      from the compatibility
function,            , fixing  to the values sampled in step I

Samples form new kernel density estimate of outgoing message
(determine new kernel bandwidths)
NBP particle generation

Dealing with the explosion of terms in
products
How do we sample from the product
without explicitly constructing it?
The key issue is solving the label
sampling problem (which kernel)
Solutions that have been developed
involve
Multiresolution Gibbs sampling using KD-trees
Importance sampling
Example

“1-step” Graph              “2-step” Graph

Nonlin Least-Sq       NBP, “1-step”      NBP, “2-step”
Yet another example: Data
association
Setting up graphical models
Different cases
Cases in which we know which targets are seen by
which sets of sensors
Cases in which we aren’t sure how many or which
targets fall into regions covered by specific subsets
of sensors
Constructing graphical models that are as
sensor-centric as possible
Very different from centralized processing
Each sensor is a node in the graph (variable =
assigning measurements to targets or regions)
Introduce region and target nodes only as needed in
order to simplify message passing (pairwise cliques)
Communications-sensitive
message-passing
Objective:
Provide each node with computationally simple (and
completely local) mechanism to decide if sending a
message is worth it
Need to adapt the algorithm in a simple way so that
each node has a mechanism for updating its beliefs
when it doesn’t receive a full set of messages
Simple rule:
Don’t send a message if the K-L divergence from the
previous message falls below a threshold
If a node doesn’t receive a message, use the last one
sent (which requires a bit of memory: to save the last
one sent)
Illustrating comms-sensitive
message-passing dynamics
Organized network           Self-organization
data association   with region-based representation
Incorporating time, uncertain
organization, and beating the dealer
Add nodes that allow us to separate target
dynamics from discrete data associations

Perform explicit data association within each frame
(using evidence from other frames)
Stitch across time through temporal dynamics
Trading off bits for fusion accuracy
What we need is an audit trail
Message accuracy to fusion accuracy
Bits versus message accuracy
The first
Exploit a notion of “dynamic range” of message “errors”
The key is defining a metric that is subadditive (for the “product”)
and contractive or mixing (for the “sum”)
Leads to precise bounds (and BP convergence results) and to
approximations (when we view errors as quantization noise)
The second
Exploit multiresolution representation of messages and
quantify bits versus message error as a function of resolution
Experiments
Relatively weak         Stronger potentials
potential functions       Loopy BP not guaranteed to
Loopy BP              converge
guaranteed to          Estimate may still be
converge
useful
Bound and estimate
behave similarly
Multiresolution communication
of particle-based messages
KD-trees
Tree-structure successively divides point sets
Typically along some cardinal dimension
Cache statistics of subsets for fast
computation
Example: cache means and covariances
Can also be used for approximation…
Any cut through the tree is a density estimate
Easy to optimize over possible cuts
Communications cost
Upper bound on error (KL, max-log, etc)
For localization and tracking this leads to
Send multimodal distributions early
Send coarser, single modes once a node or
track is localized
How can we take objectives of
other nodes into account?
Rapprochement of two lines of inquiry
Decentralized detection
Message passing algorithms for graphical models
Lots to do, but what we know:
When there are comms constraints and both local and global
objectives, optimal design requires the sensing nodes to organize
Doing this corresponds to optimizing the non-fixed parts of a
graphical model for the phenomenon, sensors, and fusion network
This organization specifies a protocol for generating and
interpreting messages
Organization uses comms (there’s a cost to being ad hoc)
Person-by-person optimal organization requires message passing
Avoiding the traps of optimality for decentralized detection for
complex networks requires careful thought
A tractable and instructive case:
Directed acyclic networks
Each node receives one or more bits of information from its “parents”
and sends one or more bits to its “children”
Underlying phenomenon and “cost” have compatible structure
Each node senses a “local” part of the overall phenomenon
Overall cost is sum of local decision costs and costs for communication
Person-by-person optimality via message-passing
At each optimization stage, each node needs:
A pdf for the bits it will receive from its parent (this is the information push
part of the protocol: What does it mean if I receive a “1”?)
A downstream “cost-to-go” (this is the information pull part of the protocol: I’ll
tell you what’s important to me)
Generalizing this requires the avoiding the same traps as those
encountered in specifying graphical models
Avoiding the NP-hard trap of exact evaluation (and optimization) of
expectations
Dealing with Limited Power: Sensor
So where are we going?
Information science in the large
These problems are not problems in signal
processing, computing, information theory
They are problems in all of these fields
And we’ve just scratched the surface
Why should the graph of the phenomenon be the
same as the sensing/communication network?
What if we send more complex messages with
protocol bits (e.g. to overcome BP over-counting)
What if nodes develop protocols to request
messages
In this case “no news” IS news…

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 16 posted: 5/14/2010 language: English pages: 26
How are you planning on using Docstoc?