# defense

Document Sample

```					Change Detection in Stochastic
Shape Dynamical Models with
Application to Activity
Recognition
Namrata Vaswani
Problem Formulation
• The Problem:
– Model activities performed by a group of moving and
interacting objects (which can be people or vehicles
or robots or diff. parts of human body). Use the
models for abnormal activity detection and tracking

• Our Approach:
– Treat objects as point objects: “landmarks”.
– Changing configuration of objects: deforming shape
– „Abnormality‟: change from learnt shape dynamics

• Related Approaches for Group Activity:
– Co-occurrence statistics, Dynamic Bayes Nets
The Framework
• Define a Stochastic State-Space Model (a continuous
state HMM) for shape deformations in a given activity,
with shape & scaled Euclidean motion forming the
hidden state vector and configuration of objects forming
the observation.

• Use a particle filter to track a given observation
sequence, i.e. estimate the hidden state given
observations.

• Define Abnormality as a slow or drastic change in the
shape dynamics with unknown change parameters. We
propose statistics for slow & drastic change detection.
Overview
• The Group Activity Recognition Problem

• Slow and Drastic Change Detection

• Landmark Shape Dynamical Models

• Applications, Experiments and Results

• Principal Components Null Space Analysis

• Future Directions & Summary of Contributions
A Group of People: Abnormal
Activity Detection
Normal Activity    Abnormal Activity
Human Action Tracking
Cyan: Observed
Green: Ground Truth
Red: SSA
Blue: NSSA
Overview
• The Group Activity Recognition Problem

• Slow and Drastic Change Detection

• Landmark Shape Dynamical Models

• Applications, Experiments and Results

• Principal Components Null Space Analysis
The Problem
• General Hidden Markov Model (HMM): Markov state
sequence {Xt}, Observation sequence {Yt}
Yt  h( X t )  wt , X t  f ( X t 1 )  nt , {wt }, {nt } indep.

• Finite duration change in system model which causes a
permanent change in probability distribution of state
• Change is slow or drastic: Tracking Error & Observation
Likelihood do not detect slow changes. Use dist. of Xt
• Change parameters unknown: use Log-Likelihood(Xt)
• State is partially observed: use MMSE estimate of LL(Xt)
given observations, E[LL(Xt)|Y1:t)] = ELL
• Nonlinear dynamics: Particle filter to estimate ELL
The General HMM

Qt(Xt+1|Xt)
State, Xt                 State, Xt+1    Xt+2   …

ψt(Yt|Xt)

Observation, Yt         Observation, Yt+1   Yt+2
Related Work
• Change Detection using Particle filters, unknown change
parameters
–   CUSUM on Generalized LRT (Y1:t) : assumes finite parameter set
–   Modified CUSUM statistic on Generalized LRT (Y1:t)
–   Testing if {uj =Pr(Yt<yj|Y1:t-1) } are uniformly distributed
–   Tracking Error (TE): error b/w Yt & its prediction based on past
–   Threshold negative log likelihood of observations (OL)

• All of these approaches use observation statistics: Do not
detect slow changes
– PF is stable and hence is able to track a slow change

• Average Log Likelihood of i.i.d. observations used often
– But ELL = E[-LL(Xt)|Y1:t] (MMSE of LL given observations) in
context of general HMMs is new
Particle Filtering
Aim : Evaluate filtering distribution, π tN (dx)  π tN (dx)  Pr (X t  dx|Y1:t ), t
|t

Also get prediction distribution, π tN 1(dx)  Pr (X t  dx|Y1:t 1 ), t
|t
1. Initialization : Generate Monte Carlo samples from initial prior, π 0|0
N
π 0|0(dx) 
N
 δ x0(i) (dx),
(i)
x0 ~π 0|0(dx)
i 1
2. Prediction : Generate samples from prior state transition kernel at t
N
π tN 1(dx)   δ ~ (i) (dx),       ~ (i) ~Q (  |x (i) )
xt
|t             xt                        t      t-1
i 1
3. Update : Weight each sample by probability of obs. given sample
~
N                                   t (Yt|~t(i) )
x
(a) : π tN (dx)   wt(i) δ ~ (i) (dx),
|t                 xt            wt(i)      N
i 1
 t (Yt|~t )
x (i)
i 1
N
(b) : π tN (dx)   δ x(i) (dx),
|t                               xt(i) ~ Multinomial ({~t(i) , wt(i) }in1 ) : Resample step
x
i 1
t

4. Set t  t  1, Go to step 2
Change Detection Statistics
• Slow Change: Propose Expected Log Likelihood (ELL)
– ELL = Kerridge Inaccuracy b/w πt (posterior) and pt0 (prior)
ELL(Y1:t )=E[-log pt0 (Xt)|Y1:t]=Eπ[-log pt0 (Xt)]=K(πt : pt0)

• A sufficient condition for “detectable changes” using ELL
– E[ELL(Y1:t0)] = K(pt0:pt0)=H(pt0), E[ELL(Y1:tc)]= K(ptc:pt0)
– Chebyshev Inequality: With false alarm & miss probabilities of 1/9,
ELL detects all changes s.t.
K(ptc:pt0) -H(pt0)>3 [√Var{ELL(Y1:tc)} +√Var{ELL(Y1:t0)}]

• Drastic Change: ELL does not work, use OL or TE
– OL: Neg. log of current observation likelihood given past
OL = -log [Pr(Yt|Y0:t-1,H0) ] = -log[<pt|t-1 , ψt>]
– TE: Tracking Error. If white Gaussian observation noise, TE ≈ OL
ELL & OL: Slow & Drastic Change
• ELL fails to detect drastic changes
– Approximating posterior for changed system observations using a
PF optimal for unchanged system: error large for drastic changes
– OL relies on the error introduced due to the change to detect it

• OL fails to detect slow changes
– Particle Filter tracks slow changes “correctly”
– Assuming change till t-1 was tracked “correctly” (error in posterior
small), OL only uses change introduced at t, which is also small
– ELL uses total change in posterior till time t & the posterior is
approximated “correctly” for a slow change: so ELL detects a slow
change when its total magnitude becomes “detectable”

• ELL detects change before loss of track, OL detects after
A Simulated Example
• Change introduced in system model from t=5 to t=15

ELL                              OL
Practical Issues
• Defining pt0(x):
– Use part of state vector which has linear Gaussian dynamics: can
define pt0(x) in closed form
OR
– Assume a parametric family for pt0(x), learn parameters using
training data

• Declare a change when either ELL or OL exceed their
respective thresholds.
– Set ELL threshold to a little above H(pt0)
– Set OL threshold to a little above E[OL0,0]=H(Yt|Y1:t-1)

• Single frame estimates of ELL or OL may be noisy
– Average the statistic or average no. of detects or modify CUSUM
Change Detection

Yes Change
ELL=Ep[-log   pt0(Xt)]   > Threshold?
(Slow)

Yt    PF     pt|t-1N
pt|tN = ptN

Yes Change
OL= -log[<pt|t-1 , ψt>] > Threshold?
(Drastic)
Approximation Errors
• Total error < Bounding error + Exact filtering error + PF error

– Bounding error: Stability results hold only for bounded fn‟s but
LL is unbounded. So approximate LL by min{-log pt0(Xt),M}

– Exact filtering error: Error b/w exact filtering with changed
system model & with original model. Evaluating πtc,0 (using
Qt0 ) instead of πtc,c (using Qtc)

– PF Error: Error b/w exact filtering with original model & particle
filtering with original model. Evaluating πtc,0,N which is a Monte
Carlo estimate of πtc,0
Stability / Asymptotic Stability
• The ELL approximation error averaged over observation
sequences & PF realizations is eventually monotonically
decreasing (& hence stable), for large enough N if
– Change lasts for a finite time
– “Unnormalized filter kernels” are mixing
– Certain boundedness (or uniform convergence of bounded
approximation) assumptions hold

• Asymptotically stable if the kernels are uniformly “mixing”

• Use stability results of [LeGland & Oudjane]

• Analysis generalizes to errors in MMSE estimate of any
fn of state evaluated using a PF with system model error
“Unnormalized filter kernel” “mixing”
• “Unnormalized filter kernel”, Rt, is state transition kernel,Qt,
weighted by likelihood of observation given state
Rt ( x, dx' )   t ,Yt ( x' )Qt ( x, dx' ),  t ,Yt ( x' )  g (Yt | x' )

• “Mixing”: measures the rate at which the transition kernel
“forgets” its initial condition or eqvtly. how quickly the state
sequence becomes ergodic. Mathematically,
Kernel K is mixing if   0 and a nonneg. measure  , s.t.
( A)  K ( x, A)    ( A), x  E x ,  Borel subsets A  Ex
1

• Example [LeGland et al] : State transition, Xt =Xt-1+nt is not
mixing. But if Yt=h(Xt)+wt, wt is truncated noise, then Rt is
mixing
Complementary Behavior of ELL & OL
• ELL approx. error, etc,0, is upper bounded by an increasing
function of OLkc,0, tc< k < t
t
etc,0     exp( OLc,0 ) ( DQ ,k ,
k
1
k )
k t c

• Implication: Assume “detectable” change i.e. ELLc,c large
• OL fails => OLkc,0,tc<k<t small => ELL error, etc,0
small=> ELLc,0 large => ELL detects
• ELL fails => ELLc,0 small =>ELL error, etc,0 large =>
at least one of OLkc,0,tc<k<t large => OL detects
“Rate of Change” Bound
• The total error in ELL estimation is upper bounded by
increasing functions of the “rate of change” (or “system
model error per time step”) with all increasing derivatives.

• OLc,0 is upper bounded by increasing function of “rate of
change”.

• Metric for “rate of change” (or equivalently “system model
error per time step”) for a given observation Yt : DQ,t is

DQ (Qtc , Qt0 )  sup x x 'E t ,Yt ( x' ) | qtc ( x, x' )  qt0 ( x, x' ) | dx '
The Bound
Assume: Change for finite time, Unnormalized filter kernels mixing,
Posterior state space bounded
M t  tc,0
1.       
etc,0     M t tc,0
              ( DQ,k , tc  k  t ),  : incr. fn, incr. derivatives of all orders
N
2 t 1      4 t 2               k                      4 t 1    4 t 2             2
 [3] :  t   t  2 
c ,0
  t:k 3 2 2 ,  t  2 t  2 
c ,0
  t:k 3 2 k2
t       log 3 k tc        k 1 k  2                  t     log 3 k tc       k 1 k  2
2 DQ,k                        ~
 k             ~         ,k ( DQ,k , Dk 1 )
Dk 1
C   k
S                                    ~
 k                     ~                      ,k ( DQ,k , Dk 1 )
 k (C 
Dk 1
2
k
 DQ,k )
~
 Dt 1   D,t 1 ( DQ,k , tc  k  t  1)
~
~
 DQ,t )   ( DQ,k , tc  k  t ),  : incr. fn.
Dt 1
2.   OLc,0
t         log(C           t
Implications
• If change slow, ELL works and OL does not work
• ELL error can blow up very quickly as rate of change
increases (its upper bound blows up)
– A small error in both normal & changed system
models introduces less total error than a perfect
transition kernel for normal system & large error in
changed system
– A sequence of small changes will introduce less total
error than one drastic change of same magnitude
Possible Applications

• Abnormal activity detection, Detecting motion
disorders in human actions, Activity Segmentation

• Neural signal processing: detecting changes in stimuli

• Congestion Detection

• Video Shot change or Background model change
detection

• System model change detection in target tracking
problems without the tracker loses track
Overview
• The Group Activity Recognition Problem

• Slow and Drastic Change Detection

• Landmark Shape Dynamical Models

• Applications, Experiments and Results

• Principal Components Null Space Analysis
What is Shape?
• Shape is the geometric information that remains
when location, scale and rotation effects are filtered
out [Kendall]

• Shape of k landmarks in 2D
– Represent the X & Y coordinates of the k points as
a k-dimensional complex vector: Configuration
– Translation Normalization: Centered Configuration
– Scale Normalization: Pre-shape
– Rotation Normalization: Shape
Related Work
• Related Approaches for Group Activity
– Co-occurrence Statistics
– Dynamic Bayesian Networks
– Shape for robot formation control

• Shape Analysis/Deformation:
–   Pairs of Thin plate splines, Principal warps
–   Active Shape Models: affine deformation in configuration space
–   „Deformotion‟: scaled Euclidean motion of shape + deformation
–   Piecewise geodesic models for tracking on Grassmann manifolds

• Particle Filters for Multiple Moving Objects:
– JPDAF (Joint Probability Data Association Filter): for tracking
multiple independently moving objects
Motivation
• A generic and sensor invariant approach for “activity”
– Only need to change observation model depending on the
“landmark”, the landmark extraction method and the sensor used
– Easy to fuse sensors in a Particle filtering framework

• “Shape”: invariant to translation, zoom, in-plane rotation

• Single global framework for modeling and tracking
independent motion + interactions of groups of objects
– Co-occurrence statistics: Req. individual & joint histograms
– JPDAF: Cannot model object interactions for tracking
– Active Shape Models: good for only approx. rigid objects

• Particle Filter is better than the Extended Kalman Filter
– Able to get back in track after loss of track due to outliers,
– Handle multimodal system or observation process
Hidden Markov Shape Model
Xt=[Shape(zt), Shape Velocity(ct), Scale(st), Rotation(θt)]   Xt+1
State
Dynamics
Observation Model
Yt = h(Xt) + wt = ztstejθ + wt
wt : i.i.d observation noise        Using complex notation

Observation, Yt = Centered Configuration

Shape X Rotation, SO(2) X Scale, R+ = Centered Config, Ck-1
State Dynamics
Shape Dynamics: Linear Markov model on shape velocity
• Shape “velocity” at t in tangent space w.r.t. shape at t-1, zt-1

• Orthogonal basis of the tangent space, U(zt-1)

• Linear Gauss-Markov model for shape velocity
ct  Ac,t ct 1  nt , nt ~ N (0,  n ), vt  U ( zt 1 )ct

• “Move” zt-1 by an amount vt on shape manifold to get zt
zt  (1  vt * vt )1 / 2 zt 1  vt
Motion (Scale, Rotation):
• Linear Gauss-Markov dynamics for log st, unwrapped θt
The HMM
Observation Model: [Shape,Motion]Centered Configuration
Yt  h( X t )  wt ,       wt ~ N (0,  obs )  outliers
h( X t )  zt st e  jt
System Model : Shape and Motion Dynamics
Shape Dynamics:                                              Motion Dynamics:

ct  Ac ct 1  nt , nt ~ N (0,  n )                             •Linear Gauss-
Markov models for
vt  U ( zt 1 )ct , U ( zt 1 )  basis([I  zt 1 zt 1*]C )    log st and θt
•Can be stationary
zt  (1  vt * vt )1 / 2 zt 1  vt
or non-stationary
Three Cases
• Non-Stationary Shape Activity (NSSA)
• Tangent space, U(zt-1), changes at every t
• Most flexible: Detect abnormality and also track it

• Stationary Shape Activity (SSA)
• Tangent space, U(μ), is constant (μ is a mean shape)
• Track normal behavior, detect abnormality

• Piecewise Stationary Shape Activity (PSSA)
• Tangent space is piecewise constant, U(μk)
• Change time: fixed or decided on the fly using ELL
• PSSA + ELL: Activity Segmentation
Stationary, Non-Stationary

Stationary Shape

Non-Stationary Shape
Learning Procrustes’ Mean
• Procrustes mean of set of preshapes wi [Dryden,Mardia]:
  arg min  d F ( wi ,  )
ˆ               2
          i

j   2
 arg min  min || wi  e   a  jb ||
          i  , , a ,b
 arg min  [1  * ( wi wi *) ]
          i
 arg max  * S , S   wi wi*
 :|| || 1                   i
 Largest Eigenvector of S

Shape : zi  wi e ji ,  i  arg(wi *  )
Overview
• The Group Activity Recognition Problem

• Slow and Drastic Change Detection

• Landmark Shape Dynamical Models

• Applications, Experiments and Results

• Principal Components Null Space Analysis
Abnormal Activity Detection
• Define abnormal activity as
– Slow or drastic change in shape statistics with
change parameters unknown.
– System is a nonlinear HMM, tracked using a PF
• This motivated research on slow & drastic change
detection in general HMMs
– Tracking Error detects drastic changes. We
proposed a statistic called ELL for slow change.
– Use a combination of ELL & Tracking Error and
declare change if either exceeds its threshold.
Tracking to obtain observations
• CONDENSATION tracker framework
• State: Shape, shape velocity, scale, rotation,
translation, Observation: Configuration vector
• Measurement model: Motion detection locally around
predicted object locations to obtain observation
• Predicted object configuration obtained by prediction
step of Particle filter
• Predicted motion information can be used to move
the camera (or any other sensor)

• Combine with abnormality detection: for drastic
abnormalities will not get observation for a set of
frames, if outlier then only for 1-2 frames
Activity Segmentation
• Use PSSA model for tracking

• At time t, let current mean shape = μk

• Use ELL w.r.t. μk to detect change time, tk+1
(segmentation boundary)

• At tk+1, set current mean shape to posterior Procrustes
mean of current shape, i.e.
μk+1=largest eigenvector of Eπ[ztzt*]=Σi=1N zt(i) zt(i)*

• Setting the current mean as above is valid only if
tracking error (or OL) has not exceeded the threshold
(PF still in track)
A Common Framework for…
• Tracking
– Groups of people or vehicles
– Articulated human body tracking
• Abnormal Activity Detection / Activity Id
– Suspicious behavior, Lane change detection
– Abnormal action detection, e.g. motion disorders
– Human Action Classification, Gait recognition
• Activity Sequence Segmentation
• Fusing different sensors
Experiments
• Group Activity:
– Normal activity: Group of people deplaning &
walking towards airport terminal: used SSA model
– Abnormality: A person walks away in an un-
allowed direction: distorts the normal shape
• Simulated walking speeds of 1,2,4,16,32 pixels per time
step (slow to drastic distortion in shape)
• Compared detection delays using TE and ELL
• Plotted ROC curves to compare performance
• Human actions:
– Defined NSSA model for tracking a figure skater
– Abnormality: abnormal motion of one body part
– Able to detect as well as track slow abnormality
Abnormality
• Abnormality introduced at t=5
• Observation noise variance = 9
• OL plot very similar to TE plot (both same to first order)

ELL                         Tracking Error (TE)
ROC: ELL
• Plot of Detection delay against Mean time b/w False Alarms
(MTBFA) for varying detection thresholds
• Plots for increasing observation noise

Slow Change: ELL Works               Drastic Change: ELL Fails
ROC: Tracking Error(TE)
• ELL: Detection delay = 7 for slow change , Detection delay = 60
for drastic
• TE: Detection delay = 29 for slow change, Detection delay =
4 for drastic

Slow Change: TE Fails               Drastic Change: TE Works
ROC: Combined ELL-TE
• Plots for observation noise variance = 81 (maximum)
• Detection Delay < 8 achieved for all rates of change

Slow Change: Works!            Drastic Change: Works!
Human Action Tracking
Normal Action                    Abnormality
SSA better than NSSA             NSSA works, SSA fails

Green: Observed,
Magenta: SSA,
Blue: NSSA
NSSA Tracks and Detects Abnormality
Abnormality introduced at t=20
ELL                         Tracking Error

Red: SSA,
Blue: NSSA
Overview
• The Group Activity Recognition Problem

• Slow and Drastic Change Detection

• Landmark Shape Dynamical Models

• Applications, Experiments and Results

• Principal Components Null Space Analysis
Typical Data Distributions

„Apples from Apples‟ problem:   „Apples from Oranges‟ problem:
All algorithms work well        Worst case for SLDA, PCA
PCNSA Algorithm
• Subtract common mean μ, Obtain PCA space
• Project all training data into PCA space, evaluate class
mean, covariance in PCA space: μi, Σi
• Obtain class Approx. Null Space (ANS) for each class:
Mi trailing eigenvectors of Σi
• Valid classification directions in ANS: if distance
between class means is “significant”: WiNSA
• Classification: Project query Y into PCA space,
X=WPCAT(Y- μ), choose Most Likely class, c, as
NSAT
c  arg min i d i (X), d i (X)  ||W i          (X  μi )||
Classification Error Probability
                             

                     P ( E1SLDA )    ( z;0,1)dz
PCNSA )
P ( E1                ( z;0,1)dz                 

                             
                              

 | ( 1   2 )T N 2 |               | ( 1   2 )T W LDA |

          N 2 1 N 2
T                                     2

         k ANS ,1                  W LDA 1W LDA
T

• Two class problem. Assumes 1-dim ANS, 1 LDA direction
• Generalizes to M dim ANS and to non-Gaussian but unimodal
& symmetric distributions
Applications

• Image & Video retrieval
– Applied to human action retrieval
– Hierarchical image/video retrieval: PCNSA
followed by LDA

• Activity Classification & Abnormal
Activity Detection
Applications
Face recognition,             Object recognition
Large pose variation

Face recognition,
Large expression variation   Facial Feature Matching
Discussion & Ideas
• PCNSA test approximates the LRT (optimal Bayes
solution) as condition no. of Σi tends to infinity
1
LRT : Class  arg min( X   i )T  i ( X   i )
i

• Fuse PCNSA and LDA: get an algorithm very similar to
Multispace KL

• For multiclass problems, use error probability
expressions to decide which of PCNSA or SLDA is
better for a given set of 2 classes

• Perform facial feature matching using PCNSA, use this
for face registration followed by warping to standard
geometry
Ongoing and Future Work
• Change Detection
–   Implications of Bound on Errors is increasing fn. of rate of change
–   CUSUM on ELL & OL
–   Quantitative performance analysis of ELL & OL
–   Find examples of mixing “unnormalized filter kernels”

• Non-Stationary & Piecewise Stationary Shape Activities
– Application to sequences of different kinds of actions
– PSSA + ELL for activity segmentation
– Joint tracking and abnormality detection

• Time varying number of Landmarks?
– What is “best” strategy to get a fixed no. „k‟ of landmarks?
– Can we deal with changing dimension of shape space?

• Multiple Simultaneous Activities, Multi-sensor fusion
• 3D Shape, General shape spaces
Contributions
• ELL for slow change detection, Stability of ELL approximation error

• Complementary behavior of ELL & OL, ELL error proportional to “rate
of change” with all increasing derivatives

• Stochastic dynamical models for landmark shapes: NSSA, SSA,
PSSA

• Modeling the changing configuration of a group of moving point
objects as a deforming shape: “shape activity”.

• Using ELL + PSSA for activity segmentation

• PCNSA & its error probability analysis, application to action retrieval,
abnormal activity detection

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 27 posted: 2/18/2010 language: English pages: 55
How are you planning on using Docstoc?