Automatic Video-Based Human Motion Analyzer for Consumer by pptfiles


									   Automatic Video-Based Human Motion
 Analyzer for Consumer Surveillance System

   Weilun Lao, Jungong Han, and Peter H.N. de With, Fellow, IEEE

IEEE Transactions on Consumer Electronics, Vol. 55, No. 2, MAY 2009
 Introduction
   Literature on surveillance video analysis
   Requirements of surveillance analysis systems
 Overview of proposed visual motion analysis system
 Techniques for human motion analysis
 Experimental results

 Video surveillance can contribute to the safety of people in the home
  and ease control of home-entrance and equipment-usage functions.
Literature on surveillance video analysis
 Most surveillance systems have focused on understanding the
  events through the study of trajectories and positions of persons
  using a-priori knowledge about the scene.
    The Pfinder [2] system was developed to describe a moving person in
     an indoor environment.
    The VSAM [3] system can monitor activities over various scenarios,
     using multiple cameras which are connected as a network.
    The real-time visual surveillance system W4 [4] employs the
     combined techniques of shape analysis and body tracking, and models
     different appearances of a person.
 [2] C.R. Wren, A. Azarbayejani, T. Darrell and A.P. Pentland, “Pfinder:real-time tracking of the human body,”
 [3] R.T. Collins, A.J. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins, Y. Tsin, D. Tolliver, N. Enomoto and O. Hasegawa,
 “A system for video surveillance and monitoring
 [4] I. Haritaoglu, D. Harwood and L. Davis, “W4: real-time surveillance of people and their activities,”
Literature on surveillance video analysis
 Relying on the detected trajectories of the concerned objects.
 As the local properties of the detected persons are missing, the
  developed systems lack the semantic recognition result of
  dynamic human activities.
 In this paper, we explore the combination of using trajectory and
  posture recognition in order to improve the semantic analysis of
  the human behavior.
Requirements of surveillance analysis systems
 The specific challenges for consumer applications are as follows:
   The posture and motion analysis results should have sufficient accuracy
    for consumer acceptance.
   High-processing efficiency achieving (near) real-time operation with
    low-cost consumer hardware.
   A conversion of 2-D results to a 3-D space can facilitate the analysis of
    special events such as burglary.
In this paper
The total framework consists of four processing levels:

1.A pre-processing level including background modeling and multiple-
person detection.
2.An object-based level performing trajectory estimation and posture
3.An event-based level for semantic analysis.
4.A visualization level including camera calibration and 3-D scene
In this paper

It achieves a near real-time performance (6-8 frames/second)
In this paper
 The location and posture of persons are visualized in a 3-D space
  after performing camera calibration and integrating context
 The accurate and realistic reconstruction in a virtual space can
  significantly contribute to the scene understanding, like crime-
  evidence collection and healthcare behavior analysis.
Overview of proposed visual motion analysis system
 Pre-processing level : The background modeling and object detection .
 Object-based level : It performs trajectory estimation and posture
 Event-based level : Interaction relationships are modeled to infer a
  multiple-person event.
 Visualization level : With the aim of 2D-3D mapping calibration.
Overview of proposed visual motion analysis system
Techniques for human motion analysis
 Pre-processing level :
   Multi-person detection
 Object-based level :
   Trajectory estimation
   Individual action recognition with CHMM
 Event-based level :
   Interaction modeling
 Visualization level :
   3-D scene reconstruction
Multi-person detection
 Background subtraction:
   We perform a pixel-based background subtraction.
   The scene model has a probability density function for each pixel
   A pixel from a new frame is considered to be a background pixel if its
    new value is well described by its density function.
   The Gaussian Mixture Model (GMM) is employed for the background
 Recognizing persons:
   We use the k-Nearest Neighbor (k-NN) classifier
   The classifier utilizes two features : area, and the ratio of the bounding
    box attached to each detected object.
Trajectory estimation
 Using mean-shift algorithm :
   For tracking persons
   Based on their individual appearance model
   Represented as a color histogram
Trajectory estimation
1. Extracting every new person entering the scene.
2. Calculating the corresponding histogram model in the image
3. In subsequent frames for tracking that person, we shift the person
   object to the location whose histogram is the closest to the
   previous frame.

    After the trajectory is located, we can conduct the body-based
     analysis at the location of the person in every frame.
    When the trajectory is obtained, we can also estimate the position
     of the persons involved in the video scene.
Individual action recognition with CHMM
 Posture representation
 HV-PCA : a new, simple and effective shape descriptor, to represent
   the silhouette in each frame.
1. Every detected person silhouette is adapted to an M×N pixel template
   in a normalization phase (M=180 and N=80).
2. We apply the horizontal and vertical projections


                                                       (0)      N=80
Individual action recognition with CHMM
 HV-PCA :
 In the vertical projection :
 180-D shape vector → 60*3 → 2*3 (By PCA) →6*1(reshape)
 Similarly, a vector of 8×1 is reshaped from the horizontal projection

                  P(.) indicates our part-based PCA implementation
 Principal component analysis (PCA) is a mathematical procedure
  to convert a set of observations of possibly correlated variables
  into a set of values of uncorrelated variables called principal
Individual action recognition with CHMM
 Temporal modeling with CHMM
 A single-frame recognition is not sufficiently accurate when we
  require general motion classification.
 The temporal consistency is required.
 We use the Continuous Hidden Markov Model (CHMM) with left-
  right topology [12].

 [12] L.R. Rabiner, “A tutorial on hidden Markov models and selected
 applications in speech recognition,”
Individual action recognition with CHMM
 Suppose a CHMM has E states
                 F output symbols

 It is fully specified by the triplet

 The E*E-state transition matrix A

 The E*F -state output probability matrix B is defined as

 The initial state distribution vector   is specified as
Individual action recognition with CHMM
 Assign a CHMM model to each of the predefined posture types for
    the observed human body.
   Using the Baum-Welch algorithm to train each CHMM.
   So the triplet λ is obtained for each model.
   An observation sequence
   Calculate
   Recognize the posture class as being the one that is represented
    by the maximum probable model :

 (K=5 , for the types : left- pointing, right-pointing, squatting, raising
    hands overhead, and lying)
Interaction Modeling
 In multi-person events, the event analysis is achieved by
  understanding the interactions between people.

 The events are rely on the temporal order and relationship of their
  sub-events (the individual posture).
Interaction Modeling
 To represent temporal relationships of sub-events:
   Temporal relationships TR={after, meets, during, finishes, overlaps, equal,

 We can apply the heuristic rules to understand the scene.

 i.e. in robbery detection , the posture ‘’pointing’’ is a key
  reference posture.
Interaction Modeling
 3-D Scene Reconstruction
 We want to implement the 2D-3D mapping.
 It is useful for scene understanding.

 Camera calibration:
 Since both the ground and the displayed image are planar, the
  mapping between them is a homography.

3-D Scene Reconstruction
 In our previous work [11], we have developed an automatic
  algorithm to establish the homography mapping for analyzing a tennis
 We manually put four white lines forming a rectangular on the
 We have measured the length of each line in the real world, thereby
  defining their coordinates in the real-world domain.

 After performing the mapping, it plays a useful role in the crime-
  scene analysis, data retrieval and evidence collection.

               [11] J. Han, D. Farin, P.H.N. de With and W. Lao, “Real-time video content
               analysis tool for consumer media storage system,”
3-D Scene Reconstruction
Experimental results
 Training:10 video sequences, containing various single/multi-person
  motion (15 frames/s).
 Testing:15 similar sequences.

 Result:
   Person detection: 98% accuracy rate
   Person tracking : 95% detection rate
Experimental results

 The robbery detection rate is 90% in our captured simulated-
  robbery video sequences (in total 10 sequences)

 Our system is efficient, achieving a near real-time performance
  (6-8 frames/second for 640*480 resolution (VGA), with a P-IV
  3-GHz PC)
Experimental results
Experimental results

To top