Smart Traffic LTD

Document Sample
Smart Traffic LTD Powered By Docstoc
					Human Pose
1. Introduction

2. Article [1]

   Real Time Motion Capture Using a Single

   TOF Camera (2010)

3. Article [2]

   Real Time Human Pose Recognition In

   Parts Using a Single Depth Images(2011)
1.1 What Is Pose Recognition?

  Input Image


                            Fig From [2]
1.2 Motivation
Why do we need this?


  Smart surveillance

  virtual reality

  motion analysis

  Gaming - Kinect
Kinect – Project Natal
 Microsoft Xbox 360 console

  “You are the controller”

  Launched - 04/11/10

  In the first 60 days on the market sold over 8M

units! (Guinness world record)
       1.3 Challenges
                 Full Solu
Cloth es?      OC
                  CLU ??
         t?          ?SIO
       gh the problem???
  What is       T im      N S?
              l                ??
         R ea            ??
                  h ea had
1.4 Previous Technology

mocap using markers –

Multi View camera systems –
   limited applicability.

Monocular –
   simplified problems.
1.4 New Technology
Time Of Flight Camera. (TOF)

üDense depth
üHigh frame rate (100 Hz)
üRobust to:
   üother problems.
     2. Article [1]
   Real Time Motion
 Capture Using a Single
 Time Of Flight Camera
(V. Ganapathi et al. CVPR 2010)
Article Contents
2.1 previous work

2.2 What’s new?

2.3 Overview

2.4 results

2.5 limitations & future work

2.6 Evaluation
2.1 Previous work
Many many many articles…
(Moeslund et al 2006–covered 350 articles…)

    (2006)        (2006)       (1998)
 2.2 What’s new?
ØTOF technology

ØPropagating information up the kinematic chain.

ØProbabilistic model using the unscented transform.

ØMultiple GPUs.
2.3 Overview

1. Probabilistic Model

2. Algorithm Overview:

   Ø Model Based Hill Climbing Search

   Ø Evidence Propagation

   Ø Full Algorithm
          1. Probabilistic Model

    15 body parts
DAG – Directed Acyclic Graph    DBN– Dynamic Bayesian Network

pose                           speed           range scan
              1. Probabilistic Model

dynamic Bayesian network (DBN)
ØUse ray casting to evaluate
distance from measurement.

ØGoal: Find the most likely states, given previous frame MAP, i.e.:

                                                               Fig From [1]
             2. Algorithm Overview

1. Hill climbing search (HC)

2. Evidence Propagation –EP
         2.1 Hill Climbing Search (HC)

 Grid around

                                    evaluate likelihood
                                    choose best point!



Coarse to fine Grids.
                                                  Fig From [1]
        2.1 Hill Climbing Search (HC)

The good:
 run in parallel in GPUS
The Bad:
 Local optimum
 Ridges, Plateau, Alleys
 Can lose track when motion is fast ,or occlusions occur.
          2.2 Evidence Propagation

Also has 3 stages:
1. Body part detection (C. Plagemann et al 2010)

2. Probabilistic Inverse Kinematics

3. Data association and inference
         2.2.1 Body Part Detection

Bottom up approach:
1. Locate interest points with AGEX –
   Accumulative Geodesic Extrema.
2. Find orientation.
3. Classify the head, foots and hands using local shape

                                                          Fig From [3]
           2.2.1 Body Part Detection


                                       Fig From [3]
      2.2.2 P robabalistic inverse
            kinematics (EP)


ØAssume Correspondence
ØNeed new MAP conditioned on        .
ØProblem –                isn’t linear!
ØSolution: Linearize with the unscented Kalman filter .
ØEasy to determine                  .
                    2.3 Full Algorithm
 MAP           HC

Image                                EP
        Part                    by                    HC
                      body parts
2.4 Results
 28 real depth image sequences.
 Ground Truth - tracking markers.
                      ,     – real marker position
                            – estimated position
               perfect tracks.
               fault tracking.
 Compared 3 algorithms: EP, HC, HC+EP .
             2.4 Results

  Bigger                                   Harder
                best – HC+EP, worse – EP.
                Runs close to real time.
                HC: 6 frames per second.
                HC+EP: 4-6 frames per second.
                                                    Fig From [1]
 2.4 Results
                       Lose track
  Extreme case – 27:



                                    Fig From [1]
2.5 Limitations & Future work
  Manual Initialization.
  Tracking more than one person at a time.
  Using temporal data – consume more time,
 reinitialization problem.
Future work:
  improving the speed.
  combining with color cameras
  fully automatic model initialization.
  Track more than 1 person.
2.6 Evaluation
 Well Written
 Self Contained
 Novel combination of existing parts
 New technology
 Achieving goals (real time)
 Missing examples on probabilistic model.
 Not clear how      is defined
 Extensively validated:
     Data set and code available
     not enough visual examples in article
     No comparison to different algorithms
      3. Article [2]
 Real Time Human Pose
  Recognition In Parts
   From Single Depth
(Shotton et al. & Xbox incubation
    Microsoft Research 2011)
Article Contents
2.1 previous work

2.2 What’s new?

2.3 Overview

2.4 results

2.5 limitations & future work

2.6 Evaluation
 2.1 Previous work
ØSame as Article [1].
   2.2 What’s new?
ØUsing no temporal information – robust and

  fast (200 frames per second).

ØObject recognition approach.

Øper pixel classification.

ØLarge and highly varied

   training dataset .

                                       Fig From [2]
2.3 Overview

1. Database construction
2. Body part inference and joint proposals:

computational efficiency and robustness
                    1. Database

Pose estimation is often overcome lack of training data…

  Huge color and texture variability.

 Computer simulation don’t produce the range of volitional
motions of a human subject.
             2. Data base

100k mocap frames   Synthetic rendering pipeline

                                          Fig From [2]
   1. Database

                 Real data

Which is real???
            Synthetic data

                         Fig From [2]
        2. Body part inference

1. Body part labeling

2. Depth image features

3. Randomized decision forests

4. Joint position proposals
          2.1 Body part labeling

        Head Up Left              Head Up Right

  31 body parts labeled .

 The problem now can be solved by an efficient

classification algorithms.

                                                 Fig From [2]
    2.2 Depth comparison features

Simple depth comparison features:(1)

      – depth at pixel x in image I, offset

normalization - depth invariant.

computational efficiency:

no preprocessing.

                                              Fig From [2]
    2.3 Randomized Decision forests

How does it work?
                            Pixel x
  Node = feature

Classify pixel x:

                                      Fig From [2]
    2.3 Randomized Decision forests

Training Algorithm:
                                            1M Images – 2000 pixels
                                            Per image


Ø   Training 3 trees, depth 20, 1M images~ 1 day (1000 core cluster)
    1M images*2000pixels*2000       *50 =
2.3 Randomized Decision forests

Trained tree:

                            Fig From [2]
      2.4 Joint Position Proposal

Local mode finding approach based on mean shift with a
weighted Gaussian kernel.
Density estimator:

                               Center of mass      outliers

                                                         Fig From [4]
2.4 Results
 8800 frames of real depth images.
 5000 synthetic depth images.
 Also evaluate Article [1] dataset.

 Measures :
1. Classification accuracy – confusion matrix.
2. joint accuracy –mean Average Precision (mAP)
  results within D=0.1m –TP.

  Fig From [2]
2.4 Results- Classification accuracy
     high correlation between real and synthetic.
     Depth of tree – most effective

                                                    Fig From [2]
2.4 Results - Joint Prediction
  Comparing the algorithm on:
real set (red) – mAP 0.731
ground truth set (blue) – mAP 0.914
                          mAP 0.984 – upper body

                                          Fig From [2]
2.4 Results- Joint Prediction
Comparing algorithm to ideal Nearest Neighbor
matching, and realistic NN - Chamfer NN.

                                            Fig From [2]
2.4 Results- Joint Prediction
Comparison to Article[1]:
 Run on the same dataset
 Better results (even without temporal data)
 Runs 10x faster.

                                       Fig From [2]
2.4 Results- Joint Prediction
Full rotations and multiple people
 Right-left ambiguity
  mAP of 0.655 ( good for our uses)

   Result Video
                                      Fig From [2]
2.4 Results
Faster proposals
When using simple bottom-up clustering instead of
mean shift:
Mean shift:       50fps 0.731 mAP.
Simple cluster: 200fps 0.677 mAP.
2.5 Limitations & Future work
Future work:
  better synthesis pipeline
  Is there efficient approach that directly
regress joint positions? (already done in future
work - Efficient offset regression of body joint
2.6 Evaluation
 Well Written
 Self Contained
 Novel combination of existing parts
 New technology
 Achieving goals (real time)
 Extensively validated:
    Used in real console
    Many results graphs and examples
  (Another pdf of supplementary material)
    Broad comparison to other algorithms
    data set and code not available
[1] Real Time Motion Capture Using a Single TOF Camera (V.

    Ganapathi et al. 2010)

[2] Real Time Human Pose Recognition In Parts Using a Single Depth

    Images(Shotton et al. & Xbox Incubation 2011)

[3] Real time identification and localization of body parts from depth

    images (C. Plagemann et al. 2010)

[4] Computer Graphics course (046746), Technion.

Shared By: