Docstoc

The AIT 3D multimodal person tracker for CLEAR2007

Document Sample
The AIT 3D multimodal person tracker for CLEAR2007 Powered By Docstoc
					The AIT 3D Multimodal Person
  Tracker for CLEAR 2007
     Nikos Katsarakis, Fotios Talantzis,
        Aristodemos Pnevmatikakis
          & Lazaros Polymenakos
      {nkat,fota,apne,lcp}@ait.edu.gr

                May 8, 2007
Visual 3D PT: Input and Association
                             D
                                             Presentation Area             • Utilize 2D AIT body tracker for bodies,




                                                                   C
          C




                                                (250,360,170) cm
                                                                             face tracker for faces in them in 4 camera
                                                                             views. Use face BBs as input for 3D
                                                                       A
                                                                             association
                                                                           • Span 3D space using cube of 5cm edge
         (110,110,135 ) cm

                                            125cm3 cube spanning
                                 B

                                                                               – Map cube to all camera views
                                                     3D space

     C                                           (270,80,130) cm
 y




 z                                   (200,40,135 ) cm                          – Collect faces if cube center in face BB
          C




                                                                   C



                x
                                                                           • Consider cubes with at least 2 cameras
                                               MK3




                                                                             contributing a face
Visual 3D PT: Validate Associations
 • For multiple people, find sets of mutually
   exclusive associations
 • Select one set of associations based on track
   consistency
   – as indicated by the 3D Kalman trackers that maintain
     the tracks
 • Eliminate too short tracks, getting rid of wrong
   associations
 • Optionally, use AIT body tracker on panoramic
   camera and ask for the found 3D positions to be
   mapped inside the bodies
Visual 3D PT: Validate Associations
                      Audio 3D PT
• State-space approach based on
  particle filters
   – PF assumes that the source moves
     according to a model that has a
     specific consistency across time
     frames
   – The PF uses time delay estimates
     from pairs of microphones as a feed
• A Voice Activity Detector is
  integrated to deal with short
  pauses in speech
• An external PF is initialized at
  every frame to deal with the case
  where speakers interchange or
  particles get trapped in a spurious
  location
                                    Audiovisual 3D PT
           3D positions
          (video tracker)
                                                          3D position
                                                        (audio tracker)
                                                                          • Use video and
                                                                            synchronized audio
                                    Track video
                                                        Synchronization
                                                                            positions
                                     positions
                                                                          • No audio: Track last target
                                    Audio data
                                                                            from video
                    YES                                  NO
                                    available?
                                                                          • Noise or not single speaker:
               Multiple
                                                          Previously
                                                                            no output
               speakers                   YES
                                                                          • Audio position close to
                                                        tracked target?
               or noise?


                 NO
                                                                            video: output video position
           Find closest
           video match
                                                  YES         NO
                                                                          • Audio position far from
                                                                            video: output audio position
                Closer
               than D?
    NO                        YES

Output audio               Output video
                                                          No output
  position                   position
                       3D PT: Results

           70
                                                      Visual
           60                                         Audio
                                                      A/V
           50

           40

           30
MOTA (%)




           20

           10

            0

           -10

           -20

           -30
                 All    AIT   IBM   ITC   UKA   UPC
                              Visual 3D PT: Relation to 2D FT
                        60                                                                  100                                                                                           100

                                                                                             90                                                                                            90
                        50
                                                                                             80                                                                                            80




                                                              3D person tracking MOTA (%)




                                                                                                                                                            3D person tracking MOTA (%)
                        40
PT MOTA - FT MOTA (%)




                                                                                             70                                                                                            70

                        30                                                                   60                                                                                            60

                        20                                                                   50                                                                                            50

                                                                                             40                                                                                            40
                        10
                                                                                             30                                                                                            30
                         0
                                                                                             20                                                                                            20
                        -10
                                                                                             10                                                                                            10

                        -20                                                                   0                                                                                             0
                               AIT    IBM   ITC   UKA   UPC                                       0   20            40             60            80   100                                       0   20       40             60         80   100
                                                                                                           Face tracking false positive rate (%)                                                         Face tracking miss rate (%)




                              • MOTA improvement over 2D FTs
                              • Most important role in 3D MOTA is the 2D
                                miss rate
                                     – large slope in linear fit

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:12
posted:8/25/2012
language:English
pages:8