Docstoc

Interleaved Object Categorization and Segmentation

Document Sample
Interleaved Object Categorization and Segmentation Powered By Docstoc
					                                              Integrating Recognition and Reconstruction
                                              for Cognitive Scene Interpretation
Integrating and Sensory Augmented Computing
Perceptual Recognitoin and Reconstruction




                                              Bastian Leibe, Nico Cornelis, Kurt Cornelis, Luc Van Gool
                                              Computer Vision Laboratory & VISICS
                                              ETH Zurich                         KU Leuven

                                              Sicily Workshop, Syracusa, 22.09.2006

                                                                                              CVPR’06 Video Proceedings
                                                                                                      DAGM’06
                                              Motivation
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                              • Urban traffic scene analysis from a moving vehicle
                                                    Detect objects in the image
                                                    Localize them in 3D
                                                    Build up a metric scene model
                                              • Applications e.g. in driver assistance systems
                                                                                                                     2
                                                                   B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Challenges
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                                             Motion blur                                Brightly-lit areas




                                                            Lense flaring                                  Dark shadows
                                              + Intra-category variability, multiple viewpoints, partial occlusion, ...

                                                                                                                             3
                                                                   B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Cognitive Loop with 3D Geometry
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                              • Connect recognition and reconstruction
                                              • Reconstruction pathway delivers scene geometry
                                                  Greatly improves recognition performance
                                              • Recognition detects objects that disturb reconstruction
                                                  More accurate geometry estimate


                                                                                                                   4
                                                                 B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Outline
                                              • Hardware setup
                                              • Reconstruction pathway
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                                    Real-time Structure-from-Motion
                                                    Real-time dense reconstruction

                                              • Recognition pathway
                                                    Local-feature based object detection
                                                    Incorporation of scene geometry
                                                    Temporal integration in world coordinate frame
                                                    Feedback to reconstruction

                                              • Results and Conclusion


                                                                                                                     5
                                                                   B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Hardware Setup
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                              • Stereo camera rig mounted on top of the vehicle
                                              • Calibrated w.r.t. wheel base points
                                              • Video streams captured at 25 fps, 360288 resolution

                                                                                                                 6
                                                               B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Real-Time Structure-from-Motion
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                              • Basis: very fast feature matching
                                                    Simple features
                                                    Optimized for urban environment
                                                    Only computed on green channel of a single camera
                                              • Rest: standard SfM pipeline
                                                                                                                     [Cornelis et al., CVPR’06]
                                                                                                                                             7
                                                                   B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Real-Time Dense Reconstruction
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                              • Dense reconstruction on rectified images
                                                    Ruled surface assumption to speed-up dense reconstruction
                                                    Correlation measure: Sum of per-pixel SSDs along vertical lines
                                                    Line-sweep algorithm with ordering constraints (DP)
                                                    Fast computation on GPU
                                              • Errors introduced by pixels not belonging to facades!
                                                                                                                      [Cornelis et al., CVPR’06]
                                                                                                                                              8
                                                                    B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Real-Time Dense Reconstruction (2)
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                              • Merge dense reconstructions using known camera poses.
                                              • “Voted polygon carving” on 2D projection


                                                                                                                9
                                                              B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Real-Time Dense Reconstruction (2)
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                              • Merge dense reconstructions using known camera poses.
                                              • “Voted polygon carving” on 2D projection
                                              • Surfaces registered on world map using GPS

                                                                                                                10
                                                              B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Textured 3D Model
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                                         Original                                            3D Reconstruction
                                              • Run-times
                                                   SfM + Bundle adjustment: 26-30 fps on CPU
                                                   Dense reconstruction:       26 fps on GPU

                                                                                                                                 11
                                                                    B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Information Flow into Recognition
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                              • For each frame, 3D reconstruction delivers
                                                  External camera calibration
                                                  Ground plane estimate

                                                  Used for improving recognition of the next frame.




                                                                                                                   12
                                                                 B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Appearance-Based Car Detection
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                              • Bank of 5 single-view ISM detectors
                                              • Each based on 3 local cues
                                                    Harris-Laplace, Hessian-Laplace, and DoG interest regions
                                                    Local Shape Context descriptors
                                              • Semi-profile detectors additionally mirrored
                                              • Not real-time yet…
                                                                                                                [Leibe, Mikolajczyk, Schiele,06]
                                                                                                                                             13
                                                                   B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Implicit Shape Model - Representation
                                                                                                                                                    …
                                                                                                                                                    …
                                                                                                                                                    …
                                                                                                                                                    …
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                                           training images
                                                     (+reference segmentation)
                                                                                                              …
                                                                                                                          Appearance codebook

                                              • Learn appearance codebook                                             y                 y

                                                     Extract patches at interest points
                                                     Agglomerative clustering  codebook
                                                                                                                          s                 s
                                                                                                                                 x              x
                                              • Learn spatial distributions                                           y                 y

                                                     Match codebook to training images
                                                     Record matching positions on object
                                                                                                                          s                 s
                                                                                                                                 x              x
                                                                                                                 Spatial occurrence distributions
                                                                                                                                                    14
                                                                        B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Implicit Shape Model - Recognition
                                                   Interest Points     Matched Codebook                      Probabilistic
                                                                            Entries                              Voting
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                                                                                                         y


                                                                                                                         s
                                                                                                                                        x
                                                                                                                      3D Voting Space
                                                                                                                       (continuous)




                                                                          Backprojected                     Backprojection
                                                                           Hypotheses                          of Maxima
                                                               B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                                                                                                   [Leibe & Schiele,04] 15
                                              Implicit Shape Model - Recognition
                                                      Interest Points       Matched Codebook                      Probabilistic
                                                                                 Entries                              Voting
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                                                                                                              y


                                                                                                                              s
                                                                                                                                             x
                                                                                                                           3D Voting Space
                                              Segmentation                                                                  (continuous)




                                                        p(figure)              Backprojected                     Backprojection
                                                      Probabilities             Hypotheses                          of Maxima
                                                                    B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                                                                                                        [Leibe & Schiele,04] 16
                                              2D/3D Interactions
                                              • Likelihood of 3D hypothesis H given image I and 2D
                                                detections h:
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                                                                                                      recognition
                                              • 2D recognition score                                                   score (2D)
                                                    Expressed in terms of per-pixel p(figure) probabilities




                                                                                                                                    17
                                                                    B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              2D/3D Interactions
                                              • Likelihood of 3D hypothesis H given image I and 2D
                                                detections h:
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                                                                                                    3D prior
                                              • 3D prior
                                                  Distance prior (uniform range)
                                                  Size prior      (Gaussian)
                                                  Significantly reduced search space                               y               Search
                                                                                                                                   corridor

                                                                                                                        s
                                                                                                                               x

                                                                                                                                        18
                                                                  B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              2D/3D Interactions
                                              • Likelihood of 3D hypothesis H given image I and 2D
                                                detections h:
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                                                                                         2D/3D
                                                                                                        transfer
                                              • 2D/3D transfer
                                                 Two image-plane detections are consistent if they correspond
                                                  to the same 3D object
                                                  Multi-viewpoint integration
                                                  Multi-camera integration




                                                                                                                   19
                                                                 B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Detections Using Ground Plane Constraints
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                                                                                            left camera
                                                                                                            1175 frames

                                                                                                                    20
                                                          B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Quantitative Results
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                              • Detection performance on first 600 frames
                                                    All cars annotated that were >50% visible
                                                    Ground plane constraint significantly improves precision
                                                    Performance: 0.2 fp/image at 50% recall

                                                                                                                      21
                                                                    B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Temporal Integration
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                              • Temporal integration in world coordinate frame
                                                    Using external camera calibration from SfM.
                                                    Each detection transfers to a 3D observation H.
                                                    Find superset of 3D hypotheses .
                                                    Estimate orientation using cluster shape & detected viewpoints.
                                                    Select set of 3D hypotheses that best explain the observations.

                                                                                                                     22
                                                                   B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Hypothesis Selection for 3D Detections
                                              • Quadratic Boolean Optimization Problem (from MDL)
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                                                                                                  [Leonardis et al,95]

                                              • Individual scores (diagonal terms)


                                              • Interaction costs (off-diagonal terms)


                                                             temporal                              likelihood of        penalty for
                                                               decay                              membership to          physical
                                                                                                   hypothesis            overlap
                                                                                                                                     23
                                                                B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                                  Perceptual Recognition and Reconstruction
                                                  Integrating and Sensory Augmented Computing




B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                                                                                Result of Temporal Integration




                                 24
                                                  Perceptual Recognition and Reconstruction
                                                  Integrating and Sensory Augmented Computing




B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                                                                                Online 3D Car Location Estimates




                                 25
                                                  Perceptual Recognition and Reconstruction
                                                  Integrating and Sensory Augmented Computing




B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                                                                                3D Estimates After Convergence




                                 26
                                              Feedback into 3D Reconstruction
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                              • Feedback of detections & segmentation maps
                                                 Used to discard features on cars for SfM
                                                 Used to mask out cars in dense reconstruction

                                                 More accurate 3D estimates in the next frame.

                                                                                                                  27
                                                                B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Another Application: 3D City Modeling
                                              Enhancing your driving experience…
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                                       Original                                            3D Reconstruction



                                                                                                                               28
                                                                  B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Conclusion
                                              • System for traffic scene analysis integrating
                                                    Structure-from-Motion
Integrating and Sensory Augmented Computing




                                                    Dense 3D Reconstruction
Perceptual Recognition and Reconstruction




                                                    Object detection and localization in 2D and 3D
                                                    Temporal integration in world coordinate frame

                                              • Cognitive Loop between 2D and 3D processing
                                                    Reconstruction delivers camera calibration, ground plane
                                                    3D context tremendously improves recognition performance
                                                    Car detection, segmentation makes 3D estimation more accurate

                                              • System applied to challenging real-world task
                                                    Real-time 3D reconstruction (26-30 fps)
                                                    Accurate object detection & 3D pose estimation results

                                                                                                                     29
                                                                   B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
                                              Thank you very much for your attention!
Integrating and Sensory Augmented Computing
Perceptual Recognition and Reconstruction




                                                                                                         30
                                                       B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool

				
DOCUMENT INFO