Computer Vision: Multiview Stereo

Document Sample
Computer Vision: Multiview Stereo Powered By Docstoc
					Camera Calibration & Stereo
     Reconstruction

        Jinxiang Chai
3D Computer Vision
The main goal here is to reconstruct geometry of 3D
worlds.
How can we estimate the camera parameters?




      - Where is the camera located?
      - Which direction is the camera looking at?
      - Focal length, projection center, aspect ratio?
Stereo reconstruction
Given two or more images of the same scene or object,
  compute a representation of its shape



                                            known
                                            camera
                                          viewpoints




How can we estimate camera parameters?
Camera calibration
Augmented pin-hole camera
 - focal point, orientation
 - focal length, aspect ratio, center, lens distortion


                   Known 3D




 Classical calibration
   - 3D    2D
   - correspondence
   Camera calibration online resources
Camera and calibration target
Classical camera calibration
Known 3D coordinates and 2D coordinates
   - known 3D points on calibration targets




   - find corresponding 2D points in image using feature detection
     algorithm
Camera parameters

                                  Known 3D coords and 2D coords




 u    sx      а         u0
 v    0      -sy        v0
 1    0       0         1
       Viewport proj.        Perspective proj.   View trans.
Camera parameters

                                  Known 3D coords and 2D coords




 u    sx      а         u0
 v    0      -sy        v0
 1    0       0         1
       Viewport proj.        Perspective proj.   View trans.
               Intrinsic camera          extrinsic camera
           parameters (5 parameters) parameters (6 parameters)
Camera matrix
Fold intrinsic calibration matrix K and extrinsic pose
  parameters (R,t) together into a
  camera matrix
                        M = K [R | t ]




(put 1 in lower r.h. corner for 11 d.o.f.)
Camera matrix calibration
Directly estimate 11 unknowns in the M matrix using
  known 3D points (Xi,Yi,Zi) and measured feature
  positions (ui,vi)
Camera matrix calibration
Linear regression:
   • Bring denominator over, solve set of (over-determined) linear
     equations. How?
Camera matrix calibration
Linear regression:
   • Bring denominator over, solve set of (over-determined) linear
     equations. How?




   • Least squares (pseudo-inverse)
     - 11 unknowns (up to scale)
     - 2 equations per point (homogeneous coordinates)
     - 6 points are sufficient
Nonlinear camera calibration
Perspective projection:
                                                      x 
              ui   f x       u0  r1T     t1   i 
              v    0                           yi 
               i         fy   v0   r2T
                                              t2  
                                                      z 
              1  0
                         0    1  r3T
                                                  i
                                               t3 
                                                      1
Nonlinear camera calibration
Perspective projection:
                            K           R      T      P
                                                      x 
              ui   f x       u0  r1T     t1   i 
              v    0                           yi 
               i         fy   v0   r2T
                                              t2  
                                                      z 
              1  0
                         0    1  r3T
                                                  i
                                               t3 
                                                      1
Nonlinear camera calibration
Perspective projection:
                                       K             R    T      P
                                                                 x 
                      ui   f x          u0  r1T     t1   i 
                      v    0                              yi 
                       i            fy   v0   r2T
                                                         t2  
                                                                 z 
                      1  0
                                    0    1  r3T
                                                             i
                                                          t3 
                                                                 1

2D coordinates are just a nonlinear function of its 3D
coordinates and camera parameters:
          ( f x r1T  r2T  u0 r3T )  P  f xt1  t2  u0t3
     ui 
                              r3T  P  t3
            ( f y r2T  v0 r3T )  P  f y t2  t3
     vi 
                        r3T  P  t3
Nonlinear camera calibration
Perspective projection:
                                      K              R     T      P
                                                                  x 
                      ui   f x          u0  r1T      t1   i 
                      v    0                               yi 
                       i           fy    v0   r2T
                                                          t2  
                                                                  z 
                      1  0
                                    0    1  r3T
                                                              i
                                                           t3 
                                                                  1

2D coordinates are just a nonlinear function of its 3D
coordinates and camera parameters:
          ( f x r1T  r2T  u0 r3T )  P  f xt1  t2  u0t3              f ( K , R, T ; Pi )
     ui 
                              r3T  P  t3
            ( f y r2T  v0 r3T )  P  f y t2  t3
     vi                                                  g ( K , R, T ; Pi )
                        r  P  t3
                         3
                          T
Multiple calibration images
Find camera parameters which satisfy the constraints from
M images, N points:
  for j=1,…,M
       for i=1,…,N     uij  f ( K , R j , T j ; Pi )
                       vij  g ( K , R j , T j ; Pi )

This can be formulated as a nonlinear optimization problem:
        M     N

         (uij  f ( K , R j , T j ; Pi )) 2  (vij  g ( K , R j , T j ; Pi )) 2
        j 1 i 1
Multiple calibration images
Find camera parameters which satisfy the constraints from
M images, N points:
  for j=1,…,M
       for i=1,…,N
                       uij  f ( K , R j , T j ; Pi )
                       vij  g ( K , R j , T j ; Pi )

This can be formulated as a nonlinear optimization problem:
        M     N

         (uij  f ( K , R j , T j ; Pi )) 2  (vij  g ( K , R j , T j ; Pi )) 2
        j 1 i 1


      Solve the optimization using nonlinear optimization techniques:
       - Gauss-newton
       - Levenberg-Marquardt
Nonlinear approach
Advantages:
  • can solve for more than one camera pose at a time
  • fewer degrees of freedom than linear approach
  • Standard technique in photogrammetry, computer vision,
    computer graphics
     - [Tsai 87] also estimates lens distortions (freeware @ CMU)
    http://www.cs.cmu.edu/afs/cs/project/cil/ftp/html/v-source.html

Disadvantages:
  • more complex update rules
  • need a good initialization (recover K [R | t] from M)
How can we estimate the camera parameters?
Application: camera calibration for sports video




              images               Court model

                  [Farin et. Al]
Stereo matching
Given two or more images of the same scene or object
  as well as their camera parameters, how to compute
  a representation of its shape?


What are some possible representations for shapes?
  • depth maps
  • volumetric models
  • 3D surface models
  • planar (or offset) layers
Outline
Stereo matching
  - Traditional stereo
  - Active stereo


Volumetric stereo
  - Visual hull
  - Voxel coloring
  - Space carving
Readings

Stereo matching
  • 11.1, 11.2,.11.3,11.5 in Sezliski book


  • D. Scharstein and R. Szeliski. A taxonomy and
    evaluation of dense two-frame stereo correspondence
    algorithms.
    International Journal of Computer Vision, 47(1/2/3):7-
    42, April-June 2002.
Stereo


                       scene point




                    image plane

   optical center
Stereo




Basic Principle: Triangulation
   • Gives reconstruction as intersection of two rays
   • Requires
       > calibration
       > point correspondence
Stereo correspondence
Determine Pixel Correspondence
   • Pairs of points that correspond to same scene point



 epipolar line                                            epipolar line
                            epipolar plane




Epipolar Constraint
   • Reduces correspondence problem to 1D search along conjugate
     epipolar lines
   • Java demo: http://www.ai.sri.com/~luong/research/Meta3DViewer/EpipolarGeo.html
Stereo image rectification
    Stereo image rectification




•   reproject image planes onto a common
    plane parallel to the line between optical centers
•   pixel motion is horizontal after this transformation
•   two homographies (3x3 transform), one for each
    input image reprojection
   C. Loop and Z. Zhang. Computing Rectifying Homographies for
    Stereo Vision. IEEE Conf. Computer Vision and Pattern
    Recognition, 1999.
Rectification




                Original image pairs




                Rectified image pairs
Stereo matching algorithms
Match Pixels in Conjugate Epipolar Lines
   • Assume brightness constancy
   • This is a tough problem
   • Numerous approaches
      > A good survey and evaluation:   http://www.middlebury.edu/stereo/
Your basic stereo algorithm




 For each epipolar line
     For each pixel in the left image
          •   compare with every pixel on same epipolar line in right image
          •   pick pixel with minimum matching cost

 Improvement: match windows
     •   This should look familiar.. (cross correlation or SSD)
     •   Can use Lukas-Kanade or discrete search (latter more common)
Window size




                        W=3   W = 20
Effect of window size
  • Smaller window
     +
     -
  • Larger window
     +
     -
More constraints?
We can enforce more constraints to reduce matching
ambiguity
 - smoothness constraints: computed disparity at a pixel
   should be consistent with neighbors in a surrounding window.


 - uniqueness constraints: the matching needs to be bijective

 - ordering constraints: e.g., computed disparity at a pixel
   should not be larger than the disparity of its right neighbor pixel by
   more than one pixel.
Stereo results
  • Data from University of Tsukuba
  • Similar results on other images without ground truth




       Scene                                    Ground truth
Results with window search




 Window-based matching       Ground truth
   (best window size)
    Better methods exist...




                 A better method                                      Ground truth
Boykov et al., Fast Approximate Energy Minimization via Graph Cuts,
   International Conference on Computer Vision, September 1999.
More recent development
High-Quality Single-Shot Capture of Facial
Geometry [siggraph 2010, project website]
 - capture high-fidelity facial geometry from multiple cameras
 - pairwise stereo reconstruction between neighboring cameras
 - hallucinate facial details
More recent development
High Resolution Passive Facial Performance
Capture [siggraph 2010, project website]
 - capture dynamic facial geometry from multiple video cameras
 - spatial stereo reconstruction for every frame
 - building temporal correspondences across the entire sequence
Stereo reconstruction pipeline
Steps
  •   Calibrate cameras
  •   Rectify images
  •   Compute disparity
  •   Estimate depth
Stereo reconstruction pipeline
Steps
   •   Calibrate cameras
   •   Rectify images
   •   Compute disparity
   •   Estimate depth

What will cause errors?
   •   Camera calibration errors
   •   Poor image resolution
   •   Occlusions
   •   Violations of brightness constancy (specular reflections)
   •   Large motions
   •   Low-contrast image regions
Outline
Stereo matching
  - Traditional stereo
  - Active stereo


Volumetric stereo
  - Visual hull
  - Voxel coloring
  - Space carving
    Active stereo with structured light




                               Li Zhang’s one-shot stereo

            camera 1                                        camera 1


projector                                       projector


            camera 2




      Project “structured” light patterns onto the object
            • simplifies the correspondence problem
Active stereo with structured light
Laser scanning




                                              Digital Michelangelo Project
                                           http://graphics.stanford.edu/projects/mich/




 Optical triangulation
    • Project a single stripe of laser light
    • Scan it across the surface of the object
    • This is a very precise version of structured light scanning
Laser scanned models




           The Digital Michelangelo Project, Levoy et al.
Laser scanned models




           The Digital Michelangelo Project, Levoy et al.
Recent development
Capturing dynamic facial movement using active
stereo [project website]
   - use synchronized video cameras and structured light projectors to
capture dynamic facial geometry
  - use a generic 3D model to build temporal correspondences across
the entire sequence

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:17
posted:9/16/2012
language:English
pages:49