# Computer Vision: Multiview Stereo

Document Sample

```					Camera Calibration & Stereo
Reconstruction

Jinxiang Chai
3D Computer Vision
The main goal here is to reconstruct geometry of 3D
worlds.
How can we estimate the camera parameters?

- Where is the camera located?
- Which direction is the camera looking at?
- Focal length, projection center, aspect ratio?
Stereo reconstruction
Given two or more images of the same scene or object,
compute a representation of its shape

known
camera
viewpoints

How can we estimate camera parameters?
Camera calibration
Augmented pin-hole camera
- focal point, orientation
- focal length, aspect ratio, center, lens distortion

Known 3D

Classical calibration
- 3D    2D
- correspondence
Camera calibration online resources
Camera and calibration target
Classical camera calibration
Known 3D coordinates and 2D coordinates
- known 3D points on calibration targets

- find corresponding 2D points in image using feature detection
algorithm
Camera parameters

Known 3D coords and 2D coords

u    sx      а         u0
v    0      -sy        v0
1    0       0         1
Viewport proj.        Perspective proj.   View trans.
Camera parameters

Known 3D coords and 2D coords

u    sx      а         u0
v    0      -sy        v0
1    0       0         1
Viewport proj.        Perspective proj.   View trans.
Intrinsic camera          extrinsic camera
parameters (5 parameters) parameters (6 parameters)
Camera matrix
Fold intrinsic calibration matrix K and extrinsic pose
parameters (R,t) together into a
camera matrix
M = K [R | t ]

(put 1 in lower r.h. corner for 11 d.o.f.)
Camera matrix calibration
Directly estimate 11 unknowns in the M matrix using
known 3D points (Xi,Yi,Zi) and measured feature
positions (ui,vi)
Camera matrix calibration
Linear regression:
• Bring denominator over, solve set of (over-determined) linear
equations. How?
Camera matrix calibration
Linear regression:
• Bring denominator over, solve set of (over-determined) linear
equations. How?

• Least squares (pseudo-inverse)
- 11 unknowns (up to scale)
- 2 equations per point (homogeneous coordinates)
- 6 points are sufficient
Nonlinear camera calibration
Perspective projection:
x 
ui   f x       u0  r1T     t1   i 
v    0                           yi 
 i         fy   v0   r2T
          t2  
z 
1  0
           0    1  r3T
              i
t3 
1
Nonlinear camera calibration
Perspective projection:
K           R      T      P
x 
ui   f x       u0  r1T     t1   i 
v    0                           yi 
 i         fy   v0   r2T
          t2  
z 
1  0
           0    1  r3T
              i
t3 
1
Nonlinear camera calibration
Perspective projection:
K             R    T      P
x 
ui   f x          u0  r1T     t1   i 
v    0                              yi 
 i            fy   v0   r2T
          t2  
z 
1  0
              0    1  r3T
              i
t3 
1

2D coordinates are just a nonlinear function of its 3D
coordinates and camera parameters:
( f x r1T  r2T  u0 r3T )  P  f xt1  t2  u0t3
ui 
r3T  P  t3
( f y r2T  v0 r3T )  P  f y t2  t3
vi 
r3T  P  t3
Nonlinear camera calibration
Perspective projection:
K              R     T      P
x 
ui   f x          u0  r1T      t1   i 
v    0                               yi 
 i           fy    v0   r2T
           t2  
z 
1  0
              0    1  r3T
               i
t3 
1

2D coordinates are just a nonlinear function of its 3D
coordinates and camera parameters:
( f x r1T  r2T  u0 r3T )  P  f xt1  t2  u0t3              f ( K , R, T ; Pi )
ui 
r3T  P  t3
( f y r2T  v0 r3T )  P  f y t2  t3
vi                                                  g ( K , R, T ; Pi )
r  P  t3
3
T
Multiple calibration images
Find camera parameters which satisfy the constraints from
M images, N points:
for j=1,…,M
for i=1,…,N     uij  f ( K , R j , T j ; Pi )
vij  g ( K , R j , T j ; Pi )

This can be formulated as a nonlinear optimization problem:
M     N

 (uij  f ( K , R j , T j ; Pi )) 2  (vij  g ( K , R j , T j ; Pi )) 2
j 1 i 1
Multiple calibration images
Find camera parameters which satisfy the constraints from
M images, N points:
for j=1,…,M
for i=1,…,N
uij  f ( K , R j , T j ; Pi )
vij  g ( K , R j , T j ; Pi )

This can be formulated as a nonlinear optimization problem:
M     N

 (uij  f ( K , R j , T j ; Pi )) 2  (vij  g ( K , R j , T j ; Pi )) 2
j 1 i 1

Solve the optimization using nonlinear optimization techniques:
- Gauss-newton
- Levenberg-Marquardt
Nonlinear approach
Advantages:
• can solve for more than one camera pose at a time
• fewer degrees of freedom than linear approach
• Standard technique in photogrammetry, computer vision,
computer graphics
- [Tsai 87] also estimates lens distortions (freeware @ CMU)
http://www.cs.cmu.edu/afs/cs/project/cil/ftp/html/v-source.html

Disadvantages:
• more complex update rules
• need a good initialization (recover K [R | t] from M)
How can we estimate the camera parameters?
Application: camera calibration for sports video

images               Court model

[Farin et. Al]
Stereo matching
Given two or more images of the same scene or object
as well as their camera parameters, how to compute
a representation of its shape?

What are some possible representations for shapes?
• depth maps
• volumetric models
• 3D surface models
• planar (or offset) layers
Outline
Stereo matching
- Traditional stereo
- Active stereo

Volumetric stereo
- Visual hull
- Voxel coloring
- Space carving
Readings

Stereo matching
• 11.1, 11.2,.11.3,11.5 in Sezliski book

• D. Scharstein and R. Szeliski. A taxonomy and
evaluation of dense two-frame stereo correspondence
algorithms.
International Journal of Computer Vision, 47(1/2/3):7-
42, April-June 2002.
Stereo

scene point

image plane

optical center
Stereo

Basic Principle: Triangulation
• Gives reconstruction as intersection of two rays
• Requires
> calibration
> point correspondence
Stereo correspondence
Determine Pixel Correspondence
• Pairs of points that correspond to same scene point

epipolar line                                            epipolar line
epipolar plane

Epipolar Constraint
• Reduces correspondence problem to 1D search along conjugate
epipolar lines
• Java demo: http://www.ai.sri.com/~luong/research/Meta3DViewer/EpipolarGeo.html
Stereo image rectification
Stereo image rectification

•   reproject image planes onto a common
plane parallel to the line between optical centers
•   pixel motion is horizontal after this transformation
•   two homographies (3x3 transform), one for each
input image reprojection
   C. Loop and Z. Zhang. Computing Rectifying Homographies for
Stereo Vision. IEEE Conf. Computer Vision and Pattern
Recognition, 1999.
Rectification

Original image pairs

Rectified image pairs
Stereo matching algorithms
Match Pixels in Conjugate Epipolar Lines
• Assume brightness constancy
• This is a tough problem
• Numerous approaches
> A good survey and evaluation:   http://www.middlebury.edu/stereo/
Your basic stereo algorithm

For each epipolar line
For each pixel in the left image
•   compare with every pixel on same epipolar line in right image
•   pick pixel with minimum matching cost

Improvement: match windows
•   This should look familiar.. (cross correlation or SSD)
•   Can use Lukas-Kanade or discrete search (latter more common)
Window size

W=3   W = 20
Effect of window size
• Smaller window
+
-
• Larger window
+
-
More constraints?
We can enforce more constraints to reduce matching
ambiguity
- smoothness constraints: computed disparity at a pixel
should be consistent with neighbors in a surrounding window.

- uniqueness constraints: the matching needs to be bijective

- ordering constraints: e.g., computed disparity at a pixel
should not be larger than the disparity of its right neighbor pixel by
more than one pixel.
Stereo results
• Data from University of Tsukuba
• Similar results on other images without ground truth

Scene                                    Ground truth
Results with window search

Window-based matching       Ground truth
(best window size)
Better methods exist...

A better method                                      Ground truth
Boykov et al., Fast Approximate Energy Minimization via Graph Cuts,
International Conference on Computer Vision, September 1999.
More recent development
High-Quality Single-Shot Capture of Facial
Geometry [siggraph 2010, project website]
- capture high-fidelity facial geometry from multiple cameras
- pairwise stereo reconstruction between neighboring cameras
- hallucinate facial details
More recent development
High Resolution Passive Facial Performance
Capture [siggraph 2010, project website]
- capture dynamic facial geometry from multiple video cameras
- spatial stereo reconstruction for every frame
- building temporal correspondences across the entire sequence
Stereo reconstruction pipeline
Steps
•   Calibrate cameras
•   Rectify images
•   Compute disparity
•   Estimate depth
Stereo reconstruction pipeline
Steps
•   Calibrate cameras
•   Rectify images
•   Compute disparity
•   Estimate depth

What will cause errors?
•   Camera calibration errors
•   Poor image resolution
•   Occlusions
•   Violations of brightness constancy (specular reflections)
•   Large motions
•   Low-contrast image regions
Outline
Stereo matching
- Traditional stereo
- Active stereo

Volumetric stereo
- Visual hull
- Voxel coloring
- Space carving
Active stereo with structured light

Li Zhang’s one-shot stereo

camera 1                                        camera 1

projector                                       projector

camera 2

Project “structured” light patterns onto the object
• simplifies the correspondence problem
Active stereo with structured light
Laser scanning

Digital Michelangelo Project
http://graphics.stanford.edu/projects/mich/

Optical triangulation
• Project a single stripe of laser light
• Scan it across the surface of the object
• This is a very precise version of structured light scanning
Laser scanned models

The Digital Michelangelo Project, Levoy et al.
Laser scanned models

The Digital Michelangelo Project, Levoy et al.
Recent development
Capturing dynamic facial movement using active
stereo [project website]
- use synchronized video cameras and structured light projectors to
capture dynamic facial geometry
- use a generic 3D model to build temporal correspondences across
the entire sequence

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 17 posted: 9/16/2012 language: English pages: 49