Stereo disparity by fionan


									Lecture 1.2    COMP 646                                                                 Jan. 10, 2008

Stereo disparity
Let’s now reinterpret the model we developed in lecture 1.1. Rather than moving the camera over
time, suppose we have two cameras that are in a fixed position relative to each other. Assume
the optical axes of the two cameras are parallel, i.e. the cameras have the same Z direction, but
that one camera is displaced to the right (which we’ll call the positive x direction) of the other by
a distance Tx . This distance is sometimes called the baseline of the camera pair. In terms of the
previous model, the two cameras correspond to t = 0 and t = 1, and Ty = Tz = 0.
   A 3D point with coordinates (X0 , Y0, Z0 ) in the left camera’s coordinate system would have
coordinates (X0 − Tx , Y0 , Z0 ) in the right camera’s coordinate system. As such, this 3D point would
project to a different x value but not to a different y value in the left and right images. The
difference in x position is called the binocular disparity,
                                              X     X − Tx     Tx
                            d ≡ xl − xr =        f−        f =    f.
                                              Z0      Z0       Z0
From this equation, we can make two key observations:

  1. A 3D point in the world projects to the same row (y value) in the left and right image plane.
     This implies that to match corresponding points in the two images, the vision system only
     needs to search within corresponding rows in the two images.

  2. Once the visual system has found corresponding points in the two images, it can use the
     disparity to calculate the depth of the scene point, assuming Tx and f are known.

    The first problem is often called the correspondence problem. This is the problem we discuss
    To develop an intuition, hold up your finger in front of your face and look at something far away
behind the finger so that the two eyes are pointing in roughly the same direction. You should notice
that you see two copies of your finger (double vision). If this is not obvious, then close the left and
right eyes, alternately back and forth. You’ll notice that the image of your finger jumps back and
forth, relative to the background scene.
    The above discussion highlights the basic features of stereovision.

   • disparity: The two eyes see the world from two different 3D positions. As a result, the
     images seen by the two eyes are slightly different.

   • correspondence: To avoid “double vision,” the brain must match points in the left and right
     eyes’s images.

   • depth perception: The position difference of points in the left and right eye images depends
     on the 3D geometry of the scene. A vision system can use the position differences to infer

    We could generalize the geometry by allowing for arbitrary camera translation and rotation
between. This would follow a derivation similar to what we saw with motion in last lecture. However,
since we are mainly concerned with correspondence today, we keep the geometry simple and only
consider translation in X.

Lecture 1.2        COMP 646                                                                           Jan. 10, 2008

History: random dot stereograms and the correspondence problem
How does the eye/brain match corresponding points in the left and right images? Until the early
1960’s, it was commonly believed that this correspondence problem was “easy”. The eye/brain
found some familiar pattern in the left image and matched it to the same familiar pattern in the
right image, or vice-versa. This makes sense intuitively – you can see quite well with just one eye.
    In the 1960’s, engineers and psychologists became interested in exactly how this process worked
and started doing experiments with digital stereo image pairs. One important type of image that
was used was the random dot stereogram (RDS) which was invented by Bela Julesz at Bell Labs.
The RDS is a pair of images (a “stereo pair”), each of which is merely a random collection of white
and dark dots. As such, it does not contain any familiar features. Although each image on its own
is a set of random dots, there is a relation between the random dots in the two images. The random
dots in the left eye’s image are related to the random dots in the right eye’s image by shifting a
patch of the left eye’s image relative to the right eye’s image.
    Julesz carried out many experiments with RDSs. These are described in detail in his classic
book from 1971 and in a paper1 . His results are very important in understanding how stereo vision.
They strongly suggest the human visual system (HVS) does not rely on matching familiar monocular
features to solve the correspondence problem. (Each image of a random dot stereogram is maximally
random. There are no familiar patterns in there (except with extremely small probability.)
    The construction of the random dot stereograms is illustrated in the figure above. First, one
image (say the left) is created by setting each pixel (“picture element”) randomly to either black
or white. Then, a copy of this image is made. Call this copy the right image. The right image is
then altered by taking a square patch2 and shifting that patch horizontally by d pixels to the left,
writing over any pixels values. The pixels vacated by shifting the patch are filled in with random
values. This procedure yields four types of regions in the two images.

   • the shifted pixels (visible in both left and right images)

   • the pixels in the left image that were erased from the right image, because of the shift and
     write; (left only)

   • the pixels in the right image that were vacated by the shift (right only)

   • any other pixels in the two images (both left and right)

To view a stereogram such as above, your left eye should look at the left image and your right eye
should look at the right image. (This is difficult to do without training.) If you do it correctly, then
you will see a square floating in front of a background.

Disparity space
Let’s now relate the above example to a 3D scene geometry that could give rise to it. Suppose
the optical axes of two cameras are parallel; the right camera is displaced from the left in the X
direction only. This allows us to assume the same projection plane Z = f for the two cameras.
      B. Julesz, ”Binocular depth perception without familiarity cues”, Science, 145:356-362 (1964)
      it doesn’t have to be a square

Lecture 1.2        COMP 646                                                               Jan. 10, 2008

          left eye only                                        right eye only

                          square                          square

                     background                             background

           left eye image                           right eye image                   perception

     The 3D scene is a small square in front of a background. This scene yields a disparity d between
the small square in the left and right images.
     Consider a single horizontal line y = y0 in the image projection plane such that this line cuts
across the displaced square. We wish to understand the disparity along this line.
     The figure above on the right represents this line in the two images using a coordinate system
(xl , xr ). For each 3D scene point that projects to this line, there is a unique xl and xr value.
Moreover, each depth value Z corresponds to a unique disparity value, since d = xl − xr = Tx /Z.
     Notice that the set of lines that arrive at the left eye are vertical lines in the figure on the
right, and the set of lines that arrive at the right eye are horizontal lines in the figure on the right.
Similarly, each horizontal line in the figure on the left represents a line of constant depth (constant
disparity). Each diagonal line in the figure on the right represents a line of constant disparity
(constant depth).
     Make sure you understand where each point on the foreground square and each point on the
background appear in the left and right figures. (I talked through it in class.)
     Note that xl > xr for all points Z > 0. Thus, disparity d is positive.
     Because of the geometry of the projection, certain points on the background surface are visible
to one eye only; others are visible to both eyes; still others are visible to neither eye. Points that
are visible to one eye only are called monocular points.

Lecture 1.2          COMP 646                                                                               Jan. 10, 2008

           not visible to either eye

                                                                                      right eye only

                            foreground square     not visible to either eye

                                                                                                   visible to both eyes
                                                        left eye only
                             projection plane

        left eye         right eye

Formulating the stereo correspondence problem
Now that we better understand the geometry of stereo, let’s try to formulate the stereo correspon-
dence problem as a computation problem. How could a vision system solve the correspondence
problem, say for a random dot stereogram?
    The correspondence problem was first addressed in a formal way by David Marr and Tomasso
Poggio3 in the mid 1970s. They proposed a specific algorithm that matches white pixels in the left
image to white pixels in the right image, and similarly black to black.
    Marr and Poggio try to derive an algorithm from first principles. They introduce two “con-
straints” in the 3D world that make the correspondence problem easier to solve.

   • C1 (uniqueness:) A given pixel in each image is typically the projection of one 3D point in
     the world, namely a point on an opaque surface.

   • C2 (continuity:) Surfaces in the world are typically continuous.

The two constraints often hold, but not always. C1 fails when we have a transparent surface such as
a window, since we have objects visible at different depths and at the same image position. C2 fails
when the scene contains small surfaces at different depths, for example, a bush or tree. In this case,
there will be lots of discontinuities in the depth function, and hence lots of discontinuities in dis-
parity. Anyhow, the two constraints suggest two rules that can be used to solve the correspondence

   • R1 (uniqueness:) For any position in either the left or right eye’s image, there exists exactly
     one disparity value;

   • R2 (continuity:) Disparity varies continuously across the left and right image.
      D. Marr and T. Poggio. ”Cooperative Computation of Stereo Disparity”. Science, 194:282-287. (1976).

Lecture 1.2         COMP 646                                                            Jan. 10, 2008

Note that the second constraint does not hold everywhere in the above example, since there is a
depth discontinuity.
     Using the constraints/rules on the stereo correspondence problem, they propose an iterative
algorithm to solve the correspondence problem. Assuming N pixels in each image, they discretize
the space (xl , xr ) using an N × N square grid. Each row/column in the grid corresponds to a single
position in the left/right image respectively. The goal is then to compute a binary labeling of the
nodes in this N × N grid. A 1-label means that the node corresponds to a visible surface at that
(xl , xr ) value, a 0-label otherwise.
     The rules R1 and R2 constrain the labeling solution:
   • The uniqueness rule R1 implies that at most one node can have the label 1 in any row and in
     any column.

   • The rule R2 implies that the disparity d should vary slowly as a function of xl and as a
     function of xr . Any fixed value of disparity d defines a line xl − xr = d in (xl , xr ) space. This
     is a diagonal line with slope 1. Thus, the continuity constraint implies that nodes with label
     1 should cluster along such diagonal lines.
Marr and Poggio proposed an iterative algorithm for computing such a labeling, given binary images
Il and Ir from one row in a random dot stereogram. Their paper is significant as it is one of the
first papers to propose a computational model of a problem solved in visual perception.

How binocular stereo depends on depth
It is often claimed that stereovision is good for near objects, but poor for distant objects. What
does this mean? For the visual system to use stereovision to perceive the relative depth of two
points, it needs to detect a change in disparity ∆d between these two points. Similarly, for the
visual system to distinguish a surface that has constant depth from one that does not have constant
depth, it needs to detect a change in disparity ∆d along the surface.
    Consider two points, one at depth Z and the other at depth Z + ∆Z. From d = f Tx /Z0 , we
approximate 4 :
                                                 1        −1
                                   ∆d = Tx f (∆ ) ≈ Tx f 2 (∆Z)
                                                Z         Z
                                                  Tx f ∆Z
                                        ∆d ≈ −(       )(    ).
                                                   Z     Z
    How do we interpret this? If we consider two nearby points in the scene, we have a difference
in depth and a difference in disparity. The difference in disparity is proportional to two terms: the
ratio Tx , and the relative change ∆Z in depth. The first term fZx is the disparity itself. The second
       Z                           Z
term ∆Z is the relative depth of the two points.
    For example, suppose we are looking at a ball and comparing the nearest point on the ball with
a point near the edge of the ball (as seen by left eye, see figure below). The term ∆Z is equal to
      If g(Z) = 1/Z is a function of Z, its derivative is g ′ (Z) = −1/Z 2 and so

                                                   ∆g(Z) ≈ g ′ (Z)∆Z .

Lecture 1.2    COMP 646                                                                 Jan. 10, 2008

the (half)angle subtended by the ball on the image plane. (Actually, it is the tangent of this angle,
but for small angles, tan θ ≈ θ.) Notice that this relative depth is the same for the ball as it is for
any sphere that subtends the same angle, for example the moon.
    Thus, two spheres that subtend the same angle but that are at different depths have a different
the disparity range ∆d. If one of the spheres is very far away (the moon), the absolute depth Z
is large and so the disparity range is near zero. Such a sphere will look flat to a human observer,
since Tx ≈ 0. By contrast, a nearby sphere that subtends the same visual angle as the moon will
not look flat, since the change in disparity along the surface is large enough that it can be detected
by the visual system.




Vergence and disparity
I have set up the mathematics of stereo such that the two eyes are pointing in the same direction,
namely perpendicular to the vector that separates the two eyes. This is not the most general
situation. In particular, in human vision, the eyes rotate and the optical axes (the depth direction)
of the two eyes can converge on an arbitrary point in the image. The geometry of stereo gets a
bit more complicated in the case that the eyes converge. I will delay presenting the details of this
geometry until much later. For now, let’s keep the discussion qualitative.
    Consider the 2D plane that is defined by three points: the two eyes, and the point in space
where the optical axes of the eyes converge. This 3D point is, called the vergence point. It is ‘the
point the eyes are looking at. The depth direction Z is not the same in the left and right eyes when
the eyes are converging on a point that is a finite distance away. Thus, if we keep the definition of
disparity as before, d = xl − xr , then the disparity value will be different from the case when the
optical axes of the eyes were parallel. Roughly speaking, as the eyes converge, the xl value of any
scene point will decrease and the xr value will increase, and so the disparity will decrease.
    Here are a few common facts/definitions you should know about:
   • The disparity of the vergence point is 0. The vergence point is not the only point with zero
     disparity. Define the horoptor to be the locus of points with zero disparity.
   • The horoptor is roughly identicaly to the Vieth-M¨ller circle, which is the unique circle con-
     taining the two eyes and the vergence point.
   • Any point P has crossed disparity if the dP > 0. This happens if if P is closer than the
     horoptor. Crossed disparity means that you would have to “cross your eyes” in order to fixate
     on the object.

Lecture 1.2     COMP 646                                                                 Jan. 10, 2008

   • P has uncrossed disparity if dP < 0. This happens if P is farther than the horoptor.

When the eyes are parallel (verged at infinity), disparity is always crossed or zero. But when the
eyes converge, disparity can be crossed or uncrossed.
                                                 disparity < 0

                                          disparity > 0

                                            left eye          right eye

As an aside, you may be interested to know that many people are ”stereoblind” or partly ”stere-
oblind”. This doesn’t mean that they can see with one eye only. Rather, it means they cannot
get any depth information from stereovision. Surprisingly, some people are stereoblind for crossed
disparities, but they have stereo vision for uncrossed disparities and vice-versa.
    Stereoblindless is analogous to color blindness. As I will briefly discuss later in the course, color
blindless arises when a person is missing certain photoabsorbing pigments in their eyes. There are
different types of color blindness, depending on which pigments you are missing. Ten percent of
human males are missing one of the three pigments, and hence partly color blind. Color blindless
doesn’t mean that you cannot see color. Rather it means that you see only a range of colors.


To top