# Stereo disparity by fionan

VIEWS: 101 PAGES: 7

• pg 1
Lecture 1.2    COMP 646                                                                 Jan. 10, 2008

Stereo disparity
Let’s now reinterpret the model we developed in lecture 1.1. Rather than moving the camera over
time, suppose we have two cameras that are in a ﬁxed position relative to each other. Assume
the optical axes of the two cameras are parallel, i.e. the cameras have the same Z direction, but
that one camera is displaced to the right (which we’ll call the positive x direction) of the other by
a distance Tx . This distance is sometimes called the baseline of the camera pair. In terms of the
previous model, the two cameras correspond to t = 0 and t = 1, and Ty = Tz = 0.
A 3D point with coordinates (X0 , Y0, Z0 ) in the left camera’s coordinate system would have
coordinates (X0 − Tx , Y0 , Z0 ) in the right camera’s coordinate system. As such, this 3D point would
project to a diﬀerent x value but not to a diﬀerent y value in the left and right images. The
diﬀerence in x position is called the binocular disparity,
X     X − Tx     Tx
d ≡ xl − xr =        f−        f =    f.
Z0      Z0       Z0
From this equation, we can make two key observations:

1. A 3D point in the world projects to the same row (y value) in the left and right image plane.
This implies that to match corresponding points in the two images, the vision system only
needs to search within corresponding rows in the two images.

2. Once the visual system has found corresponding points in the two images, it can use the
disparity to calculate the depth of the scene point, assuming Tx and f are known.

The ﬁrst problem is often called the correspondence problem. This is the problem we discuss
today.
To develop an intuition, hold up your ﬁnger in front of your face and look at something far away
behind the ﬁnger so that the two eyes are pointing in roughly the same direction. You should notice
that you see two copies of your ﬁnger (double vision). If this is not obvious, then close the left and
right eyes, alternately back and forth. You’ll notice that the image of your ﬁnger jumps back and
forth, relative to the background scene.
The above discussion highlights the basic features of stereovision.

• disparity: The two eyes see the world from two diﬀerent 3D positions. As a result, the
images seen by the two eyes are slightly diﬀerent.

• correspondence: To avoid “double vision,” the brain must match points in the left and right
eyes’s images.

• depth perception: The position diﬀerence of points in the left and right eye images depends
on the 3D geometry of the scene. A vision system can use the position diﬀerences to infer
depth.

We could generalize the geometry by allowing for arbitrary camera translation and rotation
between. This would follow a derivation similar to what we saw with motion in last lecture. However,
since we are mainly concerned with correspondence today, we keep the geometry simple and only
consider translation in X.

1
Lecture 1.2        COMP 646                                                                           Jan. 10, 2008

History: random dot stereograms and the correspondence problem
How does the eye/brain match corresponding points in the left and right images? Until the early
1960’s, it was commonly believed that this correspondence problem was “easy”. The eye/brain
found some familiar pattern in the left image and matched it to the same familiar pattern in the
right image, or vice-versa. This makes sense intuitively – you can see quite well with just one eye.
In the 1960’s, engineers and psychologists became interested in exactly how this process worked
and started doing experiments with digital stereo image pairs. One important type of image that
was used was the random dot stereogram (RDS) which was invented by Bela Julesz at Bell Labs.
The RDS is a pair of images (a “stereo pair”), each of which is merely a random collection of white
and dark dots. As such, it does not contain any familiar features. Although each image on its own
is a set of random dots, there is a relation between the random dots in the two images. The random
dots in the left eye’s image are related to the random dots in the right eye’s image by shifting a
patch of the left eye’s image relative to the right eye’s image.
Julesz carried out many experiments with RDSs. These are described in detail in his classic
book from 1971 and in a paper1 . His results are very important in understanding how stereo vision.
They strongly suggest the human visual system (HVS) does not rely on matching familiar monocular
features to solve the correspondence problem. (Each image of a random dot stereogram is maximally
random. There are no familiar patterns in there (except with extremely small probability.)
The construction of the random dot stereograms is illustrated in the ﬁgure above. First, one
image (say the left) is created by setting each pixel (“picture element”) randomly to either black
or white. Then, a copy of this image is made. Call this copy the right image. The right image is
then altered by taking a square patch2 and shifting that patch horizontally by d pixels to the left,
writing over any pixels values. The pixels vacated by shifting the patch are ﬁlled in with random
values. This procedure yields four types of regions in the two images.

• the shifted pixels (visible in both left and right images)

• the pixels in the left image that were erased from the right image, because of the shift and
write; (left only)

• the pixels in the right image that were vacated by the shift (right only)

• any other pixels in the two images (both left and right)

To view a stereogram such as above, your left eye should look at the left image and your right eye
should look at the right image. (This is diﬃcult to do without training.) If you do it correctly, then
you will see a square ﬂoating in front of a background.

Disparity space
Let’s now relate the above example to a 3D scene geometry that could give rise to it. Suppose
the optical axes of two cameras are parallel; the right camera is displaced from the left in the X
direction only. This allows us to assume the same projection plane Z = f for the two cameras.
1
B. Julesz, ”Binocular depth perception without familiarity cues”, Science, 145:356-362 (1964)
2
it doesn’t have to be a square

2
Lecture 1.2        COMP 646                                                               Jan. 10, 2008

left eye only                                        right eye only

square                          square

background                             background

left eye image                           right eye image                   perception

The 3D scene is a small square in front of a background. This scene yields a disparity d between
the small square in the left and right images.
Consider a single horizontal line y = y0 in the image projection plane such that this line cuts
across the displaced square. We wish to understand the disparity along this line.
The ﬁgure above on the right represents this line in the two images using a coordinate system
(xl , xr ). For each 3D scene point that projects to this line, there is a unique xl and xr value.
Moreover, each depth value Z corresponds to a unique disparity value, since d = xl − xr = Tx /Z.
Notice that the set of lines that arrive at the left eye are vertical lines in the ﬁgure on the
right, and the set of lines that arrive at the right eye are horizontal lines in the ﬁgure on the right.
Similarly, each horizontal line in the ﬁgure on the left represents a line of constant depth (constant
disparity). Each diagonal line in the ﬁgure on the right represents a line of constant disparity
(constant depth).
Make sure you understand where each point on the foreground square and each point on the
background appear in the left and right ﬁgures. (I talked through it in class.)
Note that xl > xr for all points Z > 0. Thus, disparity d is positive.
Because of the geometry of the projection, certain points on the background surface are visible
to one eye only; others are visible to both eyes; still others are visible to neither eye. Points that
are visible to one eye only are called monocular points.

3
Lecture 1.2          COMP 646                                                                               Jan. 10, 2008

not visible to either eye

background
right eye only

foreground square     not visible to either eye
x
r

visible to both eyes
left eye only
projection plane

x
l
left eye         right eye

Formulating the stereo correspondence problem
Now that we better understand the geometry of stereo, let’s try to formulate the stereo correspon-
dence problem as a computation problem. How could a vision system solve the correspondence
problem, say for a random dot stereogram?
The correspondence problem was ﬁrst addressed in a formal way by David Marr and Tomasso
Poggio3 in the mid 1970s. They proposed a speciﬁc algorithm that matches white pixels in the left
image to white pixels in the right image, and similarly black to black.
Marr and Poggio try to derive an algorithm from ﬁrst principles. They introduce two “con-
straints” in the 3D world that make the correspondence problem easier to solve.

• C1 (uniqueness:) A given pixel in each image is typically the projection of one 3D point in
the world, namely a point on an opaque surface.

• C2 (continuity:) Surfaces in the world are typically continuous.

The two constraints often hold, but not always. C1 fails when we have a transparent surface such as
a window, since we have objects visible at diﬀerent depths and at the same image position. C2 fails
when the scene contains small surfaces at diﬀerent depths, for example, a bush or tree. In this case,
there will be lots of discontinuities in the depth function, and hence lots of discontinuities in dis-
parity. Anyhow, the two constraints suggest two rules that can be used to solve the correspondence
problem:

• R1 (uniqueness:) For any position in either the left or right eye’s image, there exists exactly
one disparity value;

• R2 (continuity:) Disparity varies continuously across the left and right image.
3
D. Marr and T. Poggio. ”Cooperative Computation of Stereo Disparity”. Science, 194:282-287. (1976).

4
Lecture 1.2         COMP 646                                                            Jan. 10, 2008

Note that the second constraint does not hold everywhere in the above example, since there is a
depth discontinuity.
Using the constraints/rules on the stereo correspondence problem, they propose an iterative
algorithm to solve the correspondence problem. Assuming N pixels in each image, they discretize
the space (xl , xr ) using an N × N square grid. Each row/column in the grid corresponds to a single
position in the left/right image respectively. The goal is then to compute a binary labeling of the
nodes in this N × N grid. A 1-label means that the node corresponds to a visible surface at that
(xl , xr ) value, a 0-label otherwise.
The rules R1 and R2 constrain the labeling solution:
• The uniqueness rule R1 implies that at most one node can have the label 1 in any row and in
any column.

• The rule R2 implies that the disparity d should vary slowly as a function of xl and as a
function of xr . Any ﬁxed value of disparity d deﬁnes a line xl − xr = d in (xl , xr ) space. This
is a diagonal line with slope 1. Thus, the continuity constraint implies that nodes with label
1 should cluster along such diagonal lines.
Marr and Poggio proposed an iterative algorithm for computing such a labeling, given binary images
Il and Ir from one row in a random dot stereogram. Their paper is signiﬁcant as it is one of the
ﬁrst papers to propose a computational model of a problem solved in visual perception.

How binocular stereo depends on depth
It is often claimed that stereovision is good for near objects, but poor for distant objects. What
does this mean? For the visual system to use stereovision to perceive the relative depth of two
points, it needs to detect a change in disparity ∆d between these two points. Similarly, for the
visual system to distinguish a surface that has constant depth from one that does not have constant
depth, it needs to detect a change in disparity ∆d along the surface.
Consider two points, one at depth Z and the other at depth Z + ∆Z. From d = f Tx /Z0 , we
approximate 4 :
1        −1
∆d = Tx f (∆ ) ≈ Tx f 2 (∆Z)
Z         Z
or
Tx f ∆Z
∆d ≈ −(       )(    ).
Z     Z
How do we interpret this? If we consider two nearby points in the scene, we have a diﬀerence
in depth and a diﬀerence in disparity. The diﬀerence in disparity is proportional to two terms: the
T
ratio Tx , and the relative change ∆Z in depth. The ﬁrst term fZx is the disparity itself. The second
Z                           Z
term ∆Z is the relative depth of the two points.
Z
For example, suppose we are looking at a ball and comparing the nearest point on the ball with
a point near the edge of the ball (as seen by left eye, see ﬁgure below). The term ∆Z is equal to
Z
4
If g(Z) = 1/Z is a function of Z, its derivative is g ′ (Z) = −1/Z 2 and so

∆g(Z) ≈ g ′ (Z)∆Z .

5
Lecture 1.2    COMP 646                                                                 Jan. 10, 2008

the (half)angle subtended by the ball on the image plane. (Actually, it is the tangent of this angle,
but for small angles, tan θ ≈ θ.) Notice that this relative depth is the same for the ball as it is for
any sphere that subtends the same angle, for example the moon.
Thus, two spheres that subtend the same angle but that are at diﬀerent depths have a diﬀerent
the disparity range ∆d. If one of the spheres is very far away (the moon), the absolute depth Z
is large and so the disparity range is near zero. Such a sphere will look ﬂat to a human observer,
since Tx ≈ 0. By contrast, a nearby sphere that subtends the same visual angle as the moon will
Z
not look ﬂat, since the change in disparity along the surface is large enough that it can be detected
by the visual system.

Z

Tx

∆Z

Vergence and disparity
I have set up the mathematics of stereo such that the two eyes are pointing in the same direction,
namely perpendicular to the vector that separates the two eyes. This is not the most general
situation. In particular, in human vision, the eyes rotate and the optical axes (the depth direction)
of the two eyes can converge on an arbitrary point in the image. The geometry of stereo gets a
bit more complicated in the case that the eyes converge. I will delay presenting the details of this
geometry until much later. For now, let’s keep the discussion qualitative.
Consider the 2D plane that is deﬁned by three points: the two eyes, and the point in space
where the optical axes of the eyes converge. This 3D point is, called the vergence point. It is ‘the
point the eyes are looking at. The depth direction Z is not the same in the left and right eyes when
the eyes are converging on a point that is a ﬁnite distance away. Thus, if we keep the deﬁnition of
disparity as before, d = xl − xr , then the disparity value will be diﬀerent from the case when the
optical axes of the eyes were parallel. Roughly speaking, as the eyes converge, the xl value of any
scene point will decrease and the xr value will increase, and so the disparity will decrease.
Here are a few common facts/deﬁnitions you should know about:
• The disparity of the vergence point is 0. The vergence point is not the only point with zero
disparity. Deﬁne the horoptor to be the locus of points with zero disparity.
• The horoptor is roughly identicaly to the Vieth-M¨ller circle, which is the unique circle con-
u
taining the two eyes and the vergence point.
• Any point P has crossed disparity if the dP > 0. This happens if if P is closer than the
horoptor. Crossed disparity means that you would have to “cross your eyes” in order to ﬁxate
on the object.

6
Lecture 1.2     COMP 646                                                                 Jan. 10, 2008

• P has uncrossed disparity if dP < 0. This happens if P is farther than the horoptor.

When the eyes are parallel (verged at inﬁnity), disparity is always crossed or zero. But when the
eyes converge, disparity can be crossed or uncrossed.
disparity < 0

disparity > 0

left eye          right eye

Stereoblindness
As an aside, you may be interested to know that many people are ”stereoblind” or partly ”stere-
oblind”. This doesn’t mean that they can see with one eye only. Rather, it means they cannot
get any depth information from stereovision. Surprisingly, some people are stereoblind for crossed
disparities, but they have stereo vision for uncrossed disparities and vice-versa.
Stereoblindless is analogous to color blindness. As I will brieﬂy discuss later in the course, color
blindless arises when a person is missing certain photoabsorbing pigments in their eyes. There are
diﬀerent types of color blindness, depending on which pigments you are missing. Ten percent of
human males are missing one of the three pigments, and hence partly color blind. Color blindless
doesn’t mean that you cannot see color. Rather it means that you see only a range of colors.

7

To top