Stereo Vision for Mobile Robotics
Marti Gaëtan, Micro-Engineering Laboratory, Swiss Federal Institute of Technology of Lausanne.
The Virtual Reality and Advanced Interfaces 2 Stereo vision
(VRAI) group 1 is currently investigating stereo The geometric basis key problem in stereo vision is to
vision for mobile robots. This paper provides an find corresponding points in stereo images.
overview of both computational and biological Corresponding points are the projections of a single 3D
approaches to stereo vision, stereo image point in the different image spaces. The difference in the
processing and robot navigation. position of corresponding points in their respective
images is called disparity (see Figure 1). Disparity is a
1 Introduction function of both the position of the 3D scene point and of
Two eyes or cameras looking at the same scene from the position, orientation, and physical characteristics of
different perspectives provide a mean for determining the stereo devices (e.g. cameras).
three-dimensional shape and position. Scientific
investigation of this effect (called variously stereo vision,
stereopsis or single vision) has a rich history in P
psychology, biology and more recently, in the
computational model of perception. Stereo is an
important method for machine perception because it
leads to direct depth measurements. Additionally, unlike
monocular techniques, stereo does not infer depth from
weak or unverifiable photometric and statistical Il Ir Il
assumptions, nor does it require specific detailed objects
models. Once stereo images have been brought into Pl Pr Pr Pl
point-to-point correspondence, recovering depth by
triangulation is straightforward. Fl Left camera Fr
Right camera = disparity
Current range imagers have achieved real-time or near system
real-time performance on images of modest size. For
example, stereo algorithms on standard hardware are
capable of returning dense 128 x 128 range images at 10
Figure 1 (a) A system with two cameras is shown. The
Hz, while scanning laser range-finders can operate at 2
focal points are F l and Fr, the image planes are I l and I r.
Hz on 256 x 256 images. To take advantage of these
A point P in the 3D scene is projected onto P l in the left
devices, researchers have proposed numerous methods
image and onto P r in the right image, (b) cyclopean
for extracting 3-D information from range images. These
methods operate either in 3-D Cartesian space view: the disparity is the difference in the position of
(volumetric representations) or in a 2.5-D range image the projection of the point P onto the two stereo image
space (contour map method). Contour map methods are planes.
particularly attractive for computation bound applications
such as mobile robots. In addition to providing the function that maps pair of
We begin with a discussion of the geometric basis and corresponding images points onto scene points, a camera
a computational model for stereo vision. Next, we briefly model can be used to constraint the search for
describe biological aspects of depth perception and a corresponding image point to one dimension. Any point
contour map method for depth processing. Finally, we in the 3D world space together with the centers of
present an obstacle avoidance technique for mobile projection of two cameras systems, defines an epipolar
robots using real-time stereo vision. plane. The intersection of such a plane with an image
plane is called an epipolar line (see Figure 2). Every
point of a given epipolar line must correspond to a single
point on the corresponding epipolar line. The search for a
match of a point in the first image may therefore be
reduced to a one-dimensional neighborhood in the second
1 image plane (as opposed to a 2D neighborhood).
VRAI Group, IMT-DMT / EPFL, Dr. C. Baur
[ I ( x i, y j ) I ( x i, y j )]
P C1 ( x, y, )
P I 2
1 (x i, y j ) I 2
2 (x i, y j )
’ i, j i, j
It is important to know if a match is reliable or not.
Epipolar The form of the correlation curve (for example C 1) can be
plane used to decide if the probability of the match to be an
Il error is high or not. Indeed, errors occur when a wrong
Epipolar lines Ir
peak slightly higher than the right one is chosen. Thus, if
Pl= P’l P’r Pr in the correlation curve we find several peaks with
approximately the same height, the risk of choosing the
Fl Fr wrong one increases, especially if the image is noisy.
Stereo baseline However, a confidence coefficient , proportional to the
Left camera system Right camera system difference of height between the most important peaks
may be defined. Other important information may also be
Figure 2 Epipolar lines and epipolar planes. extracted from the correlation curve as, for instance,
When the stereo cameras are oriented such that there is
a known horizontal displacement between them, disparity 3 Human depth perception
can only occur in the horizontal direction and the stereo
For human beings, correlation (as described in the
images are said to be in correspondence. When a stereo
previous section) is only a local mechanism of
pair is in correspondence, the epipolar lines are
stereoscopic vision [Bruce and Green 1990]. However,
coincident with the horizontal scan lines of the digitized
imagine the following experiment:
Ideally, one would like to find the correspondence of D1 D2 D3 G1 G2 G3
every individual pixel in both images of a stereo pair.
However, it is obvious that the information content in the
intensity value of a single pixel is too low for
unambiguous matching. In practice, continuous areas of
image intensity are the basic units that are matched. This Interlace of
approach (called area matching) usually involves some points
form of cross-correlation to establish correspondences.
The main problem in matching is to find an effective
definition of what we call a valid correlation.
Correlation scores are computed by comparing a fixed
window in the first image against a shifting window in
the second. The second window is moved in the second Figure 3 Stereoscopic fusion false target problem.
image by integer increments along the corresponding
epipolar line and a correlation score curve is generated A stereoscopic system displays the set of points D 1,
for integer disparity values. The measured disparity can D2, D3 for the right eye (see Figure 3) and G 1, G2, G3 for
then be taken to be the one that provides the largest peak. the left one. An observer should be able to see any
To quantify the similarity between two correlation interlace of points (grey points in the light grey area) but,
windows, we must choose among many different criteria instead, they all succeed to the dark ones. This
that produce reliable results in a minimum computation experiment shows that a global mechanism, based on
time. We denote by I 1(x,y) and I 2(x,y) the intensity value criteria other than local correlation, is used. Among these
at pixel (x,y). The correlation window has dimensions ones, the following are taken in account:
(2n 1) (2m 1) . Therefore, the indexes which appear in
the formula below vary between -n and +n for the i-index A principle of correlation based on the contours
and between -m and +m for the j-index : of the image (see Figure 4a);
A mechanism of cognitive interpretation which
has, in some cases, more priority than the local
mechanism of stereo vision [Maar 1982];
This is quite difficult to obtain in practice.
A mechanism of pictorial clues of depth (relative intersecting the terrain. These cutting planes induce a
size, relative height, perspective, shade, “fog” quantization of the 3-D space based on elevation. Our
effect and interposition (see Figure 4b)); approach uses this basic idea, and consists of the
A principle of dynamic clues, such as motion 1. Constructing a set of volumes in 3-D space using a
parallax; set of cutting surfaces (not necessarily planar);
Other mechanisms such as correlation of 2. Projecting the cutting surfaces back to the range
frequency filtered images [Poggio and Poggio image to induce a quantization of the range data (see
1984]; Figures 5c and 5d);
3. Using the quantized range image to construct terrain
models or other abstractions;
In cases where the desired segmentation is relative to
the sensor viewpoint, the first two steps can be achieved
off-line, leading to significant computational savings,
especially when the cutting surfaces are complicated. In
addition, step 3 can often be performed in the range
image space, which is much more efficient than working
in the volumetric space. Also, in contrast to the grid-
based approaches, the cutting surfaces need not be
regular, and can be sized to take in account the precision
(a) (b) and error characteristics of the range data.
Figure 4 (a) “Illusory” contours defining a square
giving the impression that this shape is placed in front of
4 circles, (b) interposition principle (cognitive
This list is not exhaustive but presents the most
significant criteria belonging to the global system of the
human stereoscopic vision 3.
In summary, the human stereo system uses a number of (a)
interesting methods, which work together to recover
depth. On one hand, this system is very powerful because (b)
even using one eye, it is possible to perceive depth. On
the other hand, it is also very subjective because a
trompe-l’œil can fool our perceptive system.
In the next section, we will focus on an example of
computational model for range (disparity) image parsing.
4 Contour map method
For many applications, working in 3D Euclidean space (c) (d)
turns out to be unnecessary and difficult to manage. To
reduce the amount of data which has to be processed, we Figure 5 (a) Range image where light (green) pixels
introduce a method of quantifying volumes that allows us represent points closer than dark ones, (b) Elevation
to manipulate range images (see Figure 5a) directly, map composed of the superposing of 8 contours (a
without having to first transform to 3-D space. The single contour is represented with a special pattern), (c)
method is similar to the use of contour maps to represent and (d) projection of the cutting surfaces back in the
elevations; hence, we call it the contour method range image for the special contour of image b.
[Chauvin, Marti and Konolige 1997]. A contour
represents the elevation at a particular height (see Figure
5b); all terrain between one contour line and the next is at 5 Mobile robot obstacle avoidance
an elevation between that represented by the contour The contour method is well suited for use with the
lines. Contour creation can be visualized as a set of vector field histogram (VFH) algorithm for mobile robot
planes parallel to the ground at specified heights, obstacle avoidance [Borenstein and Koren 1991].
Originally developed with sonar sensors, the method used
See [Bruce and Green 1990] for more details.
1. A regular 2-D histogram grid in plan view, holding
the results of sonar sensor readings around the robot. The key step is calculating the histogram value h k for
The value of each grid point represents the number each sector. Roughly speaking, this value represents the
of sonar readings that indicated an object within the probability of finding an obstacle close to the robot in the
point (see Figure 7a); direction of sector k. The simplest idea is to use a single
2. A polar histogram is computed from the histogram cutting surface at elevation over the ground plane
grid, with k regular angular sectors instead of a sufficient to constitute an obstacle for the robot. Any
rectilinear grid. The value h k of each sector in the points in the resulting contour (see Figure 6b) are
polar histogram represents the obstacle density in obstacles, and we can use the number of such points in a
that direction (see Figure 8); column and their distance to determine a histogram
3. Steering and velocity values are extracted from the value.
Figure 8 Polar histogram corresponding the obstacles of
(a) The details of the weighting scheme we use are not
critical; we expect almost any reasonable method that
combines distance and number of points will work
reasonably well. In our implementation we used a stereo
system with disparity as the range metric, and let each
contour cell m contributes ln[r (m)] to its sector value.
Contour This measure compensates for the fact that the disparity
increases hyperbolically as an object gets closer.
Figure 6 Representation of an obstacle (a) in the
Cartesian space, (b) in the contour image space.
In range images, each column of the image represents
a polar sector whose angular width is determined by the
camera parameters (see Figure 7b). We let the k sectors
correspond to the columns of the range image. Thus, we (a) (b)
can construct the polar histogram directly from the
contour representation, without having to convert to
Sector 1 0 Sector k
S. 1 S. k Figure 9 Experimental results. (a) image of the scene, (b)
corresponding disparity image, (c) from left to right: contour,
object detection and polar histogram (the vertical line corresponds
to the direction followed by the robot and the horizontal line, its
Figure 7 Two obstacles (a) in the polar grid, (b) in the
contour image grid.
The VFH method was implemented using a small research in the field of stereo-vision should be investigated
stereo system 4 for range images and a PC for processing for the implementation of biological models as for example
the VFH algorithm. The stereo system returned images multiple frequency filtering, in order to increase the quality
at a 5 Hz rate, and the VFH processing took less than 10 of the range image.
ms per image to format the polar histogram and extract
the desired direction and speed of travel. Data were then 7 Acknowledgments
sent to a robot navigation program 5 in order to steer a
Koala6 or Pioneer7 robot. I would first like to thanks the Experimental
Figure 9 shows a calculation of the polar histogram Psychology Department of University of Geniva directed
from a typical range image and a single-contour by Prof. M. Flueckiger, witch gave my the opportunity to
segmentation. In this case, the sensor covers about a 70 write this article.
degree angle, and each sector is about 0.5 degrees. I would also like to thank the VRAI group of Swiss
Image (a) is an intensity image of the scene, and (b) Federal Institute of Technology of Lausanne, especially
shows the disparity map computed by a stereo system. Mr Terry Fong, Mr Didier Guzonni, Dr Charles Baur and
Brighter green values are higher disparities, hence closer Mr Nicolas Chauvin for their help in the process of
to the camera. The final set of images (c) shows the writing this paper.
contour (left side), a segmentation of some obstacles On the American side, I would like to thank Mr Kurt
(middle), and finally the polar histogram. The middle of Konolige and the SRI International.
the histogram is straight along the camera optical axis,
and the vertical line indicates the direction of travel that 8 References
the VFH algorithm has found. From the picture, this is [Borenstein and Koren 1991] J. Borenstein and Y.
the direction through the open door. Koren. The Vector Field Histogram - Fast Obstacle
Because of the small sector size, there is considerable Avoidance for Mobile Robots. IEEE Journal of Robotics
variation in adjacent histogram values, and the result and Automation 7(3), June 1991.
could benefit from low-pass filtering.
Additional enhancements in the construction of the [Bruce and Green 1990] V. Bruce and P. Green. Visual
polar histogram from contours are under study, among Perception, Physiology and Ecology. Lawrence Erlbaum
them: Associates Ltd. Publishers, 1990
Ground plane detection. A contour representing the [Chauvin, Marti and Konolige 1997] N. Chauvin, G.
ground plane would give an indication that there Marti and K. Konolige. Contour Maps for Real-Time
actually was a reasonable path in front of the robot. Range Image Parsing. Not yet published, January 1997.
Here, the value ln[r (m)] for each cell in a sector would
be subtracted from some initial constant for the sector; [Maar 1982] D. Maar. Vision : A computational
Holes. A contour underneath the ground plane could investigation into the human representation and
be used to check for holes near the robot. The addition processing of visual information. W.H. Freeman and Co.
to the histogram would be the same as for positive- [Marti 1997] G.Marti. Diploma Work : Stereoscopic
elevation obstacles; camera real time processing and robot navigation, March
Small-height obstacles. Instead of a single contour 1997
at an appropriate height for obstacles, several could be
positioned starting from just over the ground plane. [Poggio and Poggio 1984] G. Poggio and T. Poggio. The
The contribution of elements from the lower contours analysis of stereopsis. Annuel Review of Neuroscience, 7,
would be weighted by a fraction q depending on their 379-412.
height. Thus, the robot would prefer smooth terrain to
bumpy, even though it could negotiate the latter;
Stereoscopic systems for robot navigation are currently
possible using a new technology of low-resolution real-time
devices. Although these devices don’t have the same
performances as the human depth perception system, they
seem efficient for simple applications such as obstacle
avoidance using vector field histograms. Some more
The Small Vision ModuleTM developed by SRI
Saphira TM developed by SRI International and EPFL.
KoalaTM robot is developed by the K-team (EPFL).
PioneerTM robot is developed by SRI International.