Stereo Vision for Mobile Robotic

Document Sample
Stereo Vision for Mobile Robotic Powered By Docstoc
					                                         Stereo Vision for Mobile Robotics
     Marti Gaëtan, Micro-Engineering Laboratory, Swiss Federal Institute of Technology of Lausanne.

    The Virtual Reality and Advanced Interfaces                       2 Stereo vision
    (VRAI) group 1 is currently investigating stereo                     The geometric basis key problem in stereo vision is to
    vision for mobile robots. This paper provides an                  find    corresponding      points   in stereo     images.
    overview of both computational and biological                     Corresponding points are the projections of a single 3D
    approaches to stereo vision, stereo image                         point in the different image spaces. The difference in the
    processing and robot navigation.                                  position of corresponding points in their respective
                                                                      images is called disparity (see Figure 1). Disparity is a
1 Introduction                                                        function of both the position of the 3D scene point and of
   Two eyes or cameras looking at the same scene from                 the position, orientation, and physical characteristics of
different perspectives provide a mean for determining                 the stereo devices (e.g. cameras).
three-dimensional shape and position. Scientific
investigation of this effect (called variously stereo vision,
stereopsis or single vision) has a rich history in                                                                  P
psychology, biology and more recently, in the
computational model of perception. Stereo is an
important method for machine perception because it
leads to direct depth measurements. Additionally, unlike
monocular techniques, stereo does not infer depth from
weak or unverifiable photometric and statistical                          Il                        Ir                                         Il
assumptions, nor does it require specific detailed objects
models. Once stereo images have been brought into                                    Pl                  Pr             Pr          Pl
point-to-point correspondence, recovering depth by
triangulation is straightforward.                                        Fl    Left camera         Fr
                                                                                                         Right camera   = disparity
   Current range imagers have achieved real-time or near                         system
                                                                                                            system             (b)
real-time performance on images of modest size. For
example, stereo algorithms on standard hardware are
capable of returning dense 128 x 128 range images at 10
                                                                       Figure 1 (a) A system with two cameras is shown. The
Hz, while scanning laser range-finders can operate at 2
                                                                       focal points are F l and Fr, the image planes are I l and I r.
Hz on 256 x 256 images. To take advantage of these
                                                                       A point P in the 3D scene is projected onto P l in the left
devices, researchers have proposed numerous methods
                                                                       image and onto P r in the right image, (b) cyclopean
for extracting 3-D information from range images. These
methods operate either in 3-D Cartesian space                          view: the disparity  is the difference in the position of
(volumetric representations) or in a 2.5-D range image                 the projection of the point P onto the two stereo image
space (contour map method). Contour map methods are                    planes.
particularly attractive for computation bound applications
such as mobile robots.                                                  In addition to providing the function that maps pair of
   We begin with a discussion of the geometric basis and              corresponding images points onto scene points, a camera
a computational model for stereo vision. Next, we briefly             model can be used to constraint the search for
describe biological aspects of depth perception and a                 corresponding image point to one dimension. Any point
contour map method for depth processing. Finally, we                  in the 3D world space together with the centers of
present an obstacle avoidance technique for mobile                    projection of two cameras systems, defines an epipolar
robots using real-time stereo vision.                                 plane. The intersection of such a plane with an image
                                                                      plane is called an epipolar line (see Figure 2). Every
                                                                      point of a given epipolar line must correspond to a single
                                                                      point on the corresponding epipolar line. The search for a
                                                                      match of a point in the first image may therefore be
                                                                      reduced to a one-dimensional neighborhood in the second
      1                                                               image plane (as opposed to a 2D neighborhood).
          VRAI Group, IMT-DMT / EPFL, Dr. C. Baur

                                                                                                      [ I ( x  i, y  j )  I ( x    i, y  j )]
                                                                                                                1                       2

                                                                  P                C1 ( x, y,  ) 
                                                                                                        i, j

                                              P                                                       I       2
                                                                                                               1 (x    i, y  j )    I      2
                                                                                                                                               2 (x      i, y  j )
                                              ’                                                       i, j                              i, j

                                                                               It is important to know if a match is reliable or not.
                                      Epipolar                              The form of the correlation curve (for example C 1) can be
                                       plane                                used to decide if the probability of the match to be an
   Il                                                                       error is high or not. Indeed, errors occur when a wrong
                         Epipolar lines                      Ir
                                                                            peak slightly higher than the right one is chosen. Thus, if
                 Pl= P’l             P’r            Pr                      in the correlation curve we find several peaks with
                                                                            approximately the same height, the risk of choosing the
  Fl                                           Fr                           wrong one increases, especially if the image is noisy.
             Stereo baseline                                                However, a confidence coefficient  , proportional to the
  Left camera system                        Right camera system             difference of height between the most important peaks
                                                                            may be defined. Other important information may also be
Figure 2 Epipolar lines and epipolar planes.                                extracted from the correlation curve as, for instance,
                                                                            bland areas.
   When the stereo cameras are oriented such that there is
a known horizontal displacement between them, disparity                     3 Human depth perception
can only occur in the horizontal direction and the stereo
                                                                               For human beings, correlation (as described in the
images are said to be in correspondence. When a stereo
                                                                            previous section) is only a local mechanism of
pair is in correspondence,        the epipolar lines are
                                                                            stereoscopic vision [Bruce and Green 1990]. However,
coincident with the horizontal scan lines of the digitized
                                                                            imagine the following experiment:
   Ideally, one would like to find the correspondence of                            D1           D2            D3                      G1         G2             G3
every individual pixel in both images of a stereo pair.
However, it is obvious that the information content in the
intensity value of a single pixel is too low for
unambiguous matching. In practice, continuous areas of
image intensity are the basic units that are matched. This                                                                                            Interlace of
approach (called area matching) usually involves some                                                                                                 points
form of cross-correlation to establish correspondences.

2.1 Matching
   The main problem in matching is to find an effective
definition of what we call a valid correlation.
Correlation scores are computed by comparing a fixed
window in the first image against a shifting window in
the second. The second window is moved in the second                         Figure 3 Stereoscopic fusion false target problem.
image by integer increments along the corresponding
epipolar line and a correlation score curve is generated                       A stereoscopic system displays the set of points D 1,
for integer disparity values. The measured disparity can                    D2, D3 for the right eye (see Figure 3) and G 1, G2, G3 for
then be taken to be the one that provides the largest peak.                 the left one. An observer should be able to see any
   To quantify the similarity between two correlation                       interlace of points (grey points in the light grey area) but,
windows, we must choose among many different criteria                       instead, they all succeed to the dark ones. This
that produce reliable results in a minimum computation                      experiment shows that a global mechanism, based on
time. We denote by I 1(x,y) and I 2(x,y) the intensity value                criteria other than local correlation, is used. Among these
at pixel (x,y). The correlation window has dimensions                       ones, the following are taken in account:
(2n  1)  (2m  1) . Therefore, the indexes which appear in
the formula below vary between -n and +n for the i-index                             A principle of correlation based on the contours
and between -m and +m for the j-index :                                               of the image (see Figure 4a);
                                                                                    A mechanism of cognitive interpretation which
                                                                                     has, in some cases, more priority than the local
                                                                                     mechanism of stereo vision [Maar 1982];

            This is quite difficult to obtain in practice.

     A mechanism of pictorial clues of depth (relative              intersecting the terrain. These cutting planes induce a
      size, relative height, perspective, shade, “fog”               quantization of the 3-D space based on elevation. Our
      effect and interposition (see Figure 4b));                     approach uses this basic idea, and consists of the
                                                                     following steps:
     A principle of dynamic clues, such as motion                   1. Constructing a set of volumes in 3-D space using a
      parallax;                                                           set of cutting surfaces (not necessarily planar);
     Other mechanisms such as correlation of                        2. Projecting the cutting surfaces back to the range
      frequency filtered images [Poggio and Poggio                        image to induce a quantization of the range data (see
      1984];                                                              Figures 5c and 5d);
                                                                     3. Using the quantized range image to construct terrain
                                                                          models or other abstractions;
                                                                        In cases where the desired segmentation is relative to
                                                                     the sensor viewpoint, the first two steps can be achieved
                                                                     off-line, leading to significant computational savings,
                                                                     especially when the cutting surfaces are complicated. In
                                                                     addition, step 3 can often be performed in the range
                                                                     image space, which is much more efficient than working
                                                                     in the volumetric space. Also, in contrast to the grid-
                                                                     based approaches, the cutting surfaces need not be
                                                                     regular, and can be sized to take in account the precision
                (a)                              (b)                 and error characteristics of the range data.

 Figure 4 (a) “Illusory” contours defining a square
 giving the impression that this shape is placed in front of
 4 circles, (b) interposition principle (cognitive

   This list is not exhaustive but presents the most
significant criteria belonging to the global system of the
human stereoscopic vision 3.
In summary, the human stereo system uses a number of                              (a)
interesting methods, which work together to recover
depth. On one hand, this system is very powerful because                                                       (b)
even using one eye, it is possible to perceive depth. On
the other hand, it is also very subjective because a
trompe-l’œil can fool our perceptive system.
In the next section, we will focus on an example of
computational model for range (disparity) image parsing.
                                                                                           1                            2
4 Contour map method
   For many applications, working in 3D Euclidean space                           (c)                          (d)
turns out to be unnecessary and difficult to manage. To
reduce the amount of data which has to be processed, we              Figure 5 (a) Range image where light (green) pixels
introduce a method of quantifying volumes that allows us             represent points closer than dark ones, (b) Elevation
to manipulate range images (see Figure 5a) directly,                 map composed of the superposing of 8 contours (a
without having to first transform to 3-D space. The                  single contour is represented with a special pattern), (c)
method is similar to the use of contour maps to represent            and (d) projection of the cutting surfaces back in the
elevations; hence, we call it the contour method                     range image for the special contour of image b.
[Chauvin, Marti and Konolige 1997].             A contour
represents the elevation at a particular height (see Figure
5b); all terrain between one contour line and the next is at         5 Mobile robot obstacle avoidance
an elevation between that represented by the contour                    The contour method is well suited for use with the
lines. Contour creation can be visualized as a set of                vector field histogram (VFH) algorithm for mobile robot
planes parallel to the ground at specified heights,                  obstacle avoidance [Borenstein and Koren 1991].
                                                                     Originally developed with sonar sensors, the method used
                                                                     three steps:
          See [Bruce and Green 1990] for more details.

1.   A regular 2-D histogram grid in plan view, holding
     the results of sonar sensor readings around the robot.               The key step is calculating the histogram value h k for
     The value of each grid point represents the number                each sector. Roughly speaking, this value represents the
     of sonar readings that indicated an object within the             probability of finding an obstacle close to the robot in the
     point (see Figure 7a);                                            direction of sector k. The simplest idea is to use a single
2.   A polar histogram is computed from the histogram                  cutting surface at elevation over the ground plane
     grid, with k regular angular sectors instead of a                 sufficient to constitute an obstacle for the robot. Any
     rectilinear grid. The value h k of each sector in the             points in the resulting contour (see Figure 6b) are
     polar histogram represents the obstacle density in                obstacles, and we can use the number of such points in a
     that direction (see Figure 8);                                    column and their distance to determine a histogram
3.   Steering and velocity values are extracted from the               value.
     polar histogram;

                                                                        -45                                                      +45
                                                                       Figure 8 Polar histogram corresponding the obstacles of
                                                                       figure 7.
                              (a)                                         The details of the weighting scheme we use are not
                                                                       critical; we expect almost any reasonable method that
                                                                       combines distance and number of points will work
                                                                       reasonably well. In our implementation we used a stereo
                                                                       system with disparity as the range metric, and let each
                                                                       contour cell m contributes ln[r (m)] to its sector value.
                            Contour                                    This measure compensates for the fact that the disparity
                                                                       increases hyperbolically as an object gets closer.
Figure 6 Representation of an obstacle (a) in the
Cartesian space, (b) in the contour image space.

  In range images, each column of the image represents
a polar sector whose angular width is determined by the
camera parameters (see Figure 7b). We let the k sectors
correspond to the columns of the range image. Thus, we                                 (a)                              (b)
can construct the polar histogram directly from the
contour representation, without having to convert to
Cartesian space.

 Sector 1             0                     Sector k

       -45                                +45
                                    S. 1                  S. k         Figure 9 Experimental results. (a) image of the scene, (b)
                                                                       corresponding disparity image, (c) from left to right: contour,
                                                                       object detection and polar histogram (the vertical line corresponds
                                                                       to the direction followed by the robot and the horizontal line, its
                      (a)                                              speed).



 Figure 7 Two obstacles (a) in the polar grid, (b) in the
contour image grid.
   The VFH method was implemented using a small                       research in the field of stereo-vision should be investigated
stereo system 4 for range images and a PC for processing              for the implementation of biological models as for example
the VFH algorithm. The stereo system returned images                  multiple frequency filtering, in order to increase the quality
at a 5 Hz rate, and the VFH processing took less than 10              of the range image.
ms per image to format the polar histogram and extract
the desired direction and speed of travel. Data were then             7 Acknowledgments
sent to a robot navigation program 5 in order to steer a
Koala6 or Pioneer7 robot.                                               I would first like to thanks the Experimental
   Figure 9 shows a calculation of the polar histogram                Psychology Department of University of Geniva directed
from a typical range image and a single-contour                       by Prof. M. Flueckiger, witch gave my the opportunity to
segmentation. In this case, the sensor covers about a 70              write this article.
degree angle, and each sector is about 0.5 degrees.                     I would also like to thank the VRAI group of Swiss
Image (a) is an intensity image of the scene, and (b)                 Federal Institute of Technology of Lausanne, especially
shows the disparity map computed by a stereo system.                  Mr Terry Fong, Mr Didier Guzonni, Dr Charles Baur and
Brighter green values are higher disparities, hence closer            Mr Nicolas Chauvin for their help in the process of
to the camera. The final set of images (c) shows the                  writing this paper.
contour (left side), a segmentation of some obstacles                   On the American side, I would like to thank Mr Kurt
(middle), and finally the polar histogram. The middle of              Konolige and the SRI International.
the histogram is straight along the camera optical axis,
and the vertical line indicates the direction of travel that          8 References
the VFH algorithm has found. From the picture, this is                [Borenstein and Koren 1991]      J. Borenstein and Y.
the direction through the open door.                                     Koren. The Vector Field Histogram - Fast Obstacle
    Because of the small sector size, there is considerable              Avoidance for Mobile Robots. IEEE Journal of Robotics
variation in adjacent histogram values, and the result                   and Automation 7(3), June 1991.
could benefit from low-pass filtering.
   Additional enhancements in the construction of the                 [Bruce and Green 1990] V. Bruce and P. Green. Visual
polar histogram from contours are under study, among                     Perception, Physiology and Ecology. Lawrence Erlbaum
them:                                                                    Associates Ltd. Publishers, 1990
     Ground plane detection. A contour representing the               [Chauvin, Marti and Konolige 1997]        N. Chauvin, G.
   ground plane would give an indication that there                      Marti and K. Konolige. Contour Maps for Real-Time
   actually was a reasonable path in front of the robot.                 Range Image Parsing. Not yet published, January 1997.
   Here, the value ln[r (m)] for each cell in a sector would
   be subtracted from some initial constant for the sector;           [Maar 1982]      D. Maar. Vision : A computational
     Holes. A contour underneath the ground plane could                 investigation into the human representation and
   be used to check for holes near the robot. The addition              processing of visual information. W.H. Freeman and Co.
   to the histogram would be the same as for positive-                [Marti 1997]    G.Marti. Diploma Work : Stereoscopic
   elevation obstacles;                                                 camera real time processing and robot navigation, March
      Small-height obstacles. Instead of a single contour               1997
   at an appropriate height for obstacles, several could be
   positioned starting from just over the ground plane.               [Poggio and Poggio 1984] G. Poggio and T. Poggio. The
   The contribution of elements from the lower contours                  analysis of stereopsis. Annuel Review of Neuroscience, 7,
   would be weighted by a fraction q depending on their                  379-412.
   height. Thus, the robot would prefer smooth terrain to
   bumpy, even though it could negotiate the latter;

6 Conclusion
Stereoscopic systems for robot navigation are currently
possible using a new technology of low-resolution real-time
devices. Although these devices don’t have the same
performances as the human depth perception system, they
seem efficient for simple applications such as obstacle
avoidance using vector field histograms. Some more

          The Small Vision ModuleTM developed by SRI
          Saphira TM developed by SRI International and EPFL.
          KoalaTM robot is developed by the K-team (EPFL).
          PioneerTM robot is developed by SRI International.


Shared By: