2003High resolution terrain mapping by lsslxb


									      High resolution terrain mapping using low altitude aerial stereo imagery
                                                Il-Kyun Jung and Simon Lacroix
                                                    7, Ave du Colonel Roche
                                               31077 Toulouse Cedex 4 FRANCE
                                             Il-Kyun.Jung, Simon.Lacroix@laas.fr

                          Abstract                               (e.g. bundle adjustment) to refine the camera positions and
                                                                 the 3D coordinates of matched features. In contrast, SLAM
This paper presents an approach to build high resolution         is an incremental approach: the 3D feature map and the sen-
digital elevation maps from a sequence of unregistered low       sor position are simultaneously built and updated.
altitude stereovision image pairs. The approach first uses a          Among the different approaches to tackle the SLAM
visual motion estimation algorithm that determines the 3D        problem, the Kalman filter based approach is the most popu-
motions of the cameras between consecutive acquisitions,         lar. It is theoretically well grounded, and it has been proved
on the basis of visually detected and matched environment        that its application to the SLAM problem converges [3].
features. An extended Kalman filter then estimates both the       Some contributions cope with its main practical drawback,
6 position parameters and the 3D positions of the memo-          i.e. its complexity which is cubic in the dimension of the
rized features as images are acquired. Details are given on      considered state [4]: such developments are necessary when
the filter implementation and on the estimation of the uncer-     mapping very large areas. Other approaches to the SLAM
tainties on the feature observations and motion estimations.     problem have been proposed, mainly to overcome the as-
Experimental results show that the precision of the method       sumption that the various error probability distributions are
enables to build spatially consistent very large maps.           Gaussian, which is required by the Kalman filter. Set mem-
                                                                 bership approaches just need the knowledge of bounds on
1. Introduction                                                  the errors [5], but they are practically difficult to imple-
The main difficulty to build digital terrain maps is to pre-      ment when the number of position parameters exceeds 3,
cisely determine the sensor position and orientation as it       and are somehow sub-optimal. Expectation minimization
moves. Dead reckoning techniques that integrate over time        algorithms (EM) have also been successfully adapted to the
the data provided by motion estimation sensors, such as in-      SLAM problem [6]. In terms of sensor modality, solutions
ertial sensors, are not sufficient because they are intrinsi-     to the SLAM problem has mainly been experimented with
cally prone to generate position estimates with unbounded        range sensors in indoor environments, in the case of robots
error growth. Precise visual motion estimation techniques        moving on planes, i.e. in 2 dimensions [7, 6, 3]. To our
that use stereovision and visual features tracking or match-     knowledge, there are very few contributions to the SLAM
ing have been proposed for ground rovers [1], but their er-      problem based on vision (e.g. [8]).
rors also cumulate over time, since they do not memorize             This paper presents an approach to the SLAM problem
any environment feature. The only solution to diminish the       in 3 dimensions, using only a set of non-registered low alti-
errors on the position estimates is to rely on stable environ-   tude stereovision image pairs. The approach is presented in
ment features, that are detected and memorized as the sensor     the following section, and section 3 summarizes the basic
moves. In the robotic community, it has early been under-        algorithms on which it relies: stereovision, interest points
stood that the problem of mapping such features and esti-        detection and matching, and visual motion estimation. Sec-
mating the robot location are intimately tied together, and      tion 4 details our implementation of the Extended Kalman
that they must therefore be solved in a unified manner [2].       Filter, with a focus on the identification of the various er-
This problem, known as the ”SLAM problem1” has now               rors. Localization results and the building of digital eleva-
been widely studied (an historical presentation of the main      tion maps with a stereovision bench mounted on a blimp
contributions can be found in the introduction of [3]).          flying at a few tens of meter altitude are then presented.
   Contributions on “structure from motion” addresses the
same problem. Successful approaches have been reported,          2. Overview of the approach
but they require batch processing (i.e. with the whole se-       Landmarks are interest points, i.e. visual features that can
quence of the acquired images) and a global optimization         be matched when perceived from various positions, and
  1 SLAM   stands for “Simultaneous Localization And Mapping”    whose 3D coordinates are provided by stereovision. We use
an extended Kalman filter (EKF) as the recursive filter: the
state vector of the EKF is the concatenation of the stereo
bench position (6 parameters) and the landmark’s positions
(3 parameters for each landmark). The key algorithm that
allows both motion estimation between consecutive stereo-
vision frames (prediction) and the observation and match-
ing of landmarks (data association) is a robust interest point   Figure 1: A result of the stereovision algorithm, with an image pair
                                                                                £ ¡
                                                                 taken at about       altitude. From left to right: one of the original image,
matching algorithm.                                              disparity map (shown here in a blue/close red/far color scale), and 3D
   The various algorithmic stages achieved every time a          image, rendered as a mesh for readability purposes. Pixels are properly
stereovision image pair is acquired are the following:           matched in all the perceived areas, even the low textured ones.
   1. Stereovision: a dense 3D image is provided by stere-
ovision (section 3.1).                                           ris detector yields a repeatability high enough to allow ro-
   2. Interest points detection and matching between con-        bust matches [10].
secutive frames, and with past frames in which old land-             To match interest points, we use a matching algorithm
marks are visible (section 3.2).                                 that relies on local interest point groups matching, impos-
   3. Landmark selection: a set of selection criteria are ap-    ing a combination of geometric and signal similarity con-
plied to the matched interest points, in order to partition      straints, thus being more robust than approaches solely
them in three sets: an a non-landmark set, a candidate-          based on local point signal characteristics (details can be
landmarks set and observed-landmark set (section 4.2).           found in [11]). Figure 2 shows that this algorithm generates
   4. Visual motion estimation (VME): the interest points        a lot of good matches, even when the view point change
retained as ”non-landmarks” are used to estimate the 6 mo-       between the considered images is quite high.
tion parameters between the previous and current frames
(section 3.3).
   5. Update of the Kalman filter state (section 4).
   Finally, after every SLAM cycle defined by these steps,
a digital elevation map is updated with the acquired images
(section 5.2).
   Step 3 is necessary for two reasons: first, only non-
landmarks points should be used to to estimate the local mo-
tion, in order to de-correlate the prediction and update steps
of the Kalman filter and second, new landmarks should be          Figure 2: A result of our interest point matching between two non regis-
                                                                 tered aerial images
cautiously added to the filter state, in order to avoid a rapid
growth of its dimension and to obtain a regular landmark
coverage of the perceived scenes.
                                                                 3.3. VME (Visual Motion Estimation)
                                                                 The interest points matched between consecutive images
3. Basic algorithms                                              and the corresponding 3D coordinates provided by stereo-
3.1. Stereovision                                                vision are used to estimate the 6 displacement parameters
We use a classical pixel-based stereovision algorithm, that      between the images, using the least square minimization
relies on a calibrated binocular stereovision bench (fig-         technique presented in [12].
ure 1). A dense disparity image is produced thanks to a              Conventional techniques to get rid of the outliers using
correlation-based pixel matching algorithm, false matches        fundamental matrix estimation, would not cope for stereovi-
being filtered out thanks to a reverse correlation. The 3D co-    sion errors, such as the ones that occur along depth discon-
ordinates of the matched pixels are determined, with an as-      tinuities. Therefore, matches that imply a 3D point whose
sociated uncertainty whose computation is depicted in sec-       coordinates uncertainties are over a threshold are first dis-
tion 4.1.                                                        carded (the threshold is empirically determined by statisti-
                                                                 cal analysis of stereovision errors). Then, a 3D transforma-
                                                                 tion with the remaining matches is estimated and the 3D
3.2. Interest points detection and matching                                                                 ¥
                                                                 matches of which error is over times the mean of the
Visual landmarks must be invariant to image translation, ro-     residual errors are eliminated: should initially be at least
                                                                                    ¥                ¨ ¦
tation, scaling, partial illumination changes and viewpoint      greater than 3.   
                                                                                      is set to       and the procedure is re-
changes. Interest points, such as detected by the popular        iterated until        .
Harris detector, has proven to have good stability proper-           This outlier rejection algorithm considering both match-
ties [9]. When a there is prior knowledge on the scale           ing and stereovision errors yields a precise 3D motion esti-
change, even approximate, a scale adaptive version of Har-       mation between consecutive stereovision frames (see results
                                                                                                                                                            w%¥  sr—
                                                                                                                                                                 u 7                            V V
                                                                                                                                                                                             a  XXV  YH¥  s7 w‡ —  tXWV  ¢%¥  s7
                                                                                                                                                                                                       &     u v       & V V  &     u    —P
in sections 4.1 and 5.1), which is used for the prediction                                                              where              jf
                                                                                                                                           u                                                                                    ƒ
stage of the Kalman filter.                                                                                           and , the error covariance of th landmark observation.
                                                                                                                     Update. The update stage fuses the prediction and the ob-
4. Kalman filter setup                                                                                                servation to produce the state estimate and its associated
The EKF is an extension of the standard linear Kalman fil-                                                            covariance:
ter, that linearizes the nonlinear prediction and observation                                                                  6!3C–3¥  ˆ
                                                                                                                                ¨  ¥ ‰ ¨                                                  i–¥  u p i–3¥  †yA%C¥  ˆ
                                                                                                                                                                                              ¨         ¨    u x  ¥ ‰ ¨ 
models around the predicted state. The discrete nonlinear                                                                  G                                                             G
system and the observations are modeled as:                                                                          D              ¨    ˜u x ¨    q ¨      x ¦ ¥ ‰ ¨       ¨  ¥ ‰ ¨ 
                                                                                                                                   !u%¥  †piu%¥  u )!u%¥  u z¢%Cu%¥  D 6!uHu%¥ 
             !3¥  20)!¥  (¢%¥ $ "!¥ 
              ¨    1  ¨    ' &    #  ¨                                                                                                T
                                                                                                                                      ¨        {u q  7 — ¥ ‰ ¨          ¨ 
                                                                                                                                     !}|¥  ")%¥  ˜u e(%Cyg¥  D 6iyg¥  u x
                                                                                                                         in which
                    !C¥  2A)!¥ @9 86!3¥ 54
                     ¨    B  ¨    7  ¨                                                                       is the kalman filter gain matrix.
           %¥  '                                             1         B                                                When detecting a new landmark, it is added to                                                                        ~%¥  ˆ
where         is a control input and , are vectors of tem-                                                           the state vector of the filter, that becomes                                                                          G
                                                                                                                           ˆP          al%¥  T@€` ˆ R Y%¥  ` ˆ R WX5H¥  T ˆ R Y%¥ 
                                                                                                                                                        &           V V V         &
porally uncorrelated errors with zero mean and covariance
%¥  3D
     E      %¥  3D
                 F                                                                                                       Q G                            . The landmark ini-
       ,         .                                                                                       IH¥ 
                                                                                                                    tialization model is:
   In our approach, the state of the filter
                                      P                                                              G                                                         ˆ  %¥  6%¥  @€` ˆ R
                                                                                                                                                                             T                                     ƒ
                                                                                                                                                                                                                (%¥  @ yn Y%¥ 
                a ` R V V V T R                                                                                                                                                                                          ‚ &
                    is composed of the 6 position parameters
                                  Q                                                                                                                          Q G                                                                                   (4)
 Xp)rp)%p)g(e)& c P
a s h & q h & i h & f & d       
                             of the stereovision bench and of
                                    Q G
                                                                              u                    €u 4 (y()u  P
                                                                                                    a & u x &                                                  „…                                    %¥  ‡ D                        %¥ 
                                                                                                                                                                                                                            ˜ H¥  s D        †‡    D
a set of      landmarks 3D coordinates
                            … ƒ ‚
                                                                ,                               D                                      D        6%¥ 
                                                                                                                                                                         D                          %¥  w‡ D
                                                                                                                                                                                                               Q                    Q
                                                                                                                                                                                                                                 %¥  ˜ ‡
                                                                                                                                                                                                                           ˜ %¥  s ‡ D
               . The associated state covariance
                        t                                is com-                                                                                                                                                                                  Q(5)
                                                                                      D                                                                                                              %¥  s 3D
                                                                                                                                                                                                             ‡                    s
                                                                                                                                                                                                                             %¥  Ss D
                                                                                                                                                                                                                                     %¥  s D
posed of the the stereo bench pose covariance
                               ‡ ‡
                                                      , the land-                          Q
                                                                                           YQ                                                                                                                                                   Q
marks covariance            and the cross-covariance between
                                              D       ‡
                                                                                                                               D               o%¥  s
                                                                                                                                               —                                       H¥  ‡ D %¥     —6H¥  s ‡ D &¢%¥            D %¥ 
                                                                                                                                Q                                     Q                         Q        Q                         Q YQ
the bench pose and landmarks           [3]. In the Kalman filter
                                                                                                                         o%¥  Ys D
                                                                                                                         —       s                                             %¥  ˜ !su(%¥  ‡ fh%¥  ™se—’(%¥  ˜ 
                                                                                                                                                                                             —                            — h%¥            D %¥
framework, the state estimation encompasses three stages:                                                                                                   Q                                                             Q             Q
prediction, observation and update of the state and covari-                                                                                              ƒ@ ˆn
                                                                                                                                                              ‚                                   H¥                                            H¥
ance estimates.                                                                                                          where            denotes the new landmark,      the ini-
Prediction. Under the assumption that landmarks are sta-                                                             tialization function using the current robot pose estimate
                                                                                                                                           f    ‡
tionary, the state prediction is:                                                                                    and       the error covariance of the new landmark.
                    )!¥  S¢%¥  ˆ  %¥  6%C3¥  ˆ
                      ¨    ‘ &             #  ¥ ‰ ¨ 
                                    G                    G                                                           4.1. Errors identification
          6!“’¥  ‘
           ¨                   s h ” & q h ” & i h ” gf ” ed ” & c •
                                                       &    &        ”                                               Error identification is crucial to set up a Kalman filter, as a
where                           ¨ 
                                §@¥             ¥
                                                    is the vi-                                                       precise determination of these errors will avoid the empiri-
sual motion estimation result between and          positions.                                                        cal ”filter tuning” step. In our context, the following errors
The associated state covariance prediction is written as:                                                            must be estimated:
                                                                                                                                   ‰                                                                                       ‡ f
                  D %¥  #    86%–¥ 
                             —  ¥ ‰ ¨     D                 —
                                                              (%¥               0%¥  ™#
                                                                                      ˜                                  the landmark initialization error (    ),                                                      u
           Q YQ            Q             Q
                                         SQ                                            Q                                           ‰
      0%¥  ™reh%¥  gp%¥  re—
                ˜ # d —       d f   # d                                     i¨–¥  E3D                                   the landmark observation error (
                                                                                                                                                                 for the observed
                                                                                                               (1)   landmark ),
                                                                                                                                   ‰                                                                                 '
                     D       —  ¥ ‰ ¨ 
                             86%¥  ‡                     D %¥  #            %¥  ‡                                  and the error of the input control , which is the visual              d
                         Q                                Q                  Q
                                                                                                                     motion estimation result ( ).                                                                                                      1
where      represents the error covariance of the visual mo-                                                            Note that in our approach, the lumped process noise
tion estimation result. Note that the covariance of land-                                                            is set to , landmarks being stationary and the robot pose
marks is not changed in the prediction stage.                                                                        prediction being directly computed with the current pose
Observation. When observing the        landmark, the obser-                                                          and the result of the visual motion estimation.
vation model and the Jacobian of the observation function
are written as:                                                                                                      Landmark initialization errors. When new landmarks
                    (%y3¥  ˆ  H¥  u o%¥  u ˆ n
                      ¥ ‰ ¨             7  ¥ ‰ ¨                                                                  are detected, their 3D coordinates being computed by stere-                        f   ‡
                                                                                                                     ovision, the covariance matrix       on new landmarks is to-
               (%C3¥  ˆ  %¥  @7
                 ¥ ‰ ¨ 
                               is a function of the predicted                                                        tally defined by the stereovision error.
                       9k ƒ
                       m                                                                                                 Statistics on image pairs acquired from the same position
robot state and the     landmark in the state vector of the
filter, which maps the state space into the observation state.                                                        show that the distribution of the disparity computed on any
                                                                                                                     given pixel can be well approximated by a Gaussian [13],
The innovation and the associated covariance is written as:
                                                                                                                     and that there is a correlation between the shape of the simi-
                6!3¥  u p
                 ¨                 @!3¥  u n
                                     ¦ ¨                          %yC–3¥  u ˆ n
                                                                    ¥ ‰ ¨ 
                                                                                                               (2)   larity score curve around its peak and the standard deviation
  D H¥  sr8ij0¥  0q
         u 7 —  ¨  u                !j0¥  jut%¥  ˜ sr(%j0¥ 
                                       ¨    u f      u 7 — ¥ ‰ ¨ 
                                                                                                                     on the disparity: the sharper the peak, the more precise the
                                                                                                               (3)   disparity (figure 3). A standard deviation        associated to                                              ‹
                9                                                                  3.5
                                     m = -38.71, s = 0.048                                            m = -46.16, s = 0.145
                6                                                                                                                                                       0.3

                                                                                                                                      Disparity standard deviation

     Π         5

                                                                     Π            1.5

                                                                                                                                                                       0.2
                0                                                                   0
                                   disparities                                                       disparities

                7                                                                  3.5
                             m = -46.039066, s = 0.096855                                     m = -31.684595, s = 0.128873
                6                                                                   3

                5                                                                  2.5                                                                                  0.1


                4                                                                   2
     Π         3
                                                                     Π            1.5

                2                                                                   1

                1                                                                  0.5
                                                                                                                                                                                            0.2                      0.4     0.6
                                                                                                     disparities                                                                                            Peak curvature

Figure 3: Left: examples of some probability density functions of dis-
parities computed on a set of 100 image pairs, with the corresponding
Gaussian fit. Right: Standard deviation of the disparities as a function of
the curvature of the similarity score curve at its peak.                                                                                                                                                                               Figure 4: Principle of the combination of the matching and stereovision
                                                                                                                                                                                                                                       errors. The points located in the square box are the projection of on the                                                                                                             ¯
                                                                                                                                                                                                                                       image plane. Small ellipses indicate stereovision errors, the large dotted
each computed disparity is estimated using the curvature                                                             Ž                                                                                                                 ellipsoid is the resulting observation error.
of the curve at its peak, approximated by fitting a parabola.
    Once matches are established, the coordinates of the 3D                                                                                                                                                                                                    £      ª                                 ±£
                                                                                                                                                                                                                                                                                                        °                                                                                                                                    ª
                                                                                                                                                                                                                                           where     and    are the 3D point coordinates of   and                                                                                                                                ²
points are computed with the usual triangulation formula:                                                                                                                                                                                                                                             ª «                  ° «
4                    ’
                    ‘                                          “
                                                             4 d ”                                                4 ˜–“x
                                                                                                                     — •                                                                               4                              its neighbors, and     and     are the corresponding vari-
         ,            and            , where is the depth,                                                                                                                                                                         ™
                     ‹                                                                                                d H“                                                        — •                                                  ances.
is the stereo baseline, and ,       and    are calibration pa-                                                                 š                                                                                                                                                                                                 Ÿž 
                                                                                                                           › &
                                                                                                                        œ)@'                                                                                                              When the matching error is    pixel, the 3D coordinates
rameters (that depends on        , the considered pixel image
                                                                                                                                                                                                                                       being only computed on integer pixels by stereovision, we
coordinates). Using a first order approximation, it comes:                                                                                                                                                                                                                                                                                                                                                        £                   ®u
                                                                                                                                                                                                                                       assume the 3D surface variation is locally linear:
                                                                                                                                                                                                                                             bª       £       ´
                                                                                                                                                                                                                                                              )’u                                      ®u «
                                                                                                                                                                                                                                                                                                       ³                 ´ u «  ª
                                                                                                                                                                                                                                                                                                                          (pwywb«                                                                                                               
                                                                                                                     s            ‹
                                                                                                                                   ˜4 5Š                                                                                                              and                     ¡. These coordinates                                                 ¡
                                                                                                                                      š                                                                                                and the associated variances are used in the equations (2)
                                                                                                                                                                                                                                       and (3).
The covariance matrix of the point coordinates is then de-
rived from the triangulation equations. When a new land-
mark is observed, its coordinates are added to the filter state,                                                                                                                                                                        Motion estimation errors. Given a set of 3D matched
                                                                                                                                                                                                                                                      ·¶                     £P                £ XX& T
                                                                                                                                                                                                                                                                                                  & &         £ & `       a ` g£ XW& T
                                                                                                                                                                                                                                                                                                                              ¸  & &
and the state covariance is updated according to equations                                                                                                                                                                             points           µ                 , the function which is
(4) and (5).                                                                                                                                                                                                                           minimized to determine the corresponding motion is [12]:
                                                                                                                                                                                                                                             6±¶ ºˆ ‘  ¹
                                                                                                                                                                                                                                                  &                                    §              £        ˆ &
                                                                                                                                                                                                                                                                                                             £ ¾f eˆ d & ˆ c  ½¦ ¸
                                                                                                                                                                                                                                                                                                                               ¼                                      ¦                      ˆ & qˆ & iˆ
                                                                                                                                                                                                                                                                                                                                                                                     ˜€atsph rph eph P
Observation error. In our case, landmark observation is                                                                                                                                                                                         µ                                               T ¨
                                                                                                                                                                                                                                                                                                                                  »                     »
based on interest point matching. Outliers being rejected                                                                                                                                                                                                                          »
(section 3.3), only interest point location errors are consid-                                                                                                                                                                                                     wˆ ‘
                                                                                                                                                                                                                                                                                                                         i ˆ &
                                                                                                                                                                                                                                                                                                     ¶ ˆ ‘ ipˆ h &rpˆ h &epˆ h &gf eˆ d & ˆ c 
                                                                                                                                                                                                                                                                                                            s      q
ered to determine the matching error on image plane. With                                                                                                                                                                                 where                          . . can be written with   µ
                                                                                                                                                                                                                                                                                           Á¶·À¶ º‘ ” ±¿2ˆ ‘
                                                                                                                                                                                                                                                                                                      &        ‘                                                                                      ”                ¶
the precise Harris detector, the location estimate of inter-                                                                                                                                                                           random perturbations (
                                                                                                                                                                                                                                                                                               ) and
est point in two consecutive images taken from a very close                                                                                                                          ž
                                                                                                                                                                                   Ÿ ˆ¨
                                                                                                                                                                                                                                       the true and       are not observed. In order to measure                                        ˆ‘                                                                            £
point of view is precise enough to use       pixel as the max-                                                                                                                                                                         the uncertainty of , the uncertainties of landmarks       and                                                                                                                                     »
imum error tolerated to asses good matches [9] due to neg-                                                                                                                                                                             their observation     are propagated as shown in [14]: con-
                                                                                                                                                                                                                                                                          £                                                   j£
ligible projective distortion and occlusion. The expected                                                                                                                                                ž 
                                                                                                                                                                                                                                       sidered that     and   Â ±D
                                                                                                                                                                                                                                                                  are not correlated, the covariance                       »                    »
matching error is therefore of the order of     pixel. In con-                                                                                                                                                                         estimate     can be also written as:
trast, under different view point(i.e. when re-perceiving a                                                                                                                                                                                                     ˆ &                                                   T                     ˆ &                                                                                      T
landmark after a long loop), a more flexible tolerance limit                                                                                                                                                                                             { (±¶ ºˆ ‘  Hy   D   Å Ä                                        Æ
                                                                                                                                                                                                                                                                                                                          ­ •         { (j¶ €ˆ ‘  H˜ º­ Æ
                                                                                                                                                                                                                                                                                                                                                     Å Ä   Ç
                                                                                                               ž                                                                                                                                                                  ‘                  à                                                ‘
is applied(i.e. around      pixel) [11]. In such case, the ex-                                         ¡
                                                                                                                                                                                                    ¨                                                                                      Ä                                                             Ä
pected matching error value is then set to pixel.                                                                                                                                                                                                                           Ê È
                                                                                                                                                                                                                                                                         Â iÉ
                                                                                                                                                                                                                                            where                            Ã È                      is the Jacobian of the cost function and
    The observation error is defined by the reprojection of                                                                                                                                                                                                                                       Å
the matching error on the 3D plane estimated by stereovi-
sion (with an associated error - figure 4).                                                                                                                                                  ¨                                                 »
                                                                                                                                                                                                                                                  £ &€ˆ ‘  HÄ   Å
                                                                                                                                                                                                                                                                      £ T ¨                        ­ Æ                            o­ D
                                                                                                                                                                                                                                                                                                                                   Ë                Å HÄ
                                                                                                                                                                                                                                                                                                                                                                                  £ ºˆ ‘ 
                                                                                                                                                                                                                                                                                                                                                                                                            ˜ (
    When the 2D matching error is set to pixel, the errors                                                                                                                                                                                                  »          Ä               »                                                    Ä                     »
provided by stereovision for the 3D matching point and its                                                                                                                                                                                                                               `
8 closest neighbors are used to compute the expected 3D                                                                                                                                                                                         ¸ £ ºˆ ‘ 
                                                                                                                                                                                                                                                      &                                       §                                                                                     £ oˆ ‘ 
                                                                                                                                                                                                                                                                                                                                                                                       &            ¸                    ˜
                                                                                                                                                                                                                                                                    Å                               Ç ­ Æ                         ËÇ ­ D                   Å
coordinate and associated variance of the matching point as                                                                                                                                                                                 »
                                                                                                                                                                                                                                                                   ¸ £ T ¨                                                                                                ¸
follows:                 ¢                                                                                                                                                              ¢                                                                                 Ä                    »                                                Ä
                                                                                                                                                                                                                                                                                             d f
                                                                                                                                                                                                                                                                                              j–  D
                                                                                                                                                                                                                                                     is the input covariance matrix which is used in   Ã
                                                            § ¨                         (u
                                                                                         &                                                                                  § ¨
                     £                                                         £                                    ¬ ­ «                                                       £            ¦              £      «
                                                                                                                                                                                                                   u y  ’u          equation (1) to estimate the state variances during the filter
                                                        ¥ ©u
                                                      ª0¨                                                                                                              ¥ ®u
                                                                                                                                                                                                                                       prediction stage.

4.2. Landmark selection                                                                                                                                                                                                                                             25

As explained in the section 2, the 3D matches established                                                                                                                                         SLAM


after the interest point matching step are split into three sets.                                                                                                                                                                                                   15


The observed-landmarks set is simply the points that corre-                                                26

sponds to landmarks already in the state vector of the EKF.                                                24

                                                                                                                                                                                                              15                                                     5

                                                                                                                                                                                                                                                                     0                                                                              Initial
The rest of the matches are then studied, to select the set of                                             20
                                                                                                           −15                                                                       Initial



candidate-landmarks according to the following three crite-                                                                                                    −5
                                                                                                                                                                                 0                                                                                  −25            −20          −15            −10              −5                   0          5

ria:                                                                                                                                                   Longitude(m)                                                                                                                                        Longitude(m)

      Observability. Good landmarks should be observable
                                                                      Figure 5: A result of the SLAM implementation with a sequence of 40
in several consecutive frames.                                        stereovision pair. The left image show the reconstructed trajectory in 3D,
     Stability. The 3D coordinates of good landmarks must             the right one shows the 120 landmarks mapped, with         uncertainty el-                                                                                                                                                              Õ
be precisely estimated by stereovision.                               lipses magnified by a factor of 40.
      Representability. Good landmarks must efficiently                                                     0.35                                                                                                                                                    0.12

represent a 3D scene. The stereovision bench state esti-                                                    0.3
                                                                                                                                                                                                                                                                              Translation X
                                                                                                                                                                                                                                                                              Translation Y
                                                                                                                                                                                                                                                                              Translation Z

mation will be more stable if landmarks are regularly dis-

                                                                      Sigma of angle uncertainty(degree)

                                                                                                                                                                                                                             Sigma of translation uncertainty(m)

patched in the perceived scene, and this regularity will avoid                                              0.2

a rapid growth of the EKF state vector size.                                                               0.15

   The number of candidate landmarks that are checked is                                                    0.1

determined on the basis of the number of new interest point                                                0.05

matches (i.e. the ones that do not match with an already
                              ́ ¨
                              Ì                                                                                 0                                                                                                                                                    0
                                                                                                                    0             5             10        15          20    25           30              35             40                                                0    5          10          15       20          25            30              35    40

mapped landmark). We use            of the new interest points,                                                                                                Image step                                                                                                                                  Image step

as the visual motion estimation technique requires a lot of
                                                                      Figure 6:   Evolution of the standard deviations of the camera position
matches to yield a precise result. The landmark selection             parameters during the flight shown in figure 5.
is made according a heuristic procedure so that they satisfy
the above three criteria.                                                                                                                                                                                               Î ¨
                                                                                                                                                                                                                        b®ž 
                                                                      translation errors are below        in the three axes after an      Î 
5. Results                                                            about        long trajectory, and angular errors are all below  Ò
Our developments have been tested with hundreds of im-                half a degree.
                                                                        Figure 7 shows and other trajectory reconstructed with a
ages taken on-board a blimp, at altitudes ranging from
              Î Ÿ                 ž    Î
to       . The cameras of the    ¡ ¡ wide stereo bench are            set of 100 images.
             ¨ ¸ ¸ !¨´                             ž   Î
ӈÒiÑzÐ6Ïi¡      pixels CCD sensors, with a a
                    ¡                             Ó Ïfocal
length lens.                                                          5.2. Digital elevation maps
                                                                      Thanks to the precise positioning estimation, the processed
5.1. Positioning errors                                               stereovision images can be fused after every update of the
We do not have any localization mean that could be used               EKF into a digital elevation map (DEM), that describes the                                                                               x  Ö
                                                                                                                                                                                                              e)& 9  4
as reference on-board the blimp (such as a centimeter accu-           environment as a function                , determined on every
                                                                                                                                u )& u $
racy GPS). However, when the blimp flies over an already               cell         of a regular Cartesian grid.
perceived area, the VME can provide an precise estimate                  Our algorithm to build a DEM simply computes the el-
of the relative pose between the first and last frame of the           evation of each cell by averaging the elevations of the 3D
sequence that overlaps.                                               points that are vertically projected on the cell surface. Since
   Figure 5 presents a comparison of the reconstructed loop
trajectory, while figure 6 shows the evolution of the standard                                                                                   Reference                                      VME                                                                             SLAM                                   SLAM
deviation of the 6 position parameters of the stereo bench                                                                                       std. dev.                                   abs. error                                                                       std. dev.                              abs. error
when applying the EKF. Until image 25, the standard devi-                                                                   ×                          Ø ¨
                                                                                                                                                        ˆ ©ž                                     Ø ž
                                                                                                                                                                                                    ˜  e                                                                          Ø ¨ ž 
                                                                                                                                                                                                                                                                                                                            Ø ˆ ž 
ation grows, however much more slowly than when prop-                                                                   Ù                            ˆˆ ž 
                                                                                                                                                     Ø ¤                                            Ø ˜ ž                                                                          Ø ž                                   Ø ¨ ž 
                                                                                                                                                                                                                   Ï ¡                                                                     Ó ¡                                           Ò
agating only the errors of the VME. A few old landmarks                                                                     Ú                         Ø ž                                       Ø  ž          Ÿ                                                                 ؘy ž 
                                                                                                                                                                                                                                                                                         ¤                                Ø ˆ©ž  ¨ ¨
                                                                                                                                                            Ï                                                Ò
are re-perceived in the following images: the standard de-                                                                i
                                                                                                                          ph                               Î ž                                          Î º¨ ¤ ž                                                                   Î ¨ ¨
                                                                                                                                                                                                                                                                                       ºˆ©ž                                    Î             ž 
                                                                                                                                                                Ï                                                                                                                                                                     Ï i¡
viations decreases, and stabilizes for the subsequent images                                                             qph                              Î ž                                         Î            ž
                                                                                                                                                                                                                     ˆ¨                                                                 Î ž                                  Î ž 
                                                                                                                                                              Ï                                                Ñ œÏ                                                                           Ó                                     Ñ
where some old landmarks are still observed.                                                                            sph                             Î
                                                                                                                                                        o¨  ž                                       º¨ ¤ ž 
                                                                                                                                                                                                      Î                                                                                Î ž                                  Î  ®ž       ¨
   The quantitative figures summarized in table 1 compare
the results of the final position estimate with respect to             Table 1: Comparison of the errors made by the propagation of the visual
the reference: the precision enhancement brought by the               motion estimation alone and with the SLAM EKF approach, using as a
EKF is noticeable, and the absolute estimated errors are all          reference the VME applied between images 1 and 40
bounded by twice the estimated standard deviations. The
                                                                                                                          60                                                                        References


                                   Initial                                                                                40
                                                                                                                                                                                                     [1] C. Olson, L. Matthies, M. Schoppers, and M. Maimone.

                                                                                                                                                                                                         Stereo ego-motion improvements for robust rover naviga-

    −10                                                                                                                   30

                    0                                                                                                                                                                                    tion. In IEEE International Conference on Robotics and Au-
                        10                                                                                                20

                             20                                                                                                                                                                          tomation, pages 1099–1104, May 2001.
                                  30                                                                                      10


                                            50                                                                             0
                                                                                                                                                                                                     [2] R. Smith, M. Self, and P. Cheeseman. A stochastic map for
                                                                                                       60                              position
                                                                                                                                                                                                         uncertain spatial relationships. In Robotics Research: The
          Longitude(m)                                70         0                                                        −10
                                                                                                                           −20   −10   0          10   20        30
                                                                                                                                                                           40   50   60   70   80
                                                                                                                                                                                                         Fourth International Symposium, Santa Cruz (USA), pages
                                                                                                                                                                                                         468–474, 1987.
               Figure 7: A longer trajectory reconstructed with a sequence of 100 im-
               ages. 320 landmarks have been mapped (the magnification factor of the                                                                                                                  [3] G. Dissanayake, P. M. Newman, H-F. Durrant-Whyte,
               uncertainty ellipses in the right image is here 20)                                                                                                                                       S. Clark, and M. Csorba. A solution to the simultaneous lo-
                                                                                                                                                                                                         calization and map building (slam) problem. IEEE Transac-
                                                                                                                                                                                                         tion on Robotic and Automation, 17(3):229–241, May 2001.
                                                                                                                                                                                                     [4] J. Guivant and E. Nebot. Optimization of the simultaneous
               a luminance value is associated to each 3D point produced                                                                                                                                 localization and map building algorithm for real time imple-
               by stereovision, it is also possible to compute a mean lu-                                                                                                                                mentation. IEEE Transactions on Robotics and Automation,
               minance value for each map cell. Figure 8 shows a digital                                                                                                                                 17(3):242–257, June 2001.
               elevation built from the 100 images during the trajectory of                                                                             C©ž 
                                                                                                                                                        Î ¨                                          [5] M. Kieffer, L. Jaulin, E. Walter, and D. Meizel. Robust au-
               figure 7: the resolution of the grid is here        , and no                                                                                                                               tonomous robot localization using interval analysis. Reliable
               map discrepancies can be detected in the corresponding or-                                                                                                                                Computing, 6(3):337–362, Aug. 2000.
                                                                                                                                                                                                     [6] S. Thrun, W. Burgard, and D. Fox. A real-time algorithm for
                                                                                                                                                                                                         mobile robot with applications to multi-robot and 3d map-
                                                                                                                                                                                                         ping. In IEEE International Conference on Robotics and
                                                                                                                                                                                                         Automation, San Francisco, CA (USA), 2000.
                                                                                                                                                                                                     [7] O. Wijk and H.I. Christensen. Triangulation based fusion of
                                                                                                                                                                                                         sonar data for robust robot pose tracking. IEEE Transactions
                                                                                                                                                                                                         on Robotics and Automation, 16(6):740–752, 2000.
                                                                                                                                                                                                     [8] S. Se, D. Lowe, and J. Little. Local and global localization
                                                                                                                                                                                                         for mobile robots using visual landmarks. In IEEE/RSJ In-
                                                                                                                                                                                                         ternational Conference on Intelligent Robots and Systems,
                                                                                                                                                                                                         Maui, Hawaii (USA), pages 414–420, October 2001.
                                                                                                                                                                                                     [9] C. Schmid, R. Mohr, and C. Bauckhage. Comparing and
                                                                                                                                                                                                         evaluating interest points. In Proceeding of the 6th Inter-
               Figure 8: The DEM computed with 100 images, positioned according to                                                                                                                       national Conference on Computer Vision, Bombay (India),
               the trajectory of figure 7: orthoimage and 3D view of the bottom-left area.
               The map covers an area of about            .
                                                                                                  Ü £ ¡ ¡ Û
                                                                                                  ’s¢¢¢                                                                                                  pages 230–235, January 1998.
                                                                                                                                                                                                    [10] Y. Dufournaud, C. Schmid, and R. Horaud. Matching im-
                                                                                                                                                                                                         ages with different resolutions. In International Conference
                                                                                                                                                                                                         on Computer Vision and Pattern Recognition, Hilton Head
                                                                                                                                                                                                         Island, SC (USA), pages 612–618, Juin 2000.

               6. Summary                                                                                                                                                                           [11] I-K. Jung and S. Lacroix. A robust interest point matching
                                                                                                                                                                                                         algorithm. In 8th International Conference on Computer Vi-
                                                                                                                                                                                                         sion, Vancouver (Canada), July 2001.
               We presented a vision-based SLAM approach that allows
               the building of large high resolution terrain maps. To our                                                                                                                           [12] R. Haralick, H. Joo, C.-N. Lee, X. Zhuang, V.G. Vaidya,
               knowledge, it is the first attempt to tackle a SLAM prob-                                                                                                                                  and M.B. Kim. Pose estimation from corresponding point
               lem in 3D space, using exclusively informations provided                                                                                                                                  data. IEEE Transactions on Systems, Man, and Cybernetics,
                                                                                                                                                                                                         19(6):1426–1446, Nov/Dec 1989.
               by vision. The use of interest point as landmarks allows an
               active selection of the landmarks to properly map the en-                                                                                                                            [13] L. Matthies. Toward stochastic modeling of obstacle de-
               vironment without any prior knowledge. Our interest point                                                                                                                                 tectability in passive stereo range imagery. In IEEE Interna-
               matching algorithm provides robust data associations which                                                                                                                                tional Conference on Computer Vision and Pattern Recogni-
                                                                                                                                                                                                         tion, Champaign, Illinois (USA), pages 765–768, 1992.
               makes possible the matching of already mapped landmark
               and the precise visual motion estimation between consecu-                                                                                                                            [14] R.M. Haralick. Propagating covariances in computer vision.
               tive frames. A rigorous study and identification of the vari-                                                                                                                              In International Conference on Pattern Recognition, pages
               ous errors estimates involved in the filter allows to set it up                                                                                                                            493–498, 1994.
               properly, without any empirical tuning stage.

To top