VIEWS: 33 PAGES: 6 CATEGORY: Other POSTED ON: 2/21/2011 Public Domain
High resolution terrain mapping using low altitude aerial stereo imagery Il-Kyun Jung and Simon Lacroix LAAS/CNRS 7, Ave du Colonel Roche 31077 Toulouse Cedex 4 FRANCE Il-Kyun.Jung, Simon.Lacroix@laas.fr Abstract (e.g. bundle adjustment) to reﬁne the camera positions and the 3D coordinates of matched features. In contrast, SLAM This paper presents an approach to build high resolution is an incremental approach: the 3D feature map and the sen- digital elevation maps from a sequence of unregistered low sor position are simultaneously built and updated. altitude stereovision image pairs. The approach ﬁrst uses a Among the different approaches to tackle the SLAM visual motion estimation algorithm that determines the 3D problem, the Kalman ﬁlter based approach is the most popu- motions of the cameras between consecutive acquisitions, lar. It is theoretically well grounded, and it has been proved on the basis of visually detected and matched environment that its application to the SLAM problem converges [3]. features. An extended Kalman ﬁlter then estimates both the Some contributions cope with its main practical drawback, 6 position parameters and the 3D positions of the memo- i.e. its complexity which is cubic in the dimension of the rized features as images are acquired. Details are given on considered state [4]: such developments are necessary when the ﬁlter implementation and on the estimation of the uncer- mapping very large areas. Other approaches to the SLAM tainties on the feature observations and motion estimations. problem have been proposed, mainly to overcome the as- Experimental results show that the precision of the method sumption that the various error probability distributions are enables to build spatially consistent very large maps. Gaussian, which is required by the Kalman ﬁlter. Set mem- bership approaches just need the knowledge of bounds on 1. Introduction the errors [5], but they are practically difﬁcult to imple- The main difﬁculty to build digital terrain maps is to pre- ment when the number of position parameters exceeds 3, cisely determine the sensor position and orientation as it and are somehow sub-optimal. Expectation minimization moves. Dead reckoning techniques that integrate over time algorithms (EM) have also been successfully adapted to the the data provided by motion estimation sensors, such as in- SLAM problem [6]. In terms of sensor modality, solutions ertial sensors, are not sufﬁcient because they are intrinsi- to the SLAM problem has mainly been experimented with cally prone to generate position estimates with unbounded range sensors in indoor environments, in the case of robots error growth. Precise visual motion estimation techniques moving on planes, i.e. in 2 dimensions [7, 6, 3]. To our that use stereovision and visual features tracking or match- knowledge, there are very few contributions to the SLAM ing have been proposed for ground rovers [1], but their er- problem based on vision (e.g. [8]). rors also cumulate over time, since they do not memorize This paper presents an approach to the SLAM problem any environment feature. The only solution to diminish the in 3 dimensions, using only a set of non-registered low alti- errors on the position estimates is to rely on stable environ- tude stereovision image pairs. The approach is presented in ment features, that are detected and memorized as the sensor the following section, and section 3 summarizes the basic moves. In the robotic community, it has early been under- algorithms on which it relies: stereovision, interest points stood that the problem of mapping such features and esti- detection and matching, and visual motion estimation. Sec- mating the robot location are intimately tied together, and tion 4 details our implementation of the Extended Kalman that they must therefore be solved in a uniﬁed manner [2]. Filter, with a focus on the identiﬁcation of the various er- This problem, known as the ”SLAM problem1” has now rors. Localization results and the building of digital eleva- been widely studied (an historical presentation of the main tion maps with a stereovision bench mounted on a blimp contributions can be found in the introduction of [3]). ﬂying at a few tens of meter altitude are then presented. Contributions on “structure from motion” addresses the same problem. Successful approaches have been reported, 2. Overview of the approach but they require batch processing (i.e. with the whole se- Landmarks are interest points, i.e. visual features that can quence of the acquired images) and a global optimization be matched when perceived from various positions, and 1 SLAM stands for “Simultaneous Localization And Mapping” whose 3D coordinates are provided by stereovision. We use an extended Kalman ﬁlter (EKF) as the recursive ﬁlter: the state vector of the EKF is the concatenation of the stereo bench position (6 parameters) and the landmark’s positions (3 parameters for each landmark). The key algorithm that allows both motion estimation between consecutive stereo- vision frames (prediction) and the observation and match- ing of landmarks (data association) is a robust interest point Figure 1: A result of the stereovision algorithm, with an image pair £ ¡ ¤¢ taken at about altitude. From left to right: one of the original image, matching algorithm. disparity map (shown here in a blue/close red/far color scale), and 3D The various algorithmic stages achieved every time a image, rendered as a mesh for readability purposes. Pixels are properly stereovision image pair is acquired are the following: matched in all the perceived areas, even the low textured ones. 1. Stereovision: a dense 3D image is provided by stere- ovision (section 3.1). ris detector yields a repeatability high enough to allow ro- 2. Interest points detection and matching between con- bust matches [10]. secutive frames, and with past frames in which old land- To match interest points, we use a matching algorithm marks are visible (section 3.2). that relies on local interest point groups matching, impos- 3. Landmark selection: a set of selection criteria are ap- ing a combination of geometric and signal similarity con- plied to the matched interest points, in order to partition straints, thus being more robust than approaches solely them in three sets: an a non-landmark set, a candidate- based on local point signal characteristics (details can be landmarks set and observed-landmark set (section 4.2). found in [11]). Figure 2 shows that this algorithm generates 4. Visual motion estimation (VME): the interest points a lot of good matches, even when the view point change retained as ”non-landmarks” are used to estimate the 6 mo- between the considered images is quite high. tion parameters between the previous and current frames (section 3.3). 5. Update of the Kalman ﬁlter state (section 4). Finally, after every SLAM cycle deﬁned by these steps, a digital elevation map is updated with the acquired images (section 5.2). Step 3 is necessary for two reasons: ﬁrst, only non- landmarks points should be used to to estimate the local mo- tion, in order to de-correlate the prediction and update steps of the Kalman ﬁlter and second, new landmarks should be Figure 2: A result of our interest point matching between two non regis- tered aerial images cautiously added to the ﬁlter state, in order to avoid a rapid growth of its dimension and to obtain a regular landmark coverage of the perceived scenes. 3.3. VME (Visual Motion Estimation) The interest points matched between consecutive images 3. Basic algorithms and the corresponding 3D coordinates provided by stereo- 3.1. Stereovision vision are used to estimate the 6 displacement parameters We use a classical pixel-based stereovision algorithm, that between the images, using the least square minimization relies on a calibrated binocular stereovision bench (ﬁg- technique presented in [12]. ure 1). A dense disparity image is produced thanks to a Conventional techniques to get rid of the outliers using correlation-based pixel matching algorithm, false matches fundamental matrix estimation, would not cope for stereovi- being ﬁltered out thanks to a reverse correlation. The 3D co- sion errors, such as the ones that occur along depth discon- ordinates of the matched pixels are determined, with an as- tinuities. Therefore, matches that imply a 3D point whose sociated uncertainty whose computation is depicted in sec- coordinates uncertainties are over a threshold are ﬁrst dis- tion 4.1. carded (the threshold is empirically determined by statisti- cal analysis of stereovision errors). Then, a 3D transforma- tion with the remaining matches is estimated and the 3D 3.2. Interest points detection and matching ¥ matches of which error is over times the mean of the ¥ Visual landmarks must be invariant to image translation, ro- residual errors are eliminated: should initially be at least ¥ ¨ ¦ ©§¥ tation, scaling, partial illumination changes and viewpoint greater than 3. ¥ is set to and the procedure is re- changes. Interest points, such as detected by the popular iterated until . Harris detector, has proven to have good stability proper- This outlier rejection algorithm considering both match- ties [9]. When a there is prior knowledge on the scale ing and stereovision errors yields a precise 3D motion esti- change, even approximate, a scale adaptive version of Har- mation between consecutive stereovision frames (see results w%¥ sr u 7 V V a XXV YH¥ s7 w tXWV ¢%¥ s7 & u v & V V & u P in sections 4.1 and 5.1), which is used for the prediction where jf u Q stage of the Kalman ﬁlter. and , the error covariance of th landmark observation. Update. The update stage fuses the prediction and the ob- 4. Kalman ﬁlter setup servation to produce the state estimate and its associated The EKF is an extension of the standard linear Kalman ﬁl- covariance: ter, that linearizes the nonlinear prediction and observation 6!3C3¥ ¨ ¥ ¨ i¥ u p i3¥ yA%C¥ ¨ ¨ u x ¥ ¨ models around the predicted state. The discrete nonlinear G G system and the observations are modeled as: D ¨ u x ¨ q ¨ x ¦ ¥ ¨ ¨ ¥ ¨ !u%¥ piu%¥ u )!u%¥ u z¢%Cu%¥ D 6!uHu%¥ !3¥ 20)!¥ (¢%¥ $ "!¥ ¨ 1 ¨ ' & # ¨ T ¨ {u q 7 ¥ ¨ ¨ !}|¥ ")%¥ u e(%Cyg¥ D 6iyg¥ u x in which !C¥ 2A)!¥ @9 86!3¥ 54 ¨ B ¨ 7 ¨ is the kalman ﬁlter gain matrix. %¥ ' 1 B When detecting a new landmark, it is added to ~%¥ where is a control input and , are vectors of tem- the state vector of the ﬁlter, that becomes G P al%¥ T@` R Y%¥ ` R WX5H¥ T R Y%¥ & V V V & porally uncorrelated errors with zero mean and covariance %¥ 3D E %¥ 3D F Q G . The landmark ini- , . IH¥ tialization model is: In our approach, the state of the ﬁlter P G %¥ 6%¥ @` R T (%¥ @ yn Y%¥ bUYXWUS& a ` R V V V T R & is composed of the 6 position parameters ¤G Q Q G (4) Xp)rp)%p)g(e)& c P a s h & q h & i h & f & d of the stereovision bench and of Q G wvR u u 4 (y()u P a & u x & %¥ D %¥ H¥ s D D a set of landmarks 3D coordinates t , D D 6%¥ D %¥ w D Q Q %¥ %¥ s D Q YQ . The associated state covariance t is com- Q(5) D %¥ s 3D s %¥ Ss D %¥ s D posed of the the stereo bench pose covariance b3D , the land- Q YQ Q marks covariance and the cross-covariance between D D o%¥ s H¥ D %¥ 6H¥ s D &¢%¥ D %¥ Q Q Q Q Q YQ the bench pose and landmarks [3]. In the Kalman ﬁlter Q o%¥ Ys D s %¥ !su(%¥ fh%¥ se(%¥ h%¥ D %¥ framework, the state estimation encompasses three stages: Q Q Q YQ prediction, observation and update of the state and covari- @ n H¥ H¥ ance estimates. where denotes the new landmark, the ini- Prediction. Under the assumption that landmarks are sta- tialization function using the current robot pose estimate f tionary, the state prediction is: and the error covariance of the new landmark. )!¥ S¢%¥ %¥ 6%C3¥ ¨ & # ¥ ¨ G G 4.1. Errors identiﬁcation 6!¥ ¨ s h & q h & i h gf ed & c & & Error identiﬁcation is crucial to set up a Kalman ﬁlter, as a where ¨ §@¥ ¥ is the vi- precise determination of these errors will avoid the empiri- sual motion estimation result between and positions. cal ”ﬁlter tuning” step. In our context, the following errors The associated state covariance prediction is written as: must be estimated: f D %¥ # 86%¥ ¥ ¨ D (%¥ 0%¥ # the landmark initialization error ( ), u gf Q YQ Q Q SQ Q 0%¥ reh%¥ gp%¥ re # d d f # d i¨¥ E3D the landmark observation error ( for the observed (1) landmark ), ' D ¥ ¨ 86%¥ D %¥ # %¥ and the error of the input control , which is the visual d jf Q Q Q motion estimation result ( ). 1 d jf where represents the error covariance of the visual mo- Note that in our approach, the lumped process noise tion estimation result. Note that the covariance of land- is set to , landmarks being stationary and the robot pose marks is not changed in the prediction stage. prediction being directly computed with the current pose mk 9l Observation. When observing the landmark, the obser- and the result of the visual motion estimation. vation model and the Jacobian of the observation function are written as: Landmark initialization errors. When new landmarks (%y3¥ H¥ u o%¥ u n ¥ ¨ 7 ¥ ¨ are detected, their 3D coordinates being computed by stere- f G ovision, the covariance matrix on new landmarks is to- where (%C3¥ %¥ @7 ¥ ¨ G u is a function of the predicted tally deﬁned by the stereovision error. 9k m Statistics on image pairs acquired from the same position robot state and the landmark in the state vector of the ﬁlter, which maps the state space into the observation state. show that the distribution of the disparity computed on any given pixel can be well approximated by a Gaussian [13], The innovation and the associated covariance is written as: and that there is a correlation between the shape of the simi- 6!3¥ u p ¨ @!3¥ u n ¦ ¨ %yC3¥ u n ¥ ¨ (2) larity score curve around its peak and the standard deviation D H¥ sr8ij0¥ 0q u 7 ¨ u !j0¥ jut%¥ sr(%j0¥ ¨ u f u 7 ¥ ¨ on the disparity: the sharper the peak, the more precise the (3) disparity (ﬁgure 3). A standard deviation associated to H 9 3.5 m = -38.71, s = 0.048 m = -46.16, s = 0.145 8 3 7 2.5 6 0.3 Disparity standard deviation frequency frequency 2 5 4 1.5 3 1 2 1 0.5 0.2 0 0 disparities disparities 7 3.5 m = -46.039066, s = 0.096855 m = -31.684595, s = 0.128873 6 3 5 2.5 0.1 frequency frequency 4 2 3 1.5 2 1 1 0.5 0.2 0.4 0.6 0 disparities 0 disparities Peak curvature Figure 3: Left: examples of some probability density functions of dis- parities computed on a set of 100 image pairs, with the corresponding Gaussian ﬁt. Right: Standard deviation of the disparities as a function of the curvature of the similarity score curve at its peak. Figure 4: Principle of the combination of the matching and stereovision errors. The points located in the square box are the projection of on the ¯ image plane. Small ellipses indicate stereovision errors, the large dotted each computed disparity is estimated using the curvature ellipsoid is the resulting observation error. of the curve at its peak, approximated by ﬁtting a parabola. Once matches are established, the coordinates of the 3D £ ª ±£ ° ª where and are the 3D point coordinates of and ² points are computed with the usual triangulation formula: ª « ° « 4 4 d 4 x 4 its neighbors, and and are the corresponding vari- , and , where is the depth, d H ances. is the stereo baseline, and , and are calibration pa- & )@' When the matching error is pixel, the 3D coordinates rameters (that depends on , the considered pixel image being only computed on integer pixels by stereovision, we coordinates). Using a ﬁrst order approximation, it comes: £ ®u ³ assume the 3D surface variation is locally linear: £ bª £ ´ )u ®u « ³ ´ u « ª (pwywb« s 4 5 and ¡. These coordinates ¡ ¢ and the associated variances are used in the equations (2) and (3). The covariance matrix of the point coordinates is then de- rived from the triangulation equations. When a new land- mark is observed, its coordinates are added to the ﬁlter state, Motion estimation errors. Given a set of 3D matched ·¶ £P £ XX& T & & £ & ` a ` g£ XW& T ¸ & & and the state covariance is updated according to equations points µ , the function which is (4) and (5). minimized to determine the corresponding motion is [12]: ` 6±¶ º ¹ & § £ & £ ¾f e d & c ½¦ ¸ ¼ ¦ & q & i atsph rph eph P Observation error. In our case, landmark observation is µ T ¨ » » based on interest point matching. Outliers being rejected » (section 3.3), only interest point location errors are consid- w i & ¶ ip h &rp h &ep h &gf e d & c s q ered to determine the matching error on image plane. With where . . can be written with µ Á¶·À¶ º ±¿2 & ¶ the precise Harris detector, the location estimate of inter- random perturbations ( ) and µ ¶ est point in two consecutive images taken from a very close ¨ the true and are not observed. In order to measure £ point of view is precise enough to use pixel as the max- the uncertainty of , the uncertainties of landmarks and » ¸ g£ imum error tolerated to asses good matches [9] due to neg- their observation are propagated as shown in [14]: con- £ j£ ¸ » ligible projective distortion and occlusion. The expected sidered that and Â ±D are not correlated, the covariance » » matching error is therefore of the order of pixel. In con- estimate can be also written as: Ã trast, under different view point(i.e. when re-perceiving a & T & T landmark after a long loop), a more ﬂexible tolerance limit { (±¶ º Hy Â D Å Ä Æ { (j¶ H º Æ Å Ä Ç Ã is applied(i.e. around pixel) [11]. In such case, the ex- ¡ ¨ Ä Ä pected matching error value is then set to pixel. Ê È Â iÉ where Ã È is the Jacobian of the cost function and The observation error is deﬁned by the reprojection of Å ` the matching error on the 3D plane estimated by stereovi- § sion (with an associated error - ﬁgure 4). ¨ » £ & HÄ Å £ T ¨ Æ o D Ë Å HÄ £ £ º & » ( When the 2D matching error is set to pixel, the errors » Ä » Ä » provided by stereovision for the 3D matching point and its ` 8 closest neighbors are used to compute the expected 3D ¸ £ º & § £ o & ¸ h( HÄ Å Ç Æ ËÇ D Å HÄ£ coordinate and associated variance of the matching point as » » ¸ £ T ¨ ¸ » » follows: ¢ ¢ Ä » Ä d f j Â D is the input covariance matrix which is used in Ã § ¨ (u & § ¨ £ £ ¬ « £ ¦ £ « u y u equation (1) to estimate the state variances during the ﬁlter ¥ ©u ¦¤ ª0¨ ¥ ®u ¦¤ ªA¨ prediction stage. 30 4.2. Landmark selection 25 As explained in the section 2, the 3D matches established SLAM 20 VEM after the interest point matching step are split into three sets. 15 Latitude(m) 25 The observed-landmarks set is simply the points that corre- 26 20 10 Elevation(m) sponds to landmarks already in the state vector of the EKF. 24 22 10 15 5 0 Initial The rest of the matches are then studied, to select the set of 20 −15 Initial position 5 −5 position −10 candidate-landmarks according to the following three crite- −5 −5 0 Latitude(m) −10 0 −25 −20 −15 −10 −5 0 5 ria: Longitude(m) Longitude(m) Observability. Good landmarks should be observable Figure 5: A result of the SLAM implementation with a sequence of 40 in several consecutive frames. stereovision pair. The left image show the reconstructed trajectory in 3D, Stability. The 3D coordinates of good landmarks must the right one shows the 120 landmarks mapped, with uncertainty el- Õ (Ô be precisely estimated by stereovision. lipses magniﬁed by a factor of 40. Representability. Good landmarks must efﬁciently 0.35 0.12 represent a 3D scene. The stereovision bench state esti- 0.3 Phi Theta Psi 0.1 Translation X Translation Y Translation Z mation will be more stable if landmarks are regularly dis- Sigma of angle uncertainty(degree) Sigma of translation uncertainty(m) 0.25 patched in the perceived scene, and this regularity will avoid 0.2 0.08 a rapid growth of the EKF state vector size. 0.15 0.06 The number of candidate landmarks that are checked is 0.1 0.04 determined on the basis of the number of new interest point 0.05 0.02 matches (i.e. the ones that do not match with an already Í ¨ Ì 0 0 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 mapped landmark). We use of the new interest points, Image step Image step as the visual motion estimation technique requires a lot of Figure 6: Evolution of the standard deviations of the camera position matches to yield a precise result. The landmark selection parameters during the ﬂight shown in ﬁgure 5. is made according a heuristic procedure so that they satisfy the above three criteria. Î ¨ b® translation errors are below in the three axes after an Î 5. Results about long trajectory, and angular errors are all below Ò Our developments have been tested with hundreds of im- half a degree. Figure 7 shows and other trajectory reconstructed with a ages taken on-board a blimp, at altitudes ranging from 3 Î Î ¡ to . The cameras of the ¡ ¡ wide stereo bench are set of 100 images. ¨ ¸ ¸ !¨´ Î gÎ ÓÒiÑzÐ6Ïi¡ pixels CCD sensors, with a a ¡ Ó Ïfocal length lens. 5.2. Digital elevation maps Thanks to the precise positioning estimation, the processed 5.1. Positioning errors stereovision images can be fused after every update of the We do not have any localization mean that could be used EKF into a digital elevation map (DEM), that describes the x Ö e)& 9 4 as reference on-board the blimp (such as a centimeter accu- environment as a function , determined on every x u )& u $ racy GPS). However, when the blimp ﬂies over an already cell of a regular Cartesian grid. perceived area, the VME can provide an precise estimate Our algorithm to build a DEM simply computes the el- of the relative pose between the ﬁrst and last frame of the evation of each cell by averaging the elevations of the 3D sequence that overlaps. points that are vertically projected on the cell surface. Since Figure 5 presents a comparison of the reconstructed loop trajectory, while ﬁgure 6 shows the evolution of the standard Reference VME SLAM SLAM deviation of the 6 position parameters of the stereo bench std. dev. abs. error std. dev. abs. error when applying the EKF. Until image 25, the standard devi- × Ø ¨ © Ø e Ø ¨ ¡ Ø ¡ ation grows, however much more slowly than when prop- Ù Ø ¤ Ø Ø Ø ¨ Ï ¡ Ó ¡ Ò agating only the errors of the VME. A few old landmarks Ú Ø Ø Øy ¤ Ø © ¨ ¨ Ï Ò are re-perceived in the following images: the standard de- i ph Î Î º¨ ¤ Î ¨ ¨ º© Î Ï Ï i¡ viations decreases, and stabilizes for the subsequent images qph Î Î ¨ Î Î Ï Ñ Ï Ó Ñ where some old landmarks are still observed. sph Î o¨ º¨ ¤ Î Î Î ® ¨ Ï The quantitative ﬁgures summarized in table 1 compare the results of the ﬁnal position estimate with respect to Table 1: Comparison of the errors made by the propagation of the visual the reference: the precision enhancement brought by the motion estimation alone and with the SLAM EKF approach, using as a EKF is noticeable, and the absolute estimated errors are all reference the VME applied between images 1 and 40 bounded by twice the estimated standard deviations. The 32 SLAM 60 References 30 VEM Elevation(m) 50 28 26 Initial 40 [1] C. Olson, L. Matthies, M. Schoppers, and M. Maimone. 24 22 position Stereo ego-motion improvements for robust rover naviga- Latitude(m) −10 30 0 tion. In IEEE International Conference on Robotics and Au- 10 20 20 tomation, pages 1099–1104, May 2001. 30 10 40 50 0 Initial [2] R. Smith, M. Self, and P. Cheeseman. A stochastic map for 60 position 60 10 20 30 40 50 uncertain spatial relationships. In Robotics Research: The Longitude(m) 70 0 −10 −10 Latitude(m) −20 −10 0 10 20 30 Longitude(m) 40 50 60 70 80 Fourth International Symposium, Santa Cruz (USA), pages 468–474, 1987. Figure 7: A longer trajectory reconstructed with a sequence of 100 im- ages. 320 landmarks have been mapped (the magniﬁcation factor of the [3] G. Dissanayake, P. M. Newman, H-F. Durrant-Whyte, uncertainty ellipses in the right image is here 20) S. Clark, and M. Csorba. A solution to the simultaneous lo- calization and map building (slam) problem. IEEE Transac- tion on Robotic and Automation, 17(3):229–241, May 2001. [4] J. Guivant and E. Nebot. Optimization of the simultaneous a luminance value is associated to each 3D point produced localization and map building algorithm for real time imple- by stereovision, it is also possible to compute a mean lu- mentation. IEEE Transactions on Robotics and Automation, minance value for each map cell. Figure 8 shows a digital 17(3):242–257, June 2001. elevation built from the 100 images during the trajectory of C© Î ¨ [5] M. Kieffer, L. Jaulin, E. Walter, and D. Meizel. Robust au- ﬁgure 7: the resolution of the grid is here , and no tonomous robot localization using interval analysis. Reliable map discrepancies can be detected in the corresponding or- Computing, 6(3):337–362, Aug. 2000. thoimage. [6] S. Thrun, W. Burgard, and D. Fox. A real-time algorithm for mobile robot with applications to multi-robot and 3d map- ping. In IEEE International Conference on Robotics and Automation, San Francisco, CA (USA), 2000. [7] O. Wijk and H.I. Christensen. Triangulation based fusion of sonar data for robust robot pose tracking. IEEE Transactions on Robotics and Automation, 16(6):740–752, 2000. [8] S. Se, D. Lowe, and J. Little. Local and global localization for mobile robots using visual landmarks. In IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems, Maui, Hawaii (USA), pages 414–420, October 2001. [9] C. Schmid, R. Mohr, and C. Bauckhage. Comparing and evaluating interest points. In Proceeding of the 6th Inter- Figure 8: The DEM computed with 100 images, positioned according to national Conference on Computer Vision, Bombay (India), the trajectory of ﬁgure 7: orthoimage and 3D view of the bottom-left area. The map covers an area of about . Ü £ ¡ ¡ Û s¢¢¢ pages 230–235, January 1998. [10] Y. Dufournaud, C. Schmid, and R. Horaud. Matching im- ages with different resolutions. In International Conference on Computer Vision and Pattern Recognition, Hilton Head Island, SC (USA), pages 612–618, Juin 2000. 6. Summary [11] I-K. Jung and S. Lacroix. A robust interest point matching algorithm. In 8th International Conference on Computer Vi- sion, Vancouver (Canada), July 2001. We presented a vision-based SLAM approach that allows the building of large high resolution terrain maps. To our [12] R. Haralick, H. Joo, C.-N. Lee, X. Zhuang, V.G. Vaidya, knowledge, it is the ﬁrst attempt to tackle a SLAM prob- and M.B. Kim. Pose estimation from corresponding point lem in 3D space, using exclusively informations provided data. IEEE Transactions on Systems, Man, and Cybernetics, 19(6):1426–1446, Nov/Dec 1989. by vision. The use of interest point as landmarks allows an active selection of the landmarks to properly map the en- [13] L. Matthies. Toward stochastic modeling of obstacle de- vironment without any prior knowledge. Our interest point tectability in passive stereo range imagery. In IEEE Interna- matching algorithm provides robust data associations which tional Conference on Computer Vision and Pattern Recogni- tion, Champaign, Illinois (USA), pages 765–768, 1992. makes possible the matching of already mapped landmark and the precise visual motion estimation between consecu- [14] R.M. Haralick. Propagating covariances in computer vision. tive frames. A rigorous study and identiﬁcation of the vari- In International Conference on Pattern Recognition, pages ous errors estimates involved in the ﬁlter allows to set it up 493–498, 1994. properly, without any empirical tuning stage.