VIEWS: 4 PAGES: 8 POSTED ON: 5/23/2012
MULTISCALE UNSUPERVISED CHANGE DETECTION ON OPTICAL IMAGES BASED ON MARKOV RANDOM FIELDS AND WAVELETS Gabriele Moser, Elena Angiati, and Sebastiano B. Serpico Dept. of Biophysical and Electronic Engineering (DIBE), University of Genoa Via Opera Pia 11a, I-16145, Genova, Italy; e-mail: gabriele.moser@unige.it ABSTRACT be used to distinguish changed and unchanged areas in these difference and (log-)ratio images, either by manual trial-and- In the context of environmental monitoring and natural or error procedures [16] or by more advanced automatic tech- industrial disaster management, change-detection methods niques based on parametric Bayesian approaches [2, 11, 12]. represent powerful tools for studying and mapping the evolu- tion of the Earth’s surface. In order to optimize the accuracy In order to improve the accuracy of the change maps, a of the change maps, a multiscale approach can be adopted, multiscale strategy can be adopted, in which transformed im- that jointly exploits observations at coarser scales (to glob- ages at different scales are jointly used [1, 8, 9]. The im- ally identify changed areas and gain robustness to noise) and ages at the ﬁnest scales are likely to highlight many geomet- ﬁner scales (to improve the detection of details). In this pa- rical details, but also to be more affected by noise. Data at per, a multiscale contextual unsupervised change-detection coarser scales exhibit less precise details, but a stronger im- method is proposed for optical images. It is based on discrete munity to noise. A multiscale approach, exploiting coarser wavelet transforms and Markov random ﬁelds. The image- scales to globally identify changed areas and ﬁner scales to differencing approach to change-detection with optical data improve the detection of details, may represent an effective is adopted, wavelets are applied to the difference image to ex- choice. This approach is expected to be particularly promis- tract multiscale features, and Markovian data fusion is used ing when dealing with very high resolution images acquired to integrate both these features and the spatial contextual by either optical (e.g., QuickBird or IKONOS) or SAR (e.g., information in the change-detection process. Expectation- COSMO/SkyMed, TerraSAR-X) sensors. Multiscale change maximization and Besag’s algorithms are used to estimate detection methods for SAR images have been proposed in [9] the model parameters. Experiments on real optical images and [5], by using a log-cumulant-based information-theoretic point out the effectiveness of the method, also as compared change measure and the combination of undecimated station- with state-of-the-art techniques. ary wavelet transforms (SWTs) with adaptive optimal scale- selection schemes, respectively. In [1] an unsupervised mul- Index Terms— Multiscale change detection, unsuper- tiscale change-detection method was proposed by applying vised change detection, discrete wavelet transforms, Markov a previous unsupervised multiresolution classiﬁer to wavelet random ﬁelds, expectation-maximization, Besag’s algorithm. transforms of the difference image. Object-oriented analy- sis and watershed segmentation were used in [8] for semi- 1. INTRODUCTION interactive multiscale change detection on optical data. A semi-interactive approach was also proposed in [13] by com- Multitemporal remote sensing represents a powerful source bining wavelets, anisotropic diffusion ﬁltering, and an empir- of information for monitoring the evolution of the Earth’s sur- ical procedure for optimal-scale selection. In [3] a heuristic face, for instance, in applications such as environmental mon- multiscale change measure for optical data, based on a nor- itoring or disaster management. A relevant task is the identi- malized product of several low-pass ﬁltered versions of the ﬁcation of the changes that occurred in a given area between difference image, was introduced and tested by MonteCarlo two observation dates [16]. When adopting an unsupervised simulations. approach, i.e., when assuming no training data to be available In this paper, a multiscale contextual unsupervised change- at any acquisition date, image differencing and image (log- detection method is proposed for optical images, based on )ratioing are usually applied to address this task with optical discrete wavelet transform (DWT) [6] and Markov random and synthetic aperture radar (SAR) images, respectively [16]. ﬁelds (MRFs) [4]. DWT is applied to the difference image They consist in generating a difference and (log-)ratio image, to extract multiscale features. It provides a two-dimensional respectively, by computing pixel by pixel the difference or multiresolution decomposition of the input (difference) image (logarithmic) ratio of the pixel intensities in the images ac- as the superposition of several components, each highlight- quired at the two observation dates. Thresholding can then ing the image content at a given scale [6]. MRFs jointly allows spatial context to be introduced into pixel labeling change-detection technique proposed in [11] and based on the problems and different information sources to be fused by Kittler and Illingworth (K&I) unsupervised thresholding al- suitably deﬁning “energy functions” [17]. Here, an MRF- gorithm (see Sec. 2.4) [10, 11]. based approach is proposed by modifying the method intro- duced in [18] for multiresolution image classiﬁcation and 2.2. Wavelet transform stage by combining it with DWTs and image differencing in or- der to extend it to multiscale change detection. The related An S-scale multiresolution decomposition of D (S being a MRF model is characterized by internal parameters that are predeﬁned number of scales) is obtained by applying a dyadic estimated by the expectation-maximization (EM) [15] and DWT. D is decomposed in terms of a low-pass transformed Besag’s [4] algorithms [18]. The main novelties of the paper image and of three transformed images conveying high-pass lie in the extension of the method in [18] from multiresolution information about ﬁne-scale details along the horizontal im- classiﬁcation to multiscale change-detection and in its combi- age axis, the vertical axis, or both axes. Then, the procedure nation with wavelets. A similar approach was proposed in our is recursively applied S times to the low-pass component. previous work [1] by using a more heuristic combination of W1 , W2 , . . . , WS are obtained by applying S times the in- DWT and of the multiresolution classiﬁer in [18]. Unlike [1], verse transform (IDWT) while neglecting all high-pass com- here a modiﬁed formulation of this classiﬁer is developed ponents and keeping only the low-pass terms [6]. Thus, as in order to optimally integrate it with DWT. The proposed s increases in [1, S], coarser-scale approximations Ws of D method is experimentally validated on optical satellite im- are obtained, while the ﬁnest scale is D itself (which will also ages and compared with previous single-scale and multiscale be denoted by W0 hereafter). More precisely, Ws allows ap- approaches. preciating spatial details that are (2s )-times coarser than W0 The paper is organized as follows. Sec. 2 provides a (s = 1, 2, . . . , S). methodological description of the proposed technique and of the related parameter-estimation issues. Sec. 3 presents the 2.3. MRF-based classiﬁcation stage results of the experimental validation and of the comparisons Ws is modeled as a ﬁnite set Ws = {usk }N of random k=1 with state-of-the-art techniques. Conclusions are drawn in variables, where usk denotes the intensity of the k-th pixel Sec. 4. at the s-th scale (s = 0, 1, . . . , S; k = 1, 2, . . . , N ). Similar to the multiresolution approach in [18], the statistical rela- 2. METHODOLOGY tionships between images at different scales are modeled in terms of “linear mixtures.” In [18] coregistered multispec- 2.1. Overview of the proposed method tral images with different resolutions are used to generate a classiﬁcation map at the ﬁnest resolution. For each coarser- Let I0 and I1 be two coregistered optical images, composed resolution image, a set of “virtual” pixel intensities is assumed of N pixels each and acquired over the same area at times t0 to exist at the ﬁnest resolution such that the pixel intensities and t1 , respectively (t1 > t0 ). We assume I0 and I1 to be at the coarser resolution are modeled by mosaic averaging single-channel images; the reformulation in the multichan- these virtual intensities [18]. Here, this approach is gener- nel case is straightforward. Change detection is formalized alized to multiscale analysis by postulating that, for each Ws as a binary hypothesis testing problem [7], by marking the (s = 1, 2, . . . , S), a set {˜sk }N of virtual pixel intensities u k=1 “change” and “no-change” hypotheses as H1 and H0 , respec- exists at the 0-th scale, such that: tively. Image differencing, which generates a difference im- age D by subtracting pixel-by-pixel the pixel intensities in I0 1 usk = ˜ usr , (1) by the pixel intensities in I1 , is adopted [16]. 4s r∈Qsk The key idea of the proposed method lies in generating a ﬁnite set {W1 , W2 , . . . , WS } of multiscale features by ap- where Qsk is a 2s × 2s window centered on the k-th pixel plying DWT to D (see Sec. 2.2), and to fuse this multiscale ˜ (k = 1, 2, . . . , N ). If s = 0, the identity u0k = u0k holds. information by an MRF approach. MRFs also allow the spa- Since Ws is obtained from W0 through DWT/IDWT, the tial contextual information to be exploited, thus gaining ro- pixel values in Ws could be deterministically expressed as bustness against noise. The MRF classiﬁer proposed in [18] linear combinations of the pixel values in W0 , provided a for images acquired at different spatial resolutions is gener- given DWT operator (e.g., Daubechies, biorthogonal) is cho- alized here to multiscale change detection (see Sec. 2.3). It ˜ sen. In this perspective, the virtual intensities usk may not adopts the “iterated conditional mode” (ICM) approach to be needed. Indeed we propose to adopt the formalization MRF-based classiﬁcation, which usually represents a good based on them, ﬁrst, because this allows extending the Marko- tradeoff between accuracy and computational burden [17]. vian formulation and the parameter-optimization procedures The proposed method is iterative and is initialized by using already developed in [18] for the multiresolution case. Fur- the change map generated by applying to D the (single-scale) thermore, this choice also prevents the need to predeﬁne a speciﬁc DWT at each scale and to plug the analytical ex- regularization term. The parameter λ tunes the relative im- pressions of the impulse responses of the related ﬁlters in the portance between the spatial energy term and the multiscale parameter-estimation process. This allows any DWT to be contributions. Coherently with the Markovianity property, the used in the proposed MRF framework and ensures a higher distribution in Eq. (4) does not depend on all image labels ℓr ﬂexibility. with r = k, but only on the labels of the neighbors of the k-th For each k-th pixel in W0 , let ℓk ∈ {H0 , H1 } be the pixel (through mik ) and on the labels ℓr such that r ∈ Qsk related hypothesis label and xk be the multiscale vector of for some s ∈ {1, 2, . . . , S} (through µsik (θ) and σsik (θ)). all corresponding pixel intensities usk (s = 0, 1, . . . , S; k = We recall that the MRF-based method in [18] assumes the 1, 2, . . . , N ). Given the label conﬁguration {ℓk }N , the vari- k=1 single-resolution images to be independent of one another, ˜ ables usk (k = 1, 2, . . . , N ) are assumed to be condition- when conditioned to the label conﬁguration. Here, this cor- ally independent and identically distributed. The probabil- responds to the fact that single-scale energy terms are ad- ˜ ity density function (PDF) of usk , given ℓk = Hi , is mod- ditively combined in Eq. (5) without modeling the corre- 2 eled as a Gaussian with mean µsi and variance σsi (s = lations among features at different scales. Indeed, since all 0, 1, . . . , S; i = 0, 1). Let the vector θ collect all µsi and σsi Ws features (s = 1, 2, . . . , S) are obtained from W0 through parameters. Thanks to the linearity of DWT/IDWT and of the DWT/IDWT, the conditional independence assumption can- model in Eq. (1), also the PDF of usk , given ℓk = Hi and the not be rigorously fulﬁlled in the present case. We will accept labels ℓr of the pixels r such that r ∈ Qsk , is easily proved to it as a simplifying analytical tractability assumption. be Gaussian (which is a usually accepted model for the statis- As in [18], the EM and the Besag’s methods are itera- tics of D in the case of input optical images [11]), with mean tively used to estimate θ and λ, respectively. EM addresses and variance parameterized by θ and given by [18]: estimation problems characterized by data incompleteness and converges, under mild assumptions, to a local maximum µsik (θ) = E{usk |ℓk = Hi ; ℓr , r ∈ Qsk } = of the log-likelihood function (even though convergence to a = 4−s [nsik µsi + (4s − nsik )µs,1−i ] (2) global maximum is not ensured, usually a good solution is obtained) [15]. The Besag’s method estimates the spatial reg- and ularization parameters of an MRF (here, λ) by maximizing a 2 pseudolikelihood function, obtained by an approximation of σsik (θ) = Var{usk |ℓk = Hi ; ℓr , r ∈ Qsk } = the joint likelihood function of all image labels [4]. = 16−s [nsik σsi + (4s − nsik )σs,1−i ], 2 2 (3) The proposed method is iterative and is initialized with the label map given by K&I (see Sec. 2.4). At the t-th iter- where nsik is the number of pixels r such that r ∈ Qsk and ation, denoting by the superscript “t” the current parameter ℓr = Hi (k = 1, 2, . . . , N ; s = 0, 1, . . . , S; i = 0, 1). estimates and pixel labels, the following operations are per- The label conﬁguration {ℓk }N is assumed to be an k=1 formed (t = 0, 1, 2, . . .): MRF [4]. An isotropic 2nd-order Potts MRF model is adopted. Therefore, the distribution of ℓk , given xk and 1. compute the updated estimate λt , given the current la- all the other image labels is expressed by [18]: bels {ℓt }N , by numerically maximizing the follow- k k=1 ing pseudolikelihood function (Besag’s method) [4]: exp[−Uk (Hi |λ, θ)] P {ℓk = Hi |xk ; ℓr , r = k} = 1 , 1 j=0 exp[−Uk (Hj |λ, θ)] exp(λmt ) (4) Φt (λ) = ik ; (6) i=0 k:ℓt =Hi exp(λmt ) + exp(λmt ) 0k 1k where: k S [usk − µsik (θ)]2 2 2. compute the updated parameter estimate θt by running Uk (Hi |λ, θ) = 2 + ln σsik (θ) −λmik σsik (θ) EM until convergence (see below); s=0 (5) 3. update the label of each k-th pixel according to the is the energy function of the MRF model, mik is the num- MRF minimum-energy rule, by setting ℓt+1 as the label k ber of labels equal to Hi in the 2nd-order neighborhood of t Hi that corresponds to the lowest value of Uk (Hi |λt , θt ) the k-th pixel (k = 1, 2, . . . , N ; i = 0, 1), and λ is a positive (k = 1, 2, . . . , N ; i = 0, 1) [17]. parameter. The energy is a linear combination of (S + 2) con- tributions: the last term is related to the spatial contextual in- Steps 1-3 are iterated until the percentage of pixels with formation, represented by mik according to the Potts model, different labels in successive iterations goes below a prede- and the other terms are related to the information conveyed ﬁned threshold (here, equal to 0.1%). At each t-th itera- by the Hi -conditional distributions of the pixel values at all tion (t = 0, 1, . . .), the numerical maximization in step 1 is considered scales. The Potts energy contribution encourages solved by the Newton-Raphson algorithm, which can be ap- the generation of connected regions whose pixels are assigned plied to this end because Φt (·) is a twice continuously dif- to the same hypothesis, thus essentially representing a spatial ferentiable function [14]. Given the resulting estimate λt of λ and the current label conﬁguration {ℓt }N , EM computes k k=1 initialization purposes, the initial map is used to deﬁne the ¯ in step 2 a sequence {θqt }q=0,1,... of estimates of θ. Thanks initial context of each k-th pixel for the computation of mik ¯ to the convergence properties of EM, θqt converges for q → (i = 0, 1; k = 1, 2, . . . , N ) and the initial estimate of the pa- t +∞ to a limit θ which is the updated estimate of θ resulting rameter vector θ. Speciﬁcally, a desired choice would be to 2 from step 2. Operatively, EM is iterated until the distance compute initial estimates of µsi and σsi as the sample mean ¯ ¯ between θq+1,t and θqt goes below a predeﬁned threshold ˜ and sample variance of usk on the set of samples assigned to ¯ (here, equal to 0.001) and is initialized by setting θ0t = θt−1 Hi in the initial map (i = 0, 1; s = 0, 1, . . . , S). However, (t = 1, 2, . . .). ˜ the values of the virtual intensities usk are available only for Let us denote by ηsk and ξsk the mean and variance of the s = 0. So, as a feasible simple approach, initial estimates 2 ˜ virtual intensity usk , given all multiscale observations and all of µsi and σsi for s = 1, 2, . . . , S are calculated as the sam- image labels, i.e. (s = 0, 1, . . . , S; k = 1, 2, . . . , N ): ple mean and sample variance of usk on the set of samples assigned to Hi in the initial map (i = 0, 1). ηsk = E{˜sk |ℓr , xr , r = 1, 2, . . . , N } u ξsk = Var{˜sk |ℓr , xr , r = 1, 2, . . . , N } u (7) 3. EXPERIMENTAL RESULTS ˜ According to the identity u0k = u0k and to the fact that u0k is among the components of xk , one has η0k = u0k 3.1. Data set and experimental set-up and ξ0k = 0. Thanks to the close similarity between the linear-mixture model of Eq. (1) and the similar model used A multitemporal data set consisting of a pair of coregistered in [18], one can prove through the same analytical calcula- 256 × 256 multispectral Landsat-5 TM images acquired over tions as in [18] that, at each q-th EM iteration, the components the western part of the Island of Elba (Italy) in August and ¯ qt ¯ qt qt µqt and σsi of θqt as well as estimates ηsk and ξsk of ηsk ¯si September 1994, was used for experiments (Figs. 1(a)-1(b)). and ξsk , respectively, are updated as follows (i = 0, 1; s = During the period between the acquisition dates, a ﬁre oc- 0, 1, . . . , S; k = 1, 2, . . . , N ): curred in a forested area in the image. TM bands 4 and 7 were available. Experiments were separately performed with qt qt (¯sj )2 usk − µsjk (θqt ) µ + σ ¯ each band in order to focus on a more difﬁcult classiﬁcation qt ¯sj s · ¯ 2 (θ qt ) for s ≥ 1 ηsk = 4 σsjk problem, as compared with the two-band case, and to conse- u0k for s = 0 quently better highlight the possible advantages of multiscale processing as compared to single-scale approaches. N The proposed method is based on the fusion of multiscale qt δ(ℓt , Hi )ηsr r and contextual information. Its performances were ﬁrst eval- r=1 µq+1,t = ¯si N (8) uated as a function of the choice of the DWT. Then exper- δ(ℓt , Hi ) imental comparisons were conducted with: (i) the technique r r=1 developed in [11] (which is based on the application of K&I to σ qt (¯sj )4 the moving-average ﬁltered difference image and which will qt 2 (¯ ) − σ for s ≥ 1 be denoted hereafter by K&I-Filt) to assess the advantages of qt ξsk = sj 2 ¯ 16s · σsjk (θqt ) the proposed method as compared to a single-scale one; (ii) 0 for s = 0 a variant of the proposed technique based on an MRF model N with no spatial energy (i.e., λ = 0) to focus on the relative δ(ℓt , Hi )[ξsr + (ηsr − µq+1,t )2 ] r qt qt ¯si role of multiscale and contextual information; (iii) a version r=1 adapted to optical data of the “fusion at feature level – all re- σ q+1,t (¯si )2 = , N liable scales” (FFL-ARS) method in [5]. The behavior of the δ(ℓt , Hi ) r proposed technique as a function of the number of scales was r=1 also analyzed. where δ(·, ·) is the Kronecker symbol (i.e., δ(a, b) = 1 if Given the exhaustive test map in Fig. 2(a), all results were a = b and δ(a, b) = 0 otherwise) and j is the index of the quantitatively assessed by computing the detection accuracy hypothesis Hj such that ℓt = Hj . (i.e., the percentage of changed test pixels that were correctly k labeled as changed), the false-alarm rate (i.e., the percent- age of unchanged test pixels that were erroneously labeled 2.4. Initialization by the K&I method as changed) and the overall error rate (i.e., the percentage of K&I is used to generate the initial change map. It automat- erroneously labeled pixels). For each of the two bands, up to ically computes the optimal threshold to be applied to D (in four lower-scale images were generated. As a preliminary ex- order to distinguish between changed and unchanged areas) periment, aimed at assessing the behavior of the method with by minimizing a “criterion function” related to the proba- a very simple multiscale feature extraction, a “neighbor aver- bility of error of a Bayesian binary classiﬁer [10, 11]. For aging” decomposition was used that generates Ws+1 by triv- (a) (b) (c) (d) (e) (f) (g) Fig. 1. Band 4 of the considered data set: images acquired in August (a) and September 1994 (b), difference image (c), and multiscale features extracted by a (2, 8)-order biorthogonal DWT at scales 1 ÷ 4 (d)-(g) (after histogram stretching). ially averaging the pixel intensities of four neighboring pixels in all cases. A similar plot was obtained for band 7 and is in Ws (s = 0, 1, 2, 3). Then, several DWTs were consid- not presented for brevity. The performance was affected by ered, namely, Haar, Daubechies of order ranging in the inter- the choice of the DWT operator. However, for all considered val [2, 30], symlet of order ranging in [2, 20], biorthogonal and DWTs, higher detection accuracies and lower error rates were reverse biorthogonal with the two order parameters ranging in always obtained, as compared with K&I-Filt. The results sug- [1, 6] and [1, 8], respectively, discrete Meyer, and coiﬂet (de- gested that the biorthogonal and reverse biorthogonal families tails about these DWTs can be found in [6]). As an example, provided the highest detection accuracies. Table 1 presents, the multiscale features obtained by applying a (2, 8)-order for each band, the results of three out of the DWTs yielding biorthogonal DWT to band 4 are shown in Figs. 1(d)-1(g), the most accurate change maps. A high 92.71% detection ac- together with the related difference image (Fig. 1(c)). curacy and a 0.41% error rate were obtained on band 4 by us- ing a (2, 8)-order biorthogonal DWT, thus gaining a 20.22% detection accuracy as compared with K&I-Filt. A visual anal- 3.2. Change-detection results ysis of the resulting change maps (see, for instance, Fig. 2(d)) conﬁrms the accurate identiﬁcation of the burnt area with a As shown in Table 1, a high detection accuracy and a low low number of false alarms, and points out that the result is error rate, equal to 91.84% and 0.47%, respectively, were ob- not affected by blocky artifacts. Less accurate results were tained by the proposed method when applied with neighbor obtained with band 7, reaching an 82.23% detection accuracy averaging to band 4. A large 19.35% increase in the detec- and a 1.15% error rate with a (3, 7)-order reverse biorthogo- tion accuracy and a 0.63% reduction in the error rate were nal transform. This conﬁrms a more limited effectiveness of achieved as compared with K&I-Filt. This improvement is band 7 than of band 4 in the considered data set, with respect interpreted as a consequence of the use of multiscale and con- to the problem of burnt forest-area detection. textual information. The same comments are suggested by a visual comparison of the related change maps (Figs. 2(b)- Focusing, for each band, on the DWT providing the high- 2(c)). However, Fig. 2(c) also points out the slight presence of est detection accuracy, the proposed method was applied with undesired blocky artifacts, due to neighbor averaging. Lower all ﬁve scales, while removing the Potts spatial energy com- accuracies, even though better than the ones of K&I-Filt, were ponent (i.e., while setting λ = 0). As compared with the obtained by the proposed technique when applied to band 7 case with the spatial energy, increases in the detection accu- with neighbor-averaging. racy and especially in the false-alarm rate are remarked (Ta- Fig. 4 summarizes the detection accuracies obtained by ble 1). This yields a much worse overall error rate than in the the proposed method, when applied to band 4 with all above- case with the spatial energy. A visual analysis of the related mentioned DWTs. The false-alarm rates were below 0.5% change map (Fig. 2(e)) conﬁrms the presence of many false Table 1. Change-detection performances of the proposed method applied with several DWTs (the values of possible order parameters are in parentheses), its variant with no spatial energy (λ = 0), K&I-Filt, and FFL-ARS. TM Method False-alarm Detection Overall band rate accuracy error rate 4 K&I-Filt 0.08% 72.49% 1.09% neighbor averaging 0.17% 91.84% 0.47% biorthogonal (2, 8) 0.15% 92.71% 0.41% reverse biorthogonal (6, 8) 0.15% 92.29% 0.43% symlet (5) 0.19% 91.30% 0.50% biorthogonal (2, 8) (λ = 0) 1.79% 95.53% 1.89% FFL-ARS 0.70% 86.87% 1.15% 7 K&I-Filt 0.92% 71.29% 1.94% neighbor averaging 0.66% 77.92% 1.45% reverse biorthogonal (3, 7) 0.51% 82.23% 1.15% symlet (4) 0.54% 78.75% 1.31% Daubechies (12) 0.54% 78.62% 1.30% reverse biorthogonal (3, 7) (λ = 0) 1.59% 81.11% 2.23% FFL-ARS 0.84% 63.79% 2.14% alarms when no spatial energy term is used, and points out of the technique proposed here). The results characterized by that spatially more regular maps (especially at the borders be- the lowest overall error rate, as the size of the moving win- tween changed and unchanged areas) are obtained when in- dow was varied between 5 × 5 and 51 × 51, are shown in Ta- cluding this term. These results can be explained by noting ble 1. FFL-ARS provided quite an accurate change map for that the proposed method assumes a Gaussian monomodal band 4, with a higher detection accuracy but also a slightly PDF for each hypothesis, whereas the difference image statis- higher overall error rate than K&I-Filt. However, the method tics is actually the mixture of three populations related to sea, proposed here yielded a lower overall error rate and a higher non-burnt vegetation, and burnt vegetation, respectively. EM detection accuracy than FFL-ARS. A visual analysis of the converges to a solution that correctly detects burnt vegetation related change maps points out an improved spatial regularity as “change” and sea as “no-change” but erroneously splits the of the result given by the proposed technique (see Fig. 2(d)) as non-burnt vegetation mode, thus generating many sparse false compared to the map generated by FFL-ARS (see Fig. 2(f)). alarms. On the other hand, the spatial regularization yielded FFL-ARS gave a slightly lower error rate than the variant of by the Potts energy term penalizes such false alarms. This the proposed method with no spatial energy. This suggests conﬁrms the importance of the MRF approach as a contextual a higher effectiveness of the MRF-based multiscale approach data-fusion tool. as compared to the adaptive multiscale fusion strategy used by FFL-ARS and further conﬁrms the role of the MRF model FFL-ARS was proposed in [5] for unsupervised multi- (similar comments hold for band 7). Note that the extension scale change detection on SAR data. It extracts coarser-scale of FFL-ARS to optical data may be partially heuristic: the features by applying a 4-th order Daubechies SWT to the log- local coefﬁcient of variation is related to speckle statistics in ratio image. Then, for each pixel, the “reliable scales,” i.e., SAR data [5]; when computed on the difference of optical the scales at which possible edges or spatial details can still images, it does not have this data-speciﬁc interpretation, even be appreciated, are adaptively identiﬁed based on a moving- though it still represents a local homogeneity index. window computation of a local coefﬁcient of variation on the ratio image. The features at the reliable scales are av- Finally, focusing again on the DWT with the highest de- eraged together and the resulting image is (manually or auto- tection accuracy, the number of scales was varied in the range matically) thresholded to distinguish changed and unchanged [1, 5] (see Fig. 4). When just one scale is used, the method de- areas. Here, FFL-ARS was adapted to optical data by ap- generates to a fairly standard ICM-based unsupervised classi- plying both SWT (with four lower-scale transforms) and the ﬁcation of the difference image. Higher detection accuracies reliable-scale selection to the difference image, and by using were obtained, as expected, when using all ﬁve scales than in K&I to identify the optimal threshold. This approach was this degenerate case. The differences between the detection used for comparison purposes because no unsupervised mul- accuracies obtained in the two cases were 1.37% and 9.49% tiscale change-detection techniques have been proposed so far with bands 4 and 7, respectively (in both cases the false-alarm for optical data (an exception could be our previous method rates did not exceed 0.8% and 1.5% for bands 4 and 7, respec- in [1], but it may be considered as a more heuristic variant tively). Even slightly higher accuracies were obtained with (a) (b) (c) (d) (e) (f) Fig. 2. Band 4 of the considered data set: test map (a) and change maps obtained by K&I-Filt (b), by the proposed method applied with neighbor averaging (c) and with the biorthogonal transform of order (2, 8) (with (d) or without (e) the spatial energy contribution), and by FFL-ARS (f). Legend: black = “change,” white = “no-change.” two scales on this data set. This conﬁrms the importance of multiscale method developed for SAR change detection [5] multiscale information (with, at least, two scales) in the pro- and adapted here to optical data. The method is based on posed method and also suggests quite a limited sensitivity to stationary wavelet transforms and on an adaptive selection of the number of scales. the optimal scale according to the presence of details and bor- ders. The proposed approach provided more accurate results than this technique, essentially thanks to the improved spatial 4. CONCLUSIONS regularization granted by the adopted Markovian multiscale fusion approach. An unsupervised multiscale contextual change-detection method has been proposed in this paper for optical multi- With all considered wavelets, more accurate change maps temporal images by extracting multiscale wavelet features, were obtained, as compared with the single-scale approach and by combining them through Markovian data fusion. The in [11]. However, the performances were sensitive to the approach proposed in [18] for multiresolution classiﬁcation choice of the DWT and of its possible order parameters (sym- has been modiﬁed and extended here to multiscale change lets, biorthogonal, and reverse biorthogonal wavelets of suit- detection. Experiments on real optical images suggested the able order provided the most accurate change maps on the effectiveness of the proposed technique, which generated ac- considered data set). In this perspective, an interesting future curate change maps, outperforming a single-scale approach extension of this work is the automatic optimization of the based on the application of unsupervised thresholding to the number of scales and of the choice of the DWT. noise-ﬁltered difference image [11]. This improvement was The method has been tested on single-channel images, obtained even when using a trivial neighbor-averaging to ex- but it can also be generalized to multispectral images. From tract the multiscale features; however, blocky artifacts were an analytical viewpoint, the extension is straightforward [18]. generated in this case. On the contrary, no artifacts were Operatively, if several multiscale features are extracted from remarked when using smoother wavelets, which also allowed each band in a multispectral image, the resulting overall num- a further signiﬁcant change-detection accuracy increase to ber of features may be quite large, which could involve di- be obtained as compared with neighbor averaging. These mensionality issues [7]. This would be an interesting aspect results suggest the usefulness of multiscale information for worth being investigated. A further extension of the method change-detection purposes, in order to jointly exploit the could consist in the extraction of textures as spatial-contextual higher robustness to noise at coarser scales and the pres- input features and in the corresponding integration of a non- ence of more precise geometrical details at ﬁner scales. The parametric estimation algorithm for the nongaussian texture proposed technique has also been compared to a previous statistics. Fig. 3. Detection accuracy of the proposed method, when applied to band 4, as a function of the DWT operator (the values of possible order parameters of each DWT are also reported). [9] I NGLADA , J., AND M ERCIER , G. A new statistical similarity measure for change detection in multitemporal SAR images and its extension to multiscale change analysis. IEEE Trans. Geosci. Remote Sensing 45, 5 (2007). [10] K ITTLER , J., AND I LLINGWORTH , J. Minimum error thresh- olding. Pattern Recogn. 19 (1986), 41–47. [11] M ELGANI , F., M OSER , G., AND S ERPICO , S. B. Unsuper- vised change detection methods for remote sensing images. Opt. Eng. 41, 12 (2002), 3288–3297. [12] M OSER , G., AND S ERPICO , S. B. Generalized minimum- Fig. 4. Behavior of the detection accuracy as a function of the error thresholding for unsupervised change detection from number of considered scales, for bands 4 (grey) and 7 (black). SAR amplitude imagery. IEEE Trans. Geosci. Remote Sens- ing 40, 10 (2006), 2972–2982. 5. REFERENCES [13] O UMA , Y. O., J OSAPHAT, S. S., AND TATEISHI , R. Multi- scale remote sensing data segmentation and post-segmentation [1] A NGIATI , E., M OSER , G., AND S ERPICO , S. B. Multiscale change detection based on logical modeling: Theoretical ex- unsupervised change detection by markov random ﬁelds and position and experimental results for forestland cover change wavelet transforms. In Proc. of SPIE-ISPRS XIII, Florence, analysis. Computers and Geosciences 34 (2008), 715–737. Italy, 17-20 September 2007 (2007). [14] P RESS , W. H., T EUKOLSKY, S. A., W ETTERLING , W. T., [2] BAZI , Y., B RUZZONE , L., AND M ELGANI , F. An unsuper- AND F LANNERY, B. P. Numerical recipes in C. Cambridge vised approach based on the generalized Gaussian model to au- University Press, Cambridge (UK), 2002. tomatic change detection in multitemporal SAR images. IEEE [15] R EDNER , R. A., AND WALKER , H. F. Mixture densities, Trans. Geosci. Remote Sensing 43, 4 (2005), 874–887. maximum likelihood, and the EM algorithm. SIAM Review [3] B EAUCHEMIN , M., AND F UNG , K. B. Investigation of mul- 26, 2 (1984), 195–239. tiscale product for change detection in difference images. In [16] S INGH , A. Digital change detection techniques using Proc. of IGARSS-2004, Anchorage, USA (2004), pp. 3853– remotely-sensed data. Int. J. Remote Sens. 10 (1989), 989– 3856. 1003. [4] B ESAG , J. On the statistical analysis of dirty pictures. J. R. [17] S OLBERG , A. H. S., TAXT, T., AND JAIN , A. K. A Markov Statist. Soc. 68 (1986), 259–302. random ﬁeld model for classiﬁcation of multisource satellite [5] B OVOLO , F., AND B RUZZONE , L. A detail-preserving scale- imagery. IEEE Trans. Geosci. Remote Sensing 34, 1 (1996), driven approach to change detection in multitemporal SAR im- 100–113. ages. IEEE Trans. Geosci. Remote Sensing 43, 12 (2005). [18] S TORVIK , G., F JORTOFT, R., AND S OLBERG , A. H. S. A [6] DAUBECHIES , I. Ten lectures on wavelets. SIAM, 1992. Bayesian approach to classiﬁcation of multiresolution remote [7] F UKUNAGA , K. Introduction to statistical pattern recognition. sensing data. IEEE Trans. Geosci. Remote Sensing 43, 3 Academic Press, 1990. (2005), 539–547. [8] H ALL , O., AND H AY, G. J. A multiscale object-speciﬁc ap- proach to digital change detection. Int. J. Appl. Earth Obs. 4, 4 (2003), 311–327.