Gabriele Moser, Elena Angiati, and Sebastiano B. Serpico

                Dept. of Biophysical and Electronic Engineering (DIBE), University of Genoa
                 Via Opera Pia 11a, I-16145, Genova, Italy; e-mail:

                        ABSTRACT                                  be used to distinguish changed and unchanged areas in these
                                                                  difference and (log-)ratio images, either by manual trial-and-
In the context of environmental monitoring and natural or
                                                                  error procedures [16] or by more advanced automatic tech-
industrial disaster management, change-detection methods
                                                                  niques based on parametric Bayesian approaches [2, 11, 12].
represent powerful tools for studying and mapping the evolu-
tion of the Earth’s surface. In order to optimize the accuracy        In order to improve the accuracy of the change maps, a
of the change maps, a multiscale approach can be adopted,         multiscale strategy can be adopted, in which transformed im-
that jointly exploits observations at coarser scales (to glob-    ages at different scales are jointly used [1, 8, 9]. The im-
ally identify changed areas and gain robustness to noise) and     ages at the finest scales are likely to highlight many geomet-
finer scales (to improve the detection of details). In this pa-    rical details, but also to be more affected by noise. Data at
per, a multiscale contextual unsupervised change-detection        coarser scales exhibit less precise details, but a stronger im-
method is proposed for optical images. It is based on discrete    munity to noise. A multiscale approach, exploiting coarser
wavelet transforms and Markov random fields. The image-            scales to globally identify changed areas and finer scales to
differencing approach to change-detection with optical data       improve the detection of details, may represent an effective
is adopted, wavelets are applied to the difference image to ex-   choice. This approach is expected to be particularly promis-
tract multiscale features, and Markovian data fusion is used      ing when dealing with very high resolution images acquired
to integrate both these features and the spatial contextual       by either optical (e.g., QuickBird or IKONOS) or SAR (e.g.,
information in the change-detection process. Expectation-         COSMO/SkyMed, TerraSAR-X) sensors. Multiscale change
maximization and Besag’s algorithms are used to estimate          detection methods for SAR images have been proposed in [9]
the model parameters. Experiments on real optical images          and [5], by using a log-cumulant-based information-theoretic
point out the effectiveness of the method, also as compared       change measure and the combination of undecimated station-
with state-of-the-art techniques.                                 ary wavelet transforms (SWTs) with adaptive optimal scale-
                                                                  selection schemes, respectively. In [1] an unsupervised mul-
    Index Terms— Multiscale change detection, unsuper-            tiscale change-detection method was proposed by applying
vised change detection, discrete wavelet transforms, Markov       a previous unsupervised multiresolution classifier to wavelet
random fields, expectation-maximization, Besag’s algorithm.        transforms of the difference image. Object-oriented analy-
                                                                  sis and watershed segmentation were used in [8] for semi-
                   1. INTRODUCTION                                interactive multiscale change detection on optical data. A
                                                                  semi-interactive approach was also proposed in [13] by com-
Multitemporal remote sensing represents a powerful source         bining wavelets, anisotropic diffusion filtering, and an empir-
of information for monitoring the evolution of the Earth’s sur-   ical procedure for optimal-scale selection. In [3] a heuristic
face, for instance, in applications such as environmental mon-    multiscale change measure for optical data, based on a nor-
itoring or disaster management. A relevant task is the identi-    malized product of several low-pass filtered versions of the
fication of the changes that occurred in a given area between      difference image, was introduced and tested by MonteCarlo
two observation dates [16]. When adopting an unsupervised         simulations.
approach, i.e., when assuming no training data to be available        In this paper, a multiscale contextual unsupervised change-
at any acquisition date, image differencing and image (log-       detection method is proposed for optical images, based on
)ratioing are usually applied to address this task with optical   discrete wavelet transform (DWT) [6] and Markov random
and synthetic aperture radar (SAR) images, respectively [16].     fields (MRFs) [4]. DWT is applied to the difference image
They consist in generating a difference and (log-)ratio image,    to extract multiscale features. It provides a two-dimensional
respectively, by computing pixel by pixel the difference or       multiresolution decomposition of the input (difference) image
(logarithmic) ratio of the pixel intensities in the images ac-    as the superposition of several components, each highlight-
quired at the two observation dates. Thresholding can then        ing the image content at a given scale [6]. MRFs jointly
allows spatial context to be introduced into pixel labeling         change-detection technique proposed in [11] and based on the
problems and different information sources to be fused by           Kittler and Illingworth (K&I) unsupervised thresholding al-
suitably defining “energy functions” [17]. Here, an MRF-             gorithm (see Sec. 2.4) [10, 11].
based approach is proposed by modifying the method intro-
duced in [18] for multiresolution image classification and           2.2. Wavelet transform stage
by combining it with DWTs and image differencing in or-
der to extend it to multiscale change detection. The related        An S-scale multiresolution decomposition of D (S being a
MRF model is characterized by internal parameters that are          predefined number of scales) is obtained by applying a dyadic
estimated by the expectation-maximization (EM) [15] and             DWT. D is decomposed in terms of a low-pass transformed
Besag’s [4] algorithms [18]. The main novelties of the paper        image and of three transformed images conveying high-pass
lie in the extension of the method in [18] from multiresolution     information about fine-scale details along the horizontal im-
classification to multiscale change-detection and in its combi-      age axis, the vertical axis, or both axes. Then, the procedure
nation with wavelets. A similar approach was proposed in our        is recursively applied S times to the low-pass component.
previous work [1] by using a more heuristic combination of          W1 , W2 , . . . , WS are obtained by applying S times the in-
DWT and of the multiresolution classifier in [18]. Unlike [1],       verse transform (IDWT) while neglecting all high-pass com-
here a modified formulation of this classifier is developed           ponents and keeping only the low-pass terms [6]. Thus, as
in order to optimally integrate it with DWT. The proposed           s increases in [1, S], coarser-scale approximations Ws of D
method is experimentally validated on optical satellite im-         are obtained, while the finest scale is D itself (which will also
ages and compared with previous single-scale and multiscale         be denoted by W0 hereafter). More precisely, Ws allows ap-
approaches.                                                         preciating spatial details that are (2s )-times coarser than W0
     The paper is organized as follows. Sec. 2 provides a           (s = 1, 2, . . . , S).
methodological description of the proposed technique and of
the related parameter-estimation issues. Sec. 3 presents the        2.3. MRF-based classification stage
results of the experimental validation and of the comparisons
                                                                    Ws is modeled as a finite set Ws = {usk }N of random k=1
with state-of-the-art techniques. Conclusions are drawn in
                                                                    variables, where usk denotes the intensity of the k-th pixel
Sec. 4.
                                                                    at the s-th scale (s = 0, 1, . . . , S; k = 1, 2, . . . , N ). Similar
                                                                    to the multiresolution approach in [18], the statistical rela-
                   2. METHODOLOGY                                   tionships between images at different scales are modeled in
                                                                    terms of “linear mixtures.” In [18] coregistered multispec-
2.1. Overview of the proposed method                                tral images with different resolutions are used to generate a
                                                                    classification map at the finest resolution. For each coarser-
Let I0 and I1 be two coregistered optical images, composed          resolution image, a set of “virtual” pixel intensities is assumed
of N pixels each and acquired over the same area at times t0        to exist at the finest resolution such that the pixel intensities
and t1 , respectively (t1 > t0 ). We assume I0 and I1 to be         at the coarser resolution are modeled by mosaic averaging
single-channel images; the reformulation in the multichan-          these virtual intensities [18]. Here, this approach is gener-
nel case is straightforward. Change detection is formalized         alized to multiscale analysis by postulating that, for each Ws
as a binary hypothesis testing problem [7], by marking the          (s = 1, 2, . . . , S), a set {˜sk }N of virtual pixel intensities
                                                                                                  u k=1
“change” and “no-change” hypotheses as H1 and H0 , respec-          exists at the 0-th scale, such that:
tively. Image differencing, which generates a difference im-
age D by subtracting pixel-by-pixel the pixel intensities in I0                                    1
                                                                                           usk =                ˜
                                                                                                                usr ,                 (1)
by the pixel intensities in I1 , is adopted [16].                                                  4s
     The key idea of the proposed method lies in generating
a finite set {W1 , W2 , . . . , WS } of multiscale features by ap-   where Qsk is a 2s × 2s window centered on the k-th pixel
plying DWT to D (see Sec. 2.2), and to fuse this multiscale                                                        ˜
                                                                    (k = 1, 2, . . . , N ). If s = 0, the identity u0k = u0k holds.
information by an MRF approach. MRFs also allow the spa-                Since Ws is obtained from W0 through DWT/IDWT, the
tial contextual information to be exploited, thus gaining ro-       pixel values in Ws could be deterministically expressed as
bustness against noise. The MRF classifier proposed in [18]          linear combinations of the pixel values in W0 , provided a
for images acquired at different spatial resolutions is gener-      given DWT operator (e.g., Daubechies, biorthogonal) is cho-
alized here to multiscale change detection (see Sec. 2.3). It                                                             ˜
                                                                    sen. In this perspective, the virtual intensities usk may not
adopts the “iterated conditional mode” (ICM) approach to            be needed. Indeed we propose to adopt the formalization
MRF-based classification, which usually represents a good            based on them, first, because this allows extending the Marko-
tradeoff between accuracy and computational burden [17].            vian formulation and the parameter-optimization procedures
The proposed method is iterative and is initialized by using        already developed in [18] for the multiresolution case. Fur-
the change map generated by applying to D the (single-scale)        thermore, this choice also prevents the need to predefine a
specific DWT at each scale and to plug the analytical ex-               regularization term. The parameter λ tunes the relative im-
pressions of the impulse responses of the related filters in the        portance between the spatial energy term and the multiscale
parameter-estimation process. This allows any DWT to be                contributions. Coherently with the Markovianity property, the
used in the proposed MRF framework and ensures a higher                distribution in Eq. (4) does not depend on all image labels ℓr
flexibility.                                                            with r = k, but only on the labels of the neighbors of the k-th
    For each k-th pixel in W0 , let ℓk ∈ {H0 , H1 } be the             pixel (through mik ) and on the labels ℓr such that r ∈ Qsk
related hypothesis label and xk be the multiscale vector of            for some s ∈ {1, 2, . . . , S} (through µsik (θ) and σsik (θ)).
all corresponding pixel intensities usk (s = 0, 1, . . . , S; k =           We recall that the MRF-based method in [18] assumes the
1, 2, . . . , N ). Given the label configuration {ℓk }N , the vari-
                                                       k=1             single-resolution images to be independent of one another,
ables usk (k = 1, 2, . . . , N ) are assumed to be condition-          when conditioned to the label configuration. Here, this cor-
ally independent and identically distributed. The probabil-            responds to the fact that single-scale energy terms are ad-
ity density function (PDF) of usk , given ℓk = Hi , is mod-            ditively combined in Eq. (5) without modeling the corre-
eled as a Gaussian with mean µsi and variance σsi (s =                 lations among features at different scales. Indeed, since all
0, 1, . . . , S; i = 0, 1). Let the vector θ collect all µsi and σsi   Ws features (s = 1, 2, . . . , S) are obtained from W0 through
parameters. Thanks to the linearity of DWT/IDWT and of the             DWT/IDWT, the conditional independence assumption can-
model in Eq. (1), also the PDF of usk , given ℓk = Hi and the          not be rigorously fulfilled in the present case. We will accept
labels ℓr of the pixels r such that r ∈ Qsk , is easily proved to      it as a simplifying analytical tractability assumption.
be Gaussian (which is a usually accepted model for the statis-              As in [18], the EM and the Besag’s methods are itera-
tics of D in the case of input optical images [11]), with mean         tively used to estimate θ and λ, respectively. EM addresses
and variance parameterized by θ and given by [18]:                     estimation problems characterized by data incompleteness
                                                                       and converges, under mild assumptions, to a local maximum
      µsik (θ) = E{usk |ℓk = Hi ; ℓr , r ∈ Qsk } =                     of the log-likelihood function (even though convergence to a
              = 4−s [nsik µsi + (4s − nsik )µs,1−i ]            (2)    global maximum is not ensured, usually a good solution is
                                                                       obtained) [15]. The Besag’s method estimates the spatial reg-
and                                                                    ularization parameters of an MRF (here, λ) by maximizing a
       2                                                               pseudolikelihood function, obtained by an approximation of
      σsik (θ) = Var{usk |ℓk = Hi ; ℓr , r ∈ Qsk } =
                                                                       the joint likelihood function of all image labels [4].
              = 16−s [nsik σsi + (4s − nsik )σs,1−i ],
                            2                  2
                                                                (3)         The proposed method is iterative and is initialized with
                                                                       the label map given by K&I (see Sec. 2.4). At the t-th iter-
where nsik is the number of pixels r such that r ∈ Qsk and
                                                                       ation, denoting by the superscript “t” the current parameter
ℓr = Hi (k = 1, 2, . . . , N ; s = 0, 1, . . . , S; i = 0, 1).
                                                                       estimates and pixel labels, the following operations are per-
     The label configuration {ℓk }N is assumed to be an
                                      k=1                              formed (t = 0, 1, 2, . . .):
MRF [4]. An isotropic 2nd-order Potts MRF model is
adopted. Therefore, the distribution of ℓk , given xk and                 1. compute the updated estimate λt , given the current la-
all the other image labels is expressed by [18]:                             bels {ℓt }N , by numerically maximizing the follow-
                                                                                    k k=1
                                                                             ing pseudolikelihood function (Besag’s method) [4]:
                                       exp[−Uk (Hi |λ, θ)]
  P {ℓk = Hi |xk ; ℓr , r = k} =      1                         ,                          1
                                      j=0   exp[−Uk (Hj |λ, θ)]                                                exp(λmt )
                                                               (4)             Φt (λ) =                               ik
                                                                                                                               ; (6)
                                                                                          i=0 k:ℓt =Hi
                                                                                                         exp(λmt ) + exp(λmt )
                                                                                                                0k         1k
where:                                                                                           k

                        [usk − µsik (θ)]2           2                     2. compute the updated parameter estimate θt by running
Uk (Hi |λ, θ) =                 2            + ln σsik (θ) −λmik
                             σsik (θ)                                        EM until convergence (see below);
                                                                 (5)      3. update the label of each k-th pixel according to the
is the energy function of the MRF model, mik is the num-                     MRF minimum-energy rule, by setting ℓt+1 as the label
ber of labels equal to Hi in the 2nd-order neighborhood of                                                               t
                                                                             Hi that corresponds to the lowest value of Uk (Hi |λt , θt )
the k-th pixel (k = 1, 2, . . . , N ; i = 0, 1), and λ is a positive         (k = 1, 2, . . . , N ; i = 0, 1) [17].
parameter. The energy is a linear combination of (S + 2) con-
tributions: the last term is related to the spatial contextual in-         Steps 1-3 are iterated until the percentage of pixels with
formation, represented by mik according to the Potts model,            different labels in successive iterations goes below a prede-
and the other terms are related to the information conveyed            fined threshold (here, equal to 0.1%). At each t-th itera-
by the Hi -conditional distributions of the pixel values at all        tion (t = 0, 1, . . .), the numerical maximization in step 1 is
considered scales. The Potts energy contribution encourages            solved by the Newton-Raphson algorithm, which can be ap-
the generation of connected regions whose pixels are assigned          plied to this end because Φt (·) is a twice continuously dif-
to the same hypothesis, thus essentially representing a spatial        ferentiable function [14]. Given the resulting estimate λt of
λ and the current label configuration {ℓt }N , EM computes
                                             k k=1                                initialization purposes, the initial map is used to define the
in step 2 a sequence {θqt }q=0,1,... of estimates of θ. Thanks                    initial context of each k-th pixel for the computation of mik
to the convergence properties of EM, θqt converges for q →                        (i = 0, 1; k = 1, 2, . . . , N ) and the initial estimate of the pa-
+∞ to a limit θ which is the updated estimate of θ resulting                      rameter vector θ. Specifically, a desired choice would be to
from step 2. Operatively, EM is iterated until the distance                       compute initial estimates of µsi and σsi as the sample mean
           ¯            ¯
between θq+1,t and θqt goes below a predefined threshold                                                        ˜
                                                                                  and sample variance of usk on the set of samples assigned to
(here, equal to 0.001) and is initialized by setting θ0t = θt−1                   Hi in the initial map (i = 0, 1; s = 0, 1, . . . , S). However,
(t = 1, 2, . . .).                                                                                                        ˜
                                                                                  the values of the virtual intensities usk are available only for
    Let us denote by ηsk and ξsk the mean and variance of the                     s = 0. So, as a feasible simple approach, initial estimates
virtual intensity usk , given all multiscale observations and all                 of µsi and σsi for s = 1, 2, . . . , S are calculated as the sam-
image labels, i.e. (s = 0, 1, . . . , S; k = 1, 2, . . . , N ):                   ple mean and sample variance of usk on the set of samples
                                                                                  assigned to Hi in the initial map (i = 0, 1).
            ηsk = E{˜sk |ℓr , xr , r = 1, 2, . . . , N }
           ξsk = Var{˜sk |ℓr , xr , r = 1, 2, . . . , N }
                     u                                                      (7)
                                                                                                3. EXPERIMENTAL RESULTS
According to the identity u0k = u0k and to the fact that
u0k is among the components of xk , one has η0k = u0k                             3.1. Data set and experimental set-up
and ξ0k = 0. Thanks to the close similarity between the
linear-mixture model of Eq. (1) and the similar model used                        A multitemporal data set consisting of a pair of coregistered
in [18], one can prove through the same analytical calcula-                       256 × 256 multispectral Landsat-5 TM images acquired over
tions as in [18] that, at each q-th EM iteration, the components                  the western part of the Island of Elba (Italy) in August and
             ¯ qt    ¯                            qt     qt
µqt and σsi of θqt as well as estimates ηsk and ξsk of ηsk
¯si                                                                               September 1994, was used for experiments (Figs. 1(a)-1(b)).
and ξsk , respectively, are updated as follows (i = 0, 1; s =                     During the period between the acquisition dates, a fire oc-
0, 1, . . . , S; k = 1, 2, . . . , N ):                                           curred in a forested area in the image. TM bands 4 and 7
                                                                                  were available. Experiments were separately performed with
                   qt (¯sj )2 usk − µsjk (θqt )
                  µ + σ                          ¯                               each band in order to focus on a more difficult classification
           qt       ¯sj           s
                                       ·      ¯
                                          2 (θ qt )
                                                       for s ≥ 1
        ηsk =                   4        σsjk                                     problem, as compared with the two-band case, and to conse-
                    u0k                                for s = 0                  quently better highlight the possible advantages of multiscale
                                                                                  processing as compared to single-scale approaches.
                                                                                       The proposed method is based on the fusion of multiscale
                           δ(ℓt , Hi )ηsr
                                                                                  and contextual information. Its performances were first eval-
      µq+1,t =
      ¯si               N
                                                                            (8)   uated as a function of the choice of the DWT. Then exper-
                               δ(ℓt , Hi )                                        imental comparisons were conducted with: (i) the technique
                       r=1                                                        developed in [11] (which is based on the application of K&I to
                                        σ qt
                                       (¯sj )4                                    the moving-average filtered difference image and which will
               qt 2
              (¯ ) −
                σ                                        for s ≥ 1                be denoted hereafter by K&I-Filt) to assess the advantages of
      ξsk =           sj                 2    ¯
                                  16s · σsjk (θqt )
                                                                                 the proposed method as compared to a single-scale one; (ii)
                  0                                      for s = 0
                                                                                  a variant of the proposed technique based on an MRF model
                           N                                                      with no spatial energy (i.e., λ = 0) to focus on the relative
                                 δ(ℓt , Hi )[ξsr + (ηsr − µq+1,t )2 ]
                                              qt     qt
                                                          ¯si                     role of multiscale and contextual information; (iii) a version
                           r=1                                                    adapted to optical data of the “fusion at feature level – all re-
       σ q+1,t
      (¯si )2 =                                                         ,
                                             N                                    liable scales” (FFL-ARS) method in [5]. The behavior of the
                                                 δ(ℓt , Hi )
                                                    r                             proposed technique as a function of the number of scales was
                                         r=1                                      also analyzed.
where δ(·, ·) is the Kronecker symbol (i.e., δ(a, b) = 1 if                            Given the exhaustive test map in Fig. 2(a), all results were
a = b and δ(a, b) = 0 otherwise) and j is the index of the                        quantitatively assessed by computing the detection accuracy
hypothesis Hj such that ℓt = Hj .                                                 (i.e., the percentage of changed test pixels that were correctly
                                                                                  labeled as changed), the false-alarm rate (i.e., the percent-
                                                                                  age of unchanged test pixels that were erroneously labeled
2.4. Initialization by the K&I method
                                                                                  as changed) and the overall error rate (i.e., the percentage of
K&I is used to generate the initial change map. It automat-                       erroneously labeled pixels). For each of the two bands, up to
ically computes the optimal threshold to be applied to D (in                      four lower-scale images were generated. As a preliminary ex-
order to distinguish between changed and unchanged areas)                         periment, aimed at assessing the behavior of the method with
by minimizing a “criterion function” related to the proba-                        a very simple multiscale feature extraction, a “neighbor aver-
bility of error of a Bayesian binary classifier [10, 11]. For                      aging” decomposition was used that generates Ws+1 by triv-
                       (a)                         (b)                             (c)                         (d)

                                     (e)                            (f)                          (g)

Fig. 1. Band 4 of the considered data set: images acquired in August (a) and September 1994 (b), difference image (c), and
multiscale features extracted by a (2, 8)-order biorthogonal DWT at scales 1 ÷ 4 (d)-(g) (after histogram stretching).

ially averaging the pixel intensities of four neighboring pixels          in all cases. A similar plot was obtained for band 7 and is
in Ws (s = 0, 1, 2, 3). Then, several DWTs were consid-                   not presented for brevity. The performance was affected by
ered, namely, Haar, Daubechies of order ranging in the inter-             the choice of the DWT operator. However, for all considered
val [2, 30], symlet of order ranging in [2, 20], biorthogonal and         DWTs, higher detection accuracies and lower error rates were
reverse biorthogonal with the two order parameters ranging in             always obtained, as compared with K&I-Filt. The results sug-
[1, 6] and [1, 8], respectively, discrete Meyer, and coiflet (de-          gested that the biorthogonal and reverse biorthogonal families
tails about these DWTs can be found in [6]). As an example,               provided the highest detection accuracies. Table 1 presents,
the multiscale features obtained by applying a (2, 8)-order               for each band, the results of three out of the DWTs yielding
biorthogonal DWT to band 4 are shown in Figs. 1(d)-1(g),                  the most accurate change maps. A high 92.71% detection ac-
together with the related difference image (Fig. 1(c)).                   curacy and a 0.41% error rate were obtained on band 4 by us-
                                                                          ing a (2, 8)-order biorthogonal DWT, thus gaining a 20.22%
                                                                          detection accuracy as compared with K&I-Filt. A visual anal-
3.2. Change-detection results                                             ysis of the resulting change maps (see, for instance, Fig. 2(d))
                                                                          confirms the accurate identification of the burnt area with a
As shown in Table 1, a high detection accuracy and a low
                                                                          low number of false alarms, and points out that the result is
error rate, equal to 91.84% and 0.47%, respectively, were ob-
                                                                          not affected by blocky artifacts. Less accurate results were
tained by the proposed method when applied with neighbor
                                                                          obtained with band 7, reaching an 82.23% detection accuracy
averaging to band 4. A large 19.35% increase in the detec-
                                                                          and a 1.15% error rate with a (3, 7)-order reverse biorthogo-
tion accuracy and a 0.63% reduction in the error rate were
                                                                          nal transform. This confirms a more limited effectiveness of
achieved as compared with K&I-Filt. This improvement is
                                                                          band 7 than of band 4 in the considered data set, with respect
interpreted as a consequence of the use of multiscale and con-
                                                                          to the problem of burnt forest-area detection.
textual information. The same comments are suggested by
a visual comparison of the related change maps (Figs. 2(b)-                   Focusing, for each band, on the DWT providing the high-
2(c)). However, Fig. 2(c) also points out the slight presence of          est detection accuracy, the proposed method was applied with
undesired blocky artifacts, due to neighbor averaging. Lower              all five scales, while removing the Potts spatial energy com-
accuracies, even though better than the ones of K&I-Filt, were            ponent (i.e., while setting λ = 0). As compared with the
obtained by the proposed technique when applied to band 7                 case with the spatial energy, increases in the detection accu-
with neighbor-averaging.                                                  racy and especially in the false-alarm rate are remarked (Ta-
    Fig. 4 summarizes the detection accuracies obtained by                ble 1). This yields a much worse overall error rate than in the
the proposed method, when applied to band 4 with all above-               case with the spatial energy. A visual analysis of the related
mentioned DWTs. The false-alarm rates were below 0.5%                     change map (Fig. 2(e)) confirms the presence of many false
Table 1. Change-detection performances of the proposed method applied with several DWTs (the values of possible order
parameters are in parentheses), its variant with no spatial energy (λ = 0), K&I-Filt, and FFL-ARS.
                      TM                     Method                   False-alarm Detection     Overall
                      band                                                rate       accuracy error rate
                       4                     K&I-Filt                   0.08%         72.49%    1.09%
                                       neighbor averaging               0.17%         91.84%    0.47%
                                       biorthogonal (2, 8)              0.15%         92.71%    0.41%
                                   reverse biorthogonal (6, 8)          0.15%         92.29%    0.43%
                                            symlet (5)                  0.19%         91.30%    0.50%
                                   biorthogonal (2, 8) (λ = 0)          1.79%         95.53%    1.89%
                                            FFL-ARS                     0.70%         86.87%    1.15%
                       7                     K&I-Filt                   0.92%         71.29%    1.94%
                                       neighbor averaging               0.66%         77.92%    1.45%
                                   reverse biorthogonal (3, 7)          0.51%         82.23%    1.15%
                                            symlet (4)                  0.54%         78.75%    1.31%
                                        Daubechies (12)                 0.54%         78.62%    1.30%
                             reverse biorthogonal (3, 7) (λ = 0)        1.59%         81.11%    2.23%
                                            FFL-ARS                     0.84%         63.79%    2.14%

alarms when no spatial energy term is used, and points out          of the technique proposed here). The results characterized by
that spatially more regular maps (especially at the borders be-     the lowest overall error rate, as the size of the moving win-
tween changed and unchanged areas) are obtained when in-            dow was varied between 5 × 5 and 51 × 51, are shown in Ta-
cluding this term. These results can be explained by noting         ble 1. FFL-ARS provided quite an accurate change map for
that the proposed method assumes a Gaussian monomodal               band 4, with a higher detection accuracy but also a slightly
PDF for each hypothesis, whereas the difference image statis-       higher overall error rate than K&I-Filt. However, the method
tics is actually the mixture of three populations related to sea,   proposed here yielded a lower overall error rate and a higher
non-burnt vegetation, and burnt vegetation, respectively. EM        detection accuracy than FFL-ARS. A visual analysis of the
converges to a solution that correctly detects burnt vegetation     related change maps points out an improved spatial regularity
as “change” and sea as “no-change” but erroneously splits the       of the result given by the proposed technique (see Fig. 2(d)) as
non-burnt vegetation mode, thus generating many sparse false        compared to the map generated by FFL-ARS (see Fig. 2(f)).
alarms. On the other hand, the spatial regularization yielded       FFL-ARS gave a slightly lower error rate than the variant of
by the Potts energy term penalizes such false alarms. This          the proposed method with no spatial energy. This suggests
confirms the importance of the MRF approach as a contextual          a higher effectiveness of the MRF-based multiscale approach
data-fusion tool.                                                   as compared to the adaptive multiscale fusion strategy used
                                                                    by FFL-ARS and further confirms the role of the MRF model
    FFL-ARS was proposed in [5] for unsupervised multi-             (similar comments hold for band 7). Note that the extension
scale change detection on SAR data. It extracts coarser-scale       of FFL-ARS to optical data may be partially heuristic: the
features by applying a 4-th order Daubechies SWT to the log-        local coefficient of variation is related to speckle statistics in
ratio image. Then, for each pixel, the “reliable scales,” i.e.,     SAR data [5]; when computed on the difference of optical
the scales at which possible edges or spatial details can still     images, it does not have this data-specific interpretation, even
be appreciated, are adaptively identified based on a moving-         though it still represents a local homogeneity index.
window computation of a local coefficient of variation on
the ratio image. The features at the reliable scales are av-             Finally, focusing again on the DWT with the highest de-
eraged together and the resulting image is (manually or auto-       tection accuracy, the number of scales was varied in the range
matically) thresholded to distinguish changed and unchanged         [1, 5] (see Fig. 4). When just one scale is used, the method de-
areas. Here, FFL-ARS was adapted to optical data by ap-             generates to a fairly standard ICM-based unsupervised classi-
plying both SWT (with four lower-scale transforms) and the          fication of the difference image. Higher detection accuracies
reliable-scale selection to the difference image, and by using      were obtained, as expected, when using all five scales than in
K&I to identify the optimal threshold. This approach was            this degenerate case. The differences between the detection
used for comparison purposes because no unsupervised mul-           accuracies obtained in the two cases were 1.37% and 9.49%
tiscale change-detection techniques have been proposed so far       with bands 4 and 7, respectively (in both cases the false-alarm
for optical data (an exception could be our previous method         rates did not exceed 0.8% and 1.5% for bands 4 and 7, respec-
in [1], but it may be considered as a more heuristic variant        tively). Even slightly higher accuracies were obtained with
                      (a)                        (b)                        (c)                         (d)

                                                 (e)                        (f)

Fig. 2. Band 4 of the considered data set: test map (a) and change maps obtained by K&I-Filt (b), by the proposed method
applied with neighbor averaging (c) and with the biorthogonal transform of order (2, 8) (with (d) or without (e) the spatial
energy contribution), and by FFL-ARS (f). Legend: black = “change,” white = “no-change.”

two scales on this data set. This confirms the importance of       multiscale method developed for SAR change detection [5]
multiscale information (with, at least, two scales) in the pro-   and adapted here to optical data. The method is based on
posed method and also suggests quite a limited sensitivity to     stationary wavelet transforms and on an adaptive selection of
the number of scales.                                             the optimal scale according to the presence of details and bor-
                                                                  ders. The proposed approach provided more accurate results
                                                                  than this technique, essentially thanks to the improved spatial
                    4. CONCLUSIONS                                regularization granted by the adopted Markovian multiscale
                                                                  fusion approach.
An unsupervised multiscale contextual change-detection
method has been proposed in this paper for optical multi-             With all considered wavelets, more accurate change maps
temporal images by extracting multiscale wavelet features,        were obtained, as compared with the single-scale approach
and by combining them through Markovian data fusion. The          in [11]. However, the performances were sensitive to the
approach proposed in [18] for multiresolution classification       choice of the DWT and of its possible order parameters (sym-
has been modified and extended here to multiscale change           lets, biorthogonal, and reverse biorthogonal wavelets of suit-
detection. Experiments on real optical images suggested the       able order provided the most accurate change maps on the
effectiveness of the proposed technique, which generated ac-      considered data set). In this perspective, an interesting future
curate change maps, outperforming a single-scale approach         extension of this work is the automatic optimization of the
based on the application of unsupervised thresholding to the      number of scales and of the choice of the DWT.
noise-filtered difference image [11]. This improvement was             The method has been tested on single-channel images,
obtained even when using a trivial neighbor-averaging to ex-      but it can also be generalized to multispectral images. From
tract the multiscale features; however, blocky artifacts were     an analytical viewpoint, the extension is straightforward [18].
generated in this case. On the contrary, no artifacts were        Operatively, if several multiscale features are extracted from
remarked when using smoother wavelets, which also allowed         each band in a multispectral image, the resulting overall num-
a further significant change-detection accuracy increase to        ber of features may be quite large, which could involve di-
be obtained as compared with neighbor averaging. These            mensionality issues [7]. This would be an interesting aspect
results suggest the usefulness of multiscale information for      worth being investigated. A further extension of the method
change-detection purposes, in order to jointly exploit the        could consist in the extraction of textures as spatial-contextual
higher robustness to noise at coarser scales and the pres-        input features and in the corresponding integration of a non-
ence of more precise geometrical details at finer scales. The      parametric estimation algorithm for the nongaussian texture
proposed technique has also been compared to a previous           statistics.
Fig. 3. Detection accuracy of the proposed method, when applied to band 4, as a function of the DWT operator (the values of
possible order parameters of each DWT are also reported).

                                                                         [9] I NGLADA , J., AND M ERCIER , G. A new statistical similarity
                                                                             measure for change detection in multitemporal SAR images
                                                                             and its extension to multiscale change analysis. IEEE Trans.
                                                                             Geosci. Remote Sensing 45, 5 (2007).
                                                                        [10] K ITTLER , J., AND I LLINGWORTH , J. Minimum error thresh-
                                                                             olding. Pattern Recogn. 19 (1986), 41–47.
                                                                        [11] M ELGANI , F., M OSER , G., AND S ERPICO , S. B. Unsuper-
                                                                             vised change detection methods for remote sensing images.
                                                                             Opt. Eng. 41, 12 (2002), 3288–3297.
                                                                        [12] M OSER , G., AND S ERPICO , S. B. Generalized minimum-
Fig. 4. Behavior of the detection accuracy as a function of the              error thresholding for unsupervised change detection from
number of considered scales, for bands 4 (grey) and 7 (black).               SAR amplitude imagery. IEEE Trans. Geosci. Remote Sens-
                                                                             ing 40, 10 (2006), 2972–2982.
                      5. REFERENCES                                     [13] O UMA , Y. O., J OSAPHAT, S. S., AND TATEISHI , R. Multi-
                                                                             scale remote sensing data segmentation and post-segmentation
 [1] A NGIATI , E., M OSER , G., AND S ERPICO , S. B. Multiscale             change detection based on logical modeling: Theoretical ex-
     unsupervised change detection by markov random fields and                position and experimental results for forestland cover change
     wavelet transforms. In Proc. of SPIE-ISPRS XIII, Florence,              analysis. Computers and Geosciences 34 (2008), 715–737.
     Italy, 17-20 September 2007 (2007).                                [14] P RESS , W. H., T EUKOLSKY, S. A., W ETTERLING , W. T.,
 [2] BAZI , Y., B RUZZONE , L., AND M ELGANI , F. An unsuper-                AND F LANNERY, B. P. Numerical recipes in C. Cambridge
     vised approach based on the generalized Gaussian model to au-           University Press, Cambridge (UK), 2002.
     tomatic change detection in multitemporal SAR images. IEEE         [15] R EDNER , R. A., AND WALKER , H. F. Mixture densities,
     Trans. Geosci. Remote Sensing 43, 4 (2005), 874–887.                    maximum likelihood, and the EM algorithm. SIAM Review
 [3] B EAUCHEMIN , M., AND F UNG , K. B. Investigation of mul-               26, 2 (1984), 195–239.
     tiscale product for change detection in difference images. In
                                                                        [16] S INGH , A.    Digital change detection techniques using
     Proc. of IGARSS-2004, Anchorage, USA (2004), pp. 3853–
                                                                             remotely-sensed data. Int. J. Remote Sens. 10 (1989), 989–
 [4] B ESAG , J. On the statistical analysis of dirty pictures. J. R.
                                                                        [17] S OLBERG , A. H. S., TAXT, T., AND JAIN , A. K. A Markov
     Statist. Soc. 68 (1986), 259–302.
                                                                             random field model for classification of multisource satellite
 [5] B OVOLO , F., AND B RUZZONE , L. A detail-preserving scale-             imagery. IEEE Trans. Geosci. Remote Sensing 34, 1 (1996),
     driven approach to change detection in multitemporal SAR im-            100–113.
     ages. IEEE Trans. Geosci. Remote Sensing 43, 12 (2005).
                                                                        [18] S TORVIK , G., F JORTOFT, R., AND S OLBERG , A. H. S. A
 [6] DAUBECHIES , I. Ten lectures on wavelets. SIAM, 1992.                   Bayesian approach to classification of multiresolution remote
 [7] F UKUNAGA , K. Introduction to statistical pattern recognition.         sensing data. IEEE Trans. Geosci. Remote Sensing 43, 3
     Academic Press, 1990.                                                   (2005), 539–547.
 [8] H ALL , O., AND H AY, G. J. A multiscale object-specific ap-
     proach to digital change detection. Int. J. Appl. Earth Obs. 4,
     4 (2003), 311–327.

To top