Component Analysis Approach to Estimation of Tissue Intensity Distributions of 3D Images by n.rajbharath


									838                                                                                IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011

       Component Analysis Approach to Estimation of
         Tissue Intensity Distributions of 3D Images
                      Vitali Zagorodnov*, Member, IEEE, and Arridhana Ciptadi, Student Member, IEEE

   Abstract—Many segmentation algorithms in medical imaging                      ally in the context of brain tissue segmentation (FreeSurfer [20],
rely on accurate modeling and estimation of tissue intensity proba-              SIENAX [21] part of FSL [22], SPM [16]).
bility density functions. Gaussian mixture modeling, currently the                  The main drawback of FGM models is that the tissue in-
most common approach, has several drawbacks, such as reliance
on a Gaussian model and iterative local optimization used to                     tensity distributions do not always have a Gaussian form. The
estimate the model parameters. It also does not take advantage of                noise in magnetic resonance (MR) images is known to be Ri-
substantially larger amount of data provided by 3D acquisitions,                 cian rather than Gaussian [23]. But the largest deviation from
which are becoming standard in clinical environment. We propose                  Gaussianity is due to presence of partial volume (PV) voxels [6],
a novel and completely non-parametric algorithm to estimate                      [8], [24]–[27]. One way to model such voxels is by representing
the tissue intensity probabilities in 3D images. Instead of relying
on traditional framework of iterating between classification and                  their distribution using a uniform mixture of “pure” distributions
estimation, we pose the problem as an instance of a blind source                 [8], [25], [28]. However, this complicates the optimization as the
separation problem, where the unknown distributions are treated                  functional involved becomes considerably more nonlinear. Uni-
as sources and histograms of image subvolumes as mixtures. The                   form mixture assumption has also been questioned [27]. To sim-
new approach performed well on synthetic data and real magnetic                  plify the problem several researchers suggested modeling par-
resonance imaging (MRI) scans of the brain, robustly capturing
intensity distributions of even small image structures and partial               tial volume voxels intensity distributions as Gaussian, which ap-
volume voxels.                                                                   pears to be a good approximation for sufficiently noisy images
                                                                                 [6]. This approximation has been incorporated into the most re-
   Index Terms—Blind source separation, Gaussian mixtures,
image segmentation, magnetic resonance imaging (MRI), tissue
                                                                                 cent versions of SPM [16]. However, a more recent study [29]
intensity distributions.                                                         revealed that model [28] can still achieve better fit.
                                                                                    Another issue associated with              framework is poten-
                                                                                 tial convergence to a poor local optimum, which means a suf-
                           I. INTRODUCTION                                       ficiently close parameter initialization is usually required [4],
                                                                                 [13], especially for distribution means [29]. The convergence of
                                                                                 the EM algorithm to a more meaningful optimum can be im-
         ANY segmentation algorithms in medical imaging rely
M        on accurate modeling and estimation of tissue intensity
probability density functions (pdfs) [1]–[12], usually in the con-
                                                                                 proved by including prior information in the classification step,
                                                                                 such as pixel correlations [30], MRF priors [7], [25], [31], [32]
                                                                                 or probabilistic atlas [5], [9], [31]–[33]. Even though this ap-
text of statistical region-based segmentation. Commonly, tissue                  proach is helpful, especially for very noisy images [7], it can
intensity probabilities are modeled using the finite mixture (FM)                 also introduce bias in estimation [30]. And while probabilistic
model [2], [13], [14], and its special case the finite Gaussian                   atlas is useful when segmenting brain tissues [5] or abdominal
mixture (FGM) model [15], [16]. In these models the intensity                    organs [9], its construction is not always possible, as is the case
pdf of each tissue class is represented by a parametric (e.g.,                   in the segmentation of brain lesions [34], tumor detection [35],
Gaussian in the case of FGM) function called the component                       or localization of fMRI activations.
density, while the intensity pdf of the whole image is modeled by                   Finally, the               approach often fails to take advan-
a weighted sum of the tissue component densities. The fitting is                  tage of substantially larger amount of data present in 3D images,
usually done using Expectation Maximization (EM) algorithm                       such as those obtained by magnetic resonance (MR) and X-ray
[1], [3], [6], [9]–[11], [17]–[19], which iterates between classi-               computed tomography (CT) scanning techniques. We propose a
fication and parameter estimation until a stable state is reached.                novel nonparametric algorithm to estimate tissue intensity prob-
The FM models combined with EM algorithm have been incor-                        abilities in 3D images that completely departs from traditional
porated into many existing image segmentation pipelines, usu-                    classification-estimation framework. To illustrate the main idea
                                                                                 behind our approach, consider the following introductory ex-
   Manuscript received October 19, 2010; accepted November 29, 2010. Date        ample.
of publication December 17, 2010; date of current version March 02, 2011. This      Shown in Fig. 1 are the histograms of a 3D T1-weighted MR
work was supported by SBIC C-012/2006 grant provided by A*STAR, Singa-
pore (Agency for Science and Technology and Research). Asterisk indicates
                                                                                 image and two of its 2D slices. The observed variability in the
corresponding author.                                                            shape of 2D histograms is due to varying tissue proportions
   *V. Zagorodnov is with the School of Computer Engineering, Nanyang Tech-      across the slices. For example, the slice in Fig. 1(b) contains
nological University, 639798 Singapore.                                          relatively small amount of cerebro-spinal fluid (CSF) and hence
   A. Ciptadi is with the School of Computer Engineering, Nanyang Technolog-
ical University, 639798 Singapore.                                               the lowest intensity peak (see Fig. 1(c) for reference) is practi-
   Digital Object Identifier 10.1109/TMI.2010.2098417                             cally missing from its histogram. The proportion of CSF is in-
                                                               0278-0062/$26.00 © 2010 IEEE

                                                                                                    II. PROBLEM STATEMENT
                                                                                    Let    be a 3D image volume partitioned into a set of
                                                                                 subvolumes                    . We assume the voxel intensities of
                                                                                    can take distinct values and are drawn from          unknown
                                                                                 probability mass functions (pmf)                         . For ex-
                                                                                 ample, a brain volume can be assumed to have three main tis-
                                                                                 sues, white matter (WM), gray matter (GM), and cerebro-spinal
                                                                                 fluid (CSF), so          . For an 8-bit acquisition,         . Sub-
                                                                                 volumes can be chosen arbitrary, for example as coronal, sagittal
                                                                                 or transverse slices of the 3D volume.
                                                                                    Let            be the -bin histogram of , normalized to
                                                                                 sum to 1. Then


                                                                                 where      is the th tissue proportion in the th subvolume,
                                                                                           , and is the noise term that reflects the difference
                                                                                 between the actual probability distribution and its finite approx-
                                                                                 imation by a histogram.
                                                                                   Let                       and              . Rewriting (1) in a
                                                                                 matrix form yields

Fig. 1. Histograms of a 3D brain image and several of its slices. (a) 3D Image                                                                 (2)
and its histogram. (b) Transverse slice 128 and its histogram. (c) Transverse
slice 152 and its histogram.

                                                                                 This is identical to blind source separation (BSS) formulation
creased in the slice shown in Fig. 1(c) due to the presence of                   with subvolume histograms as mixtures and unknown tissue
ventricles, leading to reappearance of the lowest intensity peak.                pmf’s as sources. Our goal is to estimate               as well
This slice, however, contains only a small amount of gray matter                 as their mixing weights given .
(GM), thus lowering the middle peak of the histogram. While                         Our solution requires several assumptions, most of which
this variability can potentially provide useful information for                  are general to a BSS problem:                       , and suffi-
mixture estimation, it is traditionally discarded by performing                  cient variability of mixing proportions . These can be easily
estimation directly on the histogram of the whole volume.                        satisfied with proper choice of partitioning and histogram di-
   The proposed approach treats the histograms of 2D slices (or                  mensionality. We also assume that distributions
any other subvolumes) as mixture realizations of the compo-                      have different means and are sufficiently separated from each
nent densities. This allows stating the unmixing problem in a                    other, where the meaning of sufficient separation is detailed in
blind source separation (BSS) framework [37], [38]. To solve                     Section III. These assumptions are not very restrictive and are
the problem we use a framework that is similar to that of in-                    generally satisfied for medical images [15], [40].
dependent component analysis (ICA) [39], but without relying
on the independence assumption. Instead we use the fact that
                                                                                                    III. PROPOSED SOLUTION
underlying components must be valid probability distributions
with different means, which results in a simple convex linear                       BSS problem has been studied extensively in recent years,
optimization problem that guarantees convergence to a global                     with several solutions proposed for selected special cases, e.g.,
optimum (ICA’s iterative procedure, in comparison, converges                     factor analysis (FA) [41] for Gaussian sources and independent
only to a local optimum). Our approach provides a promising al-                  factor analysis (IFA) [42] or independent component analysis
ternative for estimating component densities in 3D images that                   (ICA) [39] for independent non-Gaussian sources. The general
is more accurate compared to the state-of-the-art approaches.                    framework of performing decomposition (2), as exemplified by
   We note that a preliminary version of this algorithm was                      ICA, can be briefly summarized as follows.
previously reported in a conference paper [36]. The present                        1) Center matrix so that all columns sum to zero, by sub-
paper provides a complete description of the algorithm, intro-                        tracting row vector                            from all rows
duces covariance matrix correction to improve its performance,                        of . Let designate the centered version of .
and presents a more complete set of validation experiments,                        2) Apply principal component analysis (PCA) to
including evaluation of effects of intensity nonuniformity and                             . Practically, this can be done by performing SVD de-
individual algorithm components/settings (subvolume size,                             composition,                     and setting             and
relaxation of the number principal components, correction of                                    . The rows of matrix contain a set of orthogonal
the covariance matrix).                                                               principal components and matrix         contains the weights.
840                                                                              IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011

Fig. 2. Explanation of definitions used in Lemma 1. (a) Two original distributions f and f with means  and  , respectively, (b) Estimated PCA components
p and p , (c) Various linear combinations of PCA components.

  3) Decomposition of the original (noncentered) matrix                       A. Estimating Components With Largest and Smallest Means
           can be done by changing        to                      ,
                                                                                 Let and        be the components with the smallest and largest
     where is a column vector where all elements are equal to
                                                                              mean respectively. Then these can be estimated by minimizing
                                                                              (for ) or maximizing (for          ) the mean of a linear combi-
  4) Truncate and to preserve a smaller (specified) number
                                                                              nation of ’s, where coefficients of linear combination satisfy
     of PCA components corresponding to the largest eigen-
                                                                              constraints (4) and (5). This follows from the following lemma,
     values of         . The square roots of these eigenvalues are
                                                                              which proves this statement for          , and its corollary, which
     located on the diagonal of . The resultant decomposition
                                                                              extends the result to arbitrary .
     has the form                  , where the “hat” symbol refers
                                                                                 Lemma 1: Let            and             be the means of under-
     to truncation, and appearance of error vector is due to
                                                                              lying probability distributions. Let                          be an
     discarding of the components.
                                                                              arbitrary linear combination of ’s that satisfies
  5) Further decompose as                    , so that            .
                                                                              and            for            . Then,
     In practice, this step is performed in the opposite direction,

                                                                       (3)                                                                           (6)

      where rows of are estimated one-by-one by maximizing                    where
      some measure of non-Gaussianity of the resultant compo-
      nents (rows of matrix ).
   Note that this approach cannot be applied directly to our
problem, because the rows of (step 5) are not constrained to
represent valid probability distribution functions (non-negative              The equalities are achieved when                            and
and sum to 1). Furthermore, maximizing non-Gaussianity                                                .
might not result in a meaningful solution as our source com-                      Proof: Explanations of the terms used in the lemma state-
ponents are neither Gaussian nor independent. However, the                    ment are given in Fig. 2. From               and (3) it follows
ICA framework can be adapted by preserving steps 1–4, and                     that is also a linear combination of   ’s
modifying step 5 as follows.
   Let                be a set of preserved PCA components,
then                          . Let           be a column vector
where all elements are equal to 1. Since the component densities
   represent valid pmf’s and hence are non-negative and sum to                Then
1, this imposes equality and         inequality constraints on the
elements of

                                                                       (4)    and


   These constraints can be combined with the assumptions that                                                                                       (7)
component densities have different means and are sufficiently
separated from each other to estimate . This is done in the                   Let            and                . Then,                       and
following sections.                                                             is minimized by making          as large as possible. However, the

largest possible   is controlled by non-negativity of                B. Estimating Remaining Components
                                                                        In this section we show that all remaining components can
                                                                     be estimated iteratively, one-by-one, by minimizing their inner
                                                                     product (effectively overlap) with the components that have al-
                                                                     ready been estimated, where optimization is reduced to solving
                                                                     another Linear Programming problem. The following lemma
                                                                     shows this for         , while its corollary extends the result to
This leads to the left side of inequality (6). The right side of             . Here throughout       designates the inner product and
inequality (6) can be obtained in similar fashion, by setting              is the    norm.
     and              .                                                 Lemma 2: Let            and
   The intuition behind Lemma 1 can be derived from Fig. 2(c),                , where        have been estimated using (9) and (10).
which shows several linear combinations of principal compo-          Then the      coefficients of                   are the solution of
nents estimated on the basis of a mixture of two Gaussian dis-       the following linear programming problem:
tributions with means               and             . Some linear
combinations, such as                    in this example, will be                                                                   (11)
rejected because they contain negative values. The combination
               appears very similar to , but will also be rejected
because of the negative bump aligned with the peak of . Fur-              Proof: The sum of the inner products between , an arbi-
ther decreasing the coefficient in front will lead to disappear-      trary linear combination of ’s, and        is
ance of this bump and convergence to , the component with
minimum mean. Combination                     is non-negative, but
its mean is slightly less than            because of the presence
of a small positive bump aligned with the peak of . Reducing
the coefficient in front of will remove the bump and increase
the mean of this combination, converging to , the component
with maximum mean.
                                                                     Let                          . The partial derivatives of this func-
   In practice, ’s are usually small and can be discarded.
                                                                     tion are
For Gaussian components on unbounded domain, it can
be straightforwardly shown that
                             and hence                .
   If we ignore and , the equalities in (6) are achieved when
          or         . Similar statement can be made in case of
more than two components, see the following corollary.
   Corollary 1: Let             and                             be   Since            is a linear function of    ’s,             and
the means of underlying probability distributions. Let                                   , its minimum occurs at                    ,
               satisfy               and           for           .   where      is the coordinate along which has the smallest slope,
Then                                                                 i.e.,

                                                                     According to the lemma conditions,
The equalities holds when             or        .                                        , hence                         . Therefore,
   In practice, the coefficients of linear combination of ’s, cor-    the minimum value of           is achieved when
responding to components with smallest and largest mean, can         and          and                     .
be estimated by solving the following linear programming opti-          The necessary condition in Lemma 2,
mization problems, respectively:                                                         , can be interpreted as sufficient separation
                                                                     between the underlying components, since these are non-nega-
                                                                     tive. More specifically, it requires that the sum of overlaps be-
                                                                     tween and and between and is smaller than the norm
                                                              (9)    of or . This translates into a minimum signal-to-noise ratio
                                                                     (SNR) of 1.66 in the case of Gaussian components. This require-
                                                                     ment can be easily satisfied for most medical images, where typ-
                                                                     ical SNRs usually exceed 3.
                                                             (10)       In the case of more than three components, starting from the
                                                                     first two estimated components, all other components can be es-
                                                                     timated iteratively one-by-one by minimizing their overlap with
The next section discusses estimation of the remaining compo-        all previously estimated components, as shown in the following
nents, assuming their number is larger than two.                     lemma.
842                                                                                  IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011

                                                                                   The value of the histogram at the th point can be thought of as a
                                                                                   sum of     binary random variables, each representing a random
                                                                                   draw that has       chance of success, and hence has a binomial
                                                                                   distribution with mean          and variance                    .
                                                                                   The normalized histogram          (the histogram scaled to sum
                                                                                   to 1) then has mean       and variance                          ,
                                                                                   which means the approximation noise
                                                                                   has zero mean and variance

Fig. 3. (a) Piecewise constant probability density function and (b) its approxi-                                                                (14)
mation by histogram.

                                                                                   Note that the variance increases almost linearly with       , con-
                                                                                   sistent with our observation in Fig. 3.
  Lemma 3: Let             be the underlying components, of
                                                                                      The influence of the histogram noise can be reduced by es-
which the first are already known. Let
                                                                                   timating its contribution to the covariance matrix and subse-
                                                                          (12)     quently removing it. Let                        , then the covari-
                                                                                   ance matrix can be written as follows:

Then                , where   are solutions of the following
linear programming problem, will coincide with one of the re-
maining unknown components:


                                                                                   The first term corresponds to the covariance matrix that has not
     Proof: Let function                                   . Then
                                                                                   been affected by noise. Since and            are uncorrelated,
                       . As mentioned in Lemma 2, the minimum
                                                                                   the second and third terms can be neglected
of occurs when                             , where is the coor-
dinate along which has the smallest slope. According to (12),
the minimum must correspond to one of the unknown compo-
nents, i.e.,     .                                                                                                                              (15)
   Note that one of the terms on the right side of inequality (12)
is equal to the norm of a component, while all other terms on                      The    last term     is the      covariance matrix           of
both sides correspond to overlaps between components. Hence                        the     histogram     noise.      Its   nondiagonal     entries
condition (12) can be again interpreted as sufficient separation                                                                           because
between the underlying components.                                                        and        are zero mean and uncorrelated. The diagonal
                                                                                   entries can be approximated using (14) as

   Histogram is an approximation of the underlying probability
distribution (or probability mass function, for images). The dif-
ference between the two is due to finite set of data used for
histogram evaluation and can be treated as noise. When iden-                       When all subvolumes have equal size
tically distributed, this approximation noise is unlikely to alter
the directions of principal components, a property widely ex-                                                                                   (16)
ploited by PCA-based denoising. However, the histogram noise
may not identically distributed, as shown in the comparison be-                      Subtracting estimated        from (15) yields the corrected co-
tween a piecewise uniform probability mass function and its ap-                    variance matrix.
proximation by a histogram, evaluated using 2000 data points
(Fig. 3). Note that the approximation noise has larger variance                                         V. IMPLEMENTATION
in the range [0.5, 1], where the original probability mass func-                      Our algorithm was implemented fully in Matlab, using
tion is larger, and smaller variance in the range [0, 0.5], where                  built-in functions pcacov and linprog to estimate principal com-
the probability mass function is smaller.                                          ponents and perform linear programming optimization. Note
   The noise statistics can be derived analytically by noticing                    that pcacov was used instead of SVD decomposition because it
that histogram amplitudes have binomial distributions. Let                         allows manipulation of the covariance matrix (Section IV). The
be the underlying probability mass function of the th image                        implementation of our framework for component estimation
subvolume and        be the number of data points (image voxels).                  is described in Fig. 4. The algorithm to estimate the 1-st and

                                                                         Note that despite increased dimensionality of the new PCA
                                                                      subspace, the number of principal components will still be equal
                                                                      to the number of underlying tissue distributions, a required con-
                                                                      dition for the proofs of the lemmas. In other words, if intensity
                                                                      variation of a particular tissue gives raise to an additional prin-
                                                                      cipal component, this variation will be treated as a separate class
                                                                      of interest, albeit “inferior” to other classes. The large overlap
                                                                      between these inferior distributions and the main classes will
                                                                      guarantee that the former will not appear in the estimation loop
                                                                      output until all the latter ones are estimated, at which point the
                                                                      estimation process can be stopped.

                                                                                       VI. EXPERIMENTAL RESULTS

                                                                      A. Estimating Tissue Intensity Distributions From Structural
                                                                      MRI Data
                                                                         For performance evaluation we used T1-weighted
Fig. 4. Algorithm implementation.                                     brain images from two publicly available data sets,
                                                                      BrainWeb (            and
                                                                      IBSR (, and one data
  th component (line 7) is implemented as described in (9) and        set acquired at our site (CNL). The BrainWeb data set contains
(10) of Lemma 1. The algorithm to estimate each additional            realistic synthesized brain volumes with varying degrees of
component (line 10) is implemented as described in (13) of            noise and intensity nonuniformity, and 1 1 1 mm resolu-
Lemma 3. The total running time of the algorithm is on the            tion. The IBSR data set contains 18 real MR acquisitions made
order of a few seconds, for a typical 3D MR image.                    on a 1.5 T scanner with 1 1 1.5 mm resolution. The CNL
   During the initial testing on simulated data we have discov-       data set contained times 1 1 mm brain scans of 14 healthy
ered that the non-negative constraint imposed on estimated com-       aging adults (age 55–82), acquired on Siemens Tim Trio 3 T
ponents      was too strict. The histogram noise and errors in        scanner.
estimating of the principal subspace can lead to an infeasible           BrainWeb and IBSR data sets contained the ground truth la-
optimization problem or a very narrow search space. To over-          beling for GM and WM, by construction for BrainWeb and man-
come this we relaxed the non-negativity constraint               to   ually segmented by an expert for IBSR. To obtain ground truth
                   , where is the number of the histogram bins.       labeling for CNL data set, these images were processed using
This negative bound was small enough not to cause any visible         FreeSurfer 3.04 [43] and the resultant pial and WM surfaces
estimation problems in our experiments.                               were then edited by an expert and converted to volume masks. In
   We have also found that intensity nonuniformity (smoothly          addition, BrainWeb data set contained the ground truth labeling
varying intensity across the same tissue) and the presence of sev-    for CSF, which was not available for IBSR and CNL data sets.
eral tissues with slightly different intensities, e.g., subcortical   To compensate for this we removed all non-brain tissues using
structures in brain images, can increase the dimensionality of the    GCUT, an in-house developed skull stripping technique [44].
original subspace. In this case some of the estimated principal       Following this we defined the CSF label as the set of voxels pre-
components may represent slight variations in tissue appearance       served by the skull stripping but not belonging to GM or WM
rather than a new tissue, resulting in a large overlap between the    labels. Note that so defined CSF region may contain traces of
estimated components. This poses two problems. First, the ap-         non-brain tissues, left behind by the skull stripping procedure.
pearance of two overlapping component representing the same              We further augmented the ground truth to include mix classes
tissue means that some other tissue will not be represented.          of partial volume voxels (GM-WM and CSF-GM). The par-
Second, it stalls the optimization procedure, which requires suf-     tial volume voxels were defined as the voxels located near the
ficient separation between the components.                             boundary between two tissues. Practically, these were identified
   To overcome these issues we implemented a simple failure           by performing a one voxel erosion of each tissue with a standard
detection in line 11, which measures the overlap between              six-neighbor connectivity and selecting the eroded voxels as be-
the current component and all previously estimated com-               longing to the mix classes.
ponents. The overlap here was defined as the intersection                 All images were first skull stripped using GCUT [44] and then
area between components, i.e., area under the curve of                processed by our algorithm set to estimate either three (only
                              . If the overlap is greater than 2/3,   the main tissues) or five tissue intensity distributions. Shown
an empirically set threshold, the counter is reset to zero and        in Fig. 5 are the results of applying our algorithm on images
the number of principal components is increased by 1 to better        from BrainWeb data set. Our algorithm exhibited excellent per-
capture the original subspace. The new PCA subspace with              formance when only the main tissue classes were considered
increased dimensionality will contain a mixture of components         [Fig. 5(left)] and performed slightly worse when the mix classed
corresponding to distinct tissue distributions as well as some        were included, especially for large noise levels [Fig. 5(right)].
components corresponding to their intensity variations.               This decrease in the estimation quality was primarily due to the
844                                                                              IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011

                                                                              Fig. 6. Estimating three classes [CSF GM WM] on an IBSR volume 5 (left)
                                                                              and a CNL volume (right). Ground truth distributions are shown using dotted
                                                                              lines. (a) Our algorithm. (b) SPM 8.

                                                                                                             TABLE I
                                                                                  CORRELATION AND PROPORTION ERROR BETWEEN ESTIMATED AND
                                                                                   TRUE DISTRIBUTIONS, AVERAGED OVER BRAINWEB VOLUMES
                                                                                       (NOISE LEVEL 1, 3, 5, AND 7). * INDICATES p < :  0 005

Fig. 5. Estimating 3 classes [CSF GM WM] (left) and 3 pure classes [CSF GM
WM]   +2    mix classes [CSF-GM GM-WM] (right) on BrainWeb data. Ground
truth distributions are shown using dotted lines. (a) Noise           =5
                                                            . (b) Noise   .
(c) Noise  =7   .

increased error in estimating tissue proportions rather than the
distribution shapes.
   It is remarkable that our algorithm was able to capture the
two-peak shape of the CSF-GM mix class distribution, which                                                TABLE II
appears at the lower noise levels [Fig. 5(a) and (b)]. This would              CORRELATION AND PROPORTION ERROR BETWEEN ESTIMATED AND TRUE
not be possible with a Gaussian mixture model, which repre-                      DISTRIBUTIONS, AVERAGED OVER 18 IBSR VOLUMES AND 14 CNL
                                                                               VOLUMES. (A)   = PCRELAXATION, (B) HISTOGRAM ERROR CORRECTION.
sents each distribution by a single peak Gaussian. Our algo-                                       ** INDICATES p < :     0 001
rithm also compares favorably with several other approaches,
which were evaluated on BrainWeb volume with noise               in
[29]—compare Fig. 5(c) here with Fig. 6 in [29]. Our algorithm
also performed well on real MRI data from IBSR and CNL data
sets [Fig. 6(a)].
   For quantitative performance evaluation we used two mea-
sures. One was the average Pearson correlation between esti-
mated and true distributions of each class; its goal was to capture
the quality of distribution shape estimation. The other was the
absolute difference between the estimated and the true propor-
tions for each class. The summary of the performance results for
BrainWeb and IBSR/CNL data sets is provided in Tables I and II
respectively. Overall, our approach achieved excellent average
correlations of 0.88–0.97 and average proportion errors of less
than 0.06.
   For reference, we compared these results with those obtained
by well-known SPM 8 segmentation software package, which                      SPM with the default settings using 10 Gaussian distributions
models the image histogram using a mixture of Gaussians and                       for GM, 2 for WM, 6 for CSF + other structures, which were
iterates between estimating mixture parameters, tissue classi-                then linearly combined to obtain the distributions of the three
fication, intensity nonuniformity, and registration [16]. We ran               main brain tissues. Overall, SPM performed worse compared to

                                                                             distributions and the proportion of the activated (smaller) class
                                                                             to obtain different samples for our experiments. The signal-to-
                                                                             noise ratio (SNR), defined as the ratio of the difference between
                                                                             the means and the standard deviation, was varied from 2 to 6.
                                                                             The proportion of the activated class was varied from 0.1% to
                                                                                Instead of providing correlations and proportion errors as in
                                                                             Section VI-A, we used the estimated distributions to directly de-
Fig. 7. Estimating three classes [CSF GM WM] on BrainWeb data with 5%
                                                                             termine the threshold that minimizes the misclassification error
noise and (a) 20% or (b) 40% nonuniformity. Ground truth distributions are   (the sum of Type I and Type II errors). We then recorded the
shown using dotted lines.                                                    percentage increase in misclassification error when using the
                                                                             estimated threshold versus the optimal one, derived using the
                                                                             knowledge of the true distributions. For reference, we have com-
our method, achieving correlations of 0.81–0.85 and proportion               pared the results with those obtained using our own implementa-
errors of 0.05–0.13 (Table II). Considering that the CSF class               tion of the standard EM algorithm. To achieve the best possible
was not as clearly defined as the GM and WM classes, we have                  performance, the EM algorithm was initialized with the true pa-
also included results restricted to the GM and WM classes in the             rameter values, e.g., means, variances, and proportions derived
last two columns of the table. A typical SPM output is shown in              from the true distributions.
Fig. 6(b).                                                                      The results are summarized in Fig. 8. The performance of our
   We have also evaluated the effects of intensity nonuniformity             algorithm was the same throughout the chosen range of SNRs,
and individual algorithm components/settings (subvolume size,                and practically coincided with the optimal performance for pro-
relaxation of the number principal components, correction of the             portions 0.68%–1% and larger. There was a significant drop
covariance matrix) on the estimation performance. Changing                   in performance below 0.68%–1% critical failure point. Perfor-
subvolume size from 3 3 3 cuboids to 20 20 20 cuboids                        mance of the EM-based estimation was significantly worse than
to sagittal slices had little effect on performance (Table I). How-          that of our approach at              – , and was comparable to
ever, the average correlations decreased in the presence of in-              ours at           . However, considering that EM algorithm was
tensity nonuniformity, and the change was statistically signif-              initialized with the true parameter values, and real life perfor-
icant                . As shown in Fig. 7, nonuniformity causes              mance with random initialization is expected to be worse, our
the estimated distributions with minimum and maximum means                   approach clearly offers a superior alternative to the EM algo-
to become narrower than the true distributions, which explains               rithm in this application.
the decreased correlations. The relaxation step led to significant               Unlike in the case of structural MRI data, the relaxation
improvement in estimation performance; correlation increased                 of the number of principal components did not have any dis-
from 0.52 to 0.88 for IBSR data set and from 0.71 to 0.94 for                cernable effect on the performance in this experiment. On the
CNL data set                  , and proportion error decreased from          other hand, the histogram error correction had a significant im-
more than 0.2 to 0.06 or less (Table II). The correction of the              pact—its omission made our algorithm perform worse than the
covariance matrix for histogram noise had no discernable effect              EM algorithm (Fig. 8). Using larger subvolumes also decreased
on the estimation performance.                                               the performance, with critical failure proportion moving from
                                                                             0.68%–1% to 1.35% for 10 10 10 subvolumes and to 6%
B. Estimating Distribution of Activated Voxels From Simulated                proportion for 20 20 20 subvolumes.
Functional MR Data
   Activated regions in functional MRI experiments are typi-                                        VII. DISCUSSION
cally detected using significance threshold testing, where voxels                We presented a novel nonparametric, noniterative approach
are declared activated on the basis of their unlikely intensity              to estimating underlying tissue probability distributions from
under the null hypothesis of no activation [45]. This allows con-            3D image volumes. The main idea is to treat estimation as an
trolling for Type I error but not for Type II error, since only the          instance of a blind source separation problem, constraining un-
null distribution is modeled. Here by Type I error we mean false             derlying components to be non-negative, sum to one, and be
positives—declaring a nonactivated pixel activated, and by Type              sufficiently separated from each other.
II error we mean false negatives—declaring an activated pixel                   It is important to highlight the differences between the pro-
nonactivated. A more appropriate threshold could be defined if                posed approach and related techniques, such as ICA and non-
the distribution of activated voxels is known [46], but small size           negative matrix factorization (NMF) [47]. All three approaches
of the activated class makes the estimation difficult.                        seek a linear decomposition of the data matrix as
   To simulate functional MRI data we created a set of syn-                   , where rows of matrices and contain basis vectors and
thetic 200 200 200 resolution images, where activated re-                    mixing weights respectively. The proposed approach is similar
gions were modeled as uniform intensity cubes of size 3 3 3                  to ICA in using a two-stage decomposition,
voxels on a uniform background. The images were corrupted by                 (PCA) followed by                 , so that           . Another
Gaussian noise, thus creating two Gaussian distributions, one                similarity between ICA and the proposed approach is that the
for the nonactivated class and the other for the activated class.            rows of are estimated one-by-one, by optimizing a certain ob-
We then varied the difference between the means of the two                   jective function. The main novelty of our approach, compared
846                                                                               IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011

Fig. 8. Percentage increase in misclassification error (performance drop) as a function of smaller class proportion for (a) SNR = 2, (b) SNR = 4, (c) SNR = 6
for our approach (with and without error correction) versus EM algorithm.

to ICA, is the choice of this objective function, designed to min-             null distribution. This problem does not affect our method, be-
imize or maximize the means of the components and minimize                     cause of the use of subvolume histograms, rather than the whole
their overlap with previously estimated components, rather that                image histogram. In subvolumes containing activated voxels,
maximizing some measure of non-Gaussianity. Our estimated                      the number of activated and nonactivated voxels will be similar
components are also constrained to be non-negative and sum                     to each other, preventing the tail of the null distribution from
to 1. This results in completely different optimization proce-                 overpowering the peak of activated voxels. For example, when
dure, iterative locally optimal for ICA and noniterative globally              a 5 5 5 subvolume fully covers a 3 3 3 activated region,
optimal for our approach.                                                      the subvolume will contain 27 activated voxels and
   NMF differs from ICA and the proposed approach in that                          nonactivated voxels.
matrices      and     are estimated simultaneously, rather than                   The chosen experiments also highlighted the advantages of
row by row. Our approach has similar constraints to those of                   histogram noise correction (Section IV) and relaxation of the
NMF—matrices and in NMF are required to be non-neg-                            number of PCA components (Section V). The latter was essen-
ative and basis sparsity can be encouraged [48]. Nevertheless,                 tial for structural MRI data, but not for functional MRI data,
the optimization procedure used in NMF is iterative and only                   possibly due to absence of intensity nonuniformity artifact. Note
locally optimal, and testing its suitability for the current appli-            that the raw fMRI volumes do exhibit intensity nonuniformity,
cation is left as future work.                                                 but the classification is applied to statistical parametric maps,
   Note that our approach does not use any regularization or                   where voxel intensities correspond to likelihood of activation.
smoothness constraints on the shape of the estimated tissue dis-               Such maps are generated based on linear regression of the task
tributions. The resultant smoothness is due to enforcing the dis-              time course from the voxel time courses and hence do not con-
tributions to be linear combinations of the main principal com-                tain intensity nonuniformity. Histogram noise correction was
ponents. These components are expected to be smooth due to                     beneficial for fMRI data only. This is likely due to small size
denoising effect of PCA—the noise spreads across many minor                    of activated regions, which makes the proportion of their dis-
PCA components and is removed when these components are                        tribution so small that it becomes comparable to the histogram
discarded.                                                                     noise.
   The chosen experiments aimed to highlight the capabilities                     Increasing subvolume size did not affect the performance for
of our algorithm and its advantages over standard approaches                   structural MRI data and reduced it for simulated functional MRI
based on the EM algorithm. In the first experiment (estimating                  data. This is expected, considering that estimated principal com-
brain tissue intensity probability distributions from structural               ponents can be thought of as weighted averages of subvolume
MRI data) our algorithm outperformed SPM even though the                       histograms. Increasing subvolume size reduces the noise vari-
latter relied on prior tissue probability maps and intermediate                ance of each measurement (subvolume histogram) but also re-
tissue classification. The underlying distributions were not                    duces the number of measurements (the number of subvolumes)
Gaussian due to the presence of partial volume voxels, intensity               by exactly the same factor, and the two changes cancel each
nonuniformity, as well as natural tissue variability, and these                other. The decrease in performance for functional MRI data was
non-Gaussian shapes were better captured by our method.                        due to increased difficulty of detecting a small peak inside the
Since our method does not require any prior information,                       tail of the null distribution. For example, a 10 10 10 sub-
this suggests its possible deployment in such applications as                  volume fully covering a 3 3 3 activated region will contain
developmental studies on humans, segmentation of abdominal                     only 27 activated voxels and                         nonactivated
adipose tissue, tumor segmentation, functional MRI, etc.                       voxels, approximately 50 of which will be located in the tail of
   In the second experiment (simulated functional MRI data)                    the null distribution (beyond the two standard deviations).
the goal was to test robustness of our approach to a scenario                     While the presence of intensity nonuniformity does not lead
where one class (activated voxels) is substantially smaller than               to failure of our algorithm, due to proposed relaxation of the
the other. Even when initialized with the true parameter values,               number of PCA components, it has adverse effect on perfor-
the EM algorithm performed worse than our method for low                       mance by causing estimated distributions with minimum and
SNRs. This can be explained by the small peak of activated                     maximum means to become narrower than otherwise expected.
voxels distribution being completely merged with the tail of the               This can be explained by nonuniformity causing two (or pos-

sibly more) principal components to represent a single tissue                   [16] J. Ashburner and K. J. Friston, “Unified segmentation,” Neuroimage,
distribution. These components can be combined to yield dis-                         vol. 26, no. 3, pp. 839–851, 2005.
                                                                                [17] Z. Liang, J. R. Macfall, and D. P. Harrington, “Parameter estimation
tributions that are narrower than the true distribution and are                      and tissue segmentation from multispectral MR images,” IEEE Trans.
aligned with its left or right border. Hence, maximizing the                         Med. Imag., vol. 13, no. 3, pp. 441–449, Sep. 1994.
mean will yield a narrower distribution aligned with the right                  [18] C. A. Bouman and M. Shapiro, “A multiscale random field model for
border of the true distribution, while minimization of the mean                      bayesian image segmentation,” IEEE Trans. Image Process., vol. 3, no.
                                                                                     2, pp. 162–177, Mar. 1994.
will lead to alignment with the left border, as shown in Fig. 7.                [19] H. Greenspan, A. Ruf, and J. Goldberger, “Constrained Gaussian mix-
Real life applications, such as brain morphometry analysis, em-                      ture model framework for automatic segmentation of MR brain im-
ploy intensity nonuniformity correction as a mandatory first step                     ages,” IEEE Trans. Med. Imag., vol. 25, no. 9, pp. 1233–1245, Sep.
in data processing pipeline. The remaining (after correction)                   [20] B. Fischl, D. H. Salat, E. Busa, M. Albert, M. Dieterich, C. Haselgrove,
nonuniformity is likely to be much smaller than 20%–40% used                         A. van der Kouwe, R. Killiany, D. Kennedy, S. Klaveness, A. Montillo,
in our testing and hence expected to have minimal effect on per-                     N. Makris, B. Rosen, and A. M. Dale, “Whole brain segmentation:
formance.                                                                            Automated labeling of neuroanatomical structures in the human brain,”
                                                                                     Neuron, vol. 33, no. 3, pp. 341–355, 2002.
                                                                                [21] Y. Zhang, M. Brady, and S. Smith, “Segmentation of brain MR im-
                             REFERENCES                                              ages through a hidden markov random field model and the expecta-
                                                                                     tion-maximization algorithm,” IEEE Trans. Med. Imag., vol. 20, no. 1,
   [1] T. Lei and W. Sewchand, “Statistical approach to X-ray CT imaging             pp. 45–57, Jan. 2001.
       and its applications in image analysis,” IEEE Trans. Med. Imag., vol.    [22] S. M. Smith, M. Jenkinson, M. W. Woolrich, C. F. Beckmann, T. E.
       11, no. 1, pp. 62–69, Mar. 1992.                                              Behrens, H. Johansen-Berg, P. R. Bannister, M. D. Luca, I. Drobnjak,
   [2] F. O’Sullivan, “Imaging radiotracer model parameters in PET: A mix-           and D. E. Flitney, “Advances in functional and structural MR image
       ture analysis approach,” IEEE Trans. Med. Imag., vol. 12, no. 3, pp.          analysis and implementation as FSL,” NeuroImage Math. Brain Imag.,
       399–412, Sep. 1993.                                                           vol. 23, pp. S208–S219, 2004.
   [3] A. Lundervold and G. Storvik, “Segmentation of brain parenchyma and      [23] H. Gudbjartsson and S. Patz, “The Rician distribution of noisy MRI
       cerebrospinal fluid in multispectral magnetic resonance images,” IEEE          data,” Magn. Reson. Med., vol. 34, no. 6, pp. 910–914, 1995.
       Trans. Med. Imag., vol. 14, no. 2, pp. 339–349, Jun. 1995.               [24] P. Santago and H. D. Gage, “Quantification of MR brain images by
   [4] W. M. Wells, W. L. Grimson, R. Kikinis, and F. A. Jolesz, “Adaptive           mixture density and partial volume modeling,” IEEE Trans. Med.
       segmentation of MRI data,” IEEE Trans. Med. Imag., vol. 15, no. 4, pp.        Imag., vol. 12, no. 3, pp. 566–574, Sep. 1993.
       429–442, Aug. 1996.                                                      [25] D. W. Shattuck, S. R. Sandor-Leahy, K. A. Schaper, D. A. Rotten-
   [5] J. Ashburner and K. Friston, “Multimodal image coregistration and             berg, and R. M. Leahy, “Magnetic resonance image tissue classification
       partitioning—A unified framework,” Neuroimage, vol. 6, no. 3, pp.              using a partial volume model,” Neuroimage, vol. 13, no. 5, pp. 856–876,
       209–217, 1997.                                                                2001.
   [6] S. Ruan, C. Jaggi, J. Xue, J. Fadili, and D. Bloyet, “Brain              [26] D. H. Laidlaw, K. W. Fleischer, and A. H. Barr, “Partial-volume
       tissue classification of magnetic resonance images using partial               Bayesian classification of material mixtures in MR volume data using
       volume modeling,” IEEE Trans. Med. Imag., vol. 19, no. 12,                    voxel histograms,” IEEE Trans. Med. Imag., vol. 17, no. 1, pp. 74–86,
       pp. 1179–1187, Dec. 2000.                                                     Feb. 1998.
   [7] Y. Zhang, M. Brady, and S. Smith, “Segmentation of brain MR im-          [27] J. Ashburner and K. J. Friston, “Voxel-based morphometry—The
       ages through a hidden markov random field model and the expecta-               methods,” Neuroimage, vol. 11, no. 6, pp. 805–821, 2000, Pt 1.
       tion-maximization algorithm,” IEEE Trans. Med. Imag., vol. 20, no. 1,    [28] P. Santago and H. D. Gage, “Statistical models of partial volume ef-
       pp. 45–57, Jan. 2001.                                                         fect,” IEEE Trans. Image Process., vol. 4, no. 11, pp. 1531–1540, Nov.
   [8] K. Van Leemput, F. Maes, D. Vandermeulen, and P. Suetens, “A uni-             1995.
       fying framework for partial volume segmentation of brain MR images,”     [29] M. B. Cuadra, L. Cammoun, T. Butz, O. Cuisenaire, and J.-P. Thiran,
       IEEE Trans. Med. Imag., vol. 22, no. 1, pp. 105–119, Jan. 2003.               “Comparison and validation of tissue modelization and statistical clas-
   [9] H. Park, P. H. Bland, and C. R. Meyer, “Construction of an abdominal          sification methods in T1-weighted MR brain images,” IEEE Trans.
       probabilistic atlas and its application in segmentation,” IEEE Trans.         Med. Imag., vol. 24, no. 12, pp. 1548–1565, Dec. 2005.
       Med. Imag., vol. 22, no. 4, pp. 483–492, Apr. 2003.                      [30] S. Sanjay-Gopal and T. J. Hebert, “Bayesian pixel classification using
  [10] G. Dugas-Phocion, M. A. G. Ballester, G. Malandain, C. Lebrun,                spatially variant finite mixtures and the generalized EM algorithm,”
       and N. Ayache, “Improved em-based tissue segmentation and partial             IEEE Trans. Image Process., vol. 7, no. 7, pp. 1014–1028, Jul. 1998.
       volume effect quantification in multi-sequence brain MRI,” in Int.        [31] K. Van Leemput, F. Maes, D. Vandermeulen, and P. Suetens, “Auto-
       Conf. Med. Image Computing Computer Assisted Intervent. (MICCAI),             mated model-based tissue classification of MR images of the brain,”
       Saint-Malo, France, 2004, vol. 3216, pp. 26–33.                               IEEE Trans. Med. Imag., vol. 18, no. 10, pp. 897–908, Oct. 1999.
  [11] M. H. Cardinal, J. Meunier, G. Soulez, R. L. Maurice, E. Therasse,       [32] J. L. Marroquin, B. C. Vemuri, S. Botello, F. Calderon, and A. Fer-
       and G. Cloutier, “Intravascular ultrasound image segmentation:                nandez-Bouzas, “An accurate and efficient bayesian method for auto-
       A three-dimensional fast-marching method based on gray level                  matic Segmentation of brain MRI,” IEEE Trans. Med. Imag., vol. 21,
       distributions,” IEEE Trans. Med. Imag., vol. 25, no. 5, pp. 590–601,          no. 8, pp. 934–945, Aug. 2002.
       May 2006.                                                                [33] K. M. Pohl, W. M. Wells, A. Guimond, K. Kasai, M. E. Shenton, R.
  [12] F. G. Meyer and X. Shen, “Classification of fMRI time series in a low-         Kikinis, W. E. L. Grimson, and S. K. Warfield, “Incorporating non-
       dimensional subspace with a spatial prior,” IEEE Trans. Med. Imag.,           rigid registration into expectation maximization algorithm to segment
       vol. 27, no. 1, pp. 87–98, Jan. 2008.                                         MR images,” in Int. Conf. Med. Image Computing Computer Assist.
  [13] M. Figueiredo and A. Jain, “Unsupervised learning of finite mixture            Intervent. (MICCAI), 2002.
       models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3, pp.    [34] M. Prastawa, “Automatic brain tumor segmentation by subject spe-
       381–396, Mar. 2002.                                                           cific modification of atlas priors,” Acad. Radiol., vol. 10, no. 12, pp.
  [14] J. Tohka, E. Krestyannikov, I. D. Dinov, A. M. Graham, D. W. Shattuck,        1341–1348, 2003.
       U. Ruotsalainen, and A. W. Toga, “Genetic algorithms for finite mix-      [35] J. Nuyts, P. Dupont, S. Stroobants, A. Maes, L. Mortelmans, and P.
       ture model based voxel classification in neuroimaging,” IEEE Trans.            Suetens, “Evaluation of maximum-likelihood based attenupion correc-
       Med. Imag., vol. 26, no. 5, pp. 696–711, May 2007.                            tion in positron emission tomography,” IEEE Trans. Nucl. Sci., vol. 46,
  [15] P. Schroeter, J. M. Vesin, T. Langenberger, and R. Meuli, “Robust pa-         no. 4, p. 1136, Aug. 1999.
       rameter estimation of intensity distributions for brain magnetic reso-   [36] A. Ciptadi, C. Chen, and V. Zagorodnov, “Component analysis ap-
       nance images,” IEEE Trans. Med. Imag., vol. 17, no. 2, pp. 172–186,           proach to estimation of tissue intensity distributions of 3D images,”
       Apr. 1998.                                                                    in IEEE Int. Conf. Comput. Vis. (ICCV), Kyoto, Japan, 2009, p. 1765.
848                                                                                 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011

      [37] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, and E. Moulines, “A       [43] A. M. Dale, B. Fischl, and M. I. Sereno, “Cortical surface-based anal-
           blind source separation technique using second-order statistics,” IEEE        ysis: I. segmentation and surface reconstruction,” NeuroImage, vol. 9,
           Trans. Signal Process., vol. 45, no. 2, pp. 434–444, Feb. 1997.               no. 2, pp. 179–194, 1999.
      [38] J.-F. Cardoso, “Infomax and maximum likelihood for blind source sep-     [44] S. Sadananthan, W. Zheng, M. Chee, and V. Zagorodnov, “Skull strip-
           aration,” IEEE Signal Process. Lett., vol. 4, no. 4, pp. 112–114, Apr.        ping using graph cuts,” NeuroImage, vol. 49, no. 1, pp. 225–239, 2010.
           1997.                                                                    [45] B. R. Logan and D. B. Rowe, “An evaluation of thresholding tech-
      [39] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Anal-            niques in fMRI analysis,” Neuroimage, vol. 22, no. 1, pp. 95–108, 2004.
           ysis. New York: Wiley-Interscience, 2001.                                [46] N. V. Hartvig and J. L. Jensen, “Spatial mixture modeling of fMRI
      [40] R. Nagarajan and C. A. Peterson, “Identifying spots in microarray im-         data,” Hum. Brain Mapp., vol. 11, no. 4, pp. 233–248, 2000.
           ages,” IEEE Trans. Nanobiosci., vol. 1, no. 2, pp. 78–84, Jun. 2002.     [47] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-
      [41] D. Lawley and A. Maxwell, “Factor analysis as a statistical method,”          negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791,
           Statistician, vol. 12, no. 3, pp. 209–229, 1962.                              1999.
      [42] H. Attias, “Independent factor analysis,” Neural Computation, vol. 11,   [48] P. O. Hoyer, “Non-negative matrix factorization with sparseness con-
           no. 4, 1999.                                                                  straints,” J. Mach. Learn. Res., vol. 5, pp. 1457–1469, 2004.

To top