VIEWS: 5 PAGES: 11 CATEGORY: Engineering POSTED ON: 11/8/2011 Public Domain
838 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011 Component Analysis Approach to Estimation of Tissue Intensity Distributions of 3D Images Vitali Zagorodnov*, Member, IEEE, and Arridhana Ciptadi, Student Member, IEEE Abstract—Many segmentation algorithms in medical imaging ally in the context of brain tissue segmentation (FreeSurfer [20], rely on accurate modeling and estimation of tissue intensity proba- SIENAX [21] part of FSL [22], SPM [16]). bility density functions. Gaussian mixture modeling, currently the The main drawback of FGM models is that the tissue in- most common approach, has several drawbacks, such as reliance on a Gaussian model and iterative local optimization used to tensity distributions do not always have a Gaussian form. The estimate the model parameters. It also does not take advantage of noise in magnetic resonance (MR) images is known to be Ri- substantially larger amount of data provided by 3D acquisitions, cian rather than Gaussian [23]. But the largest deviation from which are becoming standard in clinical environment. We propose Gaussianity is due to presence of partial volume (PV) voxels [6], a novel and completely non-parametric algorithm to estimate [8], [24]–[27]. One way to model such voxels is by representing the tissue intensity probabilities in 3D images. Instead of relying on traditional framework of iterating between classiﬁcation and their distribution using a uniform mixture of “pure” distributions estimation, we pose the problem as an instance of a blind source [8], [25], [28]. However, this complicates the optimization as the separation problem, where the unknown distributions are treated functional involved becomes considerably more nonlinear. Uni- as sources and histograms of image subvolumes as mixtures. The form mixture assumption has also been questioned [27]. To sim- new approach performed well on synthetic data and real magnetic plify the problem several researchers suggested modeling par- resonance imaging (MRI) scans of the brain, robustly capturing intensity distributions of even small image structures and partial tial volume voxels intensity distributions as Gaussian, which ap- volume voxels. pears to be a good approximation for sufﬁciently noisy images [6]. This approximation has been incorporated into the most re- Index Terms—Blind source separation, Gaussian mixtures, image segmentation, magnetic resonance imaging (MRI), tissue cent versions of SPM [16]. However, a more recent study [29] intensity distributions. revealed that model [28] can still achieve better ﬁt. Another issue associated with framework is poten- tial convergence to a poor local optimum, which means a suf- I. INTRODUCTION ﬁciently close parameter initialization is usually required [4], [13], especially for distribution means [29]. The convergence of the EM algorithm to a more meaningful optimum can be im- ANY segmentation algorithms in medical imaging rely M on accurate modeling and estimation of tissue intensity probability density functions (pdfs) [1]–[12], usually in the con- proved by including prior information in the classiﬁcation step, such as pixel correlations [30], MRF priors [7], [25], [31], [32] or probabilistic atlas [5], [9], [31]–[33]. Even though this ap- text of statistical region-based segmentation. Commonly, tissue proach is helpful, especially for very noisy images [7], it can intensity probabilities are modeled using the ﬁnite mixture (FM) also introduce bias in estimation [30]. And while probabilistic model [2], [13], [14], and its special case the ﬁnite Gaussian atlas is useful when segmenting brain tissues [5] or abdominal mixture (FGM) model [15], [16]. In these models the intensity organs [9], its construction is not always possible, as is the case pdf of each tissue class is represented by a parametric (e.g., in the segmentation of brain lesions [34], tumor detection [35], Gaussian in the case of FGM) function called the component or localization of fMRI activations. density, while the intensity pdf of the whole image is modeled by Finally, the approach often fails to take advan- a weighted sum of the tissue component densities. The ﬁtting is tage of substantially larger amount of data present in 3D images, usually done using Expectation Maximization (EM) algorithm such as those obtained by magnetic resonance (MR) and X-ray [1], [3], [6], [9]–[11], [17]–[19], which iterates between classi- computed tomography (CT) scanning techniques. We propose a ﬁcation and parameter estimation until a stable state is reached. novel nonparametric algorithm to estimate tissue intensity prob- The FM models combined with EM algorithm have been incor- abilities in 3D images that completely departs from traditional porated into many existing image segmentation pipelines, usu- classiﬁcation-estimation framework. To illustrate the main idea behind our approach, consider the following introductory ex- Manuscript received October 19, 2010; accepted November 29, 2010. Date ample. of publication December 17, 2010; date of current version March 02, 2011. This Shown in Fig. 1 are the histograms of a 3D T1-weighted MR work was supported by SBIC C-012/2006 grant provided by A*STAR, Singa- pore (Agency for Science and Technology and Research). Asterisk indicates image and two of its 2D slices. The observed variability in the corresponding author. shape of 2D histograms is due to varying tissue proportions *V. Zagorodnov is with the School of Computer Engineering, Nanyang Tech- across the slices. For example, the slice in Fig. 1(b) contains nological University, 639798 Singapore. relatively small amount of cerebro-spinal ﬂuid (CSF) and hence A. Ciptadi is with the School of Computer Engineering, Nanyang Technolog- ical University, 639798 Singapore. the lowest intensity peak (see Fig. 1(c) for reference) is practi- Digital Object Identiﬁer 10.1109/TMI.2010.2098417 cally missing from its histogram. The proportion of CSF is in- 0278-0062/$26.00 © 2010 IEEE ZAGORODNOV AND CIPTADI: COMPONENT ANALYSIS APPROACH TO ESTIMATION OF TISSUE INTENSITY DISTRIBUTIONS OF 3D IMAGES 839 II. PROBLEM STATEMENT Let be a 3D image volume partitioned into a set of subvolumes . We assume the voxel intensities of can take distinct values and are drawn from unknown probability mass functions (pmf) . For ex- ample, a brain volume can be assumed to have three main tis- sues, white matter (WM), gray matter (GM), and cerebro-spinal ﬂuid (CSF), so . For an 8-bit acquisition, . Sub- volumes can be chosen arbitrary, for example as coronal, sagittal or transverse slices of the 3D volume. Let be the -bin histogram of , normalized to sum to 1. Then (1) where is the th tissue proportion in the th subvolume, , and is the noise term that reﬂects the difference between the actual probability distribution and its ﬁnite approx- imation by a histogram. Let and . Rewriting (1) in a matrix form yields Fig. 1. Histograms of a 3D brain image and several of its slices. (a) 3D Image (2) and its histogram. (b) Transverse slice 128 and its histogram. (c) Transverse slice 152 and its histogram. This is identical to blind source separation (BSS) formulation creased in the slice shown in Fig. 1(c) due to the presence of with subvolume histograms as mixtures and unknown tissue ventricles, leading to reappearance of the lowest intensity peak. pmf’s as sources. Our goal is to estimate as well This slice, however, contains only a small amount of gray matter as their mixing weights given . (GM), thus lowering the middle peak of the histogram. While Our solution requires several assumptions, most of which this variability can potentially provide useful information for are general to a BSS problem: , and sufﬁ- mixture estimation, it is traditionally discarded by performing cient variability of mixing proportions . These can be easily estimation directly on the histogram of the whole volume. satisﬁed with proper choice of partitioning and histogram di- The proposed approach treats the histograms of 2D slices (or mensionality. We also assume that distributions any other subvolumes) as mixture realizations of the compo- have different means and are sufﬁciently separated from each nent densities. This allows stating the unmixing problem in a other, where the meaning of sufﬁcient separation is detailed in blind source separation (BSS) framework [37], [38]. To solve Section III. These assumptions are not very restrictive and are the problem we use a framework that is similar to that of in- generally satisﬁed for medical images [15], [40]. dependent component analysis (ICA) [39], but without relying on the independence assumption. Instead we use the fact that III. PROPOSED SOLUTION underlying components must be valid probability distributions with different means, which results in a simple convex linear BSS problem has been studied extensively in recent years, optimization problem that guarantees convergence to a global with several solutions proposed for selected special cases, e.g., optimum (ICA’s iterative procedure, in comparison, converges factor analysis (FA) [41] for Gaussian sources and independent only to a local optimum). Our approach provides a promising al- factor analysis (IFA) [42] or independent component analysis ternative for estimating component densities in 3D images that (ICA) [39] for independent non-Gaussian sources. The general is more accurate compared to the state-of-the-art approaches. framework of performing decomposition (2), as exempliﬁed by We note that a preliminary version of this algorithm was ICA, can be brieﬂy summarized as follows. previously reported in a conference paper [36]. The present 1) Center matrix so that all columns sum to zero, by sub- paper provides a complete description of the algorithm, intro- tracting row vector from all rows duces covariance matrix correction to improve its performance, of . Let designate the centered version of . and presents a more complete set of validation experiments, 2) Apply principal component analysis (PCA) to including evaluation of effects of intensity nonuniformity and . Practically, this can be done by performing SVD de- individual algorithm components/settings (subvolume size, composition, and setting and relaxation of the number principal components, correction of . The rows of matrix contain a set of orthogonal the covariance matrix). principal components and matrix contains the weights. 840 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011 Fig. 2. Explanation of deﬁnitions used in Lemma 1. (a) Two original distributions f and f with means and , respectively, (b) Estimated PCA components p and p , (c) Various linear combinations of PCA components. 3) Decomposition of the original (noncentered) matrix A. Estimating Components With Largest and Smallest Means can be done by changing to , Let and be the components with the smallest and largest where is a column vector where all elements are equal to mean respectively. Then these can be estimated by minimizing 1. (for ) or maximizing (for ) the mean of a linear combi- 4) Truncate and to preserve a smaller (speciﬁed) number nation of ’s, where coefﬁcients of linear combination satisfy of PCA components corresponding to the largest eigen- constraints (4) and (5). This follows from the following lemma, values of . The square roots of these eigenvalues are which proves this statement for , and its corollary, which located on the diagonal of . The resultant decomposition extends the result to arbitrary . has the form , where the “hat” symbol refers Lemma 1: Let and be the means of under- to truncation, and appearance of error vector is due to lying probability distributions. Let be an discarding of the components. arbitrary linear combination of ’s that satisﬁes 5) Further decompose as , so that . and for . Then, In practice, this step is performed in the opposite direction, as (3) (6) where rows of are estimated one-by-one by maximizing where some measure of non-Gaussianity of the resultant compo- nents (rows of matrix ). Note that this approach cannot be applied directly to our problem, because the rows of (step 5) are not constrained to represent valid probability distribution functions (non-negative The equalities are achieved when and and sum to 1). Furthermore, maximizing non-Gaussianity . might not result in a meaningful solution as our source com- Proof: Explanations of the terms used in the lemma state- ponents are neither Gaussian nor independent. However, the ment are given in Fig. 2. From and (3) it follows ICA framework can be adapted by preserving steps 1–4, and that is also a linear combination of ’s modifying step 5 as follows. Let be a set of preserved PCA components, then . Let be a column vector where all elements are equal to 1. Since the component densities represent valid pmf’s and hence are non-negative and sum to Then 1, this imposes equality and inequality constraints on the elements of (4) and (5) These constraints can be combined with the assumptions that (7) component densities have different means and are sufﬁciently separated from each other to estimate . This is done in the Let and . Then, and following sections. is minimized by making as large as possible. However, the ZAGORODNOV AND CIPTADI: COMPONENT ANALYSIS APPROACH TO ESTIMATION OF TISSUE INTENSITY DISTRIBUTIONS OF 3D IMAGES 841 largest possible is controlled by non-negativity of B. Estimating Remaining Components In this section we show that all remaining components can be estimated iteratively, one-by-one, by minimizing their inner product (effectively overlap) with the components that have al- ready been estimated, where optimization is reduced to solving another Linear Programming problem. The following lemma shows this for , while its corollary extends the result to This leads to the left side of inequality (6). The right side of . Here throughout designates the inner product and inequality (6) can be obtained in similar fashion, by setting is the norm. and . Lemma 2: Let and The intuition behind Lemma 1 can be derived from Fig. 2(c), , where have been estimated using (9) and (10). which shows several linear combinations of principal compo- Then the coefﬁcients of are the solution of nents estimated on the basis of a mixture of two Gaussian dis- the following linear programming problem: tributions with means and . Some linear combinations, such as in this example, will be (11) rejected because they contain negative values. The combination appears very similar to , but will also be rejected because of the negative bump aligned with the peak of . Fur- Proof: The sum of the inner products between , an arbi- ther decreasing the coefﬁcient in front will lead to disappear- trary linear combination of ’s, and is ance of this bump and convergence to , the component with minimum mean. Combination is non-negative, but its mean is slightly less than because of the presence of a small positive bump aligned with the peak of . Reducing the coefﬁcient in front of will remove the bump and increase the mean of this combination, converging to , the component with maximum mean. Let . The partial derivatives of this func- In practice, ’s are usually small and can be discarded. tion are For Gaussian components on unbounded domain, it can be straightforwardly shown that and hence . If we ignore and , the equalities in (6) are achieved when or . Similar statement can be made in case of more than two components, see the following corollary. Corollary 1: Let and be Since is a linear function of ’s, and the means of underlying probability distributions. Let , its minimum occurs at , satisfy and for . where is the coordinate along which has the smallest slope, Then i.e., (8) According to the lemma conditions, The equalities holds when or . , hence . Therefore, In practice, the coefﬁcients of linear combination of ’s, cor- the minimum value of is achieved when responding to components with smallest and largest mean, can and and . be estimated by solving the following linear programming opti- The necessary condition in Lemma 2, mization problems, respectively: , can be interpreted as sufﬁcient separation between the underlying components, since these are non-nega- tive. More speciﬁcally, it requires that the sum of overlaps be- tween and and between and is smaller than the norm (9) of or . This translates into a minimum signal-to-noise ratio (SNR) of 1.66 in the case of Gaussian components. This require- ment can be easily satisﬁed for most medical images, where typ- ical SNRs usually exceed 3. (10) In the case of more than three components, starting from the ﬁrst two estimated components, all other components can be es- timated iteratively one-by-one by minimizing their overlap with The next section discusses estimation of the remaining compo- all previously estimated components, as shown in the following nents, assuming their number is larger than two. lemma. 842 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011 The value of the histogram at the th point can be thought of as a sum of binary random variables, each representing a random draw that has chance of success, and hence has a binomial distribution with mean and variance . The normalized histogram (the histogram scaled to sum to 1) then has mean and variance , which means the approximation noise has zero mean and variance Fig. 3. (a) Piecewise constant probability density function and (b) its approxi- (14) mation by histogram. Note that the variance increases almost linearly with , con- sistent with our observation in Fig. 3. Lemma 3: Let be the underlying components, of The inﬂuence of the histogram noise can be reduced by es- which the ﬁrst are already known. Let timating its contribution to the covariance matrix and subse- (12) quently removing it. Let , then the covari- ance matrix can be written as follows: Then , where are solutions of the following linear programming problem, will coincide with one of the re- maining unknown components: (13) The ﬁrst term corresponds to the covariance matrix that has not Proof: Let function . Then been affected by noise. Since and are uncorrelated, . As mentioned in Lemma 2, the minimum the second and third terms can be neglected of occurs when , where is the coor- dinate along which has the smallest slope. According to (12), the minimum must correspond to one of the unknown compo- nents, i.e., . (15) Note that one of the terms on the right side of inequality (12) is equal to the norm of a component, while all other terms on The last term is the covariance matrix of both sides correspond to overlaps between components. Hence the histogram noise. Its nondiagonal entries condition (12) can be again interpreted as sufﬁcient separation because between the underlying components. and are zero mean and uncorrelated. The diagonal entries can be approximated using (14) as IV. REDUCING THE EFFECT OF NOISE ON ESTIMATION OF PRINCIPAL COMPONENTS Histogram is an approximation of the underlying probability distribution (or probability mass function, for images). The dif- ference between the two is due to ﬁnite set of data used for histogram evaluation and can be treated as noise. When iden- When all subvolumes have equal size tically distributed, this approximation noise is unlikely to alter the directions of principal components, a property widely ex- (16) ploited by PCA-based denoising. However, the histogram noise may not identically distributed, as shown in the comparison be- Subtracting estimated from (15) yields the corrected co- tween a piecewise uniform probability mass function and its ap- variance matrix. proximation by a histogram, evaluated using 2000 data points (Fig. 3). Note that the approximation noise has larger variance V. IMPLEMENTATION in the range [0.5, 1], where the original probability mass func- Our algorithm was implemented fully in Matlab, using tion is larger, and smaller variance in the range [0, 0.5], where built-in functions pcacov and linprog to estimate principal com- the probability mass function is smaller. ponents and perform linear programming optimization. Note The noise statistics can be derived analytically by noticing that pcacov was used instead of SVD decomposition because it that histogram amplitudes have binomial distributions. Let allows manipulation of the covariance matrix (Section IV). The be the underlying probability mass function of the th image implementation of our framework for component estimation subvolume and be the number of data points (image voxels). is described in Fig. 4. The algorithm to estimate the 1-st and ZAGORODNOV AND CIPTADI: COMPONENT ANALYSIS APPROACH TO ESTIMATION OF TISSUE INTENSITY DISTRIBUTIONS OF 3D IMAGES 843 Note that despite increased dimensionality of the new PCA subspace, the number of principal components will still be equal to the number of underlying tissue distributions, a required con- dition for the proofs of the lemmas. In other words, if intensity variation of a particular tissue gives raise to an additional prin- cipal component, this variation will be treated as a separate class of interest, albeit “inferior” to other classes. The large overlap between these inferior distributions and the main classes will guarantee that the former will not appear in the estimation loop output until all the latter ones are estimated, at which point the estimation process can be stopped. VI. EXPERIMENTAL RESULTS A. Estimating Tissue Intensity Distributions From Structural MRI Data For performance evaluation we used T1-weighted Fig. 4. Algorithm implementation. brain images from two publicly available data sets, BrainWeb (http://www.bic.mni.mcgill.ca/brainweb/) and IBSR (http://www.cma.mgh.harvard.edu/ibsr/), and one data th component (line 7) is implemented as described in (9) and set acquired at our site (CNL). The BrainWeb data set contains (10) of Lemma 1. The algorithm to estimate each additional realistic synthesized brain volumes with varying degrees of component (line 10) is implemented as described in (13) of noise and intensity nonuniformity, and 1 1 1 mm resolu- Lemma 3. The total running time of the algorithm is on the tion. The IBSR data set contains 18 real MR acquisitions made order of a few seconds, for a typical 3D MR image. on a 1.5 T scanner with 1 1 1.5 mm resolution. The CNL During the initial testing on simulated data we have discov- data set contained times 1 1 mm brain scans of 14 healthy ered that the non-negative constraint imposed on estimated com- aging adults (age 55–82), acquired on Siemens Tim Trio 3 T ponents was too strict. The histogram noise and errors in scanner. estimating of the principal subspace can lead to an infeasible BrainWeb and IBSR data sets contained the ground truth la- optimization problem or a very narrow search space. To over- beling for GM and WM, by construction for BrainWeb and man- come this we relaxed the non-negativity constraint to ually segmented by an expert for IBSR. To obtain ground truth , where is the number of the histogram bins. labeling for CNL data set, these images were processed using This negative bound was small enough not to cause any visible FreeSurfer 3.04 [43] and the resultant pial and WM surfaces estimation problems in our experiments. were then edited by an expert and converted to volume masks. In We have also found that intensity nonuniformity (smoothly addition, BrainWeb data set contained the ground truth labeling varying intensity across the same tissue) and the presence of sev- for CSF, which was not available for IBSR and CNL data sets. eral tissues with slightly different intensities, e.g., subcortical To compensate for this we removed all non-brain tissues using structures in brain images, can increase the dimensionality of the GCUT, an in-house developed skull stripping technique [44]. original subspace. In this case some of the estimated principal Following this we deﬁned the CSF label as the set of voxels pre- components may represent slight variations in tissue appearance served by the skull stripping but not belonging to GM or WM rather than a new tissue, resulting in a large overlap between the labels. Note that so deﬁned CSF region may contain traces of estimated components. This poses two problems. First, the ap- non-brain tissues, left behind by the skull stripping procedure. pearance of two overlapping component representing the same We further augmented the ground truth to include mix classes tissue means that some other tissue will not be represented. of partial volume voxels (GM-WM and CSF-GM). The par- Second, it stalls the optimization procedure, which requires suf- tial volume voxels were deﬁned as the voxels located near the ﬁcient separation between the components. boundary between two tissues. Practically, these were identiﬁed To overcome these issues we implemented a simple failure by performing a one voxel erosion of each tissue with a standard detection in line 11, which measures the overlap between six-neighbor connectivity and selecting the eroded voxels as be- the current component and all previously estimated com- longing to the mix classes. ponents. The overlap here was deﬁned as the intersection All images were ﬁrst skull stripped using GCUT [44] and then area between components, i.e., area under the curve of processed by our algorithm set to estimate either three (only . If the overlap is greater than 2/3, the main tissues) or ﬁve tissue intensity distributions. Shown an empirically set threshold, the counter is reset to zero and in Fig. 5 are the results of applying our algorithm on images the number of principal components is increased by 1 to better from BrainWeb data set. Our algorithm exhibited excellent per- capture the original subspace. The new PCA subspace with formance when only the main tissue classes were considered increased dimensionality will contain a mixture of components [Fig. 5(left)] and performed slightly worse when the mix classed corresponding to distinct tissue distributions as well as some were included, especially for large noise levels [Fig. 5(right)]. components corresponding to their intensity variations. This decrease in the estimation quality was primarily due to the 844 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011 Fig. 6. Estimating three classes [CSF GM WM] on an IBSR volume 5 (left) and a CNL volume (right). Ground truth distributions are shown using dotted lines. (a) Our algorithm. (b) SPM 8. TABLE I CORRELATION AND PROPORTION ERROR BETWEEN ESTIMATED AND TRUE DISTRIBUTIONS, AVERAGED OVER BRAINWEB VOLUMES (NOISE LEVEL 1, 3, 5, AND 7). * INDICATES p < : 0 005 Fig. 5. Estimating 3 classes [CSF GM WM] (left) and 3 pure classes [CSF GM WM] +2 mix classes [CSF-GM GM-WM] (right) on BrainWeb data. Ground =3 truth distributions are shown using dotted lines. (a) Noise =5 . (b) Noise . (c) Noise =7 . increased error in estimating tissue proportions rather than the distribution shapes. It is remarkable that our algorithm was able to capture the two-peak shape of the CSF-GM mix class distribution, which TABLE II appears at the lower noise levels [Fig. 5(a) and (b)]. This would CORRELATION AND PROPORTION ERROR BETWEEN ESTIMATED AND TRUE not be possible with a Gaussian mixture model, which repre- DISTRIBUTIONS, AVERAGED OVER 18 IBSR VOLUMES AND 14 CNL VOLUMES. (A) = PCRELAXATION, (B) HISTOGRAM ERROR CORRECTION. sents each distribution by a single peak Gaussian. Our algo- ** INDICATES p < : 0 001 rithm also compares favorably with several other approaches, which were evaluated on BrainWeb volume with noise in [29]—compare Fig. 5(c) here with Fig. 6 in [29]. Our algorithm also performed well on real MRI data from IBSR and CNL data sets [Fig. 6(a)]. For quantitative performance evaluation we used two mea- sures. One was the average Pearson correlation between esti- mated and true distributions of each class; its goal was to capture the quality of distribution shape estimation. The other was the absolute difference between the estimated and the true propor- tions for each class. The summary of the performance results for BrainWeb and IBSR/CNL data sets is provided in Tables I and II respectively. Overall, our approach achieved excellent average correlations of 0.88–0.97 and average proportion errors of less than 0.06. For reference, we compared these results with those obtained by well-known SPM 8 segmentation software package, which SPM with the default settings using 10 Gaussian distributions models the image histogram using a mixture of Gaussians and for GM, 2 for WM, 6 for CSF + other structures, which were iterates between estimating mixture parameters, tissue classi- then linearly combined to obtain the distributions of the three ﬁcation, intensity nonuniformity, and registration [16]. We ran main brain tissues. Overall, SPM performed worse compared to ZAGORODNOV AND CIPTADI: COMPONENT ANALYSIS APPROACH TO ESTIMATION OF TISSUE INTENSITY DISTRIBUTIONS OF 3D IMAGES 845 distributions and the proportion of the activated (smaller) class to obtain different samples for our experiments. The signal-to- noise ratio (SNR), deﬁned as the ratio of the difference between the means and the standard deviation, was varied from 2 to 6. The proportion of the activated class was varied from 0.1% to 16%. Instead of providing correlations and proportion errors as in Section VI-A, we used the estimated distributions to directly de- Fig. 7. Estimating three classes [CSF GM WM] on BrainWeb data with 5% termine the threshold that minimizes the misclassiﬁcation error noise and (a) 20% or (b) 40% nonuniformity. Ground truth distributions are (the sum of Type I and Type II errors). We then recorded the shown using dotted lines. percentage increase in misclassiﬁcation error when using the estimated threshold versus the optimal one, derived using the knowledge of the true distributions. For reference, we have com- our method, achieving correlations of 0.81–0.85 and proportion pared the results with those obtained using our own implementa- errors of 0.05–0.13 (Table II). Considering that the CSF class tion of the standard EM algorithm. To achieve the best possible was not as clearly deﬁned as the GM and WM classes, we have performance, the EM algorithm was initialized with the true pa- also included results restricted to the GM and WM classes in the rameter values, e.g., means, variances, and proportions derived last two columns of the table. A typical SPM output is shown in from the true distributions. Fig. 6(b). The results are summarized in Fig. 8. The performance of our We have also evaluated the effects of intensity nonuniformity algorithm was the same throughout the chosen range of SNRs, and individual algorithm components/settings (subvolume size, and practically coincided with the optimal performance for pro- relaxation of the number principal components, correction of the portions 0.68%–1% and larger. There was a signiﬁcant drop covariance matrix) on the estimation performance. Changing in performance below 0.68%–1% critical failure point. Perfor- subvolume size from 3 3 3 cuboids to 20 20 20 cuboids mance of the EM-based estimation was signiﬁcantly worse than to sagittal slices had little effect on performance (Table I). How- that of our approach at – , and was comparable to ever, the average correlations decreased in the presence of in- ours at . However, considering that EM algorithm was tensity nonuniformity, and the change was statistically signif- initialized with the true parameter values, and real life perfor- icant . As shown in Fig. 7, nonuniformity causes mance with random initialization is expected to be worse, our the estimated distributions with minimum and maximum means approach clearly offers a superior alternative to the EM algo- to become narrower than the true distributions, which explains rithm in this application. the decreased correlations. The relaxation step led to signiﬁcant Unlike in the case of structural MRI data, the relaxation improvement in estimation performance; correlation increased of the number of principal components did not have any dis- from 0.52 to 0.88 for IBSR data set and from 0.71 to 0.94 for cernable effect on the performance in this experiment. On the CNL data set , and proportion error decreased from other hand, the histogram error correction had a signiﬁcant im- more than 0.2 to 0.06 or less (Table II). The correction of the pact—its omission made our algorithm perform worse than the covariance matrix for histogram noise had no discernable effect EM algorithm (Fig. 8). Using larger subvolumes also decreased on the estimation performance. the performance, with critical failure proportion moving from 0.68%–1% to 1.35% for 10 10 10 subvolumes and to 6% B. Estimating Distribution of Activated Voxels From Simulated proportion for 20 20 20 subvolumes. Functional MR Data Activated regions in functional MRI experiments are typi- VII. DISCUSSION cally detected using signiﬁcance threshold testing, where voxels We presented a novel nonparametric, noniterative approach are declared activated on the basis of their unlikely intensity to estimating underlying tissue probability distributions from under the null hypothesis of no activation [45]. This allows con- 3D image volumes. The main idea is to treat estimation as an trolling for Type I error but not for Type II error, since only the instance of a blind source separation problem, constraining un- null distribution is modeled. Here by Type I error we mean false derlying components to be non-negative, sum to one, and be positives—declaring a nonactivated pixel activated, and by Type sufﬁciently separated from each other. II error we mean false negatives—declaring an activated pixel It is important to highlight the differences between the pro- nonactivated. A more appropriate threshold could be deﬁned if posed approach and related techniques, such as ICA and non- the distribution of activated voxels is known [46], but small size negative matrix factorization (NMF) [47]. All three approaches of the activated class makes the estimation difﬁcult. seek a linear decomposition of the data matrix as To simulate functional MRI data we created a set of syn- , where rows of matrices and contain basis vectors and thetic 200 200 200 resolution images, where activated re- mixing weights respectively. The proposed approach is similar gions were modeled as uniform intensity cubes of size 3 3 3 to ICA in using a two-stage decomposition, voxels on a uniform background. The images were corrupted by (PCA) followed by , so that . Another Gaussian noise, thus creating two Gaussian distributions, one similarity between ICA and the proposed approach is that the for the nonactivated class and the other for the activated class. rows of are estimated one-by-one, by optimizing a certain ob- We then varied the difference between the means of the two jective function. The main novelty of our approach, compared 846 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011 Fig. 8. Percentage increase in misclassiﬁcation error (performance drop) as a function of smaller class proportion for (a) SNR = 2, (b) SNR = 4, (c) SNR = 6 for our approach (with and without error correction) versus EM algorithm. to ICA, is the choice of this objective function, designed to min- null distribution. This problem does not affect our method, be- imize or maximize the means of the components and minimize cause of the use of subvolume histograms, rather than the whole their overlap with previously estimated components, rather that image histogram. In subvolumes containing activated voxels, maximizing some measure of non-Gaussianity. Our estimated the number of activated and nonactivated voxels will be similar components are also constrained to be non-negative and sum to each other, preventing the tail of the null distribution from to 1. This results in completely different optimization proce- overpowering the peak of activated voxels. For example, when dure, iterative locally optimal for ICA and noniterative globally a 5 5 5 subvolume fully covers a 3 3 3 activated region, optimal for our approach. the subvolume will contain 27 activated voxels and NMF differs from ICA and the proposed approach in that nonactivated voxels. matrices and are estimated simultaneously, rather than The chosen experiments also highlighted the advantages of row by row. Our approach has similar constraints to those of histogram noise correction (Section IV) and relaxation of the NMF—matrices and in NMF are required to be non-neg- number of PCA components (Section V). The latter was essen- ative and basis sparsity can be encouraged [48]. Nevertheless, tial for structural MRI data, but not for functional MRI data, the optimization procedure used in NMF is iterative and only possibly due to absence of intensity nonuniformity artifact. Note locally optimal, and testing its suitability for the current appli- that the raw fMRI volumes do exhibit intensity nonuniformity, cation is left as future work. but the classiﬁcation is applied to statistical parametric maps, Note that our approach does not use any regularization or where voxel intensities correspond to likelihood of activation. smoothness constraints on the shape of the estimated tissue dis- Such maps are generated based on linear regression of the task tributions. The resultant smoothness is due to enforcing the dis- time course from the voxel time courses and hence do not con- tributions to be linear combinations of the main principal com- tain intensity nonuniformity. Histogram noise correction was ponents. These components are expected to be smooth due to beneﬁcial for fMRI data only. This is likely due to small size denoising effect of PCA—the noise spreads across many minor of activated regions, which makes the proportion of their dis- PCA components and is removed when these components are tribution so small that it becomes comparable to the histogram discarded. noise. The chosen experiments aimed to highlight the capabilities Increasing subvolume size did not affect the performance for of our algorithm and its advantages over standard approaches structural MRI data and reduced it for simulated functional MRI based on the EM algorithm. In the ﬁrst experiment (estimating data. This is expected, considering that estimated principal com- brain tissue intensity probability distributions from structural ponents can be thought of as weighted averages of subvolume MRI data) our algorithm outperformed SPM even though the histograms. Increasing subvolume size reduces the noise vari- latter relied on prior tissue probability maps and intermediate ance of each measurement (subvolume histogram) but also re- tissue classiﬁcation. The underlying distributions were not duces the number of measurements (the number of subvolumes) Gaussian due to the presence of partial volume voxels, intensity by exactly the same factor, and the two changes cancel each nonuniformity, as well as natural tissue variability, and these other. The decrease in performance for functional MRI data was non-Gaussian shapes were better captured by our method. due to increased difﬁculty of detecting a small peak inside the Since our method does not require any prior information, tail of the null distribution. For example, a 10 10 10 sub- this suggests its possible deployment in such applications as volume fully covering a 3 3 3 activated region will contain developmental studies on humans, segmentation of abdominal only 27 activated voxels and nonactivated adipose tissue, tumor segmentation, functional MRI, etc. voxels, approximately 50 of which will be located in the tail of In the second experiment (simulated functional MRI data) the null distribution (beyond the two standard deviations). the goal was to test robustness of our approach to a scenario While the presence of intensity nonuniformity does not lead where one class (activated voxels) is substantially smaller than to failure of our algorithm, due to proposed relaxation of the the other. Even when initialized with the true parameter values, number of PCA components, it has adverse effect on perfor- the EM algorithm performed worse than our method for low mance by causing estimated distributions with minimum and SNRs. This can be explained by the small peak of activated maximum means to become narrower than otherwise expected. voxels distribution being completely merged with the tail of the This can be explained by nonuniformity causing two (or pos- ZAGORODNOV AND CIPTADI: COMPONENT ANALYSIS APPROACH TO ESTIMATION OF TISSUE INTENSITY DISTRIBUTIONS OF 3D IMAGES 847 sibly more) principal components to represent a single tissue [16] J. Ashburner and K. J. Friston, “Uniﬁed segmentation,” Neuroimage, distribution. These components can be combined to yield dis- vol. 26, no. 3, pp. 839–851, 2005. [17] Z. Liang, J. R. Macfall, and D. P. Harrington, “Parameter estimation tributions that are narrower than the true distribution and are and tissue segmentation from multispectral MR images,” IEEE Trans. aligned with its left or right border. Hence, maximizing the Med. Imag., vol. 13, no. 3, pp. 441–449, Sep. 1994. mean will yield a narrower distribution aligned with the right [18] C. A. Bouman and M. Shapiro, “A multiscale random ﬁeld model for border of the true distribution, while minimization of the mean bayesian image segmentation,” IEEE Trans. Image Process., vol. 3, no. 2, pp. 162–177, Mar. 1994. will lead to alignment with the left border, as shown in Fig. 7. [19] H. Greenspan, A. Ruf, and J. Goldberger, “Constrained Gaussian mix- Real life applications, such as brain morphometry analysis, em- ture model framework for automatic segmentation of MR brain im- ploy intensity nonuniformity correction as a mandatory ﬁrst step ages,” IEEE Trans. Med. Imag., vol. 25, no. 9, pp. 1233–1245, Sep. 2006. in data processing pipeline. The remaining (after correction) [20] B. Fischl, D. H. Salat, E. Busa, M. Albert, M. Dieterich, C. Haselgrove, nonuniformity is likely to be much smaller than 20%–40% used A. van der Kouwe, R. Killiany, D. Kennedy, S. Klaveness, A. Montillo, in our testing and hence expected to have minimal effect on per- N. Makris, B. Rosen, and A. M. Dale, “Whole brain segmentation: formance. Automated labeling of neuroanatomical structures in the human brain,” Neuron, vol. 33, no. 3, pp. 341–355, 2002. [21] Y. Zhang, M. Brady, and S. Smith, “Segmentation of brain MR im- REFERENCES ages through a hidden markov random ﬁeld model and the expecta- tion-maximization algorithm,” IEEE Trans. Med. Imag., vol. 20, no. 1, [1] T. Lei and W. Sewchand, “Statistical approach to X-ray CT imaging pp. 45–57, Jan. 2001. and its applications in image analysis,” IEEE Trans. Med. Imag., vol. [22] S. M. Smith, M. Jenkinson, M. W. Woolrich, C. F. Beckmann, T. E. 11, no. 1, pp. 62–69, Mar. 1992. Behrens, H. Johansen-Berg, P. R. Bannister, M. D. Luca, I. Drobnjak, [2] F. O’Sullivan, “Imaging radiotracer model parameters in PET: A mix- and D. E. Flitney, “Advances in functional and structural MR image ture analysis approach,” IEEE Trans. Med. Imag., vol. 12, no. 3, pp. analysis and implementation as FSL,” NeuroImage Math. Brain Imag., 399–412, Sep. 1993. vol. 23, pp. S208–S219, 2004. [3] A. Lundervold and G. Storvik, “Segmentation of brain parenchyma and [23] H. Gudbjartsson and S. Patz, “The Rician distribution of noisy MRI cerebrospinal ﬂuid in multispectral magnetic resonance images,” IEEE data,” Magn. Reson. Med., vol. 34, no. 6, pp. 910–914, 1995. Trans. Med. Imag., vol. 14, no. 2, pp. 339–349, Jun. 1995. [24] P. Santago and H. D. Gage, “Quantiﬁcation of MR brain images by [4] W. M. Wells, W. L. Grimson, R. Kikinis, and F. A. Jolesz, “Adaptive mixture density and partial volume modeling,” IEEE Trans. Med. segmentation of MRI data,” IEEE Trans. Med. Imag., vol. 15, no. 4, pp. Imag., vol. 12, no. 3, pp. 566–574, Sep. 1993. 429–442, Aug. 1996. [25] D. W. Shattuck, S. R. Sandor-Leahy, K. A. Schaper, D. A. Rotten- [5] J. Ashburner and K. Friston, “Multimodal image coregistration and berg, and R. M. Leahy, “Magnetic resonance image tissue classiﬁcation partitioning—A uniﬁed framework,” Neuroimage, vol. 6, no. 3, pp. using a partial volume model,” Neuroimage, vol. 13, no. 5, pp. 856–876, 209–217, 1997. 2001. [6] S. Ruan, C. Jaggi, J. Xue, J. Fadili, and D. Bloyet, “Brain [26] D. H. Laidlaw, K. W. Fleischer, and A. H. Barr, “Partial-volume tissue classiﬁcation of magnetic resonance images using partial Bayesian classiﬁcation of material mixtures in MR volume data using volume modeling,” IEEE Trans. Med. Imag., vol. 19, no. 12, voxel histograms,” IEEE Trans. Med. Imag., vol. 17, no. 1, pp. 74–86, pp. 1179–1187, Dec. 2000. Feb. 1998. [7] Y. Zhang, M. Brady, and S. Smith, “Segmentation of brain MR im- [27] J. Ashburner and K. J. Friston, “Voxel-based morphometry—The ages through a hidden markov random ﬁeld model and the expecta- methods,” Neuroimage, vol. 11, no. 6, pp. 805–821, 2000, Pt 1. tion-maximization algorithm,” IEEE Trans. Med. Imag., vol. 20, no. 1, [28] P. Santago and H. D. Gage, “Statistical models of partial volume ef- pp. 45–57, Jan. 2001. fect,” IEEE Trans. Image Process., vol. 4, no. 11, pp. 1531–1540, Nov. [8] K. Van Leemput, F. Maes, D. Vandermeulen, and P. Suetens, “A uni- 1995. fying framework for partial volume segmentation of brain MR images,” [29] M. B. Cuadra, L. Cammoun, T. Butz, O. Cuisenaire, and J.-P. Thiran, IEEE Trans. Med. Imag., vol. 22, no. 1, pp. 105–119, Jan. 2003. “Comparison and validation of tissue modelization and statistical clas- [9] H. Park, P. H. Bland, and C. R. Meyer, “Construction of an abdominal siﬁcation methods in T1-weighted MR brain images,” IEEE Trans. probabilistic atlas and its application in segmentation,” IEEE Trans. Med. Imag., vol. 24, no. 12, pp. 1548–1565, Dec. 2005. Med. Imag., vol. 22, no. 4, pp. 483–492, Apr. 2003. [30] S. Sanjay-Gopal and T. J. Hebert, “Bayesian pixel classiﬁcation using [10] G. Dugas-Phocion, M. A. G. Ballester, G. Malandain, C. Lebrun, spatially variant ﬁnite mixtures and the generalized EM algorithm,” and N. Ayache, “Improved em-based tissue segmentation and partial IEEE Trans. Image Process., vol. 7, no. 7, pp. 1014–1028, Jul. 1998. volume effect quantiﬁcation in multi-sequence brain MRI,” in Int. [31] K. Van Leemput, F. Maes, D. Vandermeulen, and P. Suetens, “Auto- Conf. Med. Image Computing Computer Assisted Intervent. (MICCAI), mated model-based tissue classiﬁcation of MR images of the brain,” Saint-Malo, France, 2004, vol. 3216, pp. 26–33. IEEE Trans. Med. Imag., vol. 18, no. 10, pp. 897–908, Oct. 1999. [11] M. H. Cardinal, J. Meunier, G. Soulez, R. L. Maurice, E. Therasse, [32] J. L. Marroquin, B. C. Vemuri, S. Botello, F. Calderon, and A. Fer- and G. Cloutier, “Intravascular ultrasound image segmentation: nandez-Bouzas, “An accurate and efﬁcient bayesian method for auto- A three-dimensional fast-marching method based on gray level matic Segmentation of brain MRI,” IEEE Trans. Med. Imag., vol. 21, distributions,” IEEE Trans. Med. Imag., vol. 25, no. 5, pp. 590–601, no. 8, pp. 934–945, Aug. 2002. May 2006. [33] K. M. Pohl, W. M. Wells, A. Guimond, K. Kasai, M. E. Shenton, R. [12] F. G. Meyer and X. Shen, “Classiﬁcation of fMRI time series in a low- Kikinis, W. E. L. Grimson, and S. K. Warﬁeld, “Incorporating non- dimensional subspace with a spatial prior,” IEEE Trans. Med. Imag., rigid registration into expectation maximization algorithm to segment vol. 27, no. 1, pp. 87–98, Jan. 2008. MR images,” in Int. Conf. Med. Image Computing Computer Assist. [13] M. Figueiredo and A. Jain, “Unsupervised learning of ﬁnite mixture Intervent. (MICCAI), 2002. models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3, pp. [34] M. Prastawa, “Automatic brain tumor segmentation by subject spe- 381–396, Mar. 2002. ciﬁc modiﬁcation of atlas priors,” Acad. Radiol., vol. 10, no. 12, pp. [14] J. Tohka, E. Krestyannikov, I. D. Dinov, A. M. Graham, D. W. Shattuck, 1341–1348, 2003. U. Ruotsalainen, and A. W. Toga, “Genetic algorithms for ﬁnite mix- [35] J. Nuyts, P. Dupont, S. Stroobants, A. Maes, L. Mortelmans, and P. ture model based voxel classiﬁcation in neuroimaging,” IEEE Trans. Suetens, “Evaluation of maximum-likelihood based attenupion correc- Med. Imag., vol. 26, no. 5, pp. 696–711, May 2007. tion in positron emission tomography,” IEEE Trans. Nucl. Sci., vol. 46, [15] P. Schroeter, J. M. Vesin, T. Langenberger, and R. Meuli, “Robust pa- no. 4, p. 1136, Aug. 1999. rameter estimation of intensity distributions for brain magnetic reso- [36] A. Ciptadi, C. Chen, and V. Zagorodnov, “Component analysis ap- nance images,” IEEE Trans. Med. Imag., vol. 17, no. 2, pp. 172–186, proach to estimation of tissue intensity distributions of 3D images,” Apr. 1998. in IEEE Int. Conf. Comput. Vis. (ICCV), Kyoto, Japan, 2009, p. 1765. 848 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 30, NO. 3, MARCH 2011 [37] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, and E. Moulines, “A [43] A. M. Dale, B. Fischl, and M. I. Sereno, “Cortical surface-based anal- blind source separation technique using second-order statistics,” IEEE ysis: I. segmentation and surface reconstruction,” NeuroImage, vol. 9, Trans. Signal Process., vol. 45, no. 2, pp. 434–444, Feb. 1997. no. 2, pp. 179–194, 1999. [38] J.-F. Cardoso, “Infomax and maximum likelihood for blind source sep- [44] S. Sadananthan, W. Zheng, M. Chee, and V. Zagorodnov, “Skull strip- aration,” IEEE Signal Process. Lett., vol. 4, no. 4, pp. 112–114, Apr. ping using graph cuts,” NeuroImage, vol. 49, no. 1, pp. 225–239, 2010. 1997. [45] B. R. Logan and D. B. Rowe, “An evaluation of thresholding tech- [39] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Anal- niques in fMRI analysis,” Neuroimage, vol. 22, no. 1, pp. 95–108, 2004. ysis. New York: Wiley-Interscience, 2001. [46] N. V. Hartvig and J. L. Jensen, “Spatial mixture modeling of fMRI [40] R. Nagarajan and C. A. Peterson, “Identifying spots in microarray im- data,” Hum. Brain Mapp., vol. 11, no. 4, pp. 233–248, 2000. ages,” IEEE Trans. Nanobiosci., vol. 1, no. 2, pp. 78–84, Jun. 2002. [47] D. D. Lee and H. S. Seung, “Learning the parts of objects by non- [41] D. Lawley and A. Maxwell, “Factor analysis as a statistical method,” negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, Statistician, vol. 12, no. 3, pp. 209–229, 1962. 1999. [42] H. Attias, “Independent factor analysis,” Neural Computation, vol. 11, [48] P. O. Hoyer, “Non-negative matrix factorization with sparseness con- no. 4, 1999. straints,” J. Mach. Learn. Res., vol. 5, pp. 1457–1469, 2004.