VIEWS: 0 PAGES: 22 CATEGORY: Technology POSTED ON: 2/7/2010 Public Domain
Finite Mixture Distributions Supervisor : Shibdas Bandyopadhyay Morgane Bergot 9 June – 16 September 2006 Abstract Finite mixture distributions are a popular tool for modeling unobserved heterogeneity, and the number of ﬁelds in which mixture models have been successfully applied is still extending. In this paper, we give some concrete applications of ﬁnite mixtures, and basic deﬁnitions and concepts. We then present the classical method of moment and the most commonly used Maximum Likelihood Estimation (MLE) via the EM algorithm to ﬁt the mixture, and study the particular case of two-component univariate normal mixtures using artiﬁcial data. The goodness-of-ﬁt is studied in these cases using the χ2 test. For well-ﬁtted mixtures, we propose two rules to classify historical data, if available, into the two marginal distributions. An application of classiﬁcation and its use is given using a concrete example in genetics. Résumé Les mélanges ﬁnis de loi de probabilités sont très utilisés pour modéliser des hétérogénéités non directement observées parmi une population donnée et trouvent de plus en plus d’applications dans des domaines très variés. Dans ce rapport, nous donnons tout d’abord des applications concrètes des mélanges ﬁnis, ainsi que les déﬁnitions et concepts de bases. La classique méthode des moments et la très populaire méthode du maximum de vraisemblance via l’algorithme EM pour séparer les mélanges est présentée, et appliquée dans le cas de mélanges de deux normales à une seule variable, à partir de données générée artiﬁciellement. Le test du χ2 est utilisé pour vériﬁer la justesse de la séparation du mélange. Pour les mélanges correctement séparés, nous proposons deux règles de classement dans les deux distributions trouvées de données historiques éventuellement disponibles. Une application dans un cas concret issu de la génétique illustre cette méthode de classement et son utilité. 2 Contents Introduction 2 1 General Introduction 2 1.1 Finite Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Initial Approach to Mixture Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Applications in genetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Basic Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Number of Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Identiﬁability of Mixture Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Estimation of Mixture Distributions 5 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Method of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Maximum Likelihood Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3.1 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3.2 EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.3 Starting Values for EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Goodness-of-Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.5 Fitting Normal Mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.5.2 Mixture of Two Well-Separated Normals . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.5.3 Mixture of Two Close Normals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Classiﬁcation 12 3.1 Introductory Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Rules of Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Study of The Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.1 Mixture of Two Well-Separated Normals . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.2 Mixture of Two Close Normals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4 Application to Hemochromatosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4.1 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.5 Perspective for a Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Conclusion 15 Appendices I 1 Introduction This report is concerned with statistical distributions which can be expressed as sums of usually simple component distributions. Such superpositions are termed mixture distributions. In this text we are concerned with mixtures involving a known ﬁnite number of components which are termed ﬁnite mixtures. Finite mixtures distributions have provided a mathematical-based approach to the statistical modeling of a wide variety of random phenomena. Because of their usefulness as an extremely ﬂexible method of modeling, ﬁnite mixture models have continued to receive increasing attention over the years, from both practical and theoretical points of view. Indeed, in the past decade the extent and the potential of applications of ﬁnite mixture models have widened considerably. Fields in which mixture models have been successfully applied include astronomy, biology, genetics, medicine, psychiatry, economics, engineering, and marketing, among many other ﬁelds in the biological, physical and social sciences. A mixture model is able to model quite complex distributions through an appropriate choice of its com- ponents to represent accurately the local areas of support of the true distribution. The problem of central interest arises when data are not available for each distribution separately, but only for the overall mix- ture distribution. Often such situations arise because it is impossible to observe some underlying variable which splits the observations into groups – only the combined distribution can be studied. In these circum- stances, interest often focuses on estimating the mixing proportions and on estimation the parameters in the conditional distributions. There is a remarkable variety of estimation methods that have been applied to ﬁnite mixture problems such as graphical methods, the method of moments, maximum likelihood, minimum chi-square, least squares approaches and Bayesian approaches. In this report, the emphasis is on the ﬁtting of mixture by method of moments and maximum likelihood with the EM algorithm. We are concerned here with the issue of classifying data into categories given by the ﬁtting of univariate mixture of normals from a set of previously available data. Our goal is to determine into which categories newly given data ought to be assigned, and to do so on the basis of several rules we shall study. This type of classiﬁcation could be used in many applications in genetics. In the ﬁrst part, we present some concrete applications of ﬁnite mixtures, and give basic deﬁnitions and concepts. In the second part, we focuse on the numerical methods that will be used to ﬁt mixtures of normals, and study the goodness of ﬁtting with a speciﬁc criterion. In the third part, we attempt to classify data according to diﬀerent rules, and apply the study to a concrete case taken from genetics. Finally, we conclude and present some restrictions about the use of the present work. 1 General Introduction 1.1 Finite Mixture Models 1.1.1 Initial Approach to Mixture Analysis The ﬁrst major analyses involving the use of ﬁnite mixture models was undertaken by Pearson (1894) who ﬁtted a mixture of two normal probability density functions to some data provided by Weldon (1892,1893). The data set analyzed by Pearson consisted of measurement on the ratio of forehead to body length of n = 1000 crabs sampled from the Bay of Naples. These measurements, which were recorded in the form of v = 29 intervals, are displayed in Figure 1, along with the plot of the density of the normal mixture distribution ﬁtted to them and its two components. Weldon had speculated that the asymmetry in the histogram of these data might be a signal that this population was evolving toward two new subspecies. Pearson used the method of moment he developed to obtain an excellent ﬁt and interpreted the presence of two components as evidence that there were two species of crabs. 1.1.2 Applications in genetics One area in which mixture modeling has played a very importante role is genetics following Pearson’s pioneering work in the area in the late 1800s. The reader is referred to Schork, Allison and Thiel (1996) for a comprehensive review of the theory and application of mixture models in human genetics. A basic mixture model can be used to assess the impact of possible underlying genotypes associated with a particular locus on phenotypes that display continuous or quantitative variations in the population (for 2 120 100 80 60 40 20 0 0.58 0.6 0.62 0.64 0.66 0.68 0.7 Figure 1: Plot of forehead to body length data on 1000 crabs and of the ﬁtted two-component (red line) normal mixture models and its two components (green lines). example, blood pressure, height, weight). Consider a locus with two alleles, denoted A and a. There are three possible genotypes an individual can possess at this locus: AA, Aa (equivalent to aA) and aa. Suppose the phenotype or trait values associated with individual possessing AA genotypes, the heterozygote genotypes 2 2 2 Aa and the homozygote genotypes aa are distributed N(µ1 , σ1 ), N(µ2 , σ2 ), and N(µ3 , σ3 ), respectively. The proportions of these genotypes in the population at large are denoted pAA , pAa , and paa . If yi denotes the phenotype of interest, then it has the normal mixture distribution given by 2 2 2 g(yi ) = π1 N(µ1 , σ1 ) + π2 N(µ2 , σ2 ) + π3 N(µ3 , σ3 ) (1.1) where π1 = pAA , π2 = pAa , and π3 = paa . The genotype frequencies π1 , π2 , and π3 are usually written in terms of the allele frequencies pA and pa , using the Hardy-Weinberg equilibrium (HWE) structure, π1 = p2 , π2 = 2pA pa , π3 = p2 A a (1.2) Under the additive model, we have µ3 − µ2 = µ2 − µ1 An alternative genetics model to the one above is the simple dominance model in which genotypes AA and Aa have the same phenotype with frequency π1 = p2 + 2pA pa . This leads to a two-component normal A mixture model with components corresponding to the phenotypes AA and Aa and to aa on proportions π1 = p2 + 2pA pa and π2 = p2 respectively. A a A recent focus of mixture models in genetics is the construction of gene marker maps for percentage testing, disease resistance diagnosis, and quantitative trait loci (QTL) detection. See Doerge, Zeng, Weir (1997), Kao and Zeng (1997), and the references therein. 1.2 Basic Deﬁnitions Suppose that a random variable X takes values in a sample space X, and that its distribution can be represented by a probability density function (or mass function in the case of discrete X) of the form g(x) = p1 f1 (x) + p2 f2 (x) + . . . + pc fc (x) x∈X (1.3) where pj ∈ [0, 1], j ∈ [1, c], pj = 1 1 j c 3 and fj (.) 0, fj (x) dx = 1, j ∈ [1, c] X We shall say that X has a ﬁnite mixture distribution and that g(x) is a ﬁnite mixture density function. The parameters p1 , . . . , pc are called the mixing weights, and f1 (x), . . . , fc (x) the component densities of the mixture. There is no requirement that the component densities should all belong to the same parametric family, but as in many applications this is the case, we will restrict our attention to the simplest case, where f1 (x), . . . , fc (x) have a common functional form but have diﬀerent parameters throughout this project. We can then write fi (x) = f (x|θi ) where θi denotes the parameters occurring in fi (x). The ﬁnite mixture density function will then have the form g(x|θ) = pj f (x|θi ) x ∈ X (1.4) 1 j c where θ = (p1 , . . . , pc , θ1 , . . . , θc ) is the complete collection of all distinct parameters occurring in the mixture model. 1.3 Number of Components The most fundamental "parameter" in the deﬁnition of a ﬁnite mixture is c, the number of components subpopulations. Its importance results partly from technical considerations, in that without knowing c we cannot go further in the process of estimating the mixture density, but in many contexts, uncertainty about the number of components leads to statistical problems closely related to cluster analysis, possibly with strong assumptions about parametric structure (See McLachlan and Peel (2000), section 1.15). In many direct applications, however, there are situations where we believe in the existence of c underlying categories, or groups, such that the experimental unit on which the observation X is made belongs to one of these categories. In the following, we will then restrict our attention to the simple case where c is known. 1.4 Identiﬁability of Mixture Distributions The assumption of identiﬁability, i.e. the existence of a unique characterization for member of the class of model being considered, lies at the core of most statistical theory and practice. It is important to consider identiﬁability in practice because, without it, estimation procedures are not likely to be well deﬁned. Mixture models can present particular diﬃculties with identiﬁability. Let F = f (x, θ) | θ ∈ Ω, x ∈ Rd be the class of d-dimension distribution functions from which mixtures are to be formed. We identify the class of ﬁnite mixtures of F with the appropriate class of distribution functions, H, deﬁned by H = H(x) | H(x) = pj fj (x), pj > 0, pj = 1, fj (.) ∈ F, ∀(j, c) ∈ N2 , x ∈ Rd 1 j c 1 j c so that H in the convex hull of F. Suppose (H, H ′ ) ∈ H2 , given by H= pj f j , H′ = ′ p′ f j j 1 j c 1 j c′ ′ and that H ≡ H ′ if and only if c = c′ and we can order the summations such that pj = p′ , fj = fj , for j j = 1, . . . , c. Then H is identiﬁable. Univariate normal, exponential and Poisson mixtures are identiﬁable, provided the θj ’s are all diﬀerent, and a suﬃcient condition of identiﬁability for univariate binomial mixtures is n 2c − 1. See Everitt and Hand (1981), Titterington, Smith and Makov (1985), Yakowitz and Spragins (1968) and references therein for details. We will place our discussion under these hypotheses. 4 2 Estimation of Mixture Distributions 2.1 Introduction Over the years, a variety of approaches have been used to estimate distributions. They include graphical methods, method of moments, minimum-distance methods, maximum likelihood, and Bayesian approaches. As surmised by Titterington (1996) perhaps the main reason for the huge literature on estimation methodology for mixtures is fact that explicit formulas for parameter estimates are typically not available. As discussed in section 1.1.1, it can be argued that the problem of ﬁtting mixture models started in earnest with the work of Pearson (1894) on the estimation of the parameters of a mixture of two univariate normal components by the method of moments. This method will be our ﬁrst approach of estimation of mixture distributions. Yet, since the advent of the EM algorithm, maximum likelihood (ML) has been by far the most commonly used approach to the ﬁtting of mixture distributions. Before proceeding to consider the EM algorithm, we will ﬁrst brieﬂy deﬁne Maximum Likelihood Estimation (MLE) in general and introduce some associated notation. 2.2 Method of Moments Suppose g(x|θ) = pj f (xj |θi ) x∈X 1 j c and ts (x) is such that t0 (x) = 1 and E0 ts (X) = θs , s = 0, . . . , 2c − 1 Suppose also that 1 Vs = n ts (xi ) 1 i n Then we have a set of moment equations in θj , given by s pj θj = Vs s = 0, . . . , 2c − 1 1 j c which can be expressed as ΘP = V (2.1) s where Θsj = θj , s = 0, . . . , 2c − 1, j = 0, . . . , c Provided the θj ’s are all diﬀerent, P can be solved in terms of the {θj } and therefore eliminated. Fur- thermore, the required {θj } are the roots of θc + Dc−1 θc−1 + . . . + D1 θ1 + D0 = 0 where D = (D0 , . . . , Dc−1 )T satisﬁes V0 V1 ... Vc−1 Vc V1 V2 ... Vc Vc+1 . D = − . (2.2) . . . . Vc−1 Vc . . . V2c−2 V2c−1 Speciﬁc cases of c-components univariate mixtures of exponential, binomial and Poisson to which the above analysis applies are given in the algorithm 1 (see Appendices) where θ gives the parameters and P the mixing weights of each distribution. This analysis can not, however, be applied for normal mixtures. Consider ﬁrst a two-component univariate normal mixture: by equation the observed central moments Vr to the theoretical moments, we can obtain a 5 system which may, by some fairly tedious algebra (see Cohen, 1967), be reduced to the ’fundamental nonic’ taking the form ai u i = 0 (2.3) 0 i 9 where a0 = −24V36 a5 = 90k42 + 72V3 k5 a1 = −96V34 k4 a6 = 36V3 a2 2 = −(63V32 k4 + 72V33 k5 ) a7 = 84k4 4 3 a3 = 288V3 − 108V3 k4 k5 + 27k4 a8 =0 2 2 a4 = 444V3 k4 − 18k5 a9 = 24 and the fourth and ﬁfth sample cumulants k4 = V4 − 3V22 k5 = V5 − 10V2 V3 Let u be a real negative root of (2.3). Then estimates of δ1 = (µ1 − µ) and δ2 = (µ2 − µ) are obtained as ˆ roots of the equation wˆ δ2 − δ + u ˆ (2.4) uˆ with −8V3 u3 + 3k5 u2 + 6V3 k4 u + 2V33 ˆ ˆ ˆ ˆ w= 3 3 u ˆ 2ˆ + 3k4 u + 4V3 The estimates of the ﬁve parameters in the mixture may now be ¯ ˆ µ1 = x + δ1 ¯ ˆ µ2 = x + δ2 2 1 ˆ ˆ u u ˆ2 σ1 = 3 δ1 (2w/ˆ − V3 /ˆ) + V2 − δ1 2 1 ˆ ˆ u ˆ2 σ2 = 3 δ2 (2w/ˆ − V3 /ˆ) + V2 − δ2 u ˆ δ2 ˜ p= ˆ ˆ δ2 − δ1 The parameter estimates may, however, be non-feasible as the quadric equation (2.4) may have no real roots. The nonic may also have more than a single negative root, leading to multiple solutions. In general, the feasible set of estimates is obtained if the nonic yields a negative solution such as (2.4) has real roots, 2 2 σ1 > 0, σ2 > 0 and 0 < p < 1. The algorithm 2 (see Appendices) summarizes this method. For more than two components, the system of equations is very diﬃcult to solve, and the method gives very poor results in the multivariate case (see Day, 1969). Consequently it is unlikely that the method of moments will be of any general use in estimating the parameters for c > 2 or in a multivariate case, and so we are led to consider other possible estimation procedures such as maximum likelihood. But as a general method for estimation, the method of moments lacks the optimal properties of MLE but is very often used to give a good initialisation for the EM algorithm. 2.3 Maximum Likelihood Fitting 2.3.1 Maximum Likelihood Estimation For estimating the unknown parameters, we can also apply the standard Maximum Likelihood Estimation (MLE), which possesses desirable statistical properties such as, under very general conditions, the estimates obtained by the method are consistent (they converge with probability 1 to the true parameter values). We ﬁrst establish a general likelihood function and give the likelihood equation. Suppose X is a random variable or vector with probability density function f (x|Ψ), where Ψ = (p1 , . . . , pc−1 , θ1 , . . . , θn )T 6 is the parameter vector we wish to estimate. The parameter space is denoted by Ω. Although we are taking X to be a continuous random variable, we can still view f (x|Ψ) as a probability density function in the case where X is discrete by the adoption of counting measure. Let x = (x1 , . . . , xn ) denotes an observed independent random sample of size n on the random variable X. The likelihood function for Ψ formed from the observed data x is given by L(Ψ; x) = f (xj |Ψ) (2.5) 1 j n We attempt to ﬁnd the particular Ψ that maximizes the likelihood function. This maximization can be dealt with the traditional way by diﬀerentiating L(Ψ; x) with respect to the components of Ψ and equating the derivatives to zero to give the likelihood equation ∂L(Ψ) =0 (2.6) ∂Ψ or equivalently, ∂ log L(Ψ) =0 (2.7) ∂Ψ We let ∂ 2 logL(Ψ) I(Ψ; x) = − (2.8) ∂Ψ∂ΨT be the matrix of the negative of the second-order partial derivatives of the log likelihood function with respect to the elements of Ψ. Under regularity conditions, the expected information matrix I(Ψ) is given by I(Ψ) = E S(x|Ψ)S T (x|Ψ) (2.9) = −E [I(Ψ; x)] ; (2.10) where ∂logL(Ψ) S(x|Ψ) = (2.11) ∂Ψ is the gradient vector of the log likelihood function; that is, the score statistic. Here the operator E denotes expectation using the parameter vector Ψ. ˆ The asymptotic covariance matrix of the MLE Ψ is equal to the inverse of the expected information matrix ˆ ˆ ˆ I(Ψ), which can be approximated by I(Ψ); that is, the standard error of Ψi = (Ψ)i is given by ˆ ˆ 1/2 SE(Ψi ) ≈ (I −1 (Ψ))ii i = 1, . . . , d (2.12) The observed information matrix is usually more convenient to use than the expected information matrix, as it does not require an expectation to be taken. Thus, it is common in practice to estimate the inverse of ˆ the covariance matrix of a maximum-likelihood solution by the observed information matrix I(Ψ; x), rather than the expected information matrix I(Ψ) evaluated at Ψ = Ψ. ˆ This approach gives the approximation ˆ ˆ 1/2 SE(Ψi ) ≈ (I −1 (Ψ))ii i = 1, . . . , d (2.13) Often the log likelihood function cannot be maximized analytically, that is, the likelihood equation has no explicit solutions, particularly in incomplete-data problems. In such cases, it may be possible to compute the MLE of θ iteratively. Next we will introduce an algorithm for calculating maximum likelihood estimates. 2.3.2 EM Algorithm Incomplete data often result in complicated likelihood functions, where MLE’s usually have to be computed iteratively. The Expectation-Maximization algorithm for the ﬁnite mixture problem proposed by Dempster, Laird, and Rubin in 1977, popularly known as the EM algorithm, is a broadly applicable approach to the iterative computation of MLE’s, useful in a variety of incomplete-data problems, where algorithms such as the Newton-type methods may turn out to be more complicated. 7 ˆ Suppose we have to ﬁnd Ψ = Ψ to maximize the likelihood L(Ψ) = f (x|Ψ), where x is a set of ’incomplete’ data. let y denote a typical ’complete’ version of x and let Y(x) denote the set of all possible such y. Let the likelihood from y be denoted by g(y|Ψ) The EM algorithm generates, from some initial approximation, Ψ(0) , a sequence Ψ(m) of estimates. Each iteration consists of the following double step: • E step: Evaluate E [log g(x|Ψ)x, Ψ] = Q(Ψ, Ψ(m) ) • M step: Find Ψ = Ψ(m+1) to maximize Q(Ψ, Ψ(m) ) Algorithm 3 (see Appendices) summarizes the basic EM algorithm for the most widely used normal, exponential, binomial and Poisson distributions in the c-components univariate case where p is the mixing weights vector and θ the parameters vector (θ and µ in the case of mixture of normal). The EM algorithm converges for the four families studied (see Titterington, Smith and Makov (1985), Section 4.3.3) but the results do depend upon the starting point and the possibility of multiple solutions is always present. 2.3.3 Starting Values for EM Algorithm As explained in section 2.3.2, the EM algorithm is started from some initial value of Ψ, Ψ(0) we have to specify. Recently, Seidel, Mosler and Akel (2000) have demonstrated how diﬀerent starting strategies and stopping rules can lead to quite diﬀerent estimates in the context of ﬁtting mixture of exponential components via the EM algorithm. Furthermore, as convergence with the EM algorithm is slow, the situation will be exacerbated by a poor choice of Ψ(0) . Another problem with mixture models is that the likelihood equation will usually have multiple roots corresponding to local maxima, and so the EM algorithm should be applied from a wide choice of starting values in any search for all local maxima. Yet, in many cases involving univariate data, the choice of starting values will not be critical, and usually, the EM algorithm would be applied from a number of random starts or from estimates coming from the method of moments when possible. Yet, if the mixture is poorly ﬁtted with the method of moments, we may consider the question of using other initial values to search for other local maxima giving better results. We shall now study the ﬁtting of mixture distributions with the method of moments and the MLE. We ﬁrst deﬁne the goodness-of-ﬁt and give a criterion to characterize it. 2.4 Goodness-of-Fit Goodness-of-ﬁt means how well a statistical model ﬁts a set of observations. Measures of goodness-of-ﬁt typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions (Kolmogorov-Smirnov test), or whether outcome frequencies follow a speciﬁed distribution (Pearson’s chi-square test). Pearson’s chi-square test is one of a variety of chi-square tests – statistical procedures whose results are evaluated by reference to the chi-square distribution. It tests a null hypothesis usually called H0 that the relative frequencies of occurrence of observed events follow a speciﬁed frequency distribution. The events must be mutually exclusive. One of the simplest examples is the hypothesis that an ordinary six-sided die is "fair", i.e. all six outcomes occur equally often. Chi-square is deﬁned as: (Oi − Ei )2 χ2 = (2.14) Ei 1 i v where v are the numbers of intervals, Oi are the observed frequencies, and Ei the frequencies expected under the model. If the value found is superior to the one of χ2 found in the Chi-Square probabilities table (see Table 6 0.95 given in Appendices), we may conclude that the observed frequencies signiﬁcatively diﬀer from the theoretical eﬀective up to 5%. It means that we have 5 chances out of 100 to reject the null hypothesis H0 = "the 8 population follows the rule studied" whereas this hypothesis is true. If we use the table with χ2 , we have 0.99 1% chance of rejecting H0 . Use of the Chi-Square Probabilities Table: The number df which is used in the table is the number of degrees of freedom. It is deﬁned as df = v − r − 1 (2.15) where v are the numbers of intervals, and r is number of parameters in the statistical law studied. We now apply the numerical algorithms to the simplest case of normal mixtures, and study the goodness of ﬁtting with the chi-square test. 2.5 Fitting Normal Mixtures 2.5.1 Introduction The most widely used ﬁnite mixture distributions are those involving normal components. In the univariate case, such ﬁnite normal normal mixture distributions have a long history, and the problem of estimating their parameters is one of the oldest estimation problems in the statistical literature. To compare the method of moments and the MLE, we specialize our study to the case of two-components univariate normal mixtures, (x − µ1 )2 (x − µ2 )2 1 − 2σ1 1 − 2σ2 g(x, p, σ1 , σ2 , µ1 , µ2 ) = p √ e + (1 − p) √ e (2.16) 2πσ1 2πσ2 Once the correctness of the algorithms used have been checked with data given by Everitt and Hand (1981), we study the inﬂuence of p on the goodness of ﬁt of two types of normal mixture. For this purpose, we generate n random numbers coming from the two distributions, mix them according to several mixing weights and use the two algorithms to ﬁt the obtained mixtures. 2.5.2 Mixture of Two Well-Separated Normals We ﬁrst study a mixture of two normals well separated as for their means and variances. The generated data, which were recorded in the form of v = 26 intervals, are displayed in Figure 2 for p1 = 0.4, along with the plot of the density of the normal mixture distribution ﬁtted to them and its two components. The results of the ﬁtting for several proportions are presented in Table 1 with the chi-square error χ2 deﬁned in section 2.4 and the number of iterations k for the EM algorithm. In the case of a single root, the diﬀerence between MLE and method of moments is small. The MLE, however, gives better results for small p1 , which justify the use of the method of moments to initialize the EM algorithm. In the case of two acceptable roots for the nonic equation 2.3, the solution giving the smallest chi-square is used to initialize the EM algorithm: without this selection, the EM algorithm gives the same estimates but takes a little more time. When the method of moments fails, as it is the case for p1 = 0.01, the initialization for the EM algorithm is made at random and gives acceptable results in a reasonable time. As data are generated from already known distributions, we can compare the estimates with the theoretical parameters, and a simple glimpse to the superposition of the theoretical and the ﬁtted mixtures can give informations on the goodness of the ﬁtting. These observations allow us to say that the results are quite good in general. Moreover, the use of the Chi-Square probabilities table (see Table 6 in Appendices) with χ2 conﬁrm the goodness-of-ﬁt of most of the mixture studied. 0.95 Conclusion: The conjugate use of the method of moments and the EM algorithm gives a good ﬁtting of mixtures in this example, even for small p1 . 2.5.3 Mixture of Two Close Normals We now study the ﬁtting of a mixture of two normal distributions very close as for their means and variances. The results of the ﬁtting for several proportions are presented in Table 2 with their chi-square error χ2 deﬁned in section 2.4, and the number of iterations k for the EM algorithm. In this case, the method of moments systematically fails. As the EM algorithm can give several solutions depending on the random initialization, the choice of the estimates can be made on the basis of the chi-square 9 140 120 100 80 60 40 20 0 −60 −40 −20 0 20 40 60 80 100 Figure 2: Plot of 1000 random data generated from a two-component normal mixture with µ1 = 10, σ1 = 20, µ2 = 50, σ2 = 10 and p1 = 0.4, of the ﬁtted two-component normal mixture models (red line) and its two components (green lines). Table 1: Estimates for the parameters of a two-component normal mixture ﬁtted to data generated from a mixture with µ1 = 10, σ1 = 20, µ2 = 50, σ2 = 10 and diﬀerent p1 with the two methods for n = 1000 p Method of Moments Maximum Likelihood Estimation 0.5 µ1 = 6.7142 µ2 = 49.10 µ1 = 5.8086 µ2 = 49.107 k = 134 σ1 = 19.260 σ2 = 11.558 χ2 = 17.430a σ1 = 18.590 σ2 = 11.2377 χ2 = 17.394a p1 = 0.46149 p1 = 0.45186 0.3 µ1 = 12.314 µ2 = 50.463 µ1 = 9.1812 µ2 = 50.158 k = 165 σ1 = 21.422 σ2 = 10.518 χ2 = 20.013a σ1 = 19.763 σ2 = 10.554 χ2 = 18.6357a p1 = 0.081794 p1 = 0.29496 0.1 µ1 = 5.3631 µ2 = 49.955 µ1 = 9.0786 µ2 = 50.134 k = 200 σ1 = 18.494 σ2 = 10.669 χ2 = 25.3298a σ1 = 20.312 σ2 = 10.551 χ2 = 20.461a p1 = 0.081794 p1 = 0.093193 0.05 µ1 = 8.419 µ2 = 50.360 µ1 = 8.4789 µ2 = 50.304 k = 228 σ1 = 18.346 σ2 = 10.433 χ2 = 19.956a σ1 = 18.975 σ2 = 10.515 χ2 = 19.351a p1 = 0.043145 p1 = 0.041930 µ1 = 26.682 µ2 = 50.809 σ1 = 24.172 σ2 = 9.810 χ2 = 25.2585a p1 = 0.093604 0.01 µ1 = 7.4302 µ2 = 50.415 k = 289 no negative root σ1 = 15.015 σ2 = 10.457 χ2 = 24.125a p1 = 0.0091530 a signiﬁcant at 5% 10 Table 2: Estimates for the parameters of a two-component normal mixture ﬁtted to data generated from a mixture with µ1 = 10, σ1 = 50, µ2 = 12, σ2 = 55 and diﬀerent p1 with the two methods for n = 1000 p Method of Moments Maximum Likelihood Estimation 0.5 no real negative root µ1 = 3.0705 µ2 = 56.353 k = 6947 σ1 = 51.262 σ2 = 40.802 χ2 = 33.721b p1 = 0.84724 no real negative root µ1 = −153.75 µ2 = 11.862 k = 1372 σ1 = 9.0906 σ2 = 52.451 χ2 = 25.988b p1 = 0.0039345 0.3 no real negative root µ1 = 8.54062 µ2 = 107.446 k = 9627 gives acceptable results σ1 = 51.4289 σ2 = 24.655 χ2 = 31.3147b p1 = 0.967979 no real negative root µ1 = −42.4161 µ2 = 14.972 k = 424 gives acceptable results σ1 = 23.050 σ2 = 53.267 χ2 = 31.398b p1 = 0.056887 0.1 no real negative root µ1 = 9.0280 µ2 = 139.23 k = 2884 gives acceptable results σ1 = 52.395 σ2 = 5.4878 χ2 = 26.560b p1 = 0.98817 no real negative root µ1 = −18.157 µ2 = 12.949 k = 4652 gives acceptable results σ1 = 30.509 σ2 = 54.785 χ2 = 32.712b p1 = 0.07653 0.05 no real negative root µ1 = 9.6651 µ2 = 139.353 k = 395 gives acceptable results σ1 = 53.124 σ2 = 5.0793 χ2 = 25.284b p1 = 0.98924 no real negative root µ1 = −13.350 µ2 = 15.260 k = 870 gives acceptable results σ1 = 37.235 σ2 = 55.888 χ2 = 31.979b p1 = 0.14681 b signiﬁcant at 10% 120 120 100 100 80 80 60 60 40 40 20 20 0 0 −150 −100 −50 0 50 100 150 −150 −100 −50 0 50 100 150 Figure 3: Plot of 1000 random data generated from a two-component normal mixtures with µ1 = 10, σ1 = 50, µ2 = 12, σ2 = 55 and p1 = 0.1, of two results of the ﬁtting (red line) and their two components (green lines) giving respectively χ2 = 32.712b and χ2 = 26.560b. b signiﬁcant at 10% 11 error minimization. Yet, several diﬀerent estimates can give the same χ2 and the ﬁtting giving the smallest χ2 is not always the nearest to the theoretical results as displayed in Figure 3. Depending on the initialization, the EM algorithm can also fail. Conclusion: The method of moment failed for this example and the EM algorithm gives several solutions. The estimates are in general not very good, as conﬁrmed by the use of the Chi-Square probabilities table (see Table in 6 Appendices) with χ2 , and several statistically equivalent ﬁtting are given. Without any 0.95 other information on the mixture compounds, it is impossible to conclude which solution is the closest to the "real" one. We now suppose that historical data are available. We shall try to classify them according to the distri- bution they come from and give the estimates the closest to reality. We will then be able to determine in which category newly given data ought to be assigned. 3 Classiﬁcation 3.1 Introductory Example There is incontrovertible evidence that the early diagnosis of hereditary hemochromatosis prevents virtually all manifestations of the disease and results in normal life expectancy. In contrast, when unrecognized and untreated, the disease leads to cirrhosis, hepatocellular carcinoma, and other lethal complications (McLaren et al., 1998). Although heterozygotes for hemochromatosis do not usually develop iron overload suﬃcient to cause overt organ damages, heterozygotes may develop marked iron overload and the heterozygous state has been implicated in some typical diseases. Thus early identiﬁcation of heterozygotes and homozygotes for hemochromatosis is an important clinical challenge. An elevated level of transferrin saturation remains the most useful noninvasive screening test for aﬀected individuals, but there is some debate as to the appropriate screening level. Because it has been found that transferrin saturation is normally distributed in normal homozygotes, the physiologic models we consider is a mixture of two normal distributions for heterozygotes and homozygotes, and unaﬀected. Once we have ﬁtted the mixture, if the ﬁt is good, we would like to be able to tell to new patients who have high transferrin saturation if they are unaﬀected or not. 3.2 Rules of Classiﬁcation Once we have statistically good estimates from the ﬁtted mixture, we can try to classify historical data available. We ﬁrst have to determine if the ﬁtting corresponds to the available data. To classify the given data x = (x1 , . . . , xn ) according to the distribution they come, we use the following criterion: f1 (xi ) xi ∈ f1 ⇐⇒ > 1. (3.1) f2 (xi ) This rule separates the data into two from the crossing point of the two distributions. The number of well-classiﬁed data in each distribution will give the best ﬁtting. We can also adapt the rule to have for exemple the same proportions of data in the two distributions. When then use the following rule f1 (xi ) xi ∈ f1 ⇐⇒ > k. (3.2) f2 (xi ) and have to ﬁnd the proper k such as the number of well-classiﬁed data is the same in each distribution. 3.3 Study of The Rules 3.3.1 Mixture of Two Well-Separated Normals We generate 10 sets of data which will be used as historical data from the ﬁtting mixtures obtained in 2.5.2 for diﬀerent p, and we classify them according to the two rules given in 3.2. The averages obtained over the 10 sets are given in Table 3. There are logically more well-classiﬁed data in the highest distribution, and if one of the two distributions is too small, classiﬁcation gives very bad results for it and equi-repartition is impossible. Conclusion: The two rules of classiﬁcation give results which seem to be quite bad. Yet, as the ﬁtting is good, it is the best results we can expect to have. 12 Table 3: Averages of well-classiﬁed data for a two-component normal mixture ﬁtted to data generated from a mixture with µ1 = 10, σ1 = 20, µ2 = 50, σ2 = 10 and diﬀerent p over 10 set of n = 1000 data p k=1 f1 f2 Equi-repartition 2 a 0.5 χ = 17.394 85.06% 97.36% k = 0.29262 91.24% 0.3 χ2 = 18.636a 88.5% 95.74% k = 0.49181 91.33% 0.1 χ2 = 20.461a 89.2% 95.81% k = 0.54695 92% 0.05 χ2 = 19.351a 88.8% 95.96% k = 0.47957 92.2% 0.01 χ2 = 24.125a 87.37% 16.36% equi repartition impossible a signiﬁcant at 5% 3.3.2 Mixture of Two Close Normals We now generate 10 sets of data which will be used as historical data from the ﬁtting mixtures obtained in 2.5.3 for diﬀerent p, and we classify them according to the two rules given in 3.2. The averages obtained over the 10 sets are given in Table 4. Table 4: Averages of well-classiﬁed data for a two-component normal mixture ﬁtted to data generated from a mixture with µ1 = 10, σ1 = 50, µ2 = 12, σ2 = 55 and diﬀerent p over 10 set of n = 1000 data p k=1 f1 f2 Equi repartition 2 b 0.5 χ = 33.721 58.8% 42.4% k = 1.3667 50.2% χ2 = 25.988b 0.8% 99.2% equi repartition impossible 0.3 χ2 = 31.315b 85.7% 16.143% equi repartition impossible χ2 = 31.398b 35% 67.286% equi repartition impossible 0.1 χ2 = 26.560b 98% 2.6667% equi repartition impossible χ2 = 32.712b 49% 50.778% k = 1.0006 49.98% 0.05 χ2 = 25.284b 96% 2.3158% equi repartition impossible χ2 = 31.979b 48% 44.737% k = 1.0856 50.65% b signiﬁcant at 10% The results are uneven, depending on the ﬁtting. It is clear that we can reject the ﬁttings giving a very small classiﬁcation for one of the distribution, even if they give the smallest χ2 . For the remaining ﬁttings, the classiﬁcation is almost equally made, and only half of the data are well-classiﬁed. Conclusion: We can reject some ﬁttings on the basis of classiﬁcation of historical data. For the best ﬁttings, the ﬁrst rule gives some bad results, which can be partly explained by a poor ﬁtting and the shape of the two distributions. 3.4 Application to Hemochromatosis 3.4.1 Case Study On the basis of blood test results, we wish to know if new patients are unaﬀected by hemochromatosis or must be submitted to other speciﬁc tests to determine more precisely if they are heterozygotous or homozygotous. Distributions of transferrin saturation values for 1325 white men and 1547 nonpregnant white women 20 to 74 years of age have been selected from the unweighted second National Health and Nutrition Examination Survey (NHANES II) sample data. We generated data from estimates found by McLaren et al. (1995) with these distributions, and the conjugate method of moments et MLE will be applied to the two data sets for parameter estimation, and the χ2 statistic test will check the goodness-of-ﬁt of each ﬁtting. If the ﬁt is good, we will apply the rule studied in section 3.2 to generated data used as historical data. If the classiﬁcation gives good results, we will then give a theoretical threshold to determine if an individual is aﬀected or not. 13 100 250 90 80 200 70 60 150 50 40 100 30 20 50 10 0 0 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 Figure 4: Plots of the distribution of transferrin saturation values for white men (ﬁrst plot) and white women (second plot) with, of the results of the ﬁtting (red line) and their two components (green lines). 3.4.2 Results The data, which were recorded in the form of v = 26 intervals, are displayed in Figure 4, along with the plot of the densities of the normal mixture distributions ﬁtted to them and their two components. The given parameters and the estimates obtained with the conjugate method of moments et MLE are presented in Table 5 with the chi-square error χ2 deﬁned in section 2.4, and the number of iterations k for the EM algorithm. Table 5: Analysis of the distribution of transferrin saturation for white men and women compared to theo- retical results Given parameters Estimate parameters men µ1 = 29.5 µ2 = 46.5 µ1 = 29.585 µ2 = 44.440 k = 2834 σ1 = 7 σ2 = 7 σ1 = 7.1048 σ2 = 7.9719 χ2 = 9.446a p1 = 0.85 p1 = 0.81829 women µ1 = 26.7 µ2 = 43.4 µ1 = 26.8558 µ2 = 45.863 k = 224 σ1 = 7 σ2 = 7 σ1 = 7.10285 σ2 = 6.5023 χ2 = 15.943a p1 = 0.87 p1 = 0.892888 a signiﬁcant at 5% Both comparaison to the original estimates and χ2 test give excellent results, especially for men. As the ﬁtting is good, we generate n = 1000 data for each category, and use them as historical data. Classiﬁcation with the ﬁrst rule gives 84.05% and 89.63% for men, and 92.50% and 88.01% for women, for the ﬁrst and second distribution respectively. The classiﬁcation is acceptable, we can deﬁne a threshold xth to separate unaﬀected and aﬀected patients. We consider the following deﬁnition for xth f1 (xth ) = f2 (xth ). (3.3) which gives the crossing point of the two marginal distributions. The results are xth = 42.49% of transferrin saturation for men, xth = 41.84% for women, as the thresholds practically used are 47% and 44.7% respectively. Conclusion: Our results are a little more pessimistic (or at least more cautious) but conﬁrm that this method can be used to give a practical diagnostic in real life cases. 3.5 Perspective for a Future Work What could be study in a future work is, instead of maximizing proportions of well-classiﬁed data in each marginal distribution, trying to maximize the sum of the two proportions. This new classiﬁcation could have 14 given a diﬀerent threshold in the previous study. Another possible study which could ﬁnd a lot of applications especially in genetics is the maximization of the proportion of well classiﬁed data for a subpopulation which is a speciﬁc matter of interest. Conclusion This study has considered ﬁnite mixture distributions and some of the problems one might encounter when applying them, using real life data. Various technics have been used to ﬁt univariate mixtures but they may not all work for all data sets; as previously seen, the method of moments often fails and the MLE may not converge, depending on the initialization of the EM algorithm. Other techniques may be used if both methods fail. We presented a method of classifying historical data using two rules, and a concrete application of this classiﬁcation. The classiﬁcation was performed in two steps. We ﬁrst ﬁt the mixture, and then, if the ﬁtting was good, we classiﬁed according to a particular rule. But it may be noticed that we studied exceptions, and no study of the rules in general has been made. The techniques and mechanisms studied in this paper could be applied to a wide variety of ﬁelds, depending on the subject matter of interest, but data and the results of there treatment should be considered within the context of its domain. 15 Appendices Table 6: Chi-Square Probabilities Table df 0.995 0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01 0.005 1 — — 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.87 2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597 3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838 4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860 5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750 6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548 7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278 8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955 9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589 10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188 11 2.603 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757 12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.300 13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819 14 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319 15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801 16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267 17 5.697 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718 18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156 19 6.844 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191 38.582 20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997 21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401 22 8.643 9.542 10.982 12.338 14.041 30.813 33.924 36.781 40.289 42.796 23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181 24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559 25 10.520 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314 46.928 26 11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290 27 11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.195 46.963 49.645 28 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278 50.993 29 13.121 14.256 16.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336 30 13.787 14.953 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672 40 20.707 22.164 24.433 26.509 29.051 51.805 55.758 59.342 63.691 66.766 50 27.991 29.707 32.357 34.764 37.689 63.167 67.505 71.420 76.154 79.490 60 35.534 37.485 40.482 43.188 46.459 74.397 79.082 83.298 88.379 91.952 70 43.275 45.442 48.758 51.739 55.329 85.527 90.531 95.023 100.425 104.215 80 51.172 53.540 57.153 60.391 64.278 96.578 101.879 106.629 112.329 116.321 90 59.196 61.754 65.647 69.126 73.291 107.565 113.145 118.136 124.116 128.299 100 67.328 70.065 74.222 77.929 82.358 118.498 124.342 129.561 135.807 140.169 Algorithm 1 : Method of moments for c-components univariate mixtures of exponential, binomial and Poisson V0 = x¯ for s = 1 : 2c − 1 do if Mixture of exponential then xs Vs (x) = Γ(s + 1) end if if Mixture of binomial then x(x − 1) . . . (x − s + 1) Vs (x) = ) N (N − 1) . . . (N − s + 1) end if if Mixture of Poisson then Vs (x) = x(x − 1) . . . (x − s + 1) end if end for for i = 0 : c − 1 do Wi = Vc+i for j = 1 : c do Mij = Vi+j end for end for Solve M D = W θ = roots (xc + Dc−1 xc−1 + . . . + D1 x + D0 ) for i = 0 : 2c − 1 do for j = 0 : c do i Θij = θj end for end for Solve ΘP = V II Algorithm 2 : Method of moments for two-component univariate normal mixtures V0 = x¯ for s = 1 : 2c − 1 do Vs = (x(i) − x)r ¯ 1 i n end for a0 = −24V36 a1 = −96V34 k4 2 a2 = −(63V32 k4 + 72V33 k5 ) 4 3 a3 = 288V3 − 108V3 k4 k5 + 27k4 2 2 a4 = 444V3 k4 − 18k5 a5 = 90k42 + 72V3 k5 a6 = 36V3 a7 = 84k4 a8 = 0 a9 = 24 k4 = V4 − 3V22 k5 = V5 − 10V2 V3 U = roots ai u i 0 i 9 for i=1:9 do if Ui < 0 then −8V3 u3 + 3k5 Ui2 + 6V3 k4 Ui + 2V33 ˆ w= 2Ui3 + 3k4 Ui + 4V33 2 w δ = roots x − x + Ui Ui if δ1 and δ1 real then ¯ µ1 = x + δ(1) ¯ µ2 = x + δ(2) σ1 = 1 δ1 (2w/Ui − V3 /Ui ) + V2 − δ1 2 3 2 2 1 2 σ2 = 3 δ2 (2w/Ui − V3 /Ui ) + V2 − δ2 δ2 ˜ p= δ2 − δ1 2 2 if σ1 < 0 or σ2 < 0 then No real negative root gives acceptable results end if else No real root for the quadratic equation end if else No negativ root for the nonic equation end if end for III Algorithm 3 : Maximum Likelihood Estimation for c-components univariate mixture p(0) , θ(0) , σ (0) ρ while (ρ > ǫ) do for i = 1 : n do for j = 1 : c do (k) fj (xi ) h(i, j)(k) = (k) g (xi ) end for end for for j = 1 : c do (k) sj = h(i, j)(k) 1 i n (k) 1 (k) pj = s ; n end for ρp = p(k) − p(k−1) ; ∞ for j = 1 : c do if Mixture of binomial then (k) 1 (k) θj = h(i, j)(k) xi ; m sj 1 i n else (k) 1 (k) θj = h(i, j)(k) xi ; sj 1 i n end if end for ρθ = θ(k) − θ(k−1) ; ∞ if Mixture of normal then for j = 1 : c do (k) 1 (k) σj = (k) h(i, j)(k) (xi − θj )2 ; sj 1 i n end for ρσ = σ (k) − σ (k−1) ∞ end if ρ = max(ρp , ρθ , ρσ ); end while IV References [1] A.C. Cohen (1967). Estimation in mixtures of two normal distributions. Technometrics 9, 15-28. [2] N.E. Day (1969). Estimating the components of a mixture of normal distributions. Biometrika 56, 463-474. [3] A.P. Dempster, N.M. Laird and D.B. Rubin (1977). Maximum likelihood estimation from incomplete data via the EM algorithm (with discussions). Journal of the Royal Statistical Society B 39, 1-38. [4] R.W. Doerge, Z.B. Zeng and B.S. Weir (1997). Statistical issues in the search for gene aﬀecting quantitative traits in experimental populations Statistical Science 12, 195-219. [5] B.S. Everitt and D.J. Hand (1981). Finite Mixture Distributions. [6] C.H. Kao and Z.B Zeng (1997). General formulas for obtaining the MLEs and the asymptotic variance- covariance matrix in mapping quantitative trait loci when using the EM algorithm. Biometrics 53, 653-665. [7] G.J. McLachlan and D. Peel (2000). Finite Mixture Models. [8] C.E. McLaren, V.R. Gordeuk, A.C. Looker, V. Hasselblad, C.Q. Edwards, L.M. Griffen, J.P. Kushner, and G.M. Brittenham. Prevalence of Heterozygotes for Hemochromatosis in the White Population of the United States (1995). Blood 86, 2021-2027. [9] C.E. McLaren, G.J. McLachlan, J. W. Halliday, S.I. Webbs, B.A. Leggett, E.C. Janzwinska, D.H.G. Crawford, V.R. Gordeuk, G.D. McLaren, and L.W. Powell (1998). The distribution of the transferrin saturation and hereditary haemochromatosis in Australians. Gastroenterology 114, 543-549. [10] K. Pearson (1894). Contribution to the mathematical theory of mathematical evolution. Philosophical Transactions of the Royal Society of London A 185, 71-110. [11] N.J. Schork, D.B. Allison and B. Thiel (1996). Mixture distributions in human genetics. Statistical Methods in Medical Research 5, 155-178. [12] W. Seidel, K. Mosler and M. Akel (2000). A cautionary note on likelihood ratio tests in mixture models. Annals of the Institute of Statistical Mathematics. [13] W.Y. Tan and W.C. Chang (1972). Some comparisons of the method of moments and the method of maximum likelihood in estimating parameters of a mixture of two normal densities. Journal of American Statistical Association 67, 702-708. [14] D.M. Titterington (1996). Mixture distributions (update) Encyclopedia of Statistical Sciences Vol 5, pp 399-407. [15] D.M. Titterington, A.F.M. Smith and U.E. Makov (1985). Statistical Analysis of Finite Mixture Distributions. [16] W.F.R. Weldon (1892). Certain correlated variations in Crangon vulgaris. Proceedings of the Royal Society of London 51, 2-21. [17] W.F.R. Weldon (1893). On certain correlated variations in Carcinus moenas. Proceedings of the Royal Society of London 54, 318-329. [18] S.J. Yakowitz and J.D. Spragins (1968). On the identiﬁability of ﬁnite mixtures. Annals of Mathe- matical Statistics, 39, 209-214. V