Finite Mixture Distributions by axe11963

VIEWS: 0 PAGES: 22

									Finite Mixture Distributions
      Supervisor : Shibdas Bandyopadhyay




             Morgane Bergot

        9 June – 16 September 2006
   Abstract
    Finite mixture distributions are a popular tool for modeling unobserved heterogeneity, and the number
of fields in which mixture models have been successfully applied is still extending. In this paper, we give
some concrete applications of finite mixtures, and basic definitions and concepts. We then present the
classical method of moment and the most commonly used Maximum Likelihood Estimation (MLE) via the
EM algorithm to fit the mixture, and study the particular case of two-component univariate normal mixtures
using artificial data. The goodness-of-fit is studied in these cases using the χ2 test. For well-fitted mixtures, we
propose two rules to classify historical data, if available, into the two marginal distributions. An application
of classification and its use is given using a concrete example in genetics.




   Résumé
    Les mélanges finis de loi de probabilités sont très utilisés pour modéliser des hétérogénéités non directement
observées parmi une population donnée et trouvent de plus en plus d’applications dans des domaines très
variés. Dans ce rapport, nous donnons tout d’abord des applications concrètes des mélanges finis, ainsi que
les définitions et concepts de bases. La classique méthode des moments et la très populaire méthode du
maximum de vraisemblance via l’algorithme EM pour séparer les mélanges est présentée, et appliquée dans
le cas de mélanges de deux normales à une seule variable, à partir de données générée artificiellement. Le
test du χ2 est utilisé pour vérifier la justesse de la séparation du mélange. Pour les mélanges correctement
séparés, nous proposons deux règles de classement dans les deux distributions trouvées de données historiques
éventuellement disponibles. Une application dans un cas concret issu de la génétique illustre cette méthode
de classement et son utilité.




                                                       2
Contents
Introduction                                                                                                                                                            2

1 General Introduction                                                                                                                                                  2
  1.1 Finite Mixture Models . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    2
      1.1.1 Initial Approach to Mixture Analysis       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    2
      1.1.2 Applications in genetics . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    2
  1.2 Basic Definitions . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    3
  1.3 Number of Components . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    4
  1.4 Identifiability of Mixture Distributions . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    4

2 Estimation of Mixture Distributions                                                                                                                                   5
  2.1 Introduction . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    5
  2.2 Method of Moments . . . . . . . . . . . . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    5
  2.3 Maximum Likelihood Fitting . . . . . . . . . .           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    6
      2.3.1 Maximum Likelihood Estimation . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    6
      2.3.2 EM Algorithm . . . . . . . . . . . . . .           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    7
      2.3.3 Starting Values for EM Algorithm . . .             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    8
  2.4 Goodness-of-Fit . . . . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    8
  2.5 Fitting Normal Mixtures . . . . . . . . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9
      2.5.1 Introduction . . . . . . . . . . . . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9
      2.5.2 Mixture of Two Well-Separated Normals              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9
      2.5.3 Mixture of Two Close Normals . . . . .             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9

3 Classification                                                                                                                                                        12
  3.1 Introductory Example . . . . . . . . .     . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   12
  3.2 Rules of Classification . . . . . . . . .   . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   12
  3.3 Study of The Rules . . . . . . . . . . .   . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   12
      3.3.1 Mixture of Two Well-Separated        Normals       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   12
      3.3.2 Mixture of Two Close Normals         . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
  3.4 Application to Hemochromatosis . . .       . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
      3.4.1 Case Study . . . . . . . . . . .     . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
      3.4.2 Results . . . . . . . . . . . . .    . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
  3.5 Perspective for a Future Work . . . . .    . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14

Conclusion                                                                                                                                                             15


Appendices                                                                                                                                                              I




                                                       1
Introduction
This report is concerned with statistical distributions which can be expressed as sums of usually simple
component distributions. Such superpositions are termed mixture distributions. In this text we are concerned
with mixtures involving a known finite number of components which are termed finite mixtures.
    Finite mixtures distributions have provided a mathematical-based approach to the statistical modeling of
a wide variety of random phenomena. Because of their usefulness as an extremely flexible method of modeling,
finite mixture models have continued to receive increasing attention over the years, from both practical and
theoretical points of view. Indeed, in the past decade the extent and the potential of applications of finite
mixture models have widened considerably. Fields in which mixture models have been successfully applied
include astronomy, biology, genetics, medicine, psychiatry, economics, engineering, and marketing, among
many other fields in the biological, physical and social sciences.
    A mixture model is able to model quite complex distributions through an appropriate choice of its com-
ponents to represent accurately the local areas of support of the true distribution. The problem of central
interest arises when data are not available for each distribution separately, but only for the overall mix-
ture distribution. Often such situations arise because it is impossible to observe some underlying variable
which splits the observations into groups – only the combined distribution can be studied. In these circum-
stances, interest often focuses on estimating the mixing proportions and on estimation the parameters in the
conditional distributions.
    There is a remarkable variety of estimation methods that have been applied to finite mixture problems
such as graphical methods, the method of moments, maximum likelihood, minimum chi-square, least squares
approaches and Bayesian approaches. In this report, the emphasis is on the fitting of mixture by method of
moments and maximum likelihood with the EM algorithm.
    We are concerned here with the issue of classifying data into categories given by the fitting of univariate
mixture of normals from a set of previously available data. Our goal is to determine into which categories
newly given data ought to be assigned, and to do so on the basis of several rules we shall study. This type
of classification could be used in many applications in genetics.
   In the first part, we present some concrete applications of finite mixtures, and give basic definitions and
concepts. In the second part, we focuse on the numerical methods that will be used to fit mixtures of normals,
and study the goodness of fitting with a specific criterion. In the third part, we attempt to classify data
according to different rules, and apply the study to a concrete case taken from genetics. Finally, we conclude
and present some restrictions about the use of the present work.


1     General Introduction
1.1     Finite Mixture Models
1.1.1   Initial Approach to Mixture Analysis
The first major analyses involving the use of finite mixture models was undertaken by Pearson (1894) who
fitted a mixture of two normal probability density functions to some data provided by Weldon (1892,1893).
    The data set analyzed by Pearson consisted of measurement on the ratio of forehead to body length
of n = 1000 crabs sampled from the Bay of Naples. These measurements, which were recorded in the form
of v = 29 intervals, are displayed in Figure 1, along with the plot of the density of the normal mixture
distribution fitted to them and its two components. Weldon had speculated that the asymmetry in the
histogram of these data might be a signal that this population was evolving toward two new subspecies.
Pearson used the method of moment he developed to obtain an excellent fit and interpreted the presence
of two components as evidence that there were two species of crabs.

1.1.2   Applications in genetics
One area in which mixture modeling has played a very importante role is genetics following Pearson’s
pioneering work in the area in the late 1800s. The reader is referred to Schork, Allison and Thiel (1996)
for a comprehensive review of the theory and application of mixture models in human genetics.
    A basic mixture model can be used to assess the impact of possible underlying genotypes associated with
a particular locus on phenotypes that display continuous or quantitative variations in the population (for



                                                      2
                        120



                        100



                         80



                         60



                         40



                         20



                          0
                                   0.58     0.6     0.62       0.64    0.66      0.68     0.7




Figure 1: Plot of forehead to body length data on 1000 crabs and of the fitted two-component (red line)
normal mixture models and its two components (green lines).


example, blood pressure, height, weight). Consider a locus with two alleles, denoted A and a. There are three
possible genotypes an individual can possess at this locus: AA, Aa (equivalent to aA) and aa. Suppose the
phenotype or trait values associated with individual possessing AA genotypes, the heterozygote genotypes
                                                              2           2              2
Aa and the homozygote genotypes aa are distributed N(µ1 , σ1 ), N(µ2 , σ2 ), and N(µ3 , σ3 ), respectively. The
proportions of these genotypes in the population at large are denoted pAA , pAa , and paa .
   If yi denotes the phenotype of interest, then it has the normal mixture distribution given by
                                                  2                2                2
                              g(yi ) = π1 N(µ1 , σ1 ) + π2 N(µ2 , σ2 ) + π3 N(µ3 , σ3 )                   (1.1)

where π1 = pAA , π2 = pAa , and π3 = paa . The genotype frequencies π1 , π2 , and π3 are usually written in
terms of the allele frequencies pA and pa , using the Hardy-Weinberg equilibrium (HWE) structure,

                                          π1 = p2 , π2 = 2pA pa , π3 = p2
                                                A                       a                                 (1.2)

Under the additive model, we have µ3 − µ2 = µ2 − µ1
   An alternative genetics model to the one above is the simple dominance model in which genotypes AA
and Aa have the same phenotype with frequency π1 = p2 + 2pA pa . This leads to a two-component normal
                                                       A
mixture model with components corresponding to the phenotypes AA and Aa and to aa on proportions
π1 = p2 + 2pA pa and π2 = p2 respectively.
      A                     a

    A recent focus of mixture models in genetics is the construction of gene marker maps for percentage
testing, disease resistance diagnosis, and quantitative trait loci (QTL) detection. See Doerge, Zeng, Weir
(1997), Kao and Zeng (1997), and the references therein.

1.2     Basic Definitions
Suppose that a random variable X takes values in a sample space X, and that its distribution can be
represented by a probability density function (or mass function in the case of discrete X) of the form

                              g(x) = p1 f1 (x) + p2 f2 (x) + . . . + pc fc (x)          x∈X               (1.3)

where
                                     pj ∈ [0, 1],   j ∈ [1, c],               pj = 1
                                                                      1 j c




                                                           3
and
                                      fj (.)    0,           fj (x) dx = 1,        j ∈ [1, c]
                                                         X

   We shall say that X has a finite mixture distribution and that g(x) is a finite mixture density function.
The parameters p1 , . . . , pc are called the mixing weights, and f1 (x), . . . , fc (x) the component densities of the
mixture.
    There is no requirement that the component densities should all belong to the same parametric family,
but as in many applications this is the case, we will restrict our attention to the simplest case, where
f1 (x), . . . , fc (x) have a common functional form but have different parameters throughout this project. We
can then write fi (x) = f (x|θi ) where θi denotes the parameters occurring in fi (x). The finite mixture density
function will then have the form
                                         g(x|θ) =       pj f (x|θi ) x ∈ X                                 (1.4)
                                                        1 j c

where θ = (p1 , . . . , pc , θ1 , . . . , θc ) is the complete collection of all distinct parameters occurring in the mixture
model.

1.3      Number of Components
The most fundamental "parameter" in the definition of a finite mixture is c, the number of components
subpopulations. Its importance results partly from technical considerations, in that without knowing c we
cannot go further in the process of estimating the mixture density, but in many contexts, uncertainty about
the number of components leads to statistical problems closely related to cluster analysis, possibly with strong
assumptions about parametric structure (See McLachlan and Peel (2000), section 1.15).
   In many direct applications, however, there are situations where we believe in the existence of c underlying
categories, or groups, such that the experimental unit on which the observation X is made belongs to one of
these categories. In the following, we will then restrict our attention to the simple case where c is known.

1.4      Identifiability of Mixture Distributions
The assumption of identifiability, i.e. the existence of a unique characterization for member of the class of
model being considered, lies at the core of most statistical theory and practice. It is important to consider
identifiability in practice because, without it, estimation procedures are not likely to be well defined. Mixture
models can present particular difficulties with identifiability.
   Let
                                               F = f (x, θ) | θ ∈ Ω, x ∈ Rd
be the class of d-dimension distribution functions from which mixtures are to be formed. We identify the
class of finite mixtures of F with the appropriate class of distribution functions, H, defined by
                                                                                                      
                                                                                                      
          H = H(x) | H(x) =           pj fj (x), pj > 0,     pj = 1, fj (.) ∈ F, ∀(j, c) ∈ N2 , x ∈ Rd
                                                                                                      
                                     1 j c                         1 j c


so that H in the convex hull of F.
   Suppose (H, H ′ ) ∈ H2 , given by

                                         H=             pj f j ,       H′ =                 ′
                                                                                       p′ f j
                                                                                        j
                                                1 j c                         1 j c′

                                                                                                 ′
and that H ≡ H ′ if and only if c = c′ and we can order the summations such that pj = p′ , fj = fj , for
                                                                                       j
j = 1, . . . , c. Then H is identifiable.
   Univariate normal, exponential and Poisson mixtures are identifiable, provided the θj ’s are all different,
and a sufficient condition of identifiability for univariate binomial mixtures is n 2c − 1. See Everitt and
Hand (1981), Titterington, Smith and Makov (1985), Yakowitz and Spragins (1968) and references
therein for details. We will place our discussion under these hypotheses.




                                                                   4
2     Estimation of Mixture Distributions
2.1    Introduction
Over the years, a variety of approaches have been used to estimate distributions. They include graphical
methods, method of moments, minimum-distance methods, maximum likelihood, and Bayesian approaches.
As surmised by Titterington (1996) perhaps the main reason for the huge literature on estimation methodology
for mixtures is fact that explicit formulas for parameter estimates are typically not available.
    As discussed in section 1.1.1, it can be argued that the problem of fitting mixture models started in
earnest with the work of Pearson (1894) on the estimation of the parameters of a mixture of two univariate
normal components by the method of moments. This method will be our first approach of estimation of
mixture distributions.
    Yet, since the advent of the EM algorithm, maximum likelihood (ML) has been by far the most commonly
used approach to the fitting of mixture distributions. Before proceeding to consider the EM algorithm, we
will first briefly define Maximum Likelihood Estimation (MLE) in general and introduce some associated
notation.

2.2    Method of Moments
Suppose
                                               g(x|θ) =            pj f (xj |θi )       x∈X
                                                           1 j c

and ts (x) is such that t0 (x) = 1 and

                                               E0 ts (X) = θs ,          s = 0, . . . , 2c − 1

Suppose also that
                                                               1
                                                          Vs = n             ts (xi )
                                                                   1 i n

Then we have a set of moment equations in θj , given by

                                                          s
                                                      pj θj = Vs         s = 0, . . . , 2c − 1
                                              1 j c

which can be expressed as
                                                                 ΘP = V                                (2.1)
               s
where Θsj =   θj ,   s = 0, . . . , 2c − 1, j = 0, . . . , c
   Provided the θj ’s are all different, P can be solved in terms of the {θj } and therefore eliminated. Fur-
thermore, the required {θj } are the roots of

                                             θc + Dc−1 θc−1 + . . . + D1 θ1 + D0 = 0

where D = (D0 , . . . , Dc−1 )T satisfies
                                                                                              
                                      V0            V1     ...   Vc−1           Vc
                                   V1              V2     ...    Vc         Vc+1              
                                   .                                  D = −   .                     (2.2)
                                                                                              
                                   .                                            .               
                                       .                                       .               
                                           Vc−1      Vc    . . . V2c−2         V2c−1

   Specific cases of c-components univariate mixtures of exponential, binomial and Poisson to which the
above analysis applies are given in the algorithm 1 (see Appendices) where θ gives the parameters and P the
mixing weights of each distribution.

   This analysis can not, however, be applied for normal mixtures. Consider first a two-component univariate
normal mixture: by equation the observed central moments Vr to the theoretical moments, we can obtain a




                                                                     5
system which may, by some fairly tedious algebra (see Cohen, 1967), be reduced to the ’fundamental nonic’
taking the form
                                                    ai u i = 0                                      (2.3)
                                                    0 i 9

where
                           a0   = −24V36                                a5   = 90k42 + 72V3 k5
                           a1   = −96V34 k4                             a6   = 36V3
                           a2               2
                                = −(63V32 k4 + 72V33 k5 )               a7   = 84k4
                                       4                   3
                           a3   = 288V3 − 108V3 k4 k5 + 27k4            a8   =0
                                       2         2
                           a4   = 444V3 k4 − 18k5                       a9   = 24
and the fourth and fifth sample cumulants
                                                  k4 = V4 − 3V22
                                                  k5 = V5 − 10V2 V3

   Let u be a real negative root of (2.3). Then estimates of δ1 = (µ1 − µ) and δ2 = (µ2 − µ) are obtained as
        ˆ
roots of the equation
                                                     wˆ
                                                δ2 − δ + u  ˆ                                          (2.4)
                                                     uˆ
with
                                         −8V3 u3 + 3k5 u2 + 6V3 k4 u + 2V33
                                              ˆ        ˆ           ˆ
                                    ˆ
                                   w=              3              3
                                                 u        ˆ
                                                2ˆ + 3k4 u + 4V3
   The estimates of the five parameters in the mixture may now be

                                           ¯ ˆ
                                      µ1 = x + δ1
                                           ¯ ˆ
                                      µ2 = x + δ2
                                        2  1 ˆ ˆ u          u         ˆ2
                                      σ1 = 3 δ1 (2w/ˆ − V3 /ˆ) + V2 − δ1
                                        2  1 ˆ ˆ u                    ˆ2
                                      σ2 = 3 δ2 (2w/ˆ − V3 /ˆ) + V2 − δ2
                                                            u
                                             ˆ
                                             δ2
                                      ˜
                                      p=
                                          ˆ     ˆ
                                          δ2 − δ1
   The parameter estimates may, however, be non-feasible as the quadric equation (2.4) may have no real
roots. The nonic may also have more than a single negative root, leading to multiple solutions. In general,
the feasible set of estimates is obtained if the nonic yields a negative solution such as (2.4) has real roots,
 2        2
σ1 > 0, σ2 > 0 and 0 < p < 1. The algorithm 2 (see Appendices) summarizes this method.
   For more than two components, the system of equations is very difficult to solve, and the method gives
very poor results in the multivariate case (see Day, 1969). Consequently it is unlikely that the method of
moments will be of any general use in estimating the parameters for c > 2 or in a multivariate case, and so
we are led to consider other possible estimation procedures such as maximum likelihood.
    But as a general method for estimation, the method of moments lacks the optimal properties of MLE but
is very often used to give a good initialisation for the EM algorithm.

2.3     Maximum Likelihood Fitting
2.3.1   Maximum Likelihood Estimation
For estimating the unknown parameters, we can also apply the standard Maximum Likelihood Estimation
(MLE), which possesses desirable statistical properties such as, under very general conditions, the estimates
obtained by the method are consistent (they converge with probability 1 to the true parameter values). We
first establish a general likelihood function and give the likelihood equation.
   Suppose X is a random variable or vector with probability density function f (x|Ψ), where

                                         Ψ = (p1 , . . . , pc−1 , θ1 , . . . , θn )T




                                                             6
is the parameter vector we wish to estimate. The parameter space is denoted by Ω. Although we are taking
X to be a continuous random variable, we can still view f (x|Ψ) as a probability density function in the case
where X is discrete by the adoption of counting measure.
    Let x = (x1 , . . . , xn ) denotes an observed independent random sample of size n on the random variable
X. The likelihood function for Ψ formed from the observed data x is given by

                                           L(Ψ; x) =           f (xj |Ψ)                                    (2.5)
                                                       1 j n

   We attempt to find the particular Ψ that maximizes the likelihood function. This maximization can be
dealt with the traditional way by differentiating L(Ψ; x) with respect to the components of Ψ and equating
the derivatives to zero to give the likelihood equation

                                                  ∂L(Ψ)
                                                        =0                                                  (2.6)
                                                   ∂Ψ
or equivalently,
                                                ∂ log L(Ψ)
                                                           =0                                               (2.7)
                                                    ∂Ψ
   We let
                                                        ∂ 2 logL(Ψ)
                                            I(Ψ; x) = −                                                     (2.8)
                                                          ∂Ψ∂ΨT
be the matrix of the negative of the second-order partial derivatives of the log likelihood function with respect
to the elements of Ψ. Under regularity conditions, the expected information matrix I(Ψ) is given by

                                          I(Ψ) = E S(x|Ψ)S T (x|Ψ)                                          (2.9)
                                               = −E [I(Ψ; x)] ;                                           (2.10)

where
                                                         ∂logL(Ψ)
                                              S(x|Ψ) =                                                    (2.11)
                                                            ∂Ψ
is the gradient vector of the log likelihood function; that is, the score statistic. Here the operator E denotes
expectation using the parameter vector Ψ.
                                                    ˆ
    The asymptotic covariance matrix of the MLE Ψ is equal to the inverse of the expected information matrix
                                          ˆ                                   ˆ      ˆ
I(Ψ), which can be approximated by I(Ψ); that is, the standard error of Ψi = (Ψ)i is given by

                                        ˆ             ˆ 1/2
                                     SE(Ψi ) ≈ (I −1 (Ψ))ii       i = 1, . . . , d                        (2.12)

    The observed information matrix is usually more convenient to use than the expected information matrix,
as it does not require an expectation to be taken. Thus, it is common in practice to estimate the inverse of
                                                                                               ˆ
the covariance matrix of a maximum-likelihood solution by the observed information matrix I(Ψ; x), rather
than the expected information matrix I(Ψ) evaluated at Ψ = Ψ.  ˆ This approach gives the approximation

                                        ˆ             ˆ 1/2
                                     SE(Ψi ) ≈ (I −1 (Ψ))ii       i = 1, . . . , d                        (2.13)

   Often the log likelihood function cannot be maximized analytically, that is, the likelihood equation has
no explicit solutions, particularly in incomplete-data problems. In such cases, it may be possible to compute
the MLE of θ iteratively.
   Next we will introduce an algorithm for calculating maximum likelihood estimates.

2.3.2   EM Algorithm
Incomplete data often result in complicated likelihood functions, where MLE’s usually have to be computed
iteratively. The Expectation-Maximization algorithm for the finite mixture problem proposed by Dempster,
Laird, and Rubin in 1977, popularly known as the EM algorithm, is a broadly applicable approach to the
iterative computation of MLE’s, useful in a variety of incomplete-data problems, where algorithms such as
the Newton-type methods may turn out to be more complicated.




                                                        7
                                 ˆ
    Suppose we have to find Ψ = Ψ to maximize the likelihood L(Ψ) = f (x|Ψ), where x is a set of ’incomplete’
data. let y denote a typical ’complete’ version of x and let Y(x) denote the set of all possible such y. Let the
likelihood from y be denoted by
                                                     g(y|Ψ)

   The EM algorithm generates, from some initial approximation, Ψ(0) , a sequence           Ψ(m)    of estimates.
Each iteration consists of the following double step:
• E step: Evaluate E [log g(x|Ψ)x, Ψ] = Q(Ψ, Ψ(m) )
•   M step: Find Ψ = Ψ(m+1) to maximize Q(Ψ, Ψ(m) )
   Algorithm 3 (see Appendices) summarizes the basic EM algorithm for the most widely used normal,
exponential, binomial and Poisson distributions in the c-components univariate case where p is the mixing
weights vector and θ the parameters vector (θ and µ in the case of mixture of normal).
   The EM algorithm converges for the four families studied (see Titterington, Smith and Makov (1985),
Section 4.3.3) but the results do depend upon the starting point and the possibility of multiple solutions is
always present.

2.3.3   Starting Values for EM Algorithm
As explained in section 2.3.2, the EM algorithm is started from some initial value of Ψ, Ψ(0) we have to specify.
Recently, Seidel, Mosler and Akel (2000) have demonstrated how different starting strategies and stopping
rules can lead to quite different estimates in the context of fitting mixture of exponential components via the
EM algorithm. Furthermore, as convergence with the EM algorithm is slow, the situation will be exacerbated
by a poor choice of Ψ(0) . Another problem with mixture models is that the likelihood equation will usually
have multiple roots corresponding to local maxima, and so the EM algorithm should be applied from a wide
choice of starting values in any search for all local maxima.
    Yet, in many cases involving univariate data, the choice of starting values will not be critical, and usually,
the EM algorithm would be applied from a number of random starts or from estimates coming from the
method of moments when possible. Yet, if the mixture is poorly fitted with the method of moments, we may
consider the question of using other initial values to search for other local maxima giving better results.
   We shall now study the fitting of mixture distributions with the method of moments and the MLE. We
first define the goodness-of-fit and give a criterion to characterize it.

2.4     Goodness-of-Fit
Goodness-of-fit means how well a statistical model fits a set of observations. Measures of goodness-of-fit
typically summarize the discrepancy between observed values and the values expected under the model in
question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals, to
test whether two samples are drawn from identical distributions (Kolmogorov-Smirnov test), or whether
outcome frequencies follow a specified distribution (Pearson’s chi-square test).
    Pearson’s chi-square test is one of a variety of chi-square tests – statistical procedures whose results are
evaluated by reference to the chi-square distribution. It tests a null hypothesis usually called H0 that the
relative frequencies of occurrence of observed events follow a specified frequency distribution. The events
must be mutually exclusive. One of the simplest examples is the hypothesis that an ordinary six-sided die is
"fair", i.e. all six outcomes occur equally often.

    Chi-square is defined as:
                                                            (Oi − Ei )2
                                             χ2 =                                                          (2.14)
                                                                Ei
                                                    1 i v

where v are the numbers of intervals, Oi are the observed frequencies, and Ei the frequencies expected under
the model.
   If the value found is superior to the one of χ2 found in the Chi-Square probabilities table (see Table 6
                                                 0.95
given in Appendices), we may conclude that the observed frequencies significatively differ from the theoretical
effective up to 5%. It means that we have 5 chances out of 100 to reject the null hypothesis H0 = "the




                                                        8
population follows the rule studied" whereas this hypothesis is true. If we use the table with χ2 , we have
                                                                                                0.99
1% chance of rejecting H0 .
    Use of the Chi-Square Probabilities Table: The number df which is used in the table is the number
of degrees of freedom. It is defined as
                                         df = v − r − 1                                         (2.15)
where v are the numbers of intervals, and r is number of parameters in the statistical law studied.
    We now apply the numerical algorithms to the simplest case of normal mixtures, and study the goodness
of fitting with the chi-square test.

2.5     Fitting Normal Mixtures
2.5.1   Introduction
The most widely used finite mixture distributions are those involving normal components. In the univariate
case, such finite normal normal mixture distributions have a long history, and the problem of estimating their
parameters is one of the oldest estimation problems in the statistical literature.
   To compare the method of moments and the MLE, we specialize our study to the case of two-components
univariate normal mixtures,

                                                            (x − µ1 )2                      (x − µ2 )2
                                                  1     −
                                                               2σ1                  1     −
                                                                                               2σ2
                g(x, p, σ1 , σ2 , µ1 , µ2 ) = p √     e                + (1 − p) √      e                (2.16)
                                                 2πσ1                              2πσ2
   Once the correctness of the algorithms used have been checked with data given by Everitt and Hand
(1981), we study the influence of p on the goodness of fit of two types of normal mixture. For this purpose,
we generate n random numbers coming from the two distributions, mix them according to several mixing
weights and use the two algorithms to fit the obtained mixtures.

2.5.2   Mixture of Two Well-Separated Normals
We first study a mixture of two normals well separated as for their means and variances. The generated data,
which were recorded in the form of v = 26 intervals, are displayed in Figure 2 for p1 = 0.4, along with the
plot of the density of the normal mixture distribution fitted to them and its two components. The results of
the fitting for several proportions are presented in Table 1 with the chi-square error χ2 defined in section 2.4
and the number of iterations k for the EM algorithm.
    In the case of a single root, the difference between MLE and method of moments is small. The MLE,
however, gives better results for small p1 , which justify the use of the method of moments to initialize the
EM algorithm. In the case of two acceptable roots for the nonic equation 2.3, the solution giving the smallest
chi-square is used to initialize the EM algorithm: without this selection, the EM algorithm gives the same
estimates but takes a little more time. When the method of moments fails, as it is the case for p1 = 0.01,
the initialization for the EM algorithm is made at random and gives acceptable results in a reasonable time.
     As data are generated from already known distributions, we can compare the estimates with the theoretical
parameters, and a simple glimpse to the superposition of the theoretical and the fitted mixtures can give
informations on the goodness of the fitting. These observations allow us to say that the results are quite
good in general. Moreover, the use of the Chi-Square probabilities table (see Table 6 in Appendices) with
χ2 confirm the goodness-of-fit of most of the mixture studied.
  0.95
     Conclusion: The conjugate use of the method of moments and the EM algorithm gives a good fitting of
mixtures in this example, even for small p1 .

2.5.3   Mixture of Two Close Normals
We now study the fitting of a mixture of two normal distributions very close as for their means and variances.
The results of the fitting for several proportions are presented in Table 2 with their chi-square error χ2 defined
in section 2.4, and the number of iterations k for the EM algorithm.
    In this case, the method of moments systematically fails. As the EM algorithm can give several solutions
depending on the random initialization, the choice of the estimates can be made on the basis of the chi-square




                                                              9
                           140


                           120


                           100


                            80


                            60


                            40


                            20


                             0
                             −60   −40   −20   0    20   40    60    80    100




Figure 2: Plot of 1000 random data generated from a two-component normal mixture with µ1 = 10, σ1 = 20,
µ2 = 50, σ2 = 10 and p1 = 0.4, of the fitted two-component normal mixture models (red line) and its two
components (green lines).




Table 1: Estimates for the parameters of a two-component normal mixture fitted to data generated from a
mixture with µ1 = 10, σ1 = 20, µ2 = 50, σ2 = 10 and different p1 with the two methods for n = 1000


  p                  Method of Moments                          Maximum Likelihood Estimation
 0.5    µ1 = 6.7142      µ2 = 49.10                      µ1 = 5.8086    µ2 = 49.107    k = 134
        σ1 = 19.260      σ2 = 11.558 χ2 = 17.430a        σ1 = 18.590    σ2 = 11.2377 χ2 = 17.394a
        p1 = 0.46149                                     p1 = 0.45186
 0.3    µ1 = 12.314      µ2 = 50.463                     µ1 = 9.1812    µ2 = 50.158    k = 165
        σ1 = 21.422      σ2 = 10.518 χ2 = 20.013a        σ1 = 19.763    σ2 = 10.554    χ2 = 18.6357a
        p1 = 0.081794                                    p1 = 0.29496
 0.1    µ1 = 5.3631      µ2 = 49.955                     µ1 = 9.0786    µ2 = 50.134    k = 200
        σ1 = 18.494      σ2 = 10.669 χ2 = 25.3298a       σ1 = 20.312    σ2 = 10.551    χ2 = 20.461a
        p1 = 0.081794                                    p1 = 0.093193
 0.05   µ1 = 8.419       µ2 = 50.360                     µ1 = 8.4789    µ2 = 50.304    k = 228
        σ1 = 18.346      σ2 = 10.433 χ2 = 19.956a        σ1 = 18.975    σ2 = 10.515    χ2 = 19.351a
        p1 = 0.043145                                    p1 = 0.041930
        µ1 = 26.682      µ2 = 50.809
        σ1 = 24.172      σ2 = 9.810   χ2 = 25.2585a
        p1 = 0.093604
 0.01                                                    µ1 = 7.4302      µ2 = 50.415    k = 289
        no negative root                                 σ1 = 15.015      σ2 = 10.457    χ2 = 24.125a
                                                         p1 = 0.0091530
                  a
                      significant at 5%




                                                   10
Table 2: Estimates for the parameters of a two-component normal mixture fitted to data generated from a
mixture with µ1 = 10, σ1 = 50, µ2 = 12, σ2 = 55 and different p1 with the two methods for n = 1000

                    p          Method of Moments                    Maximum Likelihood Estimation
                   0.5         no real negative root         µ1 = 3.0705    µ2 = 56.353    k = 6947
                                                             σ1 = 51.262    σ2 = 40.802    χ2 = 33.721b
                                                             p1 = 0.84724
                               no real negative root         µ1 = −153.75   µ2 = 11.862    k = 1372
                                                             σ1 = 9.0906    σ2 = 52.451    χ2 = 25.988b
                                                             p1 = 0.0039345
                   0.3         no real negative root         µ1 = 8.54062   µ2 = 107.446 k = 9627
                              gives acceptable results       σ1 = 51.4289   σ2 = 24.655    χ2 = 31.3147b
                                                             p1 = 0.967979
                               no real negative root         µ1 = −42.4161 µ2 = 14.972     k = 424
                              gives acceptable results       σ1 = 23.050    σ2 = 53.267    χ2 = 31.398b
                                                             p1 = 0.056887
                   0.1         no real negative root         µ1 = 9.0280    µ2 = 139.23    k = 2884
                              gives acceptable results       σ1 = 52.395    σ2 = 5.4878    χ2 = 26.560b
                                                             p1 = 0.98817
                               no real negative root         µ1 = −18.157   µ2 = 12.949    k = 4652
                              gives acceptable results       σ1 = 30.509    σ2 = 54.785    χ2 = 32.712b
                                                             p1 = 0.07653
                   0.05        no real negative root         µ1 = 9.6651    µ2 = 139.353 k = 395
                              gives acceptable results       σ1 = 53.124    σ2 = 5.0793    χ2 = 25.284b
                                                             p1 = 0.98924
                               no real negative root         µ1 = −13.350   µ2 = 15.260    k = 870
                              gives acceptable results       σ1 = 37.235    σ2 = 55.888    χ2 = 31.979b
                                                             p1 = 0.14681
                          b
                              significant at 10%




      120                                                              120



      100                                                              100



       80                                                               80



       60                                                               60



       40                                                               40



       20                                                               20



        0                                                                0
            −150   −100   −50       0    50    100     150                   −150   −100   −50   0   50   100   150




Figure 3: Plot of 1000 random data generated from a two-component normal mixtures with µ1 = 10, σ1 = 50,
µ2 = 12, σ2 = 55 and p1 = 0.1, of two results of the fitting (red line) and their two components (green lines)
giving respectively χ2 = 32.712b and χ2 = 26.560b.
b
    significant at 10%




                                                                  11
error minimization. Yet, several different estimates can give the same χ2 and the fitting giving the smallest χ2
is not always the nearest to the theoretical results as displayed in Figure 3. Depending on the initialization,
the EM algorithm can also fail.
    Conclusion: The method of moment failed for this example and the EM algorithm gives several solutions.
The estimates are in general not very good, as confirmed by the use of the Chi-Square probabilities table
(see Table in 6 Appendices) with χ2 , and several statistically equivalent fitting are given. Without any
                                     0.95
other information on the mixture compounds, it is impossible to conclude which solution is the closest to the
"real" one.
   We now suppose that historical data are available. We shall try to classify them according to the distri-
bution they come from and give the estimates the closest to reality. We will then be able to determine in
which category newly given data ought to be assigned.


3     Classification
3.1     Introductory Example
There is incontrovertible evidence that the early diagnosis of hereditary hemochromatosis prevents virtually
all manifestations of the disease and results in normal life expectancy. In contrast, when unrecognized and
untreated, the disease leads to cirrhosis, hepatocellular carcinoma, and other lethal complications (McLaren
et al., 1998). Although heterozygotes for hemochromatosis do not usually develop iron overload sufficient
to cause overt organ damages, heterozygotes may develop marked iron overload and the heterozygous state
has been implicated in some typical diseases. Thus early identification of heterozygotes and homozygotes for
hemochromatosis is an important clinical challenge. An elevated level of transferrin saturation remains the
most useful noninvasive screening test for affected individuals, but there is some debate as to the appropriate
screening level.
    Because it has been found that transferrin saturation is normally distributed in normal homozygotes, the
physiologic models we consider is a mixture of two normal distributions for heterozygotes and homozygotes,
and unaffected. Once we have fitted the mixture, if the fit is good, we would like to be able to tell to new
patients who have high transferrin saturation if they are unaffected or not.

3.2     Rules of Classification
Once we have statistically good estimates from the fitted mixture, we can try to classify historical data
available. We first have to determine if the fitting corresponds to the available data. To classify the given
data x = (x1 , . . . , xn ) according to the distribution they come, we use the following criterion:
                                                            f1 (xi )
                                           xi ∈ f1 ⇐⇒                > 1.                                   (3.1)
                                                            f2 (xi )
This rule separates the data into two from the crossing point of the two distributions. The number of
well-classified data in each distribution will give the best fitting.
   We can also adapt the rule to have for exemple the same proportions of data in the two distributions.
When then use the following rule
                                                         f1 (xi )
                                          xi ∈ f1 ⇐⇒              > k.                                (3.2)
                                                         f2 (xi )
and have to find the proper k such as the number of well-classified data is the same in each distribution.

3.3     Study of The Rules
3.3.1   Mixture of Two Well-Separated Normals
We generate 10 sets of data which will be used as historical data from the fitting mixtures obtained in 2.5.2
for different p, and we classify them according to the two rules given in 3.2. The averages obtained over the
10 sets are given in Table 3.
    There are logically more well-classified data in the highest distribution, and if one of the two distributions
is too small, classification gives very bad results for it and equi-repartition is impossible.
    Conclusion: The two rules of classification give results which seem to be quite bad. Yet, as the fitting
is good, it is the best results we can expect to have.



                                                       12
Table 3: Averages of well-classified data for a two-component normal mixture fitted to data generated from
a mixture with µ1 = 10, σ1 = 20, µ2 = 50, σ2 = 10 and different p over 10 set of n = 1000 data

               p              k=1             f1        f2                  Equi-repartition
                          2           a
              0.5        χ    = 17.394      85.06%    97.36%              k = 0.29262           91.24%
              0.3        χ2   = 18.636a     88.5%     95.74%              k = 0.49181           91.33%
              0.1        χ2   = 20.461a     89.2%     95.81%              k = 0.54695            92%
              0.05       χ2   = 19.351a     88.8%     95.96%              k = 0.47957           92.2%
              0.01       χ2   = 24.125a     87.37%    16.36%      equi repartition impossible
                     a
                         significant at 5%


3.3.2   Mixture of Two Close Normals
We now generate 10 sets of data which will be used as historical data from the fitting mixtures obtained in
2.5.3 for different p, and we classify them according to the two rules given in 3.2. The averages obtained over
the 10 sets are given in Table 4.


Table 4: Averages of well-classified data for a two-component normal mixture fitted to data generated from
a mixture with µ1 = 10, σ1 = 50, µ2 = 12, σ2 = 55 and different p over 10 set of n = 1000 data

               p              k=1            f1        f2                   Equi repartition
                          2           b
               0.5       χ = 33.721         58.8%    42.4%                k = 1.3667            50.2%
                         χ2 = 25.988b        0.8%    99.2%        equi repartition impossible
               0.3       χ2 = 31.315b       85.7%    16.143%      equi repartition impossible
                         χ2 = 31.398b        35%     67.286%      equi repartition impossible
               0.1       χ2 = 26.560b        98%     2.6667%      equi repartition impossible
                         χ2 = 32.712b        49%     50.778%              k = 1.0006            49.98%
              0.05       χ2 = 25.284b        96%     2.3158%      equi repartition impossible
                         χ2 = 31.979b        48%     44.737%              k = 1.0856            50.65%
                     b
                         significant at 10%

    The results are uneven, depending on the fitting. It is clear that we can reject the fittings giving a very
small classification for one of the distribution, even if they give the smallest χ2 . For the remaining fittings,
the classification is almost equally made, and only half of the data are well-classified.
    Conclusion: We can reject some fittings on the basis of classification of historical data. For the best
fittings, the first rule gives some bad results, which can be partly explained by a poor fitting and the shape
of the two distributions.

3.4     Application to Hemochromatosis
3.4.1   Case Study
On the basis of blood test results, we wish to know if new patients are unaffected by hemochromatosis or must
be submitted to other specific tests to determine more precisely if they are heterozygotous or homozygotous.
    Distributions of transferrin saturation values for 1325 white men and 1547 nonpregnant white women 20
to 74 years of age have been selected from the unweighted second National Health and Nutrition Examination
Survey (NHANES II) sample data. We generated data from estimates found by McLaren et al. (1995) with
these distributions, and the conjugate method of moments et MLE will be applied to the two data sets for
parameter estimation, and the χ2 statistic test will check the goodness-of-fit of each fitting. If the fit is good,
we will apply the rule studied in section 3.2 to generated data used as historical data. If the classification
gives good results, we will then give a theoretical threshold to determine if an individual is affected or not.




                                                             13
      100                                                                250

       90

       80                                                                200

       70

       60                                                                150

       50

       40                                                                100

       30

       20                                                                 50

       10

        0                                                                  0
            0      10     20         30    40    50   60    70                 0    10   20   30   40   50   60   70




Figure 4: Plots of the distribution of transferrin saturation values for white men (first plot) and white women
(second plot) with, of the results of the fitting (red line) and their two components (green lines).


3.4.2           Results
The data, which were recorded in the form of v = 26 intervals, are displayed in Figure 4, along with the
plot of the densities of the normal mixture distributions fitted to them and their two components. The given
parameters and the estimates obtained with the conjugate method of moments et MLE are presented in Table
5 with the chi-square error χ2 defined in section 2.4, and the number of iterations k for the EM algorithm.


Table 5: Analysis of the distribution of transferrin saturation for white men and women compared to theo-
retical results
                                        Given parameters                    Estimate parameters
                        men           µ1 = 29.5 µ2 = 46.5        µ1 = 29.585     µ2 = 44.440 k = 2834
                                      σ1 = 7     σ2 = 7          σ1 = 7.1048     σ2 = 7.9719 χ2 = 9.446a
                                      p1 = 0.85                  p1 = 0.81829
                        women         µ1 = 26.7 µ2 = 43.4        µ1 = 26.8558    µ2 = 45.863 k = 224
                                      σ1 = 7     σ2 = 7          σ1 = 7.10285    σ2 = 6.5023 χ2 = 15.943a
                                      p1 = 0.87                  p1 = 0.892888
                               a
                                   significant at 5%

   Both comparaison to the original estimates and χ2 test give excellent results, especially for men.
   As the fitting is good, we generate n = 1000 data for each category, and use them as historical data.
Classification with the first rule gives 84.05% and 89.63% for men, and 92.50% and 88.01% for women, for
the first and second distribution respectively.
   The classification is acceptable, we can define a threshold xth to separate unaffected and affected patients.
We consider the following definition for xth

                                                           f1 (xth ) = f2 (xth ).                                      (3.3)

which gives the crossing point of the two marginal distributions.
   The results are xth = 42.49% of transferrin saturation for men, xth = 41.84% for women, as the thresholds
practically used are 47% and 44.7% respectively.
   Conclusion: Our results are a little more pessimistic (or at least more cautious) but confirm that this
method can be used to give a practical diagnostic in real life cases.

3.5         Perspective for a Future Work
What could be study in a future work is, instead of maximizing proportions of well-classified data in each
marginal distribution, trying to maximize the sum of the two proportions. This new classification could have



                                                                    14
given a different threshold in the previous study.
   Another possible study which could find a lot of applications especially in genetics is the maximization of
the proportion of well classified data for a subpopulation which is a specific matter of interest.


Conclusion
This study has considered finite mixture distributions and some of the problems one might encounter when
applying them, using real life data.
    Various technics have been used to fit univariate mixtures but they may not all work for all data sets;
as previously seen, the method of moments often fails and the MLE may not converge, depending on the
initialization of the EM algorithm. Other techniques may be used if both methods fail.
    We presented a method of classifying historical data using two rules, and a concrete application of this
classification. The classification was performed in two steps. We first fit the mixture, and then, if the fitting
was good, we classified according to a particular rule. But it may be noticed that we studied exceptions, and
no study of the rules in general has been made.
    The techniques and mechanisms studied in this paper could be applied to a wide variety of fields, depending
on the subject matter of interest, but data and the results of there treatment should be considered within
the context of its domain.




                                                     15
Appendices

                              Table 6: Chi-Square Probabilities Table

 df    0.995    0.99     0.975    0.95     0.90     0.10      0.05      0.025     0.01      0.005
 1     —        —        0.001    0.004    0.016    2.706     3.841     5.024     6.635     7.87
 2     0.010    0.020    0.051    0.103    0.211    4.605     5.991     7.378     9.210     10.597
 3     0.072    0.115    0.216    0.352    0.584    6.251     7.815     9.348     11.345    12.838
 4     0.207    0.297    0.484    0.711    1.064    7.779     9.488     11.143    13.277    14.860
 5     0.412    0.554    0.831    1.145    1.610    9.236     11.070    12.833    15.086    16.750
 6     0.676    0.872    1.237    1.635    2.204    10.645    12.592    14.449    16.812    18.548
 7     0.989    1.239    1.690    2.167    2.833    12.017    14.067    16.013    18.475    20.278
 8     1.344    1.646    2.180    2.733    3.490    13.362    15.507    17.535    20.090    21.955
 9     1.735    2.088    2.700    3.325    4.168    14.684    16.919    19.023    21.666    23.589
 10    2.156    2.558    3.247    3.940    4.865    15.987    18.307    20.483    23.209    25.188
 11    2.603    3.053    3.816    4.575    5.578    17.275    19.675    21.920    24.725    26.757
 12    3.074    3.571    4.404    5.226    6.304    18.549    21.026    23.337    26.217    28.300
 13    3.565    4.107    5.009    5.892    7.042    19.812    22.362    24.736    27.688    29.819
 14    4.075    4.660    5.629    6.571    7.790    21.064    23.685    26.119    29.141    31.319
 15    4.601    5.229    6.262    7.261    8.547    22.307    24.996    27.488    30.578    32.801
 16    5.142    5.812    6.908    7.962    9.312    23.542    26.296    28.845    32.000    34.267
 17    5.697    6.408    7.564    8.672    10.085   24.769    27.587    30.191    33.409    35.718
 18    6.265    7.015    8.231    9.390    10.865   25.989    28.869    31.526    34.805    37.156
 19    6.844    7.633    8.907    10.117   11.651   27.204    30.144    32.852    36.191    38.582
 20    7.434    8.260    9.591    10.851   12.443   28.412    31.410    34.170    37.566    39.997
 21    8.034    8.897    10.283   11.591   13.240   29.615    32.671    35.479    38.932    41.401
 22    8.643    9.542    10.982   12.338   14.041   30.813    33.924    36.781    40.289    42.796
 23    9.260    10.196   11.689   13.091   14.848   32.007    35.172    38.076    41.638    44.181
 24    9.886    10.856   12.401   13.848   15.659   33.196    36.415    39.364    42.980    45.559
 25    10.520   11.524   13.120   14.611   16.473   34.382    37.652    40.646    44.314    46.928
 26    11.160   12.198   13.844   15.379   17.292   35.563    38.885    41.923    45.642    48.290
 27    11.808   12.879   14.573   16.151   18.114   36.741    40.113    43.195    46.963    49.645
 28    12.461   13.565   15.308   16.928   18.939   37.916    41.337    44.461    48.278    50.993
 29    13.121   14.256   16.047   17.708   19.768   39.087    42.557    45.722    49.588    52.336
 30    13.787   14.953   16.791   18.493   20.599   40.256    43.773    46.979    50.892    53.672
 40    20.707   22.164   24.433   26.509   29.051   51.805    55.758    59.342    63.691    66.766
 50    27.991   29.707   32.357   34.764   37.689   63.167    67.505    71.420    76.154    79.490
 60    35.534   37.485   40.482   43.188   46.459   74.397    79.082    83.298    88.379    91.952
 70    43.275   45.442   48.758   51.739   55.329   85.527    90.531    95.023    100.425   104.215
 80    51.172   53.540   57.153   60.391   64.278   96.578    101.879   106.629   112.329   116.321
 90    59.196   61.754   65.647   69.126   73.291   107.565   113.145   118.136   124.116   128.299
 100   67.328   70.065   74.222   77.929   82.358   118.498   124.342   129.561   135.807   140.169
Algorithm 1 : Method of moments for c-components univariate mixtures of exponential, binomial and
Poisson
  V0 = x¯

  for s = 1 : 2c − 1 do

    if Mixture of exponential then
                   xs
       Vs (x) =
                Γ(s + 1)
    end if

    if Mixture of binomial then
                 x(x − 1) . . . (x − s + 1)
       Vs (x) =                             )
                N (N − 1) . . . (N − s + 1)
    end if

    if Mixture of Poisson then
       Vs (x) = x(x − 1) . . . (x − s + 1)

    end if

  end for

  for i = 0 : c − 1 do
    Wi = Vc+i

    for j = 1 : c do
      Mij = Vi+j

    end for

  end for
  Solve M D = W
  θ = roots (xc + Dc−1 xc−1 + . . . + D1 x + D0 )

  for i = 0 : 2c − 1 do

    for j = 0 : c do
              i
      Θij = θj

    end for

  end for
  Solve ΘP = V




                                                    II
Algorithm 2 : Method of moments for two-component univariate normal mixtures
 V0 = x¯

  for s = 1 : 2c − 1 do
    Vs =         (x(i) − x)r
                         ¯
          1 i n

  end for
  a0 = −24V36
  a1 = −96V34 k4
                 2
  a2 = −(63V32 k4 + 72V33 k5 )
            4                   3
  a3 = 288V3 − 108V3 k4 k5 + 27k4
            2         2
  a4 = 444V3 k4 − 18k5
  a5 = 90k42 + 72V3 k5
  a6 = 36V3
  a7 = 84k4
  a8 = 0
  a9 = 24
  k4 = V4 − 3V22
  k5 = V5 − 10V2 V3

  U = roots            ai u i
               0 i 9

  for i=1:9 do

    if Ui < 0 then
           −8V3 u3 + 3k5 Ui2 + 6V3 k4 Ui + 2V33
                ˆ
      w=
                   2Ui3 + 3k4 Ui + 4V33
                  2    w
      δ = roots x −       x + Ui
                       Ui
      if δ1 and δ1 real then
              ¯
        µ1 = x + δ(1)
              ¯
        µ2 = x + δ(2)
        σ1 = 1 δ1 (2w/Ui − V3 /Ui ) + V2 − δ1
           2
              3
                                            2

           2  1                             2
        σ2 = 3 δ2 (2w/Ui − V3 /Ui ) + V2 − δ2
                δ2
         ˜
        p=
             δ2 − δ1
             2         2
         if σ1 < 0 or σ2 < 0 then
            No real negative root gives acceptable results

         end if

      else
        No real root for the quadratic equation

      end if

    else
      No negativ root for the nonic equation

    end if

  end for




                                                     III
Algorithm 3 : Maximum Likelihood Estimation for c-components univariate mixture
  p(0) , θ(0) , σ (0)
  ρ

  while (ρ > ǫ) do

     for i = 1 : n do

        for j = 1 : c do
                        (k)
                       fj (xi )
          h(i, j)(k) = (k)
                       g (xi )
        end for

     end for

     for j = 1 : c do
        (k)
       sj =         h(i, j)(k)
                    1 i n
         (k)        1 (k)
        pj      =     s ;
                    n
     end for
     ρp = p(k) − p(k−1)                ;
                                   ∞

     for j = 1 : c do

        if Mixture of binomial then
           (k)    1 (k)
          θj =                h(i, j)(k) xi ;
                m sj
                                  1 i n

        else
             (k)        1 (k)
           θj       =                   h(i, j)(k) xi ;
                        sj
                                1 i n

        end if

     end for
     ρθ = θ(k) − θ(k−1)                ;
                                   ∞

     if Mixture of normal then

        for j = 1 : c do
            (k)       1                                      (k)
          σj =        (k)
                                           h(i, j)(k) (xi − θj )2 ;
                     sj 1          i n

        end for
        ρσ = σ (k) − σ (k−1)
                                           ∞

     end if
     ρ = max(ρp , ρθ , ρσ );

  end while




                                                                      IV
References
 [1] A.C. Cohen (1967). Estimation in mixtures of two normal distributions. Technometrics 9, 15-28.
 [2] N.E. Day (1969). Estimating the components of a mixture of normal distributions. Biometrika 56,
     463-474.
 [3] A.P. Dempster, N.M. Laird and D.B. Rubin (1977). Maximum likelihood estimation from incomplete
     data via the EM algorithm (with discussions). Journal of the Royal Statistical Society B 39, 1-38.
 [4] R.W. Doerge, Z.B. Zeng and B.S. Weir (1997). Statistical issues in the search for gene affecting
     quantitative traits in experimental populations Statistical Science 12, 195-219.
 [5] B.S. Everitt and D.J. Hand (1981). Finite Mixture Distributions.
 [6] C.H. Kao and Z.B Zeng (1997). General formulas for obtaining the MLEs and the asymptotic variance-
     covariance matrix in mapping quantitative trait loci when using the EM algorithm. Biometrics 53,
     653-665.
 [7] G.J. McLachlan and D. Peel (2000). Finite Mixture Models.
 [8] C.E. McLaren, V.R. Gordeuk, A.C. Looker, V. Hasselblad, C.Q. Edwards, L.M. Griffen, J.P.
     Kushner, and G.M. Brittenham. Prevalence of Heterozygotes for Hemochromatosis in the White
     Population of the United States (1995). Blood 86, 2021-2027.
 [9] C.E. McLaren, G.J. McLachlan, J. W. Halliday, S.I. Webbs, B.A. Leggett, E.C. Janzwinska,
     D.H.G. Crawford, V.R. Gordeuk, G.D. McLaren, and L.W. Powell (1998). The distribution of the
     transferrin saturation and hereditary haemochromatosis in Australians. Gastroenterology 114, 543-549.
[10] K. Pearson (1894). Contribution to the mathematical theory of mathematical evolution. Philosophical
     Transactions of the Royal Society of London A 185, 71-110.

[11] N.J. Schork, D.B. Allison and B. Thiel (1996). Mixture distributions in human genetics. Statistical
     Methods in Medical Research 5, 155-178.
[12] W. Seidel, K. Mosler and M. Akel (2000). A cautionary note on likelihood ratio tests in mixture
     models. Annals of the Institute of Statistical Mathematics.
[13] W.Y. Tan and W.C. Chang (1972). Some comparisons of the method of moments and the method of
     maximum likelihood in estimating parameters of a mixture of two normal densities. Journal of American
     Statistical Association 67, 702-708.
[14] D.M. Titterington (1996). Mixture distributions (update) Encyclopedia of Statistical Sciences Vol 5,
     pp 399-407.
[15] D.M. Titterington, A.F.M. Smith and U.E. Makov (1985). Statistical Analysis of Finite Mixture
     Distributions.

[16] W.F.R. Weldon (1892). Certain correlated variations in Crangon vulgaris. Proceedings of the Royal
     Society of London 51, 2-21.
[17] W.F.R. Weldon (1893). On certain correlated variations in Carcinus moenas. Proceedings of the Royal
     Society of London 54, 318-329.
[18] S.J. Yakowitz and J.D. Spragins (1968). On the identifiability of finite mixtures. Annals of Mathe-
     matical Statistics, 39, 209-214.




                                                    V

								
To top