No Slide Title by s60fx45Y

VIEWS: 0 PAGES: 62

									1
   Dirichlet process mixtures are active research areas
   Dirichlet mixtures are it!

   The flexibility of DPM models supported its huge popularity in wide variety
    of areas of application.

   DPM models are general and can be argued to have less structure.

   Double Dirichlet Process Mixtures add a degree of structure, possibly at
    the expense of some degree of flexibility, but possibly with better
    interpretability in some cases

   We discuss applications (and limitations) of these semiparametric double
    mixtures

   We compare fit-prediction duality with competing models



                                                                               2
       Other DP extensions


   Double Dirichlet process mixtures are a subclass of
    dependent Dirichlet Process mixtures (MacEachern 1999,……)

   Double DP mixture are different from Hierarchical Dirichlet
    Processes (The et al. 2006 )

   Double DPM is simply independent DPMS




                                                                  3
       Motivating Example 1
   Luminex measurements on two biomarker proteins from
    n=156 Patients
      IL-1β protein

      C-reactive protein



   The biological effects of these two proteins are thought to
    be not (totally) overlapping.


      y1i      1i  
       ~ N 2   ,  i , i  1,...,n
     y          
      2i       2i  

                                                                  4
          Two Biomarkers (y1 and y2)
         y1i      1i  
          ~ N 2   ,  i , i  1,...,n
        y          
         2i       2i  

         Usual DP mixture of normals (Ferguson 1983,…..)
        ( i , i ) i.i.d. ~ G, i  1,...,n
                     G ~ DP( , G0 )
                    G0   ~ N 2 ( 0 ,  0 )   ( ~ Inv.Wishart)

Questions
 Should we model the two biomarkers jointly?


   Should we cluster the patients based on both biomarkers jointly?

   The biomarkers may operate somewhat independently.
                                                                       5
       Double DP mixtures

        y1i      1i      12i                     1i 2i  
         ~ N 2   ,  i  
       y                                                         , i  1,...,n
         2i 
                    2i                                2i  
                                                             2
                                                                

       ( 1i ,  12i ) i.i.d. ~ G1 , i  1,..., n
                       G1 ~ DP (1 , G01 )
                                                   
                     G01  1 ~ N (  01 ,  01 )  ( 12 ~ Inv. Gamma )
                                             2




       (  2i ,  2i ) i.i.d. ~ G2 , i  1,..., n
                  2


                      G2 ~ DP ( 2 , G0 2 )
                                                   
                     G02   2 ~ N (  02 ,  02 )  ( 2 ~ Inv. Gamma )
                                              2         2




   Equicorrelation – corr(y1i, y2i) are assumed to be the same for all i=1,…,n

   Clustering based on biomarker 1 and based on biomarker 2 can be
    different
                                                                                     6
       Motivating Example 2: Interrater Agreement

   Agreement between 2 Raters (Melia and Diener-West 1994)

   Each rater provides an ordinal rating on a scale of 1-5 (lowest to
    highest invasion)of the extent to which tumor has invaded the
    eye,n=885



                                   Rater 1
                      1      2      3        4    5
         R     1      291    74     1        1    1
         a     2      186    256    7        7    3
         t     3      2      4      0        2    0
         e     4      3      10     1        14   2
         r     5      1      7      1        8    3
         2
                                                                         7
       Interrater agreement
   Kottas, Muller, Quintana (2005) analyzed these data using a
    flexible DP mixture of Bivariate probit ordinal model which
    modeled the unstructured joint probabilities

        prob(Rater 1=i and Rater2 = j), i=1,…,5, j=1,…,5

   One way to quantify interrrater agrrement is to measure
    departure from the structured model of independence

   We consider a (mixture of) Double DP mixtures model here
    which provides separate DP structures for the two raters. We
    then measure ``agreement’’ from this model.


                                                                   8
           Motivating Example 3
       Mixed model for longitudinal data
         yij  X ij   Z ij bi  ij , or
         yi  ( yi1 ,..., yip ) ~ N ( X i   Z i bi , i )

        It is common to assume (Bush and MacEachern 1996)
           bi ~ G, G ~ DP

        Modeling the error covariance i or the error variance
         (if i =diag(2i)) extends the normal distribution
         assumption to normal scale mixtures (t, Logistic,…)

             i2 ~ G , G ~ DP
                                                                  9
       Putting the two together
   One way to combine these two structures is

      (bi ,  i2 ) ~ G , G ~ DP

   Do we expect the random effects bi appearing in the
    modeling the mean and the error variances to cluster
    similarly?
   The error variance model often is used to extend the
    distributional assumption.

    bi ~ G1 , G1 ~ DP(1 , G01 )
     i2 ~ G2 , G2 ~ DP( 2 , G02 )
                                                           10
        Double DPM

   I will discuss
       Fitting
       Applicability
       Flexibility
       Limitations
    of such double semiparametric mixtures

   I will also compare these models with usual DP
    models via predictive model comparison criteria



                                                      11
       Dirichlet process
   Dirichlet Process is a probability measure on the space of
    distributions (probability measures) G.

   G ~ Dirichlet Process (G0), where G0 is a probability

   Dirichlet Process assigns positive mass to every open set
    of probabilities on support(G0)

   Conjugacy: Y1,…., Yn ~ i.i.d. G, (G) = DP( G0)
    Then Posterior (G|Y) ~ DP( G0 + nFn) where Fn is the
    empirical distn.
   Polya Urn Scheme



                                                                 12
         Stick breaking and discreteness
    G~ DP( G0) implies G is almost surely discrete
           
    G   q j{ j } with prob.1 (Tiwariand Sethuraman1982)
           j 1



    1 , 2 ,....~ i..i.d . G0

    Stick - breaking represntation
    q1  w1 ,               w1 ~ Beta(1,  
    q2  w2 (1  w1 ),           w2 ~ Beta(1,  
    q3  w3 (1  w2 )(1  w1 ), w3 ~ Beta(1,  
                                                              13
       Bayes estimate from DP

   The discrete nature of a random G from a DP leads to some
    disturbing features, such as this result from Diaconis and
    Freedman (1986)

   Location model
     yi = + i, i=1,…n
       has prior (), such as a normal prior

      1,…, n ~ i.i.d. G

             G ~ DP(G0) - symmetrized
             G0 = Cauchy or t-distn

   Then the posterior mean is an inconsistent estimate of 

                                                                 14
       Dirichlet process mixtures (DPM)
      Yi ~ f ( yi | i , , xi ), such as N (i , ), i  1,...n
               f is a known parametric distn.
               xi are possible covariates.  other parms


      1 ,...,n ~ i.id . G
              G ~ DP(G0 )



   If we marginalize over i, we obtain a semiparametric mixture

        Yi ~  f ( yi |  , , xi )dG( )
    where the mixing distribution G is random and follows DP(G0)

                                                                    15
        DPM - clusters

     Yi ~ f ( yi | i , , xi )
     1 ,...,n ~ i.id. G, G ~ DP(G0 )

   Since G is almost surely discrete, 1,…,n form clusters

        1= 5 = 8               1unique
        2= 3 = 4= 6= 7       2unique

        etc.

   The number of clusters, and the clusters themselves, are
    random.
                                                               16
       DPM – MCMC
   The Polya urn/marginalized sampler (Escobar 1994, Escobar
    & West 1995) samples i one-at-a-time from

        (i | -i, data)

   Improvements, known as collapsed samplers, are proposed in
    MacEachern (1994, 1998) where, instead of sampling i , only
    the cluster membership of i are sampled.


   For non-conjugate DPM (sampling density f(yi |i ) and base
    measure G0 are not conjugate), various algorithms have been
    proposed.
                                                                17
       Finite truncation and Blocked Gibbs
    Yi ~ f ( yi | i , , xi )
                         M                          
    1 ,...,n ~ G   q j{ } instead of DP - G   q j{
                                 j                             j}
                         j 1                       j 1



   With this finite truncation, it is now a finite
    mixture model with stick-breaking structure on qj

   (1,....,n) and (q1,....,qM) can be updated in
    blocks (instead of one-at-time as in Polya Urn
    sampler) which may provide better mixing


                                                                    18
        Comments
   In each iteration, the Polya urn/marginal sampler cycles thru
    each observation, and for each, assigns its membership
    among a new and existing clusters.


   The Poly urn sampler is also not straightforward to implement
    in non-linear (non-conjugate) problems or when the sample
    size n may not be fixed.


   For the blocked sampler, on the other hand, the choice of the
    truncation M is not well understood.


                                                                    19
      Model comparison in DPM models
   Basu and Chib (2003)
    developed Bayes factor/
    marginal likelihood           Log-Marginal and log-Bayes
                                  factor for Longitudinal Aids trial
    computation method for        (n=467) with random coeffs
    DPM.                          having distn G

                                              DPM     Student-t   Normal
   This provided a framework     DPM         -3477   (76)        (62)
    for quantitative comparison   Student-t
                                  Normal
                                                      -3553       (-14)
                                                                  -3539
    of DPM with competing
    parametric and
    semi/nonparametric models.



                                                                       20
       Marginal likelihood of DPM
   Based on the Basic marginal identity (Chib 1995)
    log-posterior()=log-likelihood() + log-prior() - log-marginal


    log-marginal = log-likelihood(*) + log-prior(*) – log-posterior(*)

   The posterior ordinate of DPM is evaluated via prequential
    conditioning as in Chib (1995)
                    Log-Marginal and log-Bayes factor
   The likelihood ordinate of DPM is evaluated from a
                         DPM Student-t Normal
                                       sampler.
    (collapsed) sequential importance (62)
                  DPM    -3477 (76)
                    Student-t         -3553     (-14)
                    Normal                      -3539


                                                                            21
22
23
24
     Double Dirichlet process mixtures (DDPM)
     Yi ~ f ( yi | i , i , , xi ), such as N (i , i ), i  1,...n
              f is a known parametric distn.
              xi are possible covariates.   other parms


     1 ,...,n ~ i.id . G             1 ,..., n ~ i.id . G
            G ~ DP(  G 0 )                   G ~ DP( G 0 )

   Marginalization obtains a double semiparametric mixture

       Yi ~   f ( yi |  , , , xi )dG ( ) dG ( )

    where the mixing distributions G and G are random
                                                                         25
Two Biomarkers case: y1 and y2

 y1i      1i      12i                           1i 2i  
  ~ N 2   ,  i  
y                                                               , i  1,...,n
 2i        2i                                      2i  
                                                            2
                                                               

 ( 1i ,  12i ) i.i.d. ~ G1 , i  1,..., n
                 G1 ~ DP (1 , G01 )
                                              
               G01  1 ~ N (  01 ,  01 )  ( 12 ~ Inv. Gamma )
                                       2




  (  2i ,  2i ) i.i.d. ~ G2 , i  1,..., n
             2


                  G2 ~ DP ( 2 , G0 2 )
                                                  
                G02   2 ~ N (  02 ,  02 )  ( 2 ~ Inv. Gamma )
                                         2         2




                                                                                    26
        A simpler model: normal means only

     yi1      i  
      ~ N 2   ,  , i  1,..n
    y          
     i2      i  



         
1 ,....., n ~ i.i.d . G ,             
                               1 ,....., n ~ i.i.d . G ,  has a
    G ~ DP( G 0 )         G ~ DP( G 0 )          Wishart prior


    We generate n=50 (i,i) means and then (yi1,yi2)
     observations from this Double-DPM model
                                                                         27
                 Double DPM

                           mu                                   Observations y




                                                     2
      2




                                                     0
      0




                                                     -2
      -2




                                             y[,2]
psi




                                                     -4
      -4




                                                     -6
      -6




                                                     -8
      -8




                                                     -10
      -10




            -4    -2   0         2   4   6                 -5    0               5

                           phi                                       y[,1]




                                                                                     28
           Plot of y and mu: Clusters in different vector
          Single DPM in the bivariate meansymbols                           Double DPM Clusters components
                                                                          Plot of y and mu: in mean in different symbols
        2




                                                       y                                                          y




                                                                    2
                                                    mu                                                           mu
        0




                                                                    0
        -2




                                                                    -2
y[,2]




                                                            y[,2]
        -4




                                                                    -4
        -6




                                                                    -6
        -8




                                                                    -8
                                                                    -10
        -10




               -5          0            5         10                         -5            0             5
                                y[,1]                                                          y[,1]




                                                                                                                           29
       Model fitting
   We fitted the Double DPM and the Bivariate DPM models to
    these data.

   The Double DPM model can be fit by a two-stage Polya urn
    sampler or a two-stage blocked Gibbs sampler.

   “Collapsing” can become more difficult.

                        1 n
         Average MSEd   E post ( id  id ) 2 , d  1,2
                                            true

                        n i 1
                        1 n
         Average MADd   E post | id  id |, d  1,2
                                            true

                        n i 1
                                                               30
                    MSE           MAD
                   y1     y2     y1     y2
                     Data generated from
                         Double DP
Double DP         0.99 1.32 0.72 0.92
Bivariate DP in
                  1.02   4.29   0.83   1.73
(y1,y2)
                     Data generated from
                        Bivariate DP
Double DP         0.98 1.29 0.81 0.94
Bivariate DP in
                  0.98   1.08   0.75   0.81
(y1,y2)



                                              31
      Wallace (asymmetric) criterion for comparing
      two clusters/partitions
    Let S be the number of mean pairs which are in the same cluster
     in a MCMC posterior draw and also in the true clustering.


    Let nk, k=1,..K be the number of means in cluster Ck in the MCMC
     draw.


    Then the Wallace asymmetric criterion for comparing these two
     clusters is

                                                          Average Wallace
           S             Data generated from       Double DP       Bivariate DP
    k nk (nk  1) / 2   Double DP
                                                    y1     y2      y1         y2
                                                   0.89   0.42    0.66       0.48
                         Bivariate DP in (y1,y2)   0.72   0.24    0.62      0.62




                                                                                    32
 Measurements on two biomarker proteins by Luminex
 panels

• Frozen parafin embedded tissues, pre and post surgery
         22000

• Luminex panel
         20000




• Nodal involvement
         18000
   CRP

         16000
         14000




                 10   20   30   40   50   60   70   80
                                                          33
           Two biomarker proteins

     The bivariate DPM
                                                 ( i , i ) i.i.d. ~ G, i  1,...,n
       y1i 
        ~ N 2   1i ,  i , i  1,...,n
      y                                                     G ~ DP( , G0 )
       2i       2i                                         G0   ~ N 2 ( 0 ,  0 )   ( ~ Inv.Wishart)


     vs the Double DPM
      y1i      1i      12i              1i 2i  
       ~ N 2   ,  i  
     y                                                  , i  1,...,n
      2i        2i                         2i  
                                                    2
                                                       

( 1i ,  12i ) i.i.d. ~ G1 , i  1,..., n                   (  2i ,  2i ) i.i.d. ~ G2 , i  1,..., n
                                                                        2


    G1 ~ DP (1 , G01 )                                          G2 ~ DP ( 2 , G0 2 )
           
    G01  1 ~ N (  01 ,  01 )  ( 12 ~ Inv.Gamma )
                            2
                                                                                        2
                                                                                                 
                                                                 G02   2 ~ N (  02 ,  02 )  ( 2 ~ Inv. Gamm
                                                                                                    2




                                                                                                                    34
                                                µpred
                          Double DP                                          Double DP




                                                              22000
             100
mu.pred[1]




                                                 mu.pred[2]
             50




                                                              18000
             0




                                                              14000
             -50




                   0   2000    6000     10000                         0   2000    6000     10000




                         Bivariate DP                                       Bivariate DP




                                                              22000
             100
mu.pred[1]




                                                 mu.pred[2]
             50




                                                              18000
             0




                                                              14000
             -50




                   0   2000    6000     10000                         0   2000    6000     10000

                                                                                                   35
                       y.pred[1]                                              y.pred[1]

              0.00      0.04        0.08                             0.00 0.04 0.08       0.12




     0
                                                            0




     10
                                                            10




     20
                                                            20
                                                                                                    Double DP




                                             Bivariate DP




     30
                                                            30




     40
                                                            40




                       y.pred[1]                                              y.pred[1]
                                                                                                                ypred




             0.00000   0.00015     0.00030                          0.00000   0.00015     0.00030




     14000
                                                            14000




     18000
                                                            18000
                                                                                                    Double DP




                                             Bivariate DP




     22000
                                                            22000




36
                                                                      ypred

                                          Double DP                                                                Bivariate DP


0.030



 0.025
                                                                              0.010

    0.020
densit y




       0.015
                           25000                                                                        25000




                                                                              densit y
                                                                                     0.005
           0.010       20000                                                                        20000
                         [ 2]




                                                                                                      [ 2]
                     r ed




                                                                                                  r ed
                   15000                                                                     15000
                   y.p




                                                                                              y.p
           0.005


            0.000
             10000                                                                       0.000
                                                                                          10000
                 0              10   20       30       40   50   60                           0              10   20      30       40   50   60
                                           y.pred[1]                                                                   y.pred[1]




                                                                                                                                                  37
                    log CPO = log f(yi| y-i)
          -8
          -10




                         LPML = log f(yi| y-i)
Log CPO

          -12




                          Double DP = -1498.67
                         Bivariate DP= -1533.01
          -14




                                                    Double DP
                                                    Bivariate DP


                0          50                 100                  150

                                Observation                              38
       Model comparison

   I prefer to use marginal likelihood/ Bayes factor for model
    comparison.

   The DIC (Deviance Information Criterion) , as proposed in
    Spiegelhalter et al. (2002) can be problematic for missing
    data/random-effects/mixture models.


   Celeux et al. (2006) proposed many different DICs for missing
    data models




                                                                  39
          DIC3
        I have earlier considered DIC3 (Celeux et al. 2006,
         Richardson 2002) in missing data and random effects
         models which is based on the observed likelihood

            DIC3  4 E log p( yobs |  ) | yobs   2 log E  p( yobs |  ) | yobs ,


            where p( yobs |  )   p( yobs ,  ,  |  ) d (   )


       The integration over the latent parameters often has to be
        obtained numerically.

       This is difficult in the present problem

                                                                                            40
        DIC9

   I am proposing to use DIC9 which is similar to DIC3
    but is based on the conditional likelihood

          p( yobs |  ,  , )

DIC 9  4 E ,  , log p ( yobs |  ,  , ) | yobs   2 log E ,  ,  p( yobs |  ,  , ) | yobs ,




                           DIC9
              Double DP    3018.3
              Bivariate DP 3050.3

                                                                                                              41
      Convergence rate results: Ghosal and Van Der Vaart (2001)

   Normal location mixtures

    Model: Yi ~ i.i.d. p(y) = (y-)dG(), i=1,…,n
                         G ~ DP(G0), G0 is Normal

    Truth:         p0(y) = (y-)dF()

   Ghosal and Van Der Vaart (2001): Under some regularity
    conditions,

        Hellinger distance (p, p0)  0 “almost surely”

                     at the rate of (log n)3/2/n

                                                             42
         Ghosal and Van Der Vaart (2001): results contd.

   Bivariate DP location-scale mixture of normals
    Yi ~ i.i.d. p(y) = (y-)dH(,),   i=1,…,n              H ~DP(H0)

   Ghosal and Van Der Vaart (2001): If H0 is Normal {a compactly supported
    distn}, then the convergence rate is
                                    (log n)7/2/n


   Double DP location-scale mixture of normals
    Yi ~ i.i.d. p(y) = (y-)dG() dG(),   i=1,…,n
    G ~DP(G0),            G ~DP(G0)


   Ghosal and Van Der Vaart (2001): If G0 is Normal, G0 is compactly supported
    and the true density
                  p0(y) = (y-)dF1() dF2() is also a double mixture, then
         Hellinger distance (p, p0)  0 at the rate of (log n)3/2/n


                                                                                    43
       Interrater data
   Agreement between 2 Raters (Melia and Diener-West 1994)

   Each rater provides an ordinal rating on a scale of 1-5 (lowest to
    highest invasion)of the extent to which tumor has invaded the
    eye,n=885



                                   Rater 1
                      1      2      3        4    5
         R     1      291    74     1        1    1
         a     2      186    256    7        7    3
         t     3      2      4      0        2    0
         e     4      3      10     1        14   2
         r     5      1      7      1        8    3
         2
                                                                         44
       DPM multivariate ordinal model
   Kottas, Muller and Quintana (2005)

     P (Rater1  j and Rater2  k for subject i)
      P( 1 j-1  Z i1   1 j ,  2 k -1  Z i 2   2 k )


            Z i1 
     Z i    i.i.d. ~  N 2 ( z |  , ) dG(  , )
           Z 
            i2 

     G ~ DP( G0 )

                                                               45
       Interrater agreement
   The objective is to measure agreement                       Rater 1

    between raters beyond what is                   1     2      3        4    5
    possible by chance.                     R   1   291   74     1        1    1
                                            a
                                            t   2   186   256    7        7    3
   This is often measured by departure
                                            e
    from independence, often specifically   r   3   2     4      0        2    0
    in the diagonals                        2   4   3     10     1        14   2

                                                5   1     7      1        8    3
   Polychoric correlation of the latent
    bivariate normal Z has been used as a
    measure of association.

   ………………… of the latent bivariate
    normal mixtures???


                                                                                   46
       Latent class model (Agresti & Lang 1993)

     C latent classes
      pijk  P(Rater1  j and Rater2  k for subject i)
           pcjk if subject i belongs to class c, c  1,...,C

    Ratings of the two raters within a class are independent
       pcjk  pc1 j pc 2 k                                             Rater 1

                                                           1     2      3        4    5

                                                  R   1    291   74     1        1    1
                                                  a
             C
    pijk   I (  i  c) pc1 j pc 2 k
                                                  t   2    186   256    7        7    3
                                                  e
                                                      3    2     4      0        2    0
            c 1                                  r
                                                  2   4    3     10     1        14   2

                                                      5    1     7      1        8    3

                                                                                          47
     Mixtures of Double DPMs
               C
      pijk   I (  i  c) pc1 j pc 2 k
              c 1


   For each latent class, we model pc1j and pc2k
    by two separate univariate ordinal probit
    DPM models

    p clj  P ( cl, j 1  Z cl   cl, j ), j  1,..5, l  1,2, c  1,..,C


    Z cli i.i.d. ~  N1 ( z |  ,  2 ) dGcl (  ,  2 )


    Gcl ~ DP( cl G0 cl )
                                                                               48
       Computational issue
               C
       pijk   I (  i  c) pc1 j pc 2 k
              c 1




   The ``sample size’’ nc in latent group c is not fixed. This
    causes problem for the polya-urn/marginal sampler which
    works with fixed sample size

   Do, Muller, Tang (2005) suggested a solution to this
    problem by jointly sampling the latent il =(il,il2) and the
    latent rating class membership i.



                                                                     49
    1             2              3             4              5
Estimated cell probabilities
 1
     .3288
     .3261
           .0836
           .0869
                   .0011
                   .0013
                           .0011
                           .0020
                                                                  .0011
                                                                  .0007
   (.2945,.3590) (.0696,.1078) (.0001,.0044) (.0003,.0056)      (0,.0027)
       .3286           .0821           .003         .0022         .0009
   (.3060,,.3524) (.0701,.09510) (.0008,.007) (.0007,.0047) (.0001,.0023)
       .2102           .2893          .0079         .0079         .0034
 2     .2135           .2826          .0082         .0069         .0031
   (.1858,.2428) (.2521,.3141) (.0031,.0152) (.0022,.0143) (.0007,.0075)
       .2098           .2853          .0076         0103          .0033
   (.1856,.2334) (.2700,.2997) (.0036,.0129) (..0074,.0138) (.0017,.0053)
       .0023           .0045             0          .0023           0
 3     .0023           .0055          .0016         .0022         .0008
    (.004,.0065)   (.0017,.0107) (.0003,.0038) (.004,.0062)    (.0,.0032)
       .0029            .006          .0003         .0015         .0004
   (.0009,.0068) (.0025,.0114)      (0,.0009)  (.0002,.0039)    (0,.0011)
       .0034           .0113          .0011         .0158         .0023
 4     .0042           .0102          .0023        ..0143         .0028
   (.0012,.0094) (.0042,.0187) (.0004,.006)     (.0066,.024) (.0006,.0069)
       .0031           .0124          .0016         .0127         .0033
    (.001,.0059)   (.0092,.0157) (.0002,.0044) (.0089,.0168) (.0014,.0057)
       .0011           .0079          .0011          .009         .0034
 5     .0012           .0071          .0019         .0083         .0039
   (.0001,.0041) (.0026,.0140) (.0003,.0054) (.0034,.0153) (.0009,.009)
       .0019           .0086          .0011         .0086         .0026
    (.0006,.004)   (.0055,.0129) (.0002,.0032) (.0053,.0118) (.0011,.0045)
                                                                             50
      Marginal probability estimates
                                                Rater A
                                       1     2     3     4     5
   Latent Group 1                  0.6691 .3246 .0036 .0014 .0012


                                                Rater B
                                       1     2     3     4     5
                                    0.9374 .0559 .0049 .0011 .0008




                         1        2        3        4             5
              1     0.6261 0.03859 0.003273 0.00068        0.000473
              2     0.3058 0.01675 0.001511 0.000326       0.000225
              3   0.003405 0.000194 2.38E-05 6.81E-06      5.06E-06
              4   0.001237 0.000121 4.37E-05 1.86E-05      1.79E-05
              5   0.000882 0.000212 7.31E-05 3.12E-05      3.10E-05

                                                                      51
     Marginal probability estimates

                                                 Rater A
   Latent Group 2                      1     2     3     4     5
                                     0.1562 .7139 .0188 .0661 .0451


                                                 Rater B
                                        1     2     3     4     5
                                     0.1432 .7432 .0226 .0706 .0205



                    1         2          3        4        5
         1    0.02196    0.1264   0.002716 0.003709 0.001382
         2     0.1107    0.5625     0.0138 0.02051 0.006414
         3   0.002445   0.01194   0.000555 0.003084 0.000764
         4   0.005113   0.02517   0.003294 0.02575 0.006747
         5   0.002968   0.01718   0.002219 0.01753 0.005212


                                                                      52
       Joint mean covariance modeling
   Trial with n=200 patients who had acute MI within 28 days of
    baseline and are depressed/low social support

   Underwent 6 months of usual care (control) or individual
    and/or group-based cognitive behavioral counseling
    (treatment).

   Response y = depression (Beck Depression Inventory)
    measured at 0,182,365,548, 913, 1278 days (but actually at
    irregular intervals)

   Covariate: Treatment, Family history, Age, Sex, BMI,……

   Intermittent missing response, missing covariate….


                                                                 53
                              Beck depression Inventory

                     0   10         20        30          40   50




              0
              200
              400
              600

 Visit Days
              800
              1000
              1200

54
         Model
       Model for the mean
                                         
          E ( yij )  ij  X ij    Z ij bi

                      
           Z ij bi  b01  b1 tij  b2i (tij  ti 2 ) 
                              i


        Model for the covariance

            yi  ( yi1 ,.., yi 6 ) ~ N 6 ( i , i )
       Pourahmadi (1999), Pourahmadi and Daniels (2002) use
        a Cholesky decomposition of the covariance which allows
        one to use log-linear model for the variances and “linear
        regression” for the off-diagonal terms
                                                                    55
        Modeling the covariance

   We assume
     i  diag ( i2 ,..,  i26 )




                                                 80
                   1




                                                 75
                                                 70
                       
log  ij  X ij    Z ij bi
       2




                                      Variance

                                                 65
                            2
Z ij bi  boi  b1i tij  b2i tij
                                                 60
                                                 55


                                                      1   2   3            4   5   6

                                                                  visits




                                                                                       56
         Mean and variance level random effects
                                                        
         bi  (boi , b1 , b2i ) and bi  (boi , b1i , b2i ), i  1,..n
                        i


       ? A joint DPM for the two random effects together which allows
        clustering at the patient level

           (bi , bi ) i.i.d ~ G  , , G  , ~ DP

       ? Or Double DPM, that is, independent DPM separately for the each
        of the two random effects which allows separate clustering at the
        mean and variance level

         bi i.i.d ~ G  , G  ~ DP 
         bi i.i.d ~ G , G ~ DP
   Most frequentist and parametric Bayesian analyses use the latter
    independence among the mean and variance level random effects.
                                                                             57
                                        Double DP                                                Bivariate DP


                -2200




                                                                                 -2200
log-likeliood




                                                                 log-likeliood
                -2300




                                                                                 -2300
                -2400




                                                                                 -2400
                              -7000          -5000    -3000                              -7000      -5000             -3000

                                         iterations                                                iterations
                                                              Log CPO
                                  -5
                                  -10
                        Log-CPO

                                  -15
                                  -20




                                                                                                   Double DP
                                  -25




                                                                                                   Bivariate DP

                                         0            50            100                      150                200

                                                              Observations
                                                                                                                              58
    Fixed effects estimates


                 Double DP           Bivariate DP          Normal
.slope    -1.12 (-2.39,-0.16)   -0.89 (-1.97,0.24) -0.26    (-.65,.28)
.change   -3.29 (-4.36,-1.77)   -2.98 (-4.4,-1.11) -4.057 (-4.72,-3.43)

.slope     0.18    (-.006,.38)  0.19 (.014,.382)     -0.16    (-0.32,0.05)
.quad     -0.005 (-0.027,0.021) -.006 (-.027,.013)   0.033   (0.009,0.051)




                                                                               59
Pseudo marginal likelihood


               log  f(yi| y-i)
  Double DP    -2495.290
  Bivariate DP -2503.916
  Normal       -2530.92




                                  60
       Summary

   Double DP mixtures may add a level of structure to mixture
    modeling with DP.

   They produce interesting “product-clustering”


   They are applicable to specific problems that may benefit
    from this structure




                                                                 61
basu@niu.edu


               62

								
To top