On the equivalence of prospective and retrospective likelihood
Document Sample


On the equivalence of prospective and retrospective
likelihood methods in case-control studies
By Ana-Maria Staicu
Department of Mathematics, University of Bristol, Bristol, BS8 1TW,
UK
A.Staicu@bristol.ac.uk
SUMMARY
Case-control studies are particulary relevant in the study of the odds ratio of disease.
Their analysis is based on the retrospective likelihood which involves many nuisance
parameters. In this work we address the problem of analyzing these studies by a
prospective model, which assumes one nuisance parameter in an unstratified design,
in both frequentist and Bayesian framework. We present new results on the priors
that can be assumed for the retrospective model parameters in order to give equivalent
Bayesian analysis to the one obtained from the prospective model with appropriate
prior. From a frequentist perspective we show the approximate equivalence of the Cox-
Reid modified profile likelihood for the log odds ratio parameter in the two models.
Some key words: Equivalence Bayesian analysis, Case-control studies; Retrospective
likelihood, Prospective likelihood, Cox-Reid modified profile likelihood.
1
1 Introduction
Case-control studies are a primary tool for the study of factors related to disease inci-
dence. They have the advantage that are cost effective and bring significant economies
in study duration, especially with rare disease settings. The most frequently utilized
design described in Miettinen (1976) uses a “retrospective” likelihood, based on the
probability of exposure given the disease status. If a logistic model is assumed for the
log odds ratio of disease, as is often the case, the likelihood involves a large number
of nuisance parameters, typically one for each exposure level. Analyzing the data by a
“prospective” likelihood based on the probability of disease given the exposure, which
involves only one nuisance parameter in an unstratified design, has been considered by
many other researchers. From a classical point of view, the prospective and retrospec-
tive likelihoods yield identical maximum likelihood estimate (Prentice & Pike, 1979)
and identical profile likelihood (Roeder et al., 1996; Murphy & van der Vaart, 2000;
Seaman & Richardson, 2004) for the log odds ratio parameter; thus same first order
inference for the parameter of interest. Nevertheless this inference may be misleading
if the sample size is not large. From a Bayesian point of view, the analysis equivalence
for the log odds ratio parameter has been shown for the first time in a recent work of
Seaman & Richardson (2004) by assuming specific priors for the nuisance parameters
involved.
In this work we extend the result on the Bayesian analysis equivalence of the
prospective and retrospective models to more general priors. We show that the pri-
ors proposed by Seaman & Richardson represent the unique choice of priors for the
nuisance parameters of the prospective and retrospective models correspondingly, only
when the model parameters are required to be independent. If parameter independence
is not requisite, we determine the conditions that a prior assumed for the retrospective
model parameters need to satisfy in order to produce same marginal posterior density
2
for the log odds ratio parameter as the prospective model. In general, priors meeting
these conditions might depend on the covariates levels; in fact for the examples given,
the depend on these covariates. Therefore these findings cannot be used when the
exposure distribution needs to be modeled.
From frequentist perspective we extend the equivalence of the profile likelihood to
that of the Cox-Reid modified profile likelihood to O(n−1 ), for the prospective and
retrospective models. The Cox-Reid likelihood was developed by Cox & Reid (1987)
as a standard method for correcting the profile likelihood in complex models; it has
been noticed to provide more reliable inference in settings both of small sample sizes
and of large number of nuisance parameters. Davison (1988) shows that for generalized
linear models, like case-control and cohort studies, accurate approximate conditional
inference for the parameter of interest uses the Cox-Reid modified profile likelihood
to second order. The present work establishes the equivalence of the approximate
conditional inference for the parameter of interest in the prospective and retrospective
models. This result agrees with the findings by Wang & Carroll (1999) concerning the
equivalence to O(n−1 ) of the saddlepoint approximations to the distributions of the
maximum likelihood estimator of the parameter of interest, in the two models.
The paper is organized as follows. In section 2 we introduce the case-control and co-
hort studies. Section 3 presents the results on the Bayesian equivalence of the prospec-
tive and retrospective analysis. In section 4 we outline the properties of a prior assumed
for the retrospective model parameters to produce equivalent Bayesian analysis of the
two models. As well we discuss the relation with Seaman & Richardson’s work. Section
5 gives the second order equivalence of the Cox-Reid modified profile likelihood needed
for approximate conditional inference. Finally, in section 6 we discuss the limitations
of the new approach and address problems for future research directions.
3
2 Framework
We consider a non-experimental study involving diseased and diseased-free individuals.
Denote by D the disease status of an individual with D = 0 indicating a disease-free
while D = 1 indicating a diseased individual. Let X be a vector that gives the values
of the exposure covariates for each subject in the study. We limit this work to discrete
exposure variables; otherwise as it is common in the literature (see Gustafson et al.,
2002) we treat each continuous exposure variable by assuming it arises from a discrete
one having the support equal to the unique observed values. Consider the support of X
be supp(X) = {z1 , . . . , zJ }, and let Y0j and Y1j designate the number of diseased-free
and diseased subjects respectively, having the exposure X = zj for all j = 1, . . . , J.
Case-control studies presume the diseased (cases) and non-diseased (controls) sub-
jects are identified first and then ”followed back” to ascertain their exposure levels. As
a result the model uses a retrospective likelihood which can be written as the product
of two independent multinomial distributions (Yd1 , . . . , YdJ ) ∼ Mu(Yd+ ; pd1 , . . . , pdJ ),
J
where Yd+ = j=1 Ydj , is the total number of diseased subjects, and the multinomial
probabilities are pdj = Pr(X = zj |D = d) for d = 0, 1. A common assumption for
both case-control and cohort studies is to take the log odds ratio of disease associated
with exposure X = x of form δ T (x − z1 ), where z1 is the baseline exposure, for sim-
plicity considered equal to 0. This assumption gives the following expression for the
probabilities pdj :
βj exp(dδ T zj )
pdj = J
,
l=1 βl exp(dδ T zl )
where βj is a nuisance parameter associated with exposure value zj , d = 0, 1 and j =
1, . . . , J which further results in the retrospective likelihood denoted by LMR (β, δ; y) in
Seaman & Richardson (2004), β = (β1 , . . . , βJ ). Note that in parameterization β the
4
model is nonidentifiable; that is, the same model can be obtained by different parame-
ter values (rβ, δ) for r > 0. For identifiability one may consider the reparameterization
J J
(µ, θ) instead, where µ = j=1 βj and θ = (θ1 . . . , θJ ) with θj = βj / j=1 βj ; alterna-
tively may assume βJ = 1. One main goal in analyzing case-control studies is to draw
inference on the log odds ratio parameter δ. The drawback with using the retrospective
likelihood is that it involves in addition to the parameter of interest, a J-dimensional
nuisance parameter, which becomes troubling with larger J. For this reason researchers
as well as practitioners are interested in alternative ways to deal with these studies.
Cohort studies presume the exposed subjects are identified first and then followed
for disease development. The natural likelihood is a prospective likelihood correspond-
ing to a product of J independent binomial distributions Y1j ∼ Bi(Y+j , pj ), where
Y+j = Y0j + Y1j is the total number of subjects with exposure level X = zj and the
binomial probability is pj = P r(D = 1|X = zj ) for j = 1, . . . , J. The same assumption
for the log odds ratio of disease associated with different exposure levels results in
binomial probabilities of the form
α exp(δ T zj )
pj = ,
1 + α exp(δ T zj )
where j = 1, . . . , J and α is a stratum parameter that may vary from stratum to
stratum in a stratified design. This gives further the prospective likelihood denoted by
LMP (α, δ; y) in the referenced paper.
For an untratified design, the prospective likelihood depends on the parameter of
interest δ and a scalar nuisance parameter α. Is is transparent to see why using a
prospective model to analyze case-control data is a very attractive alternative from the
inferential perspective for the log odds ratio parameter. In what follows we discuss the
consistency of such approach in both Bayesian and frequentist analysis.
5
3 Equivalence for the Bayesian analysis
We examine first the equivalence of the marginal posterior density for the log odds ratio
parameter, under the prospective and retrospective models. Seaman & Richardson
(2004) showed that the Bayesian analysis of the retrospective model which assumes
a Dirichlet prior distribution for the exposure probabilities in the control group is
equivalent to the analysis of the prospective model which assumes a uniform prior
distribution for the log odds that a subject with baseline exposure is diseased. Here
we extend these findings on the equivalence of the Bayesian analysis to a larger class
of prior distributions, that naturally include the uniform and Dirichlet distributions.
Like Seaman and Richardson, our approach uses the so-called “multinomial -Poisson
transformation”, discussed in Baker (1994).
Lemma 1 Suppose that random variables Ydj are independently distributed as Ydj ∼
Po(γdj ) for d = 0, 1; j = 1, . . . J, where:
γdj = d log α + log βj + dδ T zj .
Write λj = βj {1+α exp(δ T zj )} and assume independent, possibly improper priors π λ (λ)
J y
for λ = (λ1 , . . . , λJ ) and π α,δ (α, δ) which satisfy the conditions π λ (λ) j=1 λj +j exp(−λj ) dλ <
∞ and the marginal posterior π(α, δ|y) is proper. Then this marginal posterior density
is
J
α,δ {α exp(δ T zj )}y1j
π(α, δ|y) ∝ π (α, δ) , (1)
j=1
{1 + α exp(δ T zj )}y+j
equal to the marginal posterior of a prospective model which assumes the prior π α,δ (α, δ)
for the model parameters.
6
Proof. Since Y0j and Y1j are independent Poisson variables we have
α exp(δ T zj )
Y1j Y+j = y+j ∼ Bi y+j ;
1 + α exp(δ T zj )
Y+j ∼ Po βj + αβj exp(δ T zj ) ,
where j = 1, . . . , J. As a result, we can write the full likelihood as
J
{α exp(δ T zj )}y1j y+j
L(α, β, δ; y) ∝ βj {1 + α exp(δ T zj )} exp[−βj {1 + α exp(δ T zj )}]
j=1
{1 + α exp(δ T zj )}y+j
J J
{α exp(δ T zj )}y1j y
= λj +j exp(−λj ), (2)
j=1
{1 + α exp(δ T zj )}y+j j=1
Tz
where λj = βj (1 + αeδ j
), for j = 1 . . . J. Let λ = (λ1 . . . , λJ ), then (α, λ, δ) is a one-
to-one transformation of (α, β, δ), since |∂λ/∂β|+ = 0. In the new reparameterization
the model has the likelihood orthogonality property
L(α, λ, δ; y) ∝ LMP (α, δ; y) L1 (λ; y), (3)
where LMP (α, δ; y) is the first factor in (2), the prospective likelihood and L1 (λ; y) ∝
J y
j=1 λj +j exp(−λj ). Because of the likelihood orthogonality (3) it makes sense to
assume independent priors for (α, δ) and λ; so the joint prior density π(α, λ, δ) has the
form:
π(α, λ, δ) ∝ π α,δ (α, δ) π λ (λ).
Then the marginal posterior density π(α, δ|y) is obtained by integrating the joint dis-
tribution with respect to λ and is proportional to:
π(α, δ|y) ∝ π α,δ (α, δ)LMP (α, δ; y),
7
by assuming π λ (λ) is such that
π λ (λ) L1 (λ; y) dλ < ∞. (4)
Lemma 2 Consider random variables Ydj , independently distributed as Ydj ∼ Po(γdj )
for d = 0, 1; j = 1, . . . J, where:
γdj = d log α + log µ + log θj + dδ T zj ,
J
with θJ = 1. Write η = αµ j=1 θj exp(δ T zj ) and assume independent, possibly im-
proper priors π η,µ (η, µ), and π θ,δ (θ, δ) which satisfy the conditions π η,µ (η, µ) η y1+ µy0+ exp(−η−
µ) dη dµ < ∞ and the marginal posterior π(θ, δ|y) is proper. Then the marginal poste-
rior density of (θ, δ) is:
J y0j y1j
θj θj exp(δ T zj )
π(θ, δ|y) ∝ π θ,δ (θ, δ) J J
, (5)
j=1 k=1 θk k=1 θk exp(δ T zj )
equal to the marginal posterior of a retrospective model which assumes the prior π θ,δ (θ, δ)
for the model parameters.
Proof. For this part we take the same approach, except we begin slightly differently:
θ1 exp(dδ T z1 ) θJ exp(dδ T zJ )
(Yd1 , . . . , YdJ ) Yd+ = yd+ ∼ Mu yd+ ; J
,..., J
k=1 θk exp(dδ T zk ) k=1 θk exp(dδ T zk )
J
d
Yd+ ∼ Po α µ θj exp(dδ T zj ) ,
j=1
J
where d = 0, 1. Taking the reparameterization (η, µ, θ, δ), where η = αµ j=1 θj exp(δ T zj )
we find that the model admits likelihood orthogonality with respect to components
(θ, δ) and (η, µ) since L(η, µ, θ, δ; y) ∝ LMR (θ, δ; y) L2 (η, µ; y) for L2 (η, µ; y) ∝ η y1+ µy0+ exp(−η−
8
µ). Similar reasoning can then be used to argue in favor of a joint prior density of the
form
π(η, µ, θ, δ) ∝ π θ,δ (θ, δ)π η,µ (η, µ).
Note that the expression of L2 (η, µ; y) suggests further that to η and µ should be as-
signed also independent prior distributions. For simplicity of notation we use π η,µ (η, µ)
for their joint prior density. We find the marginal posterior density π(θ, δ|y) equals
π(θ, δ|y) ∝ π θ,δ (θ, δ)LMR (θ, δ; y),
provided π η,µ (η, µ) satisfies
π η,µ (η, µ) L2 (η, µ; y) dη dµ < ∞. (6)
Theorem 1 Assume the notation above, and consider the prior π(α, β, δ), possibly
improper for which the joint posterior is a proper distribution. Assume the following
conditions are satisfied:
(i) The prior density of (α, λ, δ), with λ defined earlier, can be factored into a
product of a density for (α, δ) and a density for λ, which satisfies (4). In addition the
joint density of (α, δ) must coincide with the joint marginal density of (α, δ) from the
prior π(α, β, δ);
J
(ii) The prior density for (η, µ, θ, δ), with η defined earlier, µ = j=1 βj , θ =
J
(θ1 , . . . , θJ ) and θj = βj / k=1 βk can be factored into a product of a density for (θ, δ)
and a density for (η, µ), which satisfies (6). Similarly, the joint density of (θ, δ) must
coincide with the corresponding marginal density from π(α, µ, θ, δ).
Then the marginal posterior densities of δ obtained from joint densities (1) and (5)
9
respectively, are the same.
Proof. The marginal posterior density of δ, is obtained by integrating out the joint
posterior of the Poisson model π(α, β, δ|y) with respect to (α, β). Thus one may find
posterior (1) first and then integrate with respect to α, or may calculate the posterior
(5) first and then integrate with respect to θ. The marginal posterior of δ is the same.
One main implication of the theorem is that using a prospective analysis and assum-
ing an arbitrary prior for the parameters does not necessarily yield a correct analysis of
case-control studies. The prior assumed for the prospective model parameters needs to
meet certain conditions to give equivalent Bayesian analysis for prospective and retro-
spective models. Otherwise said, Bayesian inference for the log odds ratio parameter in
case-control settings, can be carried out by using a prospective analysis and assuming
a suitable prior density. Next section discusses these issues to more detail.
4 On the priors for equivalent Bayesian analysis
4.1 Relation to Seaman & Richardson (2004)
The prior assumed by Seaman & Richardson (2004) is:
J
−1 a −1
π(α, β, δ) ∝ p(δ)α βj j , (7)
j=1
where aj ≥ 0, p(δ) is a density with the property that δ T zq p(δ) dδ and δ T zr p(δ) dδ
exist and are finite for some q and r such that y0q ≥ 1 and y1r ≥ 1.
Using the reparameterization λ = (λ1 , . . . , λJ ), λj = βj {1 + α exp(δ T zj )} we obtain
J a −1
a joint density of (α, λ, δ) of form π(α, λ, δ) ∝ p(δ)α−1 j=1 λj j {1 + α exp(δ T zj )}−aj .
J a −1
This density can be written as the product of the density π λ (λ) ∝ j=1 λj j , which
J
meets constraint (4) and π(α, δ) ∝ p(δ)α−1 j=1 {1 + α exp(δ T zj )}−aj . Note π(α, δ) is
10
exactly the factor multiplied by LMP (α, δ; y) in expression (1) of Seaman & Richardson.
However, the marginal density of (α, δ) from the joint prior (7) is π α,δ (α, δ) ∝ p(δ)α−1
and not π(α, δ) when aj = 0 for j = 1, . . . J. And taking aj = 0 for all j creates
problems regarding the posterior density being proper, whenever y+j = 0 for some j.
For this reason the authors examine the limiting posterior of the prospective model
π(α, δ|y) when aj → 0.
J
Now, using the initial prior density (7), and the reparameterization η = α j=1 βj exp(δ T zj ),
J J
µ = j=1 βj , θj = βj / k=1 βk , for j = 1, . . . , J gives a prior density for (η, µ, θ, δ)
J aj −1 J
of the form π(η, µ, θ, δ) ∝ p(δ)η −1 µ−1+a+ j=1 θj , where a+ = j=1 aj . Similarly,
this prior can be factored into a product of two independent densities: π η,µ (η, µ) ∝
J aj −1
η −1 µ−1+a+ which meets condition (6) and π θ,δ (θ, δ) ∝ p(δ) j=1 θj ; this last density
being exactly the other factor multiplied by LMR (θ, δ; y) in expression (2) of Seaman
& Richardson.
4.2 Prior properties for equivalent Bayesian analysis
Next we examine the properties that a prior density for the retrospective model pa-
rameters needs to satisfy in order to produce same Bayesian analysis for the parameter
of interest to that from a prospective model with a corresponding prior for the model
parameters. Evidently, one can derive similar conditions for a prior density of the
prospective model parameters to give Bayesian analysis equivalence.
Lemma 3 Consider a retrospective model and assume a joint prior density π β,δ (β, δ)
for the model parameters which yields a proper posterior distribution. Assume further
the prior density satisfies the following conditions:
11
(i) There is a positive function π η (η), where η > 0 for which the product
λ1 λJ
π β,δ { Tz )
,..., , δ}
1 + α exp(δ 1 1 + α exp(δ T zJ )
J J
η λj exp(δ T zj ) λj exp(δ T zj )
π {α }
j=1
1 + α exp(δ T zj ) j=1
1 + α exp(δ T zj )
factorizes into a factor depending on (α, δ), say p1 (α, δ) and another factor depending
on λ = (λ1 , . . . , λJ ), say π λ (λ) ;
J y
(ii) For π λ (λ) it holds true π λ (λ) j=1 λj +j exp(−λj ) dλ < ∞;
J
(iii) For any µ > 0 and θ = (θ1 , . . . , θJ ) such that j=1 θj = 1 the function
π β,δ (µθ, δ) can be factored into two terms, one of which depends only on µ, say p2 (µ),
for which π η (η)p2 (µ) η y1+ µy0+ +J−1 exp(−η − µ) dη dµ < ∞.
Then there is a prior π α,δ (α, δ) such that using it in a Bayesian analysis of the
prospective model gives a marginal posterior for δ identical to the one obtained from
the retrospective model by assuming the prior π β,δ (β, δ) for the model parameters.
Proof. Let π η,µ (η, µ) = µJ−1 p2 (µ)π η (η), with p2 and π η having the properties required
J
above. If α = η/{ j=1 βj exp(δ T zJ )} then the joint density for (α, β, δ) is
J J
η T β,δ
π(α, β, δ) ∝ π {α βj exp(δ zj )} π (β, δ) βj exp(δ T zj ) .
j=1 j=1
Consider next the transformations λj = βj {1 + α exp(δ T zj )} for j = 1, . . . J. Using the
assumption (i), the joint density for (α, λ, δ) can be factored into
J
λ
π(α, λ, δ) ∝ π (λ) p (α, δ) 1
{1 + α exp(δ T zj )}−1 ,
j=1
with π λ (λ) meeting condition (ii). According to Theorem 1, this joint prior density
gives identical marginal posterior density for δ for retrospective and prospective models.
12
The joint prior density for the prospective model parameters is simply π α,δ (α, δ) ∝
J
p1 (α, δ) j=1 {1 + α exp(δ T zj )}−1 .
At first glance, the three conditions of Lemma 3 regarding the prior π β,δ (β, δ) seem
quite restrictive. Naturally, the joint prior involving the Dir(0, . . . , 0) density for β,
proposed by Seaman & Richardson (2004) meets these requirements. So does the class
of priors that we present next. In general, however, it is not trivial to give further
insights on other possible solutions.
Consider the following joint prior density for (β, δ):
J −a J J
β,δ T a −1
π (β, δ) ∝ p(δ) βj exp(δ zj ) exp(− βj ) βj j , (8)
j=1 j=1 j=1
where p(δ) is a density satisfying the conditions that the integrals exp(−aδ T zs )δ T zq p(δ) dδ,
exp(−aδ T zs )δ T zr p(δ) dδ and exp(−aδ T zs ) p(δ) dδ exist and are finite for some q, r
and s such that as ≥ a, y0q ≥ 1 and y1r ≥ 1. By letting π η (η) ∝ η a−1 exp(−η), routine
algebra verifies all the conditions needed by Lemma 3. It follows a joint density prior
for the parameters (α, β, δ) of the form:
J J
T a−1 a −1
π(α, β, δ) ∝ p(δ) exp{−α βj exp(δ zj )} α βj j exp(−βj ). (9)
j=1 j=1
The conditions required for the density p(δ) are such that the posterior is proper. For
a = 0 they simplify to those stated in Seaman & Richardson (2004). This is confirmed
by the fact that this particular value results in both posteriors from the prospective
and retrospective models equal to the corresponding ones from the cited paper.
The results appear to indicate that it is not necessary to have independence between
parameters β and δ of a retrospective model, or parameters α and δ of a prospective
model. If independence is requisite, the class of priors which give Bayesian analysis
13
equivalence reduces significantly. Here we investigate the limitations of a joint prior
density π(α, β, δ) that depends independently upon the parameters α, β and δ.
Corollary 1 Suppose that all the model parameters α, β, δ are assumed independent
and assume further that the components β1 , . . . , βJ are also independent. Then the only
joint prior density for (α, β, δ) which gives equivalence between the prospective and the
retrospective models is
J
−1
π(α, β, δ) ∝ p(δ)α−1 βj . (10)
j=1
It is not obvious that under independence, the conditions required by the above Lemma
give the prior (10), but this is proved in Appendix. However, as Seaman & Richardson
(2004) remark, this joint density prior yields improper posterior distribution if for any
of the support points zj of exposure, no subject is observed.
Note that if independence is needed only between β and δ as well as between α and
δ, then condition (i) of Lemma 3 changes to:
(i∗ ) There is a positive function π η (η), where η > 0 for which the product
J
β λ1 λJ λj exp(δ T zj )
π ( ,..., , δ) π η (α )
1 + α exp(δ T z1 ) 1 + α exp(δ T zJ ) j=1
1 + α exp(δ T zj )
J J
λj exp(δ T zj )
{1 + α exp(δ T zj )}−1
j=1
1 + α exp(δ T zj ) j=1
factorizes into a factor depending on α and another factor depending on λ = (λ1 , . . . , λJ ).
14
5 Classical equivalence of the approximate condi-
tional inference
From a classical point of view, current literature provides theoretical justification to
carry out inference on the log odds ratio parameter of the retrospective model by using
large sample theory with a prospective analysis. Nevertheless the resulted inference
may be misleading for studies of smaller sample size. In this section we analyze the
agreement of the approximate conditional inference for the log odds ratio parameter
in prospective and retrospective models, addressing the question raised by Wang &
Carroll (1999) with respect to the conditional approach (see Davison 1988). The con-
ditional inference for the parameter of interest was introduced by Bartlett (1937), as
a device that considerably reduces the inferential procedure in settings with nuisance
parameters. In exponential family models, the conditional inference for a canonical
component of interest, uses the conditional likelihood of the sufficient statistic corre-
sponding to that parameter of interest given the sufficient statistic corresponding to
the nuisance parameter. For generalized linear models, Davison (1988) proves that
accurate approximation to the exact conditional likelihood is based on the Cox-Reid
modified profile likelihood (Cox & Reid, 1987). This modified profile likelihood has
been noticed to provide more reliable inference than the profile likelihood, especially
in settings of smaller sample size or involving a larger number of nuisance parameters.
In this section we obtain that the prospective and retrospective models yield the
“same” approximate conditional inference for the log odds ratio parameter to O(n−1 ),
by showing that both models admit the same Cox-Reid modified profile likelihood to
this order. Since both prospective and retrospective models have same profile likelihood
for the odds ratio parameter, this reduces to proving that the Cox-Reid adjustments
to the profile likelihood are the same to second order. A direct approach involves
15
calculation of the determinant of the block sub-matrix corresponding to the nuisance
parameter, of the observed Fisher information and evaluation at the constrained max-
imum likelihood estimate for each of the two models; for the prospective model the
ˆ ˆ
“sub-matrix” is jλλ (λδ , δ) = −∂ 2 log LMP (λδ , δ; y)/∂λ2 . To avoid this messy algebra we
use a simple approach based on Laplace approximation to the marginal posterior for
the odds ratio parameter, along with a suitable class of priors for equivalent Bayesian
analysis of the prospective and retrospective models.
Let π(α, β, δ) be a joint prior density for the parameters (α, β, δ)
J J
−1 T a −1
π(α, β, δ) ∝ p(δ)α exp{−α βj exp(δ zj )} βj j exp(−βj ), (11)
j=1 j=1
where aj are some arbitrary positive constants, j = 1, . . . , J and p(δ) is the marginal
density for δ with the constraints that δ T zq p(δ) dδ and δ T zr p(δ) dδ exist and are
finite for some q and r such that y0q ≥ 1 and y1r ≥ 1. Denote by ω = log α, ϑj = log θj
for j = 1, . . . J and let ϑ = (ϑ1 , . . . , ϑJ ). Corresponding to the joint prior density (11)
we have the following marginal prior densities for (ω, δ) and (ϑ, δ):
J
π ω,δ
(ω, δ) ∝ p(δ) {1 + exp(ω + δ T zj )}−aj (12)
j=1
J J
ϑ,δ θ,δ a −1
π (ϑ, δ) ∝ p(δ) exp(aj ϑj ), since π (θ, δ) ∝ p(δ) θj j .
j=1 j=1
Simple calculations for obtaining the posterior density point out that prior (11) gives
identical posterior densities of the prospective and retrospective model respectively
to the ones obtained by the Seaman & Richardson’s prior (7). In particular, for the
prospective model, the posterior density for (ω, δ) is
π(ω, δ|y) ∝ π(ω, δ) LMP (ω, δ; y).
16
Using Laplace approximation to integrals Sweeting (1987) points out that log of the
marginal posterior pMP (δ|y), when finite, is accurately approximated by an adjusted
profile log-likelihood plus the log of the joint prior calculated at the constrained max-
ˆ
imum likelihood estimate for the nuisance parameter, ωδ for fixed δ. More precisely,
Tierney et al. (1989) confirm that the approximation given by Tierney & Kadane
(1986):
MP 1
pMP (δ|y) ∝ exp
˙ p (δ) − MP
log |jωω (ˆ δ , δ)| + log π ω,δ (ˆ δ , δ)
ω ω
2
J
MP
∝ exp
˙ a (δ) + log p(δ) − aj log{1 + exp(ˆ δ + δ T zj )}
ω
j=1
has relative error O(n−1 ) in a fixed neighborhood of the posterior mode, where MP
p (δ) =
log LMP (ˆ δ , δ; y) is the profile log-likelihood and jωω (ω, δ) = −∂ 2 log LMP (ω, δ; y)/∂ω 2
ω MP
the (ω, ω)-component of the observed information matrix in the prospective setting.
The second line of the above expression uses the adjusted profile log-likelihood for
MP MP MP
the prospective setting a (δ) = p (δ) − 0.5 log |jωω (ˆ δ , δ)| plus the substitution of
ω
π ω,δ (ˆ δ , δ) by its corresponding expression (12).
ω
Under the retrospective model we obtain a similar second order approximation to
the marginal posterior density for δ, namely:
J
pMR (δ|y) ∝ exp
˙ MR ˆ
a (δ) + log p(δ) + aj ϑj,δ ,
j=1
MR MR MR ˆ
where a (δ) = p (δ) − 0.5 log |jϑϑ (ϑδ , δ)| is the adjusted profile log-likelihood in
MR MR ˆ
the retrospective setting and p (δ), jϑϑ (ϑδ , δ) are the profile log-likelihood and the
(ϑ, ϑ)-component of the information matrix defined accordingly.
Note that prior (11) is of from (9) with a = 0, thus for all aj > 0 it guarantees
the same posterior density of δ under the prospective and retrospective models. As a
17
consequence, the exponents from the corresponding approximations of this marginal
posterior density must agree to O(n−1 ), i.e.
J J
MP
− aj log{1 + exp(ˆ δ + δ T zj )} = MR ˆ
aj ϑj,δ + O(n−1 ),
a (δ) ω a (δ) +
j=1 j=1
for all aj > 0 and j = 1, . . . , J. It follows, as aj → 0, that
MP
a (δ) = MR
a (δ) + O(n−1 ),
MR ˆ
or equivalently that log |jωω (ˆ δ , δ)| = log |jϑϑ (ϑδ , δ)|+O(n−1 ), since
MP
ω MP
p (δ) = MR
p (δ).
That is the Cox-Reid adjustments to the profile likelihood are the same to second order.
As a result we obtain
MP
CR (δ) = MR
CR (δ) + O(n−1 ), (13)
MP MP MP
where CR (δ) = p (δ) + 0.5 log |jωω (ˆ δ , δ)| is the Cox-Reid modified profile log-
ω
MR
likelihood in the prospective setting , while CR (δ) , defined accordingly, is the Cox-Reid
modified profile log-likelihood in the retrospective setting.
To prove the asymptotic agreement of the Cox-Reid modified profile likelihood we
used the existence of prior the π(α, β, δ) meeting the aforementioned constraints. How-
ever equality (13) holds independently of the conditions assumed. The approximate
equivalence of the Cox-Reid modified profile likelihood to O(n−1 ) implies further that
the prospective and retrospective models have the same approximate conditional infer-
ence for the log odds ratio parameter.
18
6 Discussion and further work
In this paper we discussed the equivalence of the prospective and retrospective like-
lihood methods for the analysis of case control studies, from both Bayesian and fre-
quentist perspective. The Bayesian analysis equivalence between the prospective and
retrospective models has been hitherto established only for the Dir(0, . . . , 0) and uni-
form priors for the nuisance parameters (Seaman & Richardson, 2004). Present work
shows this prior is unique when parameters independence is required. In addition, it
reveals a general class of priors which yield equivalent Bayesian analysis from the two
models. However because we are using reparameterization to obtain orthogonal likeli-
hood, our joint priors may depend on the support {z1 , . . . zJ } of the exposure vector.
One limitation of this approach is that it cannot be used when one wishes to model
the exposure distribution as well. Also the marginal posterior for the log odds ratio
parameter may depend on this support; take for example the prior (9), for a = 0.
However when a = 0 the prior yields same posterior of (ω, δ) as Seaman & Richardson
(2004), hence naturally gives a marginal posterior for δ, free of the support of exposure
vector, which is particularly desirable for continuous exposure variables.
From a frequentist point of view, we showed that the prospective and retrospective
models generate equivalent Cox-Reid modified profile likelihoods with relative error
O(n−1 ). This result is consistent with the findings of Wang & Carroll (1999) regarding
ˆ
the equivalence to O(n−1 ) of the saddlepoint distribution of δ for the prospective and
retrospective models. The Cox-Reid modified profile log-likelihood CR (δ), is a version
of adjusted profile log-likelihood which reduces the profile score bias. A benefit from
using CR (δ) is that the accuracy of the chi-square approximations to wCR = 2{ ˆ
CR (δ)−
CR (δ)}
ˆ
is higher than those for w = 2{ p (δ) − p (δ)}; see DiCiccio & Stern (1994).
This work involved unstratified case-control studies. Similar conditions to (i), (ii)
and (iii) of Lemma 3 are needed in order to show the equivalence of prospective and
19
retrospective models for stratified studies; we leave this as well as accommodating
missingness and measurement errors in the data for future research. Our future work
also includes studying the approximate equivalence of the modified Cox-Reid profile
likelihood ratio for stratified designs. Showing such an equivalence would provide
more reliable inference for small stratum sizes or large number of nuisance stratum
parameters (see Sartori, 2003).
7 Acknowledgement
The author wishes to thank Malay Ghosh, Peter Green, Vanessa Didelez for useful
discussions, and especially Nancy Reid for constructive comments and for reading pre-
vious drafts which have led to an improved paper. This work was supported by Natural
Sciences and Engineering Research Council of Canada.
A Appendix
A.1 Posterior distribution of the prior (8)
Here we show the posterior density of the retrospective model and prior (8) is proper.
This is equivalent to prove that the posterior for the prospective model with the
J
corresponding prior π α,δ from Lemma 3, namely π α,δ (α, δ) ∝ p(δ) αa−1 j=1 {1 +
α exp(δ T zj )}−aj is proper. And for this we proceed in a similar way to Seaman &
Richardson’s (2004). Let ω = log α and assume s such that as > a. Let further
I(δ) = I1 (δ) exp(aδ T zs ), wehere
∞ J y0j +aj y1j
1 exp(ω + δ T zj )
I1 (δ) = exp(aω) dω.
−∞ j=1
1 + exp(ω + δ T zj ) 1 + exp(ω + δ T zj )
20
It follows that I(δ) can be written as
∞ a J y0j +aj y1j
exp(ω + δ T zs ) 1 exp(ω + δ T zj )
I(δ) = dω,
−∞ 1 + exp(ω + δ T zs ) j=1
1 + exp(ω + δ T zj ) 1 + exp(ω + δ T zj )
where as stands for as − a > 0. We have
∞ ∞
I1 (δ)p(δ) dδ = p
I(δ)˜(δ) dδ,
−∞ −∞
where p(δ) = p(δ) exp(−aδ T zs ). Now we use the results from Seaman & Richardson
˜
∞ ∞
and conclude that if p(δ) is a density,
˜ −∞
δ T zq p(δ) dδ < ∞ and
˜ −∞
δ T zr p(δ) dδ < ∞
˜
∞
then I (δ)p(δ) dδ
−∞ 1
< ∞ and therefore π(ω, δ|y) is a proper posterior density.
A.2 Proof of corollary 1
Next we prove that if we assume the retrospective model parameters, βj for j = 1, . . . , J
and δ are independent, and furthermore independent of the stratum parameter α of
a prospective model, then the only priors for equivalent Bayesian analysis of the two
models are the Dir(0, . . . , 0) and uniform proposed by Seaman & Richardson (2004).
Denote by π(α, β, δ) the joint density of all these parameters, and assume next that
this is the prior density used in the Poisson model of Lemma 1. In this precise Lemma,
we argued for λ and (α, δ) to be assigned independent priors, where λ = (λ1 , . . . , λJ ),
λj = βj {1 + α exp(δ T zj )}. Thus if π λ (λ) designated the prior density assumed for λ, it
follows that
J
β λ T T
π (β) ∝ π [β1 {1 + α exp(δ z1 )}, . . . , βJ {1 + α exp(δ zJ )}] 1 + α exp(δ T zj ) ,
j=1
21
for all α > 0, βj > 0 and δ. Taking α−1 for α, −δ for δ and rewriting 1+α−1 exp(−δ T zj ) =
{α exp(δ T zj )}−1 {1 + α exp(δ T zj )} we obtain:
J
−1
π β (β) ∝ π β [β1 {α exp(δ T z1 )}−1 , . . . , βJ {α exp(δ T zJ )}−1 ] α exp(δ T zj ) , (A-1)
j=1
for all α > 0, βj > 0, and δ. In particular for βj = 1 for j = 1, . . . , J, the left hand
side equals a constant for all α > 0, and δ, and thus the right hand side term must be
proportional to 1. This gives:
J
β T T −1
π {α exp(δ z1 ), . . . , α exp(δ zJ )} ∝ α exp(δ T zj ) ,
j=1
J
for all α > 0, δ. Without loss of generality (take π β (β) to stand for π β (β) j=1 βj )
equivalence (A-1) can be also written as
π β (β1 , . . . , βJ ) ∝ π β {β1 {α exp(δ T z1 )}−1 , . . . , βJ {α exp(δ T zJ )}−1 } (A-2)
for α > 0, βj > 0 and δ. Since we assumed independent components for β, we can
J β
write π β (β) = j=1 πj (βj ). Now for some arbitrary index j, let βi = α exp(δ T zi ) for
J β
all i = j in (A-2) and using j=1 πj {α exp(δ T zj )} ∝ 1 it implies further, take δ = 0
β β β
πj (βj ) ∝ πj (βj /α) πj (α) α, βj > 0. (A-3)
The only multiplicative functions over R, proof omitted, are power functions; and thus
the solution of (A-3) has the form
β l
πj (βj ) ∝ βjj βj > 0 (A-4)
22
β
for some lj . Substituting this expression for πj for each j = 1, . . . J into (A-2) and
J J
forcing the proportionality to hold for all α > 0, δ, gives j=1 lj = 0 and j=1 lj zj =
β l −1
(0, . . . , 0)T . For such lj , the solution of (A-1) is πj (βj ) ∝ βjj , which yields π λ (λ) of
the form
J
l −1
λ
π (λ1 , . . . , λJ ) = λjj {1 + α exp(δ T zj )}−lj . (A-5)
j=1
J T −lj
Due to independence between the variables λ and (α, δ), the product j=1 {1+α exp(δ zj )}
has to be a constant for all α > 0 and δ. This additional constraint implies fur-
ther, proof omitted, that lj = 0, 1 ≤ j ≤ J. As a result we find the density prior
J −1
π β (β) ∝ j=1 βj to be the unique solution in this case. The prior proposed by
J a −1
Seaman & Richardson (2004) is π β (β) ∝ j=1 βj j with aj → 0 for all j.
To determine the density for α we use a similar approach with the difference that
we work with the transformed variables η, µ, θ, δ instead, with η, µ, θ as previously
defined. The independence constraints assumed for α, β, δ imply further independent
densities π η (η), π µ (µ), π θ (θ) and π δ (δ) for these new variables. We establish first the
relationship between the densities for α and η, namely:
J J
α η T
π (α) ∝ π {α βj exp(δ zJ )} βj exp(δ T zJ )
j=1 j=1
for all α, βj > 0 and δ, where j = 1, . . . , J. In particular, setting α = 1 gives
J J −1
η T T
π { βj exp(δ zJ )} ∝ βj exp(δ zJ )
j=1 j=1
for all βj > 0 and δ. It follows π α (α) ∝ α−1 , concluding our proof.
23
References
[1] Baker, S. G. (1994). The multinomial-Poisson transformation. Statistician 43, 495-
504.
[2] Bartlett, M. S. (1937) Properties of sufficiency and statistical tests. Proc. R. Soc.
A, 168-282.
[3] Cox, D.R. & N. Reid (1987). Parameter orthogonality and approximate conditional
inference (with Discussion). J. R. Statist. Soc. B 49, 1-39.
[4] Davison, A.C. (1988). Approximate conditional inference in generalized linear-
models. J. R. Statist. Soc. B 50. 445-461.
[5] DiCiccio, M.J. & Stern, S.E. (1994). Frequentist and Bayesian Bartlett correction of
test statistics based on adjusted profile likelihoods. J. R. Statist. Soc. B 56, 397-408.
[6] Gustafson, P., Le, N. D. & Vallee, M. (2002). A Bayesian approach to case-control
studies with errors in covariables. Biostatistics 3, 229-243.
[7] Miettinen, O. S. (1976). Estimability and estimation in case-referent studies. Am.
J. Epidem. 103, 226-235.
[8] Murphy, S.A. & van der Vaart, A.W. (2000). On profile likelihood. J. Am. Statist.
Assoc. 95, 449-65.
[9] Prentice, R. and Pyke, R. (1979). Logistic disease incidence models and case control
studies, Biometrika, 66, 403-411.
[10] Roeder, K., Carroll, R.J. & Lindsay, B.G. (1996). A semiparametric mixture ap-
proach to case-control studies with errors in covariables. J. Am. Statist. Assoc. 91,
722-732.
24
[11] Sartori (2003). Modified profile likelihoods in models with stratum nuisance pa-
rameters. Biometrika, 90, 533549.
[12] Sweeting, T.J. (1987). Discussion of paper by D. R. Cox and N. Reid. J. R. Statist.
Soc. B 49, 20-1.
[13] Seaman, S.R. & Richardson, S. (2004). Equivalence of prospective and retrospec-
tive models in the Bayesian analysis of case-control studies. Biometrika, 91, 15-25.
[14] Tierney, L & Kadane J.B. (1986). Accurate approximations for posterior moments
and marginal densities. J. Am. Statist. Assoc. 81, 82-6.
[15] Tierney, L., Kass, R.E. & Kadane, J.B. (1989). Approximate marginal densities
of nonlinear functions. Biometrika, 76, 425-433.
[16] Wang, S. & Carroll, R.J. (1999). Higher-order accurate methods for retrospective
sampling problems. Biometrika, 86, 881-97.
25
Related docs
Get documents about "