Next generation sequencing reading club
Document Sample


Modeling tag abundance with Negative
Binomial models
Reader: Sean Wang
May 24, 2010
Outline
"Small-sample estimation of negative binomial dispersion,
with applications to SAGE data", Robinson and Smyth,
2008
"Moderated statistical tests for assessing differences in tag
abundance", Robinson and Smyth, 2007
Zero-inflated negative binomial(NB) models
First 2 papers: NB models for small sample,n
First common overdispersion, 2nd tag-wise
Zero-inflated NB models different problems
Outline
"Small-sample estimation of negative binomial dispersion,
with applications to SAGE data", Robinson and Smyth,
2008
"Moderated statistical tests for assessing differences in tag
abundance", Robinson and Smyth, 2007
Zero-inflated negative binomial(NB) models
First 2 papers: NB models for small sample,n
First common overdispersion, 2nd tag-wise
Zero-inflated NB models different problems
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion
Small-sample estimation of negative binomial
dispersion, with applications to SAGE data
MARK D. ROBINSON
GORDON K. SMYTH
May 24, 2010
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion
Outline
1 Introduction
2 NB models
3 Estimation
4 Hypothesis Testing
5 Simulation studies
Estimation
Testing
6 Summary and discussion
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion
Problem
Let Y1 , . . . , Yn1 be independent tag counts from treatment 1,
and follow NB(µi = mi λ1 , φ).
Let X1 , . . . , Xn2 be independent tag counts from treatment 2,
and follow NB(µj = mj λ2 , φ).
n1 , n2 small, mi , mj the library size, λ1 , λ2 the proportion of the
library that is a particular tag.
Goal
Test λ1 = λ2
Need to Estimateλ s and φ
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion
Why NB
No replicate 2 × 2 table & χ2
Pool small libraries: overestimate significance w.o.
interlibrary variation
Poisson: limited modeling power
GLM, but replication rare due to expense
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion
Outline
1 Introduction
2 NB models
3 Estimation
4 Hypothesis Testing
5 Simulation studies
Estimation
Testing
6 Summary and discussion
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion
NB models
Let Y be an NB random variable with mean µ and dispersion φ,
denoted Y ∼ NB(µ, φ). S.t.,
φ−1 y
Γ(y + φ−1 ) 1 µ
f (y ; µ, φ) = P(Y = y ) =
Γ(φ−1 )Γ(y + 1) 1 + µφ φ−1 + µ
(1)
E(Y ) = µ, Var (Y ) = µ + φµ2
NB⇒ Poisson as φ → 0
Robust alternative to overdispersed logistic
regression(beta-binomial)
φ accounts for the interlibrary variability
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion
Outline
1 Introduction
2 NB models
3 Estimation
4 Hypothesis Testing
5 Simulation studies
Estimation
Testing
6 Summary and discussion
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion
Estimation approaches
MLE
Pseudo-likelihood(PL, Smyth, 2003)
Quasi-likelihood(QL,Nelder, 2000)
CR approximate conditional inference(Cox and Reid, 1987)
Conditional maximum likelihood (CML)
Quantile-adjusted CML
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion
MLE
Equal library sizes: Estimate λ and φ seperately
Unequal library sizes: Estimate jointly
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion
PL
n
(yi − µi )2
ˆ
=n−1 (2)
ˆ ˆPL µi )
µi (1 + φ ˆ
i=1
QL
n
yi yi + φ−1
2 yi log − (yi + φ−1 )log
QL
QL
=n−1 (3)
i=1
ˆ
µi µi + φ−1
ˆ QL
CR
ˆ 1 ˆ
lCR (φ) = l(λ, φ) − log|jλλ (φ, λ)| (4)
2
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion
CML
When mi = m, Z = Y1 + · · · + Yn ∼ NB(nmλ, φn−1 ), sufficient
for λ.
n
lY |Z =z (φ) = logΓ(yi + φ−1 ) + logΓ(nφ−1 )
i=1 (5)
−1 −1
− logΓ(z + nφ ) − nlogΓ(φ )
When mi s not equal, no close form. The paper’s idea.
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion
Quantile-Adjusted CML
1
Let m∗ = ( n mi ) n . Adjust the observed data as if
i=1
yi ∼ NB(m ∗ λ, φ) as follows:
1 Initialize φ
2 Given φ, estimate λ
3 Assuming yi ∼ NB(mi λ, φ), calculate
1
pi = P(Y < yi ; mi λ, φ) + P(Y = yi ; mi λ, φ), i = 1, . . . , n.
2
(6)
4 Using a linear interpolation of the quantile function,
calculate pseudodata from NB(m∗ λ, φ), having probability
pi
5 Calculate φ using CML on the pseudodata
6 Repeat step 2-5 until φ converges
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion
Outline
1 Introduction
2 NB models
3 Estimation
4 Hypothesis Testing
5 Simulation studies
Estimation
Testing
6 Summary and discussion
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion
Hypothesis Testing
let Zt1 and Zt2 be the sum of pseudocounts for treatment 1 and
treatment 2, respectively, over the number of libraries, n1 and
n2 . Under the null hypothesis,
−1
Ztk ∼ NB(nk m∗ λt , φnk ), k = 1, 2. Construct an exact test
similar to the Fisher’s exact test.
Introduction
NB models
Estimation Estimation
Hypothesis Testing Testing
Simulation studies
Summary and discussion
Outline
1 Introduction
2 NB models
3 Estimation
4 Hypothesis Testing
5 Simulation studies
Estimation
Testing
6 Summary and discussion
Introduction
NB models
Estimation Estimation
Hypothesis Testing Testing
Simulation studies
Summary and discussion
Outline
1 Introduction
2 NB models
3 Estimation
4 Hypothesis Testing
5 Simulation studies
Estimation
Testing
6 Summary and discussion
Introduction
NB models
Estimation Estimation
Hypothesis Testing Testing
Simulation studies
Summary and discussion
Single tag with unequal library sizes
Introduction
NB models
Estimation Estimation
Hypothesis Testing Testing
Simulation studies
Summary and discussion
Multiple tags with unequal library sizes
Introduction
NB models
Estimation Estimation
Hypothesis Testing Testing
Simulation studies
Summary and discussion
Outline
1 Introduction
2 NB models
3 Estimation
4 Hypothesis Testing
5 Simulation studies
Estimation
Testing
6 Summary and discussion
Introduction
NB models
Estimation Estimation
Hypothesis Testing Testing
Simulation studies
Summary and discussion
Size of test: φ known
Introduction
NB models
Estimation Estimation
Hypothesis Testing Testing
Simulation studies
Summary and discussion
Size of test: φ estimated
Introduction
NB models
Estimation Estimation
Hypothesis Testing Testing
Simulation studies
Summary and discussion
Power consideration
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion
Outline
1 Introduction
2 NB models
3 Estimation
4 Hypothesis Testing
5 Simulation studies
Estimation
Testing
6 Summary and discussion
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion
Most reliable in terms of bias
Single tag with a small n, all mediocre
# tags increases, holding n small qCML is the best
Big n, CR about as well as qCML
Test performance
Only qCML achieves test size, cmp. Wald, LR, and score
test
Bias affects FDR more than power
qCML generalizable to GLMs, more research
Background
Moderated test
Simulation studies
Summary and discussion
Moderated statistical tests for assessing
differences in tag abundance
MARK D. ROBINSON
GORDON K. SMYTH
May 24, 2010
Background
Moderated test
Simulation studies
Summary and discussion
Outline
1 Background
2 Moderated test
3 Simulation studies
4 Summary and discussion
Background
Moderated test
Simulation studies
Summary and discussion
Notation
Yij : count for class i and library j,j = 1, . . . , ni ,i = 1, 2
Assume Yij ∼ NB(µij , φ)
Let µij = mij λi , H0 : λ1 = λ2
Background
Moderated test
Simulation studies
Summary and discussion
Method review-Tag wise
t-test (Ryu et al, 2002), not appropriate for small n
non-normal data
beta-binomial or overdispersed logistic(Baggerly et al.,
2003)
gamma-Poisson or overdispersed log-linear(Lu et al.,
2005)
Tag-wise
Small n, difficult
Inefficient with common φ
Background
Moderated test
Simulation studies
Summary and discussion
Method review-Tag wise
t-test (Ryu et al, 2002), not appropriate for small n
non-normal data
beta-binomial or overdispersed logistic(Baggerly et al.,
2003)
gamma-Poisson or overdispersed log-linear(Lu et al.,
2005)
Tag-wise
Small n, difficult
Inefficient with common φ
Background
Moderated test
Simulation studies
Summary and discussion
Outline
1 Background
2 Moderated test
3 Simulation studies
4 Summary and discussion
Background
Moderated test
Simulation studies
Summary and discussion
Common dispersion
mij equal for each i
2 ni
lg (φ) = logΓ(yij + φ−1 ) + log2Γ(ni φ−1 )
j=1 j=1
(1)
2
−logΓ(z + ni φ−1 ) − ni logΓ(φ−1 )
j=1
G
Common dispersion estimator max lC (φ) = g=1 lg (φ)
Unequal mij : qCML
Background
Moderated test
Simulation studies
Summary and discussion
Weighted conditional log-likelihood (WL)
WL(φg ) = lg (φg ) + αlC (φg ) (2)
WL mimics a Bayesian hierarchical model, lC prior, α precision
Background
Moderated test
Simulation studies
Summary and discussion
Example
ˆ 2 2 2
If φg |φg ∼ N(φg , τg ), φg ∼ N(φ0 , τ0 ), g = 1, . . . , G., where τg
known. Then,
ˆ 2 2
φg /τg + φ0 /τ0
ˆg ˆ
φB = E(φg |φg ) = .
2 2
1/τg + 1/τ0
WL estimator:
ˆ 2 G ˆ 2
φg /τg + α i=1 φi /τi
ˆg
φWL =
2 G 2
1/τg + α i=1 1/τi
Background
Moderated test
Simulation studies
Summary and discussion
ˆg ˆg
φB = φWL if
G ˆ 2
ˆ g=1 φg /τg
φ0 = φ0 = G 2
g=1 1/τg
and
G
2 2
1/α = τ0 /τg
g=1
Background
Moderated test
Simulation studies
Summary and discussion
Estimating procedure
1 ˆ
Find the common dispersion estimator φ0 which maximizes
lC .
2 ˆ ˆ
Evaluate Sg (φ0 ) and Ig (φ0 ) for each tag
3 Estimate τ0 by solving
G 2
Sg
2
−1 =0
g=1
Ig (1 + Ig τ0 )
. If 2
Sg /Ig < G then τ0 = 0
2 G
4 Set 1/α = τ0 g=1 Ig
5 ˜
Obtain weighted likelihood estimators φg by maximizing
WL(φg )
Background
Moderated test
Simulation studies
Summary and discussion
Interpretation
2
If the dispersions equal (all φg = φ0 ), then (Sg ) = Ig , s.t.
estimate of τ0 0 and α large.
2
If different, then E(Sg ) = 0, and Sg > Ig on average,
2
τ0 > 0, and α small.
Background
Moderated test
Simulation studies
Summary and discussion
Outline
1 Background
2 Moderated test
3 Simulation studies
4 Summary and discussion
Background
Moderated test
Simulation studies
Summary and discussion
Model fitting: MSE
Background
Moderated test
Simulation studies
Summary and discussion
Hypothesis testing
Background
Moderated test
Simulation studies
Summary and discussion
Outline
1 Background
2 Moderated test
3 Simulation studies
4 Summary and discussion
Background
Moderated test
Simulation studies
Summary and discussion
Weighted conditional likelihood estimator
Squeezes individual tag-wise towards the common
dispersion
Data-dependent prior and find MAP, approximate EB rule
Shrinkage algorithm of general application: S & I at
common dispersion.
Zero-inflated models
May 24, 2010
0-inflated models
0 with prob. ϕi
yi ∼ (1)
g(yi |xi ) with prob. 1 − ϕi .
The prob. of {Yi = yi |xi },
ϕ(γ zi ) + {1 − ϕ(γ zi )}g(0|xi ) if yi = 0
P(Yi = yi |xi , zi ) =
{1 − ϕ(γ zi )}g(yi |xi ) if yi > 0.
(2)
Poisson:
E(yi |xi , zi ) = µi (1 − ϕi ) (3)
V (yi |xi , zi ) = µi (1 − ϕi )(1 + µi ϕi ) (4)
NB:
E(yi |xi , zi ) = µi (1 − ϕi ) (5)
V (yi |xi , zi ) = µi (1 − ϕi )(1 + µi (ϕi + φ)) (6)
V (yi |xi , zi ) > E(yi |xi , zi ), allowing dispersion estimation
R: package(pscl)
Related docs
Get documents about "