# Next generation sequencing reading club

Shared by:
Categories
-
Stats
views:
11
posted:
8/17/2010
language:
English
pages:
49
Document Sample

Modeling tag abundance with Negative
Binomial models

May 24, 2010
Outline

"Small-sample estimation of negative binomial dispersion,
with applications to SAGE data", Robinson and Smyth,
2008
"Moderated statistical tests for assessing differences in tag
abundance", Robinson and Smyth, 2007
Zero-inﬂated negative binomial(NB) models

First 2 papers: NB models for small sample,n
First common overdispersion, 2nd tag-wise
Zero-inﬂated NB models different problems
Outline

"Small-sample estimation of negative binomial dispersion,
with applications to SAGE data", Robinson and Smyth,
2008
"Moderated statistical tests for assessing differences in tag
abundance", Robinson and Smyth, 2007
Zero-inﬂated negative binomial(NB) models

First 2 papers: NB models for small sample,n
First common overdispersion, 2nd tag-wise
Zero-inﬂated NB models different problems
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion

Small-sample estimation of negative binomial
dispersion, with applications to SAGE data

MARK D. ROBINSON
GORDON K. SMYTH

May 24, 2010
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion

Outline

1   Introduction

2   NB models
3   Estimation

4   Hypothesis Testing
5   Simulation studies
Estimation
Testing
6   Summary and discussion
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion

Problem

Let Y1 , . . . , Yn1 be independent tag counts from treatment 1,
and follow NB(µi = mi λ1 , φ).
Let X1 , . . . , Xn2 be independent tag counts from treatment 2,
and follow NB(µj = mj λ2 , φ).
n1 , n2 small, mi , mj the library size, λ1 , λ2 the proportion of the
library that is a particular tag.

Goal
Test λ1 = λ2
Need to Estimateλ s and φ
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion

Why NB

No replicate 2 × 2 table & χ2
Pool small libraries: overestimate signiﬁcance w.o.
interlibrary variation
Poisson: limited modeling power
GLM, but replication rare due to expense
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion

Outline

1   Introduction

2   NB models
3   Estimation

4   Hypothesis Testing
5   Simulation studies
Estimation
Testing
6   Summary and discussion
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion

NB models

Let Y be an NB random variable with mean µ and dispersion φ,
denoted Y ∼ NB(µ, φ). S.t.,
φ−1             y
Γ(y + φ−1 )        1              µ
f (y ; µ, φ) = P(Y = y ) =
Γ(φ−1 )Γ(y + 1)   1 + µφ         φ−1 + µ
(1)

E(Y ) = µ, Var (Y ) = µ + φµ2
NB⇒ Poisson as φ → 0
Robust alternative to overdispersed logistic
regression(beta-binomial)
φ accounts for the interlibrary variability
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion

Outline

1   Introduction

2   NB models
3   Estimation

4   Hypothesis Testing
5   Simulation studies
Estimation
Testing
6   Summary and discussion
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion

Estimation approaches

MLE
Pseudo-likelihood(PL, Smyth, 2003)
Quasi-likelihood(QL,Nelder, 2000)
CR approximate conditional inference(Cox and Reid, 1987)
Conditional maximum likelihood (CML)
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion

MLE

Equal library sizes: Estimate λ and φ seperately
Unequal library sizes: Estimate jointly
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion

PL
n
(yi − µi )2
ˆ
=n−1             (2)
ˆ        ˆPL µi )
µi (1 + φ ˆ
i=1

QL

n
yi                  yi + φ−1
2         yi log       − (yi + φ−1 )log
QL
QL
=n−1   (3)
i=1
ˆ
µi                  µi + φ−1
ˆ     QL

CR
ˆ      1            ˆ
lCR (φ) = l(λ, φ) − log|jλλ (φ, λ)|          (4)
2
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion

CML
When mi = m, Z = Y1 + · · · + Yn ∼ NB(nmλ, φn−1 ), sufﬁcient
for λ.
n
lY |Z =z (φ) =          logΓ(yi + φ−1 ) + logΓ(nφ−1 )
i=1                                     (5)
−1             −1
− logΓ(z + nφ         ) − nlogΓ(φ    )

When mi s not equal, no close form. The paper’s idea.
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion

1
Let m∗ = ( n mi ) n . Adjust the observed data as if
i=1
yi ∼ NB(m ∗ λ, φ) as follows:
1  Initialize φ
2  Given φ, estimate λ
3  Assuming yi ∼ NB(mi λ, φ), calculate
1
pi = P(Y < yi ; mi λ, φ) + P(Y = yi ; mi λ, φ),   i = 1, . . . , n.
2
(6)
4   Using a linear interpolation of the quantile function,
calculate pseudodata from NB(m∗ λ, φ), having probability
pi
5   Calculate φ using CML on the pseudodata
6   Repeat step 2-5 until φ converges
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion

Outline

1   Introduction

2   NB models
3   Estimation

4   Hypothesis Testing
5   Simulation studies
Estimation
Testing
6   Summary and discussion
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion

Hypothesis Testing

let Zt1 and Zt2 be the sum of pseudocounts for treatment 1 and
treatment 2, respectively, over the number of libraries, n1 and
n2 . Under the null hypothesis,
−1
Ztk ∼ NB(nk m∗ λt , φnk ), k = 1, 2. Construct an exact test
similar to the Fisher’s exact test.
Introduction
NB models
Estimation   Estimation
Hypothesis Testing   Testing
Simulation studies
Summary and discussion

Outline

1   Introduction

2   NB models
3   Estimation

4   Hypothesis Testing
5   Simulation studies
Estimation
Testing
6   Summary and discussion
Introduction
NB models
Estimation   Estimation
Hypothesis Testing   Testing
Simulation studies
Summary and discussion

Outline

1   Introduction

2   NB models
3   Estimation

4   Hypothesis Testing
5   Simulation studies
Estimation
Testing
6   Summary and discussion
Introduction
NB models
Estimation   Estimation
Hypothesis Testing   Testing
Simulation studies
Summary and discussion

Single tag with unequal library sizes
Introduction
NB models
Estimation   Estimation
Hypothesis Testing   Testing
Simulation studies
Summary and discussion

Multiple tags with unequal library sizes
Introduction
NB models
Estimation   Estimation
Hypothesis Testing   Testing
Simulation studies
Summary and discussion

Outline

1   Introduction

2   NB models
3   Estimation

4   Hypothesis Testing
5   Simulation studies
Estimation
Testing
6   Summary and discussion
Introduction
NB models
Estimation   Estimation
Hypothesis Testing   Testing
Simulation studies
Summary and discussion

Size of test: φ known
Introduction
NB models
Estimation   Estimation
Hypothesis Testing   Testing
Simulation studies
Summary and discussion

Size of test: φ estimated
Introduction
NB models
Estimation   Estimation
Hypothesis Testing   Testing
Simulation studies
Summary and discussion

Power consideration
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion

Outline

1   Introduction

2   NB models
3   Estimation

4   Hypothesis Testing
5   Simulation studies
Estimation
Testing
6   Summary and discussion
Introduction
NB models
Estimation
Hypothesis Testing
Simulation studies
Summary and discussion

Most reliable in terms of bias
Single tag with a small n, all mediocre
# tags increases, holding n small qCML is the best
Big n, CR about as well as qCML
Test performance
Only qCML achieves test size, cmp. Wald, LR, and score
test
Bias affects FDR more than power
qCML generalizable to GLMs, more research
Background
Moderated test
Simulation studies
Summary and discussion

Moderated statistical tests for assessing
differences in tag abundance

MARK D. ROBINSON
GORDON K. SMYTH

May 24, 2010
Background
Moderated test
Simulation studies
Summary and discussion

Outline

1   Background

2   Moderated test

3   Simulation studies

4   Summary and discussion
Background
Moderated test
Simulation studies
Summary and discussion

Notation

Yij : count for class i and library j,j = 1, . . . , ni ,i = 1, 2
Assume Yij ∼ NB(µij , φ)
Let µij = mij λi , H0 : λ1 = λ2
Background
Moderated test
Simulation studies
Summary and discussion

Method review-Tag wise

t-test (Ryu et al, 2002), not appropriate for small n
non-normal data
beta-binomial or overdispersed logistic(Baggerly et al.,
2003)
gamma-Poisson or overdispersed log-linear(Lu et al.,
2005)
Tag-wise
Small n, difﬁcult
Inefﬁcient with common φ
Background
Moderated test
Simulation studies
Summary and discussion

Method review-Tag wise

t-test (Ryu et al, 2002), not appropriate for small n
non-normal data
beta-binomial or overdispersed logistic(Baggerly et al.,
2003)
gamma-Poisson or overdispersed log-linear(Lu et al.,
2005)
Tag-wise
Small n, difﬁcult
Inefﬁcient with common φ
Background
Moderated test
Simulation studies
Summary and discussion

Outline

1   Background

2   Moderated test

3   Simulation studies

4   Summary and discussion
Background
Moderated test
Simulation studies
Summary and discussion

Common dispersion

mij equal for each i
                                              
2        ni
lg (φ) =                  logΓ(yij + φ−1 ) + log2Γ(ni φ−1 )
j=1       j=1
(1)
2
−logΓ(z + ni φ−1 ) − ni logΓ(φ−1 )
j=1

G
Common dispersion estimator max lC (φ) =                  g=1 lg (φ)
Unequal mij : qCML
Background
Moderated test
Simulation studies
Summary and discussion

Weighted conditional log-likelihood (WL)

WL(φg ) = lg (φg ) + αlC (φg )           (2)
WL mimics a Bayesian hierarchical model, lC prior, α precision
Background
Moderated test
Simulation studies
Summary and discussion

Example

ˆ                2                 2                              2
If φg |φg ∼ N(φg , τg ), φg ∼ N(φ0 , τ0 ), g = 1, . . . , G., where τg
known. Then,
ˆ    2        2
φg /τg + φ0 /τ0
ˆg         ˆ
φB = E(φg |φg ) =                           .
2      2
1/τg + 1/τ0

WL estimator:
ˆ    2          G ˆ       2
φg /τg + α      i=1 φi /τi
ˆg
φWL =
2           G       2
1/τg + α       i=1 1/τi
Background
Moderated test
Simulation studies
Summary and discussion

ˆg   ˆg
φB = φWL if
G   ˆ     2
ˆ                 g=1 φg /τg
φ0 = φ0 =               G       2
g=1 1/τg

and
G
2   2
1/α =               τ0 /τg
g=1
Background
Moderated test
Simulation studies
Summary and discussion

Estimating procedure

1                                              ˆ
Find the common dispersion estimator φ0 which maximizes
lC .
2                ˆ            ˆ
Evaluate Sg (φ0 ) and Ig (φ0 ) for each tag
3   Estimate τ0 by solving
G                 2
Sg
2
−1 =0
g=1
Ig (1 + Ig τ0 )

. If    2
Sg /Ig < G then τ0 = 0
2      G
4   Set 1/α = τ0      g=1 Ig
5                                         ˜
Obtain weighted likelihood estimators φg by maximizing
WL(φg )
Background
Moderated test
Simulation studies
Summary and discussion

Interpretation

2
If the dispersions equal (all φg = φ0 ), then (Sg ) = Ig , s.t.
estimate of τ0 0 and α large.
2
If different, then E(Sg ) = 0, and Sg > Ig on average,
2
τ0 > 0, and α small.
Background
Moderated test
Simulation studies
Summary and discussion

Outline

1   Background

2   Moderated test

3   Simulation studies

4   Summary and discussion
Background
Moderated test
Simulation studies
Summary and discussion

Model ﬁtting: MSE
Background
Moderated test
Simulation studies
Summary and discussion

Hypothesis testing
Background
Moderated test
Simulation studies
Summary and discussion

Outline

1   Background

2   Moderated test

3   Simulation studies

4   Summary and discussion
Background
Moderated test
Simulation studies
Summary and discussion

Weighted conditional likelihood estimator
Squeezes individual tag-wise towards the common
dispersion
Data-dependent prior and ﬁnd MAP, approximate EB rule
Shrinkage algorithm of general application: S & I at
common dispersion.
Zero-inﬂated models

May 24, 2010
0-inﬂated models

0          with prob. ϕi
yi ∼                                                (1)
g(yi |xi ) with prob. 1 − ϕi .
The prob. of {Yi = yi |xi },

ϕ(γ zi ) + {1 − ϕ(γ zi )}g(0|xi ) if yi = 0
P(Yi = yi |xi , zi ) =
{1 − ϕ(γ zi )}g(yi |xi ) if yi > 0.
(2)
Poisson:

E(yi |xi , zi ) = µi (1 − ϕi )                     (3)
V (yi |xi , zi ) = µi (1 − ϕi )(1 + µi ϕi )        (4)

NB:

E(yi |xi , zi ) = µi (1 − ϕi )                         (5)
V (yi |xi , zi ) = µi (1 − ϕi )(1 + µi (ϕi + φ))       (6)

V (yi |xi , zi ) > E(yi |xi , zi ), allowing dispersion estimation
R: package(pscl)

Related docs