Next generation sequencing reading club

Document Sample
scope of work template
							Modeling tag abundance with Negative
           Binomial models

          Reader: Sean Wang



             May 24, 2010
Outline



      "Small-sample estimation of negative binomial dispersion,
      with applications to SAGE data", Robinson and Smyth,
      2008
      "Moderated statistical tests for assessing differences in tag
      abundance", Robinson and Smyth, 2007
      Zero-inflated negative binomial(NB) models

  First 2 papers: NB models for small sample,n
  First common overdispersion, 2nd tag-wise
  Zero-inflated NB models different problems
Outline



      "Small-sample estimation of negative binomial dispersion,
      with applications to SAGE data", Robinson and Smyth,
      2008
      "Moderated statistical tests for assessing differences in tag
      abundance", Robinson and Smyth, 2007
      Zero-inflated negative binomial(NB) models

  First 2 papers: NB models for small sample,n
  First common overdispersion, 2nd tag-wise
  Zero-inflated NB models different problems
                    Introduction
                     NB models
                      Estimation
              Hypothesis Testing
              Simulation studies
          Summary and discussion




Small-sample estimation of negative binomial
 dispersion, with applications to SAGE data

                 MARK D. ROBINSON
                 GORDON K. SMYTH


                       May 24, 2010
                               Introduction
                                NB models
                                 Estimation
                         Hypothesis Testing
                         Simulation studies
                     Summary and discussion


Outline

  1   Introduction

  2   NB models
  3   Estimation

  4   Hypothesis Testing
  5   Simulation studies
        Estimation
        Testing
  6   Summary and discussion
                              Introduction
                               NB models
                                Estimation
                        Hypothesis Testing
                        Simulation studies
                    Summary and discussion


Problem


  Let Y1 , . . . , Yn1 be independent tag counts from treatment 1,
  and follow NB(µi = mi λ1 , φ).
  Let X1 , . . . , Xn2 be independent tag counts from treatment 2,
  and follow NB(µj = mj λ2 , φ).
  n1 , n2 small, mi , mj the library size, λ1 , λ2 the proportion of the
  library that is a particular tag.

  Goal
  Test λ1 = λ2
  Need to Estimateλ s and φ
                         Introduction
                          NB models
                           Estimation
                   Hypothesis Testing
                   Simulation studies
               Summary and discussion


Why NB



    No replicate 2 × 2 table & χ2
    Pool small libraries: overestimate significance w.o.
    interlibrary variation
    Poisson: limited modeling power
    GLM, but replication rare due to expense
                               Introduction
                                NB models
                                 Estimation
                         Hypothesis Testing
                         Simulation studies
                     Summary and discussion


Outline

  1   Introduction

  2   NB models
  3   Estimation

  4   Hypothesis Testing
  5   Simulation studies
        Estimation
        Testing
  6   Summary and discussion
                            Introduction
                             NB models
                              Estimation
                      Hypothesis Testing
                      Simulation studies
                  Summary and discussion


NB models

  Let Y be an NB random variable with mean µ and dispersion φ,
  denoted Y ∼ NB(µ, φ). S.t.,
                                                        φ−1             y
                               Γ(y + φ−1 )        1              µ
  f (y ; µ, φ) = P(Y = y ) =
                             Γ(φ−1 )Γ(y + 1)   1 + µφ         φ−1 + µ
                                                                  (1)

      E(Y ) = µ, Var (Y ) = µ + φµ2
      NB⇒ Poisson as φ → 0
      Robust alternative to overdispersed logistic
      regression(beta-binomial)
      φ accounts for the interlibrary variability
                               Introduction
                                NB models
                                 Estimation
                         Hypothesis Testing
                         Simulation studies
                     Summary and discussion


Outline

  1   Introduction

  2   NB models
  3   Estimation

  4   Hypothesis Testing
  5   Simulation studies
        Estimation
        Testing
  6   Summary and discussion
                          Introduction
                           NB models
                            Estimation
                    Hypothesis Testing
                    Simulation studies
                Summary and discussion


Estimation approaches



     MLE
     Pseudo-likelihood(PL, Smyth, 2003)
     Quasi-likelihood(QL,Nelder, 2000)
     CR approximate conditional inference(Cox and Reid, 1987)
     Conditional maximum likelihood (CML)
     Quantile-adjusted CML
                           Introduction
                            NB models
                             Estimation
                     Hypothesis Testing
                     Simulation studies
                 Summary and discussion


MLE




      Equal library sizes: Estimate λ and φ seperately
      Unequal library sizes: Estimate jointly
                              Introduction
                               NB models
                                Estimation
                        Hypothesis Testing
                        Simulation studies
                    Summary and discussion




PL
                          n
                                 (yi − µi )2
                                        ˆ
                                                =n−1             (2)
                              ˆ        ˆPL µi )
                              µi (1 + φ ˆ
                        i=1


QL

     n
                     yi                  yi + φ−1
 2         yi log       − (yi + φ−1 )log
                                 QL
                                               QL
                                                          =n−1   (3)
     i=1
                     ˆ
                     µi                  µi + φ−1
                                         ˆ     QL


CR
                                ˆ      1            ˆ
                    lCR (φ) = l(λ, φ) − log|jλλ (φ, λ)|          (4)
                                       2
                           Introduction
                            NB models
                             Estimation
                     Hypothesis Testing
                     Simulation studies
                 Summary and discussion




CML
When mi = m, Z = Y1 + · · · + Yn ∼ NB(nmλ, φn−1 ), sufficient
for λ.
                           n
        lY |Z =z (φ) =          logΓ(yi + φ−1 ) + logΓ(nφ−1 )
                          i=1                                     (5)
                                          −1             −1
                         − logΓ(z + nφ         ) − nlogΓ(φ    )

When mi s not equal, no close form. The paper’s idea.
                             Introduction
                              NB models
                               Estimation
                       Hypothesis Testing
                       Simulation studies
                   Summary and discussion


Quantile-Adjusted CML
                        1
  Let m∗ = ( n mi ) n . Adjust the observed data as if
                 i=1
  yi ∼ NB(m ∗ λ, φ) as follows:
    1  Initialize φ
    2  Given φ, estimate λ
    3  Assuming yi ∼ NB(mi λ, φ), calculate
                                 1
       pi = P(Y < yi ; mi λ, φ) + P(Y = yi ; mi λ, φ),   i = 1, . . . , n.
                                 2
                                                                      (6)
   4   Using a linear interpolation of the quantile function,
       calculate pseudodata from NB(m∗ λ, φ), having probability
       pi
   5   Calculate φ using CML on the pseudodata
   6   Repeat step 2-5 until φ converges
                               Introduction
                                NB models
                                 Estimation
                         Hypothesis Testing
                         Simulation studies
                     Summary and discussion


Outline

  1   Introduction

  2   NB models
  3   Estimation

  4   Hypothesis Testing
  5   Simulation studies
        Estimation
        Testing
  6   Summary and discussion
                            Introduction
                             NB models
                              Estimation
                      Hypothesis Testing
                      Simulation studies
                  Summary and discussion


Hypothesis Testing




  let Zt1 and Zt2 be the sum of pseudocounts for treatment 1 and
  treatment 2, respectively, over the number of libraries, n1 and
  n2 . Under the null hypothesis,
                         −1
  Ztk ∼ NB(nk m∗ λt , φnk ), k = 1, 2. Construct an exact test
  similar to the Fisher’s exact test.
                               Introduction
                                NB models
                                 Estimation   Estimation
                         Hypothesis Testing   Testing
                         Simulation studies
                     Summary and discussion


Outline

  1   Introduction

  2   NB models
  3   Estimation

  4   Hypothesis Testing
  5   Simulation studies
        Estimation
        Testing
  6   Summary and discussion
                               Introduction
                                NB models
                                 Estimation   Estimation
                         Hypothesis Testing   Testing
                         Simulation studies
                     Summary and discussion


Outline

  1   Introduction

  2   NB models
  3   Estimation

  4   Hypothesis Testing
  5   Simulation studies
        Estimation
        Testing
  6   Summary and discussion
                         Introduction
                          NB models
                           Estimation   Estimation
                   Hypothesis Testing   Testing
                   Simulation studies
               Summary and discussion


Single tag with unequal library sizes
                         Introduction
                          NB models
                           Estimation   Estimation
                   Hypothesis Testing   Testing
                   Simulation studies
               Summary and discussion


Multiple tags with unequal library sizes
                               Introduction
                                NB models
                                 Estimation   Estimation
                         Hypothesis Testing   Testing
                         Simulation studies
                     Summary and discussion


Outline

  1   Introduction

  2   NB models
  3   Estimation

  4   Hypothesis Testing
  5   Simulation studies
        Estimation
        Testing
  6   Summary and discussion
                        Introduction
                         NB models
                          Estimation   Estimation
                  Hypothesis Testing   Testing
                  Simulation studies
              Summary and discussion


Size of test: φ known
                         Introduction
                          NB models
                           Estimation   Estimation
                   Hypothesis Testing   Testing
                   Simulation studies
               Summary and discussion


Size of test: φ estimated
                       Introduction
                        NB models
                         Estimation   Estimation
                 Hypothesis Testing   Testing
                 Simulation studies
             Summary and discussion


Power consideration
                               Introduction
                                NB models
                                 Estimation
                         Hypothesis Testing
                         Simulation studies
                     Summary and discussion


Outline

  1   Introduction

  2   NB models
  3   Estimation

  4   Hypothesis Testing
  5   Simulation studies
        Estimation
        Testing
  6   Summary and discussion
                      Introduction
                       NB models
                        Estimation
                Hypothesis Testing
                Simulation studies
            Summary and discussion

Most reliable in terms of bias
    Single tag with a small n, all mediocre
    # tags increases, holding n small qCML is the best
    Big n, CR about as well as qCML
Test performance
    Only qCML achieves test size, cmp. Wald, LR, and score
    test
    Bias affects FDR more than power
qCML generalizable to GLMs, more research
                  Background
                Moderated test
            Simulation studies
        Summary and discussion




Moderated statistical tests for assessing
    differences in tag abundance

               MARK D. ROBINSON
               GORDON K. SMYTH


                     May 24, 2010
                             Background
                           Moderated test
                       Simulation studies
                   Summary and discussion


Outline



  1   Background


  2   Moderated test


  3   Simulation studies


  4   Summary and discussion
                             Background
                           Moderated test
                       Simulation studies
                   Summary and discussion


Notation




     Yij : count for class i and library j,j = 1, . . . , ni ,i = 1, 2
     Assume Yij ∼ NB(µij , φ)
     Let µij = mij λi , H0 : λ1 = λ2
                           Background
                         Moderated test
                     Simulation studies
                 Summary and discussion


Method review-Tag wise



     t-test (Ryu et al, 2002), not appropriate for small n
     non-normal data
     beta-binomial or overdispersed logistic(Baggerly et al.,
     2003)
     gamma-Poisson or overdispersed log-linear(Lu et al.,
     2005)
         Tag-wise
         Small n, difficult
         Inefficient with common φ
                           Background
                         Moderated test
                     Simulation studies
                 Summary and discussion


Method review-Tag wise



     t-test (Ryu et al, 2002), not appropriate for small n
     non-normal data
     beta-binomial or overdispersed logistic(Baggerly et al.,
     2003)
     gamma-Poisson or overdispersed log-linear(Lu et al.,
     2005)
         Tag-wise
         Small n, difficult
         Inefficient with common φ
                             Background
                           Moderated test
                       Simulation studies
                   Summary and discussion


Outline



  1   Background


  2   Moderated test


  3   Simulation studies


  4   Summary and discussion
                              Background
                            Moderated test
                        Simulation studies
                    Summary and discussion


Common dispersion


     mij equal for each i
                                                                   
                     2        ni
         lg (φ) =                  logΓ(yij + φ−1 ) + log2Γ(ni φ−1 )
                    j=1       j=1
                                                                            (1)
                     2
                           −logΓ(z + ni φ−1 ) − ni logΓ(φ−1 )
                    j=1

                                                               G
     Common dispersion estimator max lC (φ) =                  g=1 lg (φ)
     Unequal mij : qCML
                            Background
                          Moderated test
                      Simulation studies
                  Summary and discussion


Weighted conditional log-likelihood (WL)




                     WL(φg ) = lg (φg ) + αlC (φg )           (2)
  WL mimics a Bayesian hierarchical model, lC prior, α precision
                             Background
                           Moderated test
                       Simulation studies
                   Summary and discussion


Example


    ˆ                2                 2                              2
 If φg |φg ∼ N(φg , τg ), φg ∼ N(φ0 , τ0 ), g = 1, . . . , G., where τg
 known. Then,
                                            ˆ    2        2
                                            φg /τg + φ0 /τ0
                  ˆg         ˆ
                  φB = E(φg |φg ) =                           .
                                                2      2
                                             1/τg + 1/τ0

 WL estimator:
                                ˆ    2          G ˆ       2
                                φg /τg + α      i=1 φi /τi
                    ˆg
                    φWL =
                                    2           G       2
                                 1/τg + α       i=1 1/τi
                        Background
                      Moderated test
                  Simulation studies
              Summary and discussion




ˆg   ˆg
φB = φWL if
                                           G   ˆ     2
                         ˆ                 g=1 φg /τg
                    φ0 = φ0 =               G       2
                                            g=1 1/τg

and
                                       G
                                              2   2
                         1/α =               τ0 /τg
                                       g=1
                             Background
                           Moderated test
                       Simulation studies
                   Summary and discussion


Estimating procedure

   1                                              ˆ
       Find the common dispersion estimator φ0 which maximizes
       lC .
   2                ˆ            ˆ
       Evaluate Sg (φ0 ) and Ig (φ0 ) for each tag
   3   Estimate τ0 by solving
                           G                 2
                                            Sg
                                              2
                                                    −1 =0
                          g=1
                                  Ig (1 + Ig τ0 )

       . If    2
              Sg /Ig < G then τ0 = 0
                  2      G
   4   Set 1/α = τ0      g=1 Ig
   5                                         ˜
       Obtain weighted likelihood estimators φg by maximizing
       WL(φg )
                             Background
                           Moderated test
                       Simulation studies
                   Summary and discussion


Interpretation




                                                      2
      If the dispersions equal (all φg = φ0 ), then (Sg ) = Ig , s.t.
      estimate of τ0 0 and α large.
                                          2
      If different, then E(Sg ) = 0, and Sg > Ig on average,
        2
      τ0 > 0, and α small.
                             Background
                           Moderated test
                       Simulation studies
                   Summary and discussion


Outline



  1   Background


  2   Moderated test


  3   Simulation studies


  4   Summary and discussion
                       Background
                     Moderated test
                 Simulation studies
             Summary and discussion


Model fitting: MSE
                        Background
                      Moderated test
                  Simulation studies
              Summary and discussion


Hypothesis testing
                             Background
                           Moderated test
                       Simulation studies
                   Summary and discussion


Outline



  1   Background


  2   Moderated test


  3   Simulation studies


  4   Summary and discussion
                     Background
                   Moderated test
               Simulation studies
           Summary and discussion

Weighted conditional likelihood estimator
Squeezes individual tag-wise towards the common
dispersion
Data-dependent prior and find MAP, approximate EB rule
Shrinkage algorithm of general application: S & I at
common dispersion.
Zero-inflated models



    May 24, 2010
0-inflated models




                            0          with prob. ϕi
                   yi ∼                                                (1)
                            g(yi |xi ) with prob. 1 − ϕi .
  The prob. of {Yi = yi |xi },

                            ϕ(γ zi ) + {1 − ϕ(γ zi )}g(0|xi ) if yi = 0
   P(Yi = yi |xi , zi ) =
                                       {1 − ϕ(γ zi )}g(yi |xi ) if yi > 0.
                                                                        (2)
Poisson:

                  E(yi |xi , zi ) = µi (1 − ϕi )                     (3)
                  V (yi |xi , zi ) = µi (1 − ϕi )(1 + µi ϕi )        (4)

NB:

              E(yi |xi , zi ) = µi (1 − ϕi )                         (5)
              V (yi |xi , zi ) = µi (1 − ϕi )(1 + µi (ϕi + φ))       (6)

V (yi |xi , zi ) > E(yi |xi , zi ), allowing dispersion estimation
R: package(pscl)

						
Related docs