Identification of differential gene expression from MPSS data

Document Sample
Identification of differential gene expression from MPSS data Powered By Docstoc
					                                       Identification of
                                       differential gene
                                       expression from
                                      Massively Parallel
                                    Signature Sequencing
      Toni Reverter                 (MPSS) data based on
CRC for Innovative Dairy Products
     Bioinformatics Group            bootstrap percentile
  CSIRO Livestock Industries
 Queensland Bioscience Precinct
  306 Carmody Rd., St. Lucia
                                     confidence intervals
      QLD 4067, Australia


                                                InCoB 2004 – Auckland NZ
  Differential gene expression from MPSS data based on bootstrap percentile confidence intervals




                                MPSS Technology

 Another (very good) sequencing method
 Identifies (nearly) all the DNA molecules in a given sample
 Each analysis involves > 106 transcripts (big range!)
 High sensitivity (identification of very low abundant transcripts)
 More information:
           Brenner et al. (2000)
           www.lynxgen.com (Lynx Therapeutics, Inc.)

 …Statistical            analysis of MPSS data?


                                                                                     InCoB 2004 – Auckland NZ
Differential gene expression from MPSS data based on bootstrap percentile confidence intervals




                              MPSS Tag Data
           SIGNATURE                           CANCER (tpm)           NORMAL (tpm)

           GATCGTCCTCTCCCCCG                        15                     17
           GATCCCTGCCCCACCCC                         6                     19
           GATCCCAACCTTTTGTA                         0                      6
           GATCGCCCTCGTGCTGA                        13                     19
           GATCTATGGCATCCAAG                         6                      1
           GATCTTGGCCTTCACAT                        10                     19
           GATCCCAGGCTGCTTCT                         9                      0
           GATCTTGGCTTCTCAAC                        24                      1
           GATCTGCACAGATGCCT                        17                     18
           GATCAACGATATCCACA                         3                     10
           GATCGAGGACTGTGTGG                       290                    156
           GATCAAGCGGGAGCAGA                        78                     91
           GATCCCAACAGGCTCAA                         4                      0
                    …………

                              SUM:        1,260,230                1,280,977
                              MIN:                0                        0
                              MAX:           31,243                  101,215




                                                                                   InCoB 2004 – Auckland NZ
    Differential gene expression from MPSS data based on bootstrap percentile confidence intervals




                                  MPSS Tag Distribution

    MPSS Paper, Jongeneel et al.                   MPSS Test Data             cDNA Noise Paper
       PNAS 03, 100:4702                           No Tags = 25,503           PNAS 02, 99:14031
                                                                                                2x2 
tpm                    N Tags            %              S1           S2                         1 x 
                                                                                   f ( x)  exp     
                                                                                                     
>    1      (0.0)      27,965 100.00                100.00 100.00                           100.00
     5      (0.7)      15,145 54.16                  57.14  49.87                            56.19
    10      (1.0)      10,519 37.61                  36.11  33.66                            36.79
    50      (1.7)       3,261 11.66                  10.89  10.74                            11.76
   100      (2.0)       1,719   6.15                  5.73   5.67                             6.95
   500      (2.7)         298   1.07                  1.21   1.13                             1.94
 1,000      (3.0)         154   0.55                  0.57   0.55                             1.11
 5,000      (3.7)          26   0.09                  0.15   0.11                             0.29
10,000      (4.0)           7   0.02                  0.05   0.05                             0.16



                                                                                        InCoB 2004 – Auckland NZ
       Differential gene expression from MPSS data based on bootstrap percentile confidence intervals



                                                   MPSS Statistics Model
           Sample 1 Sample 2
Gene 1
Others
             n11
             n21
                      n12
                      n22
                             N1.
                             N2.
                                             p1  n11 N .1           p2  n12 N .2        p0  0.5  p1  p2 
             N.1      N.2    N..
 • Categorical Data                            Normal approximation for Binomial proportions
        N ..( n11n22  n12 n21) 2                                                p1  p2
     
      2
                                                            
             N1.N 2.N .1N .2                                          p0 1  p0 1 / N .1  1 / N .2

                                            N1.!N 2.!N .1! N .2!
 • Fisher’s exact (?) test:          
                                                   !       !
                                           N ..!n11 n12!n21 n22!

                                                                              (n11  n12)!
                                                               n12
 • Audic & Claverie’s test: (n12 | n11)   N .2 
                                                 
                                            N .1                   n11 n12!(1  N .2 / N .1) ( n11 n121)
                                                                        !

                                             k n12            
                                                                          
                                       min   (k | n11) ,  (k | n11)
                                              k 0          k  n12      
a la SAGE data:
      • Man et al., 2000
      • Vencio et al., 2003             No Hypothesis testing 
                                                                                            InCoB 2004 – Auckland NZ
 Differential gene expression from MPSS data based on bootstrap percentile confidence intervals




                             My Concern

                            Mycroarray                         vs                MPSS

N Genes                          ~10,000                                     ~25,000
% DE Genes                         2 – 10                                     15 – 25
N DE Genes                       200 – 1,000                               3,750 – 6,250



Biochemist
                                                                                 
                                                                                    InCoB 2004 – Auckland NZ
Differential gene expression from MPSS data based on bootstrap percentile confidence intervals




                              My Concern                                 Sensitivity
                                                                Adapted from Reverter et al., 2004




                    5 tpm




                                      50 - 300 tpm




                                                                                   InCoB 2004 – Auckland NZ
Differential gene expression from MPSS data based on bootstrap percentile confidence intervals




                              MPSS Test Data
                                                                            2 Issues:
                                                                1.    Equivalence with M-A plots
                                                                2.    Geometry




                tan-12




                                                                                   InCoB 2004 – Auckland NZ
Differential gene expression from MPSS data based on bootstrap percentile confidence intervals




                              MPSS Test Data




                                                                                   InCoB 2004 – Auckland NZ
Differential gene expression from MPSS data based on bootstrap percentile confidence intervals




                              MPSS Test Data

                                                                   Binomial
                                                                5,137 DE Genes




                                                                                   InCoB 2004 – Auckland NZ
   Differential gene expression from MPSS data based on bootstrap percentile confidence intervals




                                 Algorithm for Bootstrap
1. Read transcripts for the i-th signature:
  t i  xi , yi   i  1, 2,, n        MAi  mi , ai ; mi  xi  yi , ai  0.5 xi  y i 

2. Sort MAi by ai (x-axis)
3. Define b Bins: B j  j  1, 2,  , b (Same width or Same size)
4. Define r BR (Bootstrap Replicates), …enough for CI (eg. r = 200)
   Define  (Significance)
5. For each Bj collect BRk  k  1, 2,, r
          5.1. Compute: CI j ,1  LB j , 2 , UB j ,1 2 
          5.2. Identify:            MAi B j  CI j ,1
6. Stop

                                                                                      InCoB 2004 – Auckland NZ
Differential gene expression from MPSS data based on bootstrap percentile confidence intervals




                              Bins of equal width




                                                                                   InCoB 2004 – Auckland NZ
Differential gene expression from MPSS data based on bootstrap percentile confidence intervals




                              Bins of equal size
                                                                         Merits:
                                                              1.    Accuracy stabilisation
                                                              2.    Variance stabilisation




                                                                                   InCoB 2004 – Auckland NZ
Differential gene expression from MPSS data based on bootstrap percentile confidence intervals




                              MPSS Test Data

                                                                  Bootstrap CI
                                                                 497 DE Genes




                                                                                   InCoB 2004 – Auckland NZ
Differential gene expression from MPSS data based on bootstrap percentile confidence intervals




                              MPSS Test Data




                                                                                   InCoB 2004 – Auckland NZ
 Differential gene expression from MPSS data based on bootstrap percentile confidence intervals




                               Conclusions
 Compared to microarray, the analysis of MPSS should be trivial
 Standard parametric (binomial) methods likely to generate a large
number of differentially expressed elements.
        Trade-off: Biological vs Statistically significant
 The proposed method possesses a number of advantages:
        Very easy to implement
        Very fast to generate
        Operates on total transcripts as opposed to proportions
        Accommodates the inherent heteroskedasticity
 More research ($) is needed to assess:
        The impact of MPSS in expression studies
        The (possible) annotation gap (non-sequenced species)
                                                                                    InCoB 2004 – Auckland NZ
  Differential gene expression from MPSS data based on bootstrap percentile confidence intervals


                                       References

Brenner, S., M. Johnson, J. Bridgham, et al. (2000) Gene expression analysis by
massively parallel signature sequencing (MPSS) on microbead arrays. Nature
Biotechnology 18:630-634.
Jongeneel, C.V., C. Iseli, B.J. Stevenson, et al. (2003) Comprehensive sampling of
gene expression in human cell lines with massively parallel signature seequencing.
PNAS, USA, 100:4702-4705.
Man M.Z., X. Wang, and Y. Wang (2000) POWER_SAGE: comparing statistical
tests for SAGE experiments. Bioinformatics, 16:953-959.

Reverter, A., S. McWilliam, W. Barris, and B. Dalrymple (2004) A rapid method for
computationally inferring transcriptome coverage and microarray sensitivity.
Bioinformatics (in press).

Tu. Y., G. Stolovitzky, and U. Klein (2002) Quantitative noise analysis for gene
expression microarray experiments. PNAS, USA, 99:14031-14036.

Vencio, R.Z.N., H. Brentani, and C.A.B. Pereira (2003) Using credibility intervals
instead of hypothesis tests in SAGE analysis.

                                                                                     InCoB 2004 – Auckland NZ
Differential gene expression from MPSS data based on bootstrap percentile confidence intervals




                              Acknowledgements

                                                  Lynx Therapeutics, Inc.
                                                  Christian Haudenschild

Peter Thompson (USYD)
Frank Nicholas (USYD)
Ross Tellam (CSIRO)
Brian Dalrymple (CSIRO)




                                                                                   InCoB 2004 – Auckland NZ

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:5
posted:11/27/2011
language:English
pages:18