Docstoc

A Comparison of Microarray Platforms - SIB

Document Sample
A Comparison of Microarray Platforms - SIB Powered By Docstoc
					A Comparison of Microarray Platforms

            NUS – IMS Workshop
              7 January 2004



  Darlene Goldstein
               Talk Outline

• Bioinformatics Core Facility at ISREC
• Purpose of study
• Platform technologies and study design
• Comparisons between platforms
• Conclusions and study completion
                   BCF: What is it ?
    DAF                               • ISREC-based, supported by
             NCCR biomedical
                                        the NCCR for molecular
                                        oncology, member group of
                                        the SIB
           BCF
                                      • Created by the NCCR
                                        molecular oncology to assist
                                        its DAF (which is now
                                        absorbed into the DAFL) and
microarray research NCCR biomedical
                                        its microarray users in their
                                        biomedical research

                                      • A group devoted to the
  DAFL &                                bioinformatics and statistical
                         BCF
                                        aspects of gene expression
  BIOINF         bioinf. research       research, in particular to the
                                        analysis of data generated
     biostatisticsEPFL                  with microarray technologies
          BCF: Main Components
• Technical Support
   – advice in experimental design and data analysis
   – production, control, development of spotted arrays
   – processing of microarray data, quality assessment
• Education
   – practical training through classes / workshops
• Collaboration
   – statistical data analysis of research projects
• Research & Development
   – development / testing tools & methods
       Platform Comparison Study
• Purpose
   – to assess accuracy and reproducibility of
     different gene expression platforms
   – to compare features of different measurement
     types
   – to understand the system (important for
     normalization and downstream analysis)
• Impact
   – practical advice to DAF(L) and to NCCR
     microarray users
   – benefit to wider scientific community, especially
     if possible to somehow combine results across
     array types
       Platforms and Study Design
• Platforms
   – Affymetrix GeneChips, high-density short oligo
     arrays
   – Agilent long oligo arrays
   – in-house spotted cDNA arrays
   – MPSS (massively parallel signature sequencing, a
     digital gene expression technology patented by Lynx);
     in collaboration with the Ludwig Institute for Cancer
     Research; originally intended as ‘gold standard’
• Basic Design
   – 3 replicate measurements for two mRNAs (human
     placenta and testis)
   – dye swap for two-color systems (Agilent, cDNA)
   – 2 to 3 million tags sequenced for MPSS
                       Methods
• Experimental Method (as recommended by ‘specialists’):
   – Affymetrix: Biozentrum Basel
   – Agilent: Institut Goustav Roussy, Paris
   – Spotted cDNA arrays: Otto Hagenbuechle's group
     (DAF, now DAFL)
   – MPSS: Lynx (California), Victor Jongeneel's group
     (LICR)
   – qRT-PCR followup (~ 250 genes), Robert Lyle, Patrick
     Descombes (UniGE)
• Expression Quantification
   as recommended by ‘specialists’ (above),
   but : RMA for Affymetrix
Spotted cDNA arrays

         Human 10k Array
         8x4 subarrays
Affymetrix GeneChips




             Image of hybridized
             array
                    MPSS
                     • Uses microbeads with ~100k
                     identical DNA molecules attached
                     • Captures and identifies
                     transcript sequences of
                     expressed genes by counting the
                     number of individual mRNA
                     molecules representing each gene
                     • Individual mRNAs are
                     identified through generated 17-
                     to 20-base signature sequence
(information from
Lynx web site)       • Can use without organism
                     sequence information
                     • ‘MPSS can accurately quantify
                     transcripts as low as 5
                     transcripts per million (tpm) to
                     above 50,000 tpm’
       Other comparison studies (I)
• Yuen et al. 2002; Nuc. Acids Res. 30(10):e48
   – Affy MGU-74A, cDNA; cell lines; qRT-PCR 47 genes
   – both arrays sensitive (TP) and specific (TN) at identifying
     regulated transcripts
   – found comparable rank-order of gene regulation, but only
     modest correlation in fold-change
   – both array types biased downwards (FC under-estimated
     compared to qRT-PCR)

• Evans et al. 2002; Eur. J. Neuroscience 16:409-413
   – Affy RG-U34A, SAGE to detect brain transcripts; 43 rat
     hippocampi; evaluation based on 1000 transcripts
   – ~55% low, ~90% high abundance transcripts detected
     Other comparison studies (II)
• Li et al. 2002; Toxicological Sciences 69:383-390
   – Affy HuGene FL, HGU-95Av2, IncyteGenomics UniGemV 2.0
     (‘long cDNA’); drug-treated cell lines at 8h and 24h; qRT-PCR
     9 genes
   – cross-hyb contributed to platform discrepancies
   – found Affy ‘more reliable’ (sensitive)

• Kuo et al. 2002; Bioinformatics 18:405-412
   – Affy HU6800, cDNA, publicly available data on NCI 60;
     2895 genes
   – found low correlation between measurements (but no control
     over lab procedures – different groups had performed the
     original studies)
    Other comparison studies (III)
• Barczak et al. 2003; Genome Res. 13:1775-1785
   – 2 versions of spotted long oligo (Operon), Affy HGU-95Av2;
     cell lines; 7344 genes
   – this large-scale analysis found strong correlations between
     relative expression measurements
   – similar results for amplified and unamplified targets

• Tan et al. 2003; Nuc. Acids Res. 31:5676-5684
   – Agilent Human 1, Affy HGU-95Av2, Amersham Codelink
     UniSet Human I (30-mers); cell lines in serum-rich medium
     and 24h after serum removal; 2009 genes
   – modest correlations
   – little overlap in genes called DE
   – best agreement on DE calls (varying criteria) only 21%
• comparison studies by other groups world-wide are
  also in progress
           Comparison Principle
• Cross-platform gene matching done through the
  trome database of transcripts (constructed with
  the Transcriptome Analyzer program tromer)
• Use only those genes we classify as ‘reliably
  mapped’ between platforms (~2500 genes); we
  have not (yet) looked at probe(set)s that could not
  be well-mapped to known transcripts
• ‘Peak technical performance’ : this is a case study,
  not a systematic study; does not take into account
  normal user variation, other mRNAs, etc.
• Comparison based on M (log ratio) and A (average
  log intensity)
• Unfortunately, accuracy cannot be properly
  assessed, as true M values are not known
cDNA array Performance
      MA plots (examples)
Affy U133A


              range

              background
NCCR h10kd




    Agilent
|M| (putative effect) densities
 (Difference in M) vs. A: reproducibility

Affy U133A                     y = difference in M
                               x = average A




NCCR h10k                           Agilent h1A
D (error) densities
              Gene Matching
                                  Probe(sets) / genes
                                  18325 Agilent h1A 15688
 Affy                 Agilent     24808 Affy U133A 14876
                                   7812 NCCR h10k 6853

5099          5977
                           5797
          3365
  435                549


         2514

       NCCR
Gene matching also with MPSS
                              2494   Tromer clusters
Affy                Agilent   4060   Affy probesets
                              2869   Agilent probes
                              2685   NCCR clones
            -
                    -
        2494

 -              -

       -

     NCCR
Concordance in M density plots (I)
                                     Agilent
                                     Affy
                                     NCCR
Concordance in M density plots (II)




                               Agilent
                               Affy
                               NCCR
    Difficulty in comparing to MPSS ratios
selection on MPSS MPSS MPSS   AFFY       AGIL       NCCR
             TESTIS PLAC M    M ave      M ave      M ave          genedescr
Affy up
               0     9     3.32   6.38
HTR005199_MPSS_1_AGIL_1_AFFY_1_NCCR_1        3.91           7.12   Fc fragment of IgE, high affinity I, receptor for; alpha poly
              14 5117      8.41   6.45
HTR000581_MPSS_1_AGIL_1_AFFY_1_NCCR_1        4.07           5.91   cytochrome P450, family 19, subfamily A, polypeptide 1
              14 5117      8.41   6.45
HTR000581_MPSS_1_AGIL_2_AFFY_1_NCCR_1        1.34           5.91   cytochrome P450, family 19, subfamily A, polypeptide 1
               1 3635     10.83   6.89
HTR004581_MPSS_1_AGIL_1_AFFY_1_NCCR_1        4.12           4.84   pregnancy-associated plasma protein A
               9 9632      9.91   6.99
HTR006015_MPSS_1_AGIL_1_AFFY_1_NCCR_1        4.20           6.60   glycoprotein hormones, alpha polypeptide
               0   280     8.13   7.94
HTR000790_MPSS_1_AGIL_1_AFFY_2_NCCR_1        2.20           4.38   hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid
               0   280     8.13   7.94
HTR000790_MPSS_1_AGIL_1_AFFY_2_NCCR_2        2.20           5.69   hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid


Agil up
               4   211     5.41   5.87
HTR010250_MPSS_1_AGIL_1_AFFY_1_NCCR_1        4.56           6.15   Homo sapiens adrenomedullin (ADM), mRNA.
               6 1328      7.57   5.91
HTR004842_MPSS_1_AGIL_1_AFFY_1_NCCR_1        4.98           5.36   glypican 3
               0     6     2.81   1.58
HTR004414_MPSS_1_AGIL_1_AFFY_1_NCCR_1        5.15           4.56   estrogen-related receptor gamma
               0     6     2.81   4.02
HTR004414_MPSS_1_AGIL_1_AFFY_2_NCCR_1        5.15           4.56   estrogen-related receptor gamma
               0   828     9.70   6.06
HTR002717_MPSS_1_AGIL_1_AFFY_1_NCCR_1        6.17           7.04   insulin-like growth factor binding protein 1
               0   828     9.70   6.06
HTR002717_MPSS_1_AGIL_1_AFFY_1_NCCR_1        6.17           7.04   insulin-like growth factor binding protein 1


NCCR up
               4   211     5.41   5.87
HTR010250_MPSS_1_AGIL_1_AFFY_1_NCCR_1        4.56           6.15   Homo sapiens adrenomedullin (ADM), mRNA.
               0   726     9.51   0.02
HTR003344_MPSS_1_AGIL_1_AFFY_1_NCCR_1        1.75           6.23   placental growth factor, vascular endothelial growth factor
               0   726     9.51   5.01
HTR003344_MPSS_1_AGIL_1_AFFY_2_NCCR_1        1.75           6.23   placental growth factor, vascular endothelial growth factor
               9 9632      9.91   6.99
HTR006015_MPSS_1_AGIL_1_AFFY_1_NCCR_1        4.20           6.60   glycoprotein hormones, alpha polypeptide
               0   828     9.70   6.06
HTR002717_MPSS_1_AGIL_1_AFFY_1_NCCR_1        6.17           7.04   insulin-like growth factor binding protein 1
               0     9     3.32   6.38
HTR005199_MPSS_1_AGIL_1_AFFY_1_NCCR_1        3.91           7.12   Fc fragment of IgE, high affinity I, receptor for; alpha poly
MPSS difficulties, another illustration
                           Correlations
 first quartile (25% least frequent RNAs)

            MPSS    AGILENT 1   AGILENT 2   AFFY    NCCR 1   NCCR 2
  MPSS      1.00      0.43        -0.45     0.44     0.47     -0.47     MPSS
AGILENT 1   0.43      1.00        -0.97     0.65     0.72     -0.73   AGILENT 1
AGILENT 2   -0.45     -0.97       1.00      -0.66    -0.73    0.73    AGILENT 2
  AFFY      0.44      0.65        -0.66     1.00     0.72     -0.73     AFFY
 NCCR 1     0.47      0.72        -0.73     0.72     1.00     -0.98    NCCR 1
 NCCR 2     -0.47     -0.73       0.73      -0.73    -0.98    1.00     NCCR 2
            MPSS    AGILENT 1   AGILENT 2   AFFY    NCCR 1   NCCR 2



 fourth quartile (25% most frequent RNAs)
            MPSS    AGILENT 1   AGILENT 2   AFFY    NCCR 1   NCCR 2
  MPSS      1.00      0.72        -0.73     0.73     0.76     -0.76     MPSS
AGILENT 1   0.72      1.00        -0.98     0.77     0.79     -0.79   AGILENT 1
AGILENT 2   -0.73     -0.98       1.00      -0.77    -0.79    0.80    AGILENT 2
  AFFY      0.73      0.77        -0.77     1.00     0.81     -0.81     AFFY
 NCCR 1     0.76      0.79        -0.79     0.81     1.00     -0.98    NCCR 1
 NCCR 2     -0.76     -0.79       0.80      -0.81    -0.98    1.00     NCCR 2
            MPSS    AGILENT 1   AGILENT 2   AFFY    NCCR 1   NCCR 2
Agreement: top up 200 (placenta)

Affy
                              Agilent
   40          30
                         48
               96
        34
                    26

              44                          M range
                                 Affy: 1.66 - 7.94
             NCCR                Agil: 1.48 - 6.17
                                NCCR: 1.83 - 7.12
Agreement: top down 200 (testis)

Affy
                              Agilent
   38          41
                         46
               87
        34
                    26

              53                         M range
                               Affy: -8.27 - -1.65
             NCCR              Agil: -6.07 - -1.47
                              NCCR: -6.18 - -1.79
Comparison with MPSS, 99% CI (up)
Comparison with MPSS, 99% CI (Down)
        MPSS CI Overlap
Overlap with the 99% CI for MPSS

Affy                      31.23%
Agilent                   28.62%
NC CR                     30.52%

Overlap with the 99.9% CI for MPSS

 Affy                    41.53%
 Agilent                 38.90%
 NC CR                   40.52%
        Overlap with MPSS

           38
                         MPSS
                74
                                  (similar numbers also
      88                          for Affy and Agilent);
                                  56 of the 88 are in
112                               common to all 4


 NCCR
            missing or classified as unreliably mapped (tag to
            gene not unique)
                 Conclusions (I)
• The three microarray platforms compared performed
very similarly in terms of which genes are detected as
differentially expressed, distributions of M values,
variability between replicate measurements ...
  ... so similarly that it seems hard to find real
differences
• Most disagreement for low-expressed genes
• RMA M values (Affy) are better variance-stabilized,
but reproducibility is good for all platforms except for
weak signals in Agilent (likely due to bg treatment)
• RMA M values are more strongly compressed towards
zero at low intensity; reduces false positive calls but
might make DE at low intensity undetectable (but is it
detectable at all?)
               Conclusions (II)
Microarrays vs MPSS

M values, quantitative comparison:
   the disagreement is large ...
   ... so large that it is hard to reconcile the values,
  making it impossible to use MPSS as the ‘gold
  standard’

M values, qualitative comparison:
  there is a good degree of agreement
  - approximately the same to all three microarray
  platforms
              Conclusions (III)
• MPSS predicts many more low-abundance genes to be
  (strongly) differentially expressed
• The hybridization methods lose signal of low-abundance
  genes (due to the background fluorescence estimation?)
• microarrays miss detection of most of the differential
  expression of low abundance transcripts, but it is also
  possible that MPSS is biased for many genes or less
  precise than this approach suggests
 approach with confidence intervals for MPSS
    (currently approximate CI that takes into
    consideration the sampling error on the counts, we
    have no replicated measurements for MPSS)
           Completion of Study

Choose genes for qRT-PCR for which the platforms and
  MPSS disagree and (attempt to) address the
  questions:
• which platform is more accurate?
• how does accuracy depend on the signal intensity?
• do the microarrays miss DE frequently....?
• ....and especially at weak signal intensity ?
• which platform best detects low abundance RNAs?
• does MPSS agree with QT-PCR?
• Suggestions are welcome !!   
          Acknowledgements
• Ludwig Institute for Cancer Research
   Victor Jongeneel, Christian Iseli, Brian
     Stephenson
• DAF/DAFL
   Otto Hagenbuechle, Josiane Wyniger
• UniGE
   Robert Lyle, Patrick Descombes
• BCF
   Mauro Delorenzi, Eugenia Migliavacca
• and everyone I inadvertently left out!

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:23
posted:4/20/2011
language:English
pages:37