Microarray Pitfalls

Document Sample
Microarray Pitfalls Powered By Docstoc
					Microarray Pitfalls

   Stem Cell Network
Microarray Course, Unit 3
     October 2006
• To provide some guidelines on Affymetrix
  – How to use them
  – How not to use them
  – Things to keep in mind when designing
    experiments and analyzing data
• This is a general discussion of issues and
  is by no means exhaustive
     Inconsistent Annotations
• Affymetrix provided probeset annotations
  change over time
• The gene symbol associated with a given
  probeset is not necessarily stable
• This is due to changes in gene prediction
  as new information becomes available.
        Inconsistent Annotations (2)
An inconsistently annotated probeset

   • Perez-Iratxeta, C. and M.A. Andrade.
     2005. Inconsistencies over time in 5% of
     NetAffx probe-to-gene annotations. BMC
     Bioinformatics. 6, 183.
       – 5% of probesets have gene identifiers that
         change over the two year time span covered
         by this analysis
  Inconsistent Annotations (3)
• How do we deal with this?
  – Always note annotation version used in
    analysis especially when it is for publication
  – Report probeset name as well as gene
  – Remember that re-analysis with later
    annotations may yield different results
  – Keep your annotation files up to date
       Old chips, new data
• Expression microarrays are designed
  based the best available model of the
  genome of interest
• The model for the HG-U133 microarrays
  was a human genome assembly that was
  only 25% complete!
• The human assembly is >99% complete
      Old chips, new data (2)
• How do we deal with this?
  – A number of groups provide re-mappings of
    probes to probesets based upon the latest
    data available, for example:
    • Dai M, et al. Evolving gene/transcript definitions
      significantly alter the interpretation of GeneChip
      data. Nucleic Acids Res. 2005;33:e175
   Multiple Testing Corrections
• A single expression microarray experiment
  actually consist of hundreds of thousands
  of simultaneous parallel experiment
• This means you can test many hypotheses
• This is not free: the significance of any
  given result is decreases as a function of
  the number of hypotheses tested
Multiple Testing Corrections (2)
• How do we deal with this?
  – Limit the number of hypothesis you are testing
    instead of just „fishing‟ in the whole data set.
  – Do this by selecting a set of candidate genes
    ahead of time based on your knowledge of
    the biology of the system.
            Multiple Testing (3)
• Sandrine Dudoit, Juliet Popper Shaffer and
  Jennifer C. Boldrick Multiple Hypothesis Testing
  in Microarray Experiments Statistical Science
  2003, Vol. 18, No. 1, 71–103
   – “The biological question of differential expression can
     be restated as a problem in multiple hypothesis
     testing: the simultaneous test for each gene of the
     null hypothesis of no association between the
     expression levels and the responses”
• Talk to a statistician if you have doubts
  Not everything is in the array
• Probesets are designed with a bias
  towards the 3‟ end of the gene.
• they won‟t distinct splice variants
• won‟t pick up alternative 3‟ endings
 Not everything is in the array (2)
• What can we do about this?
  – You should be aware of this, but not much
    can be done.
  – Use other technologies to complement your
    microarray results (PCR, sequencing)
    What are you measuring?
• Remember that you are detecting the
  average mRNA over a population of cells.
• Is your sample homogenous?
• If it‟s not homogenous then what are you
  measuring? How many types of cells in
  what state?
• Time series of differentiating cells are
  particularly problematic.
    Inhomogenous Samples?
• Many sources of inhomogeneity
  – Source organism gender
  – Cell cycle
  – Tissue source
  – Diet
• Some can be eliminated
• All should be documented where possible
    Chips don‟t detect protein
• Central assumption of microarray analysis:
  The level of mRNA is positively correlated
  with protein expression levels.
  – Higher mRNA levels mean higher protein
    expression, lower mRNA means lower protein
• Other factors:
  – Protein degradation, mRNA degradation,
    polyadenylation, codon preference, translation
• This is a general discussion of issues,
  doesn‟t cover all pitfalls.
• Please contact if you
  have any comments, corrections or
• See associated bibliography for references
  from this presentation and further reading.
• Thanks for your attention!