Comparing Gene Expression Between Affymetrix Arrays
Biostat 278 January 22th, 2008
Identification of “Interesting” Genes
• Many different methods for finding differentially expressed genes. Depends on experimental design and outcome measures
Observational Methods
Statistical Methods
• Present/Absent • Test for Significant (MAS4) Signal (MAS5) • Increase/Decrease Call • Statistical Test for Difference • Absolute Difference • Confidence Interval • Fold Change for Fold-Change
Present/Absent
• The percentage of arrays in which the gene is called present (Affy or Statistical Test) can be used as a filtering criterion. • Genes with low percent present likely have expression in the noise range. • Can remove “Absent” genes prior to the statistical comparison between arrays. • Referred to a m/n filter (Pounds/Cheng 2005)
Increase/Decrease Call
• MAS 5.0 also allows users to set a baseline chip to allow increase/decrease and log ratio outputs.
– Global scaling of both chips first – Looks at differences in each specific probe pair – Uses Wilcoxon signed rank test to identify differences (problem with non-independence) – Users have to set “perturbation” which sets a range between the chips that will not be called a change. This will change the above test (and reported p value) – 4 additional parameters to set p value cutoffs for increase/marginal/decrease calls
Fold Change
• Arbitrary threshold is used for filtering (e.g. 2- or 3- fold)
– Fold Change computation: Let: E=mean(group1), B=mean(group2) – FC= (E-B)/max(min(E,B),2*Q) – At low expression levels, a 3 fold change can be noise – At high expression levels a 1.1 fold change can be biologically important.
Confidence Interval for Fold Change
• Confidence intervals provide an interval estimate for a parameter of interest. • Practical Interpretation. A 95% confidence interval will include the true value of the parameter 95% of the time. • If we use Li-Wong or RMA we can estimate the standard error of the expression signal for a pair arrays. • For many arrays can estimate the standard error based on the weighted standard deviation. • Typically we want to CI for the fold-change to exclude 1.
Statistical Tests
• Statistical tests seek to make conclusions about parameters (ex. Mean gene expression in groups 1 and 2) on the basis of data in a sample. • Structure of statistical tests involve testing two hypotheses. The null hypothesis H0 and the alternative hypothesis HA. • The null hypothesis is usually a hypothesis of no difference.
Statistical Errors
• We can never know if we have made a statistical error, but we can quantify the chance of making such errors. What are consequences of errors? • The probability of a Type I error = α : False Positive • The probability of a Type II error = β : False Negative
Basic Statistical Comparisons between groups
• t-Tests & ANOVA
– Much better than fold change thresholds – Need to characterize (or stabilize) variances – Assumes normal distribution of expression values. Transformation helps.
• Non-Parametric Tests: Wilcoxon or Kruskal-Wallis
Multiple testing problem
• We are testing 10,000+ genes. Setting p<0.05 means expect 5% false positives! • Multiple testing corrections adjust the p value • Very deceptive to report unadjusted p value. • Bonferroni multiplies individual p value by number of tests
– Worst case family error rate, too conservative, provides strong control
• False Discovery Rate (FDR) is a better approach for exploratory microarray studies
False Discovery Rate
• Proposed by Benjamini & Hochberg • Simple procedure works on p values of individual tests – Allows combinations of different kinds of tests • Controls expected proportion of false positives in set of non-null hypotheses – FDR * number of rejections = expected maximum number of those rejections that were false. • Weak control. Could be true number of false rejections is larger than the expected number. • Very effective when number of comparisons is large.
Permutation multiple testing correction procedure
• Permutation testing
– Scramble (randomly permute) the data to randomize relationship of interest, leaving other structure unchanged – Do (any) testing procedure on permuted data – Repeat K times. – Proportion of times randomly permuted data scores better than actual data is p value
• As K increases, becomes very accurate measure of p value. Makes no assumptions about test, so works with any test. • Very time consuming (typical K = 2000)
Consequences of multiple testing corrections
• Multiple testing corrections cost statistical power (probability of correctly rejecting the null when there really is a difference)
– The more genes tested, the greater the loss
• Therefore, eliminate as many genes as possible from consideration before testing for significant differences.
– E.g. by eliminating genes that didn't change at all on any chips, without regard to experimental condition.