Estimating Gene Expression Signal For Affymetrix Arrays_1_

Reviews
Comparing Gene Expression Between Affymetrix Arrays Biostat 278 January 22th, 2008 Identification of “Interesting” Genes • Many different methods for finding differentially expressed genes. Depends on experimental design and outcome measures Observational Methods Statistical Methods • Present/Absent • Test for Significant (MAS4) Signal (MAS5) • Increase/Decrease Call • Statistical Test for Difference • Absolute Difference • Confidence Interval • Fold Change for Fold-Change Present/Absent • The percentage of arrays in which the gene is called present (Affy or Statistical Test) can be used as a filtering criterion. • Genes with low percent present likely have expression in the noise range. • Can remove “Absent” genes prior to the statistical comparison between arrays. • Referred to a m/n filter (Pounds/Cheng 2005) Increase/Decrease Call • MAS 5.0 also allows users to set a baseline chip to allow increase/decrease and log ratio outputs. – Global scaling of both chips first – Looks at differences in each specific probe pair – Uses Wilcoxon signed rank test to identify differences (problem with non-independence) – Users have to set “perturbation” which sets a range between the chips that will not be called a change. This will change the above test (and reported p value) – 4 additional parameters to set p value cutoffs for increase/marginal/decrease calls Fold Change • Arbitrary threshold is used for filtering (e.g. 2- or 3- fold) – Fold Change computation: Let: E=mean(group1), B=mean(group2) – FC= (E-B)/max(min(E,B),2*Q) – At low expression levels, a 3 fold change can be noise – At high expression levels a 1.1 fold change can be biologically important. Confidence Interval for Fold Change • Confidence intervals provide an interval estimate for a parameter of interest. • Practical Interpretation. A 95% confidence interval will include the true value of the parameter 95% of the time. • If we use Li-Wong or RMA we can estimate the standard error of the expression signal for a pair arrays. • For many arrays can estimate the standard error based on the weighted standard deviation. • Typically we want to CI for the fold-change to exclude 1. Statistical Tests • Statistical tests seek to make conclusions about parameters (ex. Mean gene expression in groups 1 and 2) on the basis of data in a sample. • Structure of statistical tests involve testing two hypotheses. The null hypothesis H0 and the alternative hypothesis HA. • The null hypothesis is usually a hypothesis of no difference. Statistical Errors • We can never know if we have made a statistical error, but we can quantify the chance of making such errors. What are consequences of errors? • The probability of a Type I error = α : False Positive • The probability of a Type II error = β : False Negative Basic Statistical Comparisons between groups • t-Tests & ANOVA – Much better than fold change thresholds – Need to characterize (or stabilize) variances – Assumes normal distribution of expression values. Transformation helps. • Non-Parametric Tests: Wilcoxon or Kruskal-Wallis Multiple testing problem • We are testing 10,000+ genes. Setting p<0.05 means expect 5% false positives! • Multiple testing corrections adjust the p value • Very deceptive to report unadjusted p value. • Bonferroni multiplies individual p value by number of tests – Worst case family error rate, too conservative, provides strong control • False Discovery Rate (FDR) is a better approach for exploratory microarray studies False Discovery Rate • Proposed by Benjamini & Hochberg • Simple procedure works on p values of individual tests – Allows combinations of different kinds of tests • Controls expected proportion of false positives in set of non-null hypotheses – FDR * number of rejections = expected maximum number of those rejections that were false. • Weak control. Could be true number of false rejections is larger than the expected number. • Very effective when number of comparisons is large. Permutation multiple testing correction procedure • Permutation testing – Scramble (randomly permute) the data to randomize relationship of interest, leaving other structure unchanged – Do (any) testing procedure on permuted data – Repeat K times. – Proportion of times randomly permuted data scores better than actual data is p value • As K increases, becomes very accurate measure of p value. Makes no assumptions about test, so works with any test. • Very time consuming (typical K = 2000) Consequences of multiple testing corrections • Multiple testing corrections cost statistical power (probability of correctly rejecting the null when there really is a difference) – The more genes tested, the greater the loss • Therefore, eliminate as many genes as possible from consideration before testing for significant differences. – E.g. by eliminating genes that didn't change at all on any chips, without regard to experimental condition.

Related docs
premium docs
Other docs by vixycn
User v
Views: 10  |  Downloads: 0
Computer Security The Security Kernel
Views: 9  |  Downloads: 0
Frame_ Reproducing Kernel and Learning
Views: 2  |  Downloads: 0
kernel _1_ _ Error - EMF-FEM
Views: 3  |  Downloads: 0
Self stabilizing Linux Kernel Mechanism
Views: 3  |  Downloads: 0
Object-Oriented Programming_12_
Views: 5  |  Downloads: 0
Object-Oriented Programming_11_
Views: 4  |  Downloads: 0
Behaviour in object-oriented database systems
Views: 3  |  Downloads: 0
Chapter 1_ Object-Oriented Thinking
Views: 6  |  Downloads: 0