# Introduction to Technology Introduction to Technology and Analysis

Document Sample

```					Introduction to Technology
and Analysis
In depth class: SPH course 140.688

Bioinformatics MHS
– http://www.bioinformatics.jhsph.edu/mhs.html

Genome Café: SPH 3607

Software: http://www.bioconductor.org

http://www.biostat.jhsph.edu/~ririzarr/
rafa@jhu.edu
Top 10
1.    Plots, transformations
2.    Preprocessing
3.    Differential expression
4.    Annotation
5.    Gene Set Analysis
6.    Clustering
7.    Classiﬁcation
8.    Experimental Design
9.    Quality Control
10.   SNP and Tiling Arrays
The Technologies

Various platforms:
• Probes can be sequenced or cloned

•   Features can be high-density or
circles in a grid

•   One or two samples hybridized to
array
Sequenced (High density)
Before Labeling
Sample 1         Sample 2

Array 1           Array 2
Before Hybridization: One Channel

Sample 1            Sample 2

Array 1              Array 2
After Hybridization

Array 1          Array 2
Scanner Image

Array 1          Array 2
Quantiﬁcation

4   2         0   3   0   4        0    3

Array 1                   Array 2
Microarray Image
Two color

Courtesy of Broad Inst
Before Labeling: Two Channel
Sample 1             Sample 2

Array 1
Before Hybridization
Sample 1             Sample 2

Array 1
After Hybridization

Array 1
Scanner Image

Array 1
Quantiﬁcation

4,0   2,4       0,0   3,3

Array 1
Microarray Image
Differential Expression

This will be topic 3
Microarray Data
Why log?
Why logs?
• For better of worst, fold changes are the
preferred quantiﬁcation of differential
expression. Fold changes are basically ratios
• Biologist sometimes use the following weird
notation: -2 means 1/2, -3 means 1/3, etc…
Note there are no values between -1 and 1!
• Ratios are not symmetric around 1. This
makes it problematic to perform statistical
operations with ratios. We prefer logs
Why logs
• The intensity distribution has a fat right tail
• Log of ratios are symmetric around 0:
– Average of 1/10 and 10 is about 5
– Average of log(1/10) and 10 is 0
– Averaging ratios is almost always a bad idea!

Facts you must remember:
log(1) = 0
log(XY) = log(X) + log(Y)
log(Y/X) = log(Y) - log(X)
log(√ X) = 1/2 log(X)
The MA plots
Scatter Plot
A 45° rotation highlights a problem

This is referred to as MAplot
Quantifying differentially
expression
MA plot of average log ratios
Scatter Smooth
Should we consider
gene-speciﬁc variance?
How do we summarize?
• Seems that we should consider variance even
if not interested in inference

• The t-test is the most used summary of effect
size and within population variation
Another useful plot
• The volcano plot shows, for a particular
test, negative log p-value against the
effect size (M)
Remember these?
MA and volcano
Thank you!

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 17 posted: 11/21/2008 language: English pages: 35
gregoria