Docstoc

Normalization - CBS

Document Sample
Normalization - CBS Powered By Docstoc
					Normalization


    Getting the numbers comparable




                           DNA Microarray Bioinformatics - #27612
                   The DNA Array Analysis Pipeline
                               Question
                          Experimental Design

Array design                 Sample Preparation
                                                            Buy Chip/Array
Probe design                   Hybridization

                               Image analysis

                               Normalization
Expression Index
Calculation                  Comparable
                         Gene Expression Data

                            Statistical Analysis
                        Fit to Model (time series)


                        Advanced Data Analysis
       Clustering      PCA       Classification    Promoter Analysis
       Meta analysis          Survival analysis   Regulatory Network

                                                       DNA Microarray Bioinformatics - #27612
Expression intensities are not just target
concentrations
 • Sample contamination     • Spotting
 • RNA quality              • Other issues related to
 • Sample preparation         array manufacturing
 • Dye effect (cy3/cy5)     • Image segmentation
 • Probe affinity           • Array spatial effects
 • Hybridization
 • Unspecific signal
   (background)
 • Saturation


                                     DNA Microarray Bioinformatics - #27612
Two kinds of variation in the signal

    Global variation        Gene-specific variation
                            Spotting (size and shape)
    RNA quality             Cross-hybridization
    Sample preparation      Dye
    Dye
    Hybridization           Biological variation
    Photodetection             – Effect
                               – Noise




        Systematic                  Stochastic

                                          DNA Microarray Bioinformatics - #27612
Sources of variation

       Global variation:      Gene-specific variation:
         Systematic                 Stochastic

 • Similar effect on many   • Too random to be explicitly
 measurements               accounted for
 • Corrections can be       • “noise”
 estimated from data




     Normalization            Statistical testing
                                      DNA Microarray Bioinformatics - #27612
Calibration = Normalization = Scaling




                                    DNA Microarray Bioinformatics - #27612
Nonlinear normalization




                          DNA Microarray Bioinformatics - #27612
Lowess Normalization



                                *   *
 M                                      *   *
                                                    * *       *




                 A



     One of the most commonly utilized normalization
     techniques is the LOcally Weighted Scatterplot
     Smoothing (LOWESS) algorithm.


                                                DNA Microarray Bioinformatics - #27612
The Qspline method




 From the empirical distribution, a number of quantiles are calculated for
 each of the channels to be normalized (one channel shown in red) and for
 the reference distribution (shown in black)
 A QQ-plot is made and a normalization curve is constructed by fitting a
 cubic spline function
 As reference one can use an artificial “median array” for a set of arrays
 or use a log-normal distribution, which is a good approximation.


                                                    DNA Microarray Bioinformatics - #27612
Once again…qspline




                                  Accumulating quantiles


                     When many microarrays are to be
                     normalized to each other an average
                     array can be used as target




                                        DNA Microarray Bioinformatics - #27612
Invariant set normalization (Li and Wong)




       QuickTime™ and a                     QuickTime™ and a
   TIFF (LZW) decompres sor             TIFF (LZW) decompres sor
are needed to see this picture.      are needed to see this picture.




 A invariant set of probes is used
 -Probes that does does not change intensity rank between arrays
 -A piecewise linear median line is calculated
 -This curve is used for normalization




                                                   DNA Microarray Bioinformatics - #27612
Spatial normalization




         Raw data   After intensity   Spatial bias      After spatial
                    normalization      estimate        normalization




                                                     DNA Microarray Bioinformatics - #27612
                   The DNA Array Analysis Pipeline
                               Question
                          Experimental Design

Array design                 Sample Preparation
                                                            Buy Chip/Array
Probe design                   Hybridization

                               Image analysis

                               Normalization
Expression Index
Calculation                  Comparable
                         Gene Expression Data

                            Statistical Analysis
                        Fit to Model (time series)


                        Advanced Data Analysis
       Clustering      PCA       Classification    Promoter Analysis
       Meta analysis          Survival analysis   Regulatory Network

                                                       DNA Microarray Bioinformatics - #27612
Expression index value

 Some microarrays have multiple probes addressing
   the expression of the same target

   – Affymetrix GeneChips have 11-20 probe pairs pr. Gene
      However for downstream analysis
                            - with only one
      we often want to deal Perfect Match (PM)
                  QuickTime™ an d a
        TIFF (Uncompressed) decompressor
           are need ed to see this picture.
      value pr. gene.       - MisMatch (MM)
      Therefore we want to collapse the
        CGATCAATTGCACTATGTCATTTCT
   PM:intensities from many probes into
   MM: CGATCAATTGCAGTATGTCATTTCT
      one value:
      a gene expression index value
                                              DNA Microarray Bioinformatics - #27612
Expression index calculation

 Simplest method? Median




 But more sophisticated methods exists:
 dChip, RMA and MAS 5




                                DNA Microarray Bioinformatics - #27612
dChip (Li & Wong)

  Model:       PMij = qifj + eij
     Outlier removal:
     – Identify extreme residuals
     – Remove
     – Re-fit
     – Iterate
  Distribution of errors eij assumed
    independent of signal strength


                                       (Li and Wong, 2001)
                                       DNA Microarray Bioinformatics - #27612
RMA

 Robust Multi-array Average (RMA) expression
 measure (Irizarry et al., Biostatistics, 2003)

 For each probe set, re-write PMij = qifj as:
    log(PMij)= log(qi ) + log(fj)

 Fit this additive model by iteratively re-weighted
 least-squares or median polish



                                        DNA Microarray Bioinformatics - #27612
MAS. 5
 MicroArray Suite version 5 uses


 Signal = TukeyBiweight{log(PMj - MM*j)}

 MM* is an adjusted MM that is never bigger than PM
 Tukey biweight is a robust average procedure with weights
 and outlier rejection




                                           DNA Microarray Bioinformatics - #27612
 Methods compared on expression variance

  Standard deviation of gene measures
  from 20 replicate arrays




      Std Dev of gene measures from 20 replicate arrays

 Expression level

RMA: Blue and Red
MAS5: Green
dChip: Black
                                                       From Terry speed

                                         DNA Microarray Bioinformatics - #27612
Robustness
                                                               MAS 5.0
MAS5.0
         Log fold change estimate from 20ug cRNA




                                                          QuickTime™ and a
                                                      TIFF (LZW) decompressor
                                                   are neede d to see this picture.




                                                      Log fold change estimate from 1.25ug cRNA
                                                                                                      (Irizarry et al., Biostatistics, 2003)


                                                                                                  DNA Microarray Bioinformatics - #27612
Robustness
dChip                                                              dChip

        Log fold change estimate from 20ug cRNA




                                                         QuickTime™ an d a
                                                     TIFF (LZW) decompressor
                                                  are need ed to see this p icture .




                                                                                                           (Irizarry et al., Biostatistics, 2003)
                                                     Log fold change estimate from 1.25ug cRNA




                                                                                                 DNA Microarray Bioinformatics - #27612
Robustness
RMA                                                                  RMA

       Log fold change estimate from 20ug cRNA




                                                        QuickTime™ a nd a
                                                    TIFF (LZW) de compressor
                                                 are need ed to see this picture.




                                                                                                         (Irizarry et al., Biostatistics, 2003)
                                                   Log fold change estimate from 1.25ug cRNA




                                                                                               DNA Microarray Bioinformatics - #27612
All of this is implemented in…




                         R

       In the BioConductor packages ‘affy’




                                      (Gautier et al., 2003).
                                 DNA Microarray Bioinformatics - #27612
References
 Li and Wong, (2001). Model-based analysis of oligonucleotide arrays: Model
 validation, design issues and standard error application.
 Genome Biology 2:1–11.

 Irizarry, Bolstad, Collin, Cope, Hobbs and Speed, (2003) Summaries of Affymetrix
 GeneChip probe level data.
 Nucleic Acids Research 31(4):e15.)

 Affymetrix. Affymetrix Microarray Suite User Guide. Affymetrix, Santa Clara, CA,
 version 5 edition, 2001.


 Gautier, Cope, Bolstad, and Irizarry, (2003). affy - an r package for the analysis of
 affymetrix genechip data at the probe level. Bioinformatics




                                                             DNA Microarray Bioinformatics - #27612

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:3/15/2012
language:
pages:24