PowerPoint Presentation - Ohms - Microarray talk by wuyunqing


									RNA sample and array quality checking
         and normalization

                        Stephen Ohms BRF/CBiS

Topics for this talk:

 Some terminology

 RNA sample quality checking

 Selection of Affymetrix QC measurements

 Chip quality checking – checking probe array images for defects

 Normalization – how can we make different arrays comparable
Overview of Affymetrix GeneChip technology

An Affymetrix GeneChip microarray consists of a 1.28 cm2 glass square which is
divided into a grid of about 400,000 probe cells (each about 20 – 24 mm x 20 –
24 mm).

Each probe cell contains about 106 copies of one specific 25-mer oligonucleotide.

There is a set of 32 - 40 oligonucleotide probe cells for each gene or mRNA
The set of 32 - 40 probe cells consists of 16 - 20 perfect match (PM) probes
and a paired set of 16 - 20 mismatch (MM) probes.

Perfect match (PM): A 25-mer complementary to a reference sequence of
interest (e.g., part of a gene).

Mismatch(MM): same sequence as PM but with a single base change for the
middle (13th) base to its Watson-Crick complement .

The purpose of the MM probe is to detect non-specific binding and
background noise.
             MAS Data Architecture

      Assay Steps             Output file
Prepare          Define
Targets           Exp.

 Process probe array

      Scan Chip                  *.DAT

Compute cell intensities          *.CEL     *.RPT

Analyze cell intensities         *.CHP
Data file architecture in Affymetrix Microarray Suite

The main software from Affymetrix is MAS 5.0 (Microarray Suite Version 5.0).
Important files include:

.DAT file: image file produced by the scanner, contains 10 x 10 pixels for each
PM and MM probe cell on the chip (~107 pixels), ~50MB size

.CEL file: cell intensity file, contains PM and MM probe values. The .CEL file
contains one value for every PM and MM cell on the chip.

.CHP file: combines (or summarizes) the 20 PM and 20 MM values for each
probe set to give a single expression level for each probe set. The .CHP file
contains one expression value for every probe set on the chip.
           RNA sample quality checking
RNA quality and quantity is of great importance to the success and
reproducibility of gene expression profiling experiments. There are three
complementary methods to check RNA quality:

(1) run the initial total RNA on an agarose gel and examine the ribosomal RNA
    bands. Non-distinct ribosomal RNA bands indicate degradation.

(2) RNA purity and yield can be determined by optical density (OD) absorbance
    measurements at wavelengths of 260 and 280 nm. 260/280 absorbance
    readings should be carried out for both total RNA and biotinylated cRNA.

     Acceptable 260/280 ratios fall in the range of 1.8 to 2.1.

     Ratios below 1.8 indicate possible protein contamination.

     Ratios above 2.1 indicate the presence of degraded RNA, truncated
      cRNA transcripts, and/or excess free nucleotides.
(3) Assessing RNA sample quality with the Agilent Bioanalyzer:

Evaluation of RNA sample quality can also be carried out with the Agilent
Bioanalyzer which is similar to a gel in concept.

This has the advantage of requiring less RNA than a gel (e.g. 200 pg of total
RNA, 1 mL volume).

Electropherograms (fr: electrophoresis) detect and measure the ribosomal 5S,
18S, and 28S bands.
Assessing RNA quality with the Bioanalyzer
                     When viewed as a virtual gel, the 28S,
                     18S, and 5S rRNA bands are easily
                     The sample RNA in Lane 7 [outlined in
                     red] demonstrates how degraded RNA
                     Conversely lane 8 contains good quality
                     RNA [outlined in blue] and the
                     corresponding electropherogram can be
                     viewed in Figure 3 in the next slide.

                     RNA samples with visible degradation
                     should not be further processed.
Assessing RNA quality with the Bioanalyzer: electropherograms
 Ideally, the ratio of 28S/18S bands should be close to 2, but samples that show
 clear 18S and 28S peaks are acceptable

Poor quality sample: The 18S and 28S        Good quality sample: The 5S, 18S, and
rRNA peaks are jagged and there are         28S rRNA peaks are clean without
many small peaks (fragmented RNA)           additional erroneous spikes.
throughout the run demonstrating
                   Array quality checking

Inspect the probe array image (.DAT file) for the presence of image artifacts
(i.e., high/low intensity spots, scratches, high regional, or overall
background,etc.) on the array.

There should be no white speckling, holes, smudges, areas of saturation or
uneven hybridization on the chip.

Defects in probe array images may also be seen in:

(1) Histograms of probe intensities

(2) q-q plots of probe intensities
                              Raw PM intensity values for array #1                                    Log2 PM intensity values for array #1






                   0   2000       4000       6000        8000     10000   12000                   7   8       9           10        11      12   13

                                         Raw PM intensity value                                                   Log2 PM intensity value

     Histograms of raw PM or MM intensities are usually skewed to the right.

     Thus they can be transformed to a log2 scale for plotting. The logarithm
     function tends to squeeze together the larger values in a data set (values larger
     than 1) and stretches out the smaller values (values less than 1). The same
     data has been plotted using the two scales above.
       Why we do log transformation of probe intensities

   The logarithm function tends to squeeze together the larger values in a data set
   (values larger than 1) and stretches out the smaller values (values less than 1).
   This is illustrated in the plot of a log function below e.g. y = log(x)

                       2.0 2.2 2.6 2.8
                                           The first two values on the x axis are
1.03                                       2.0 and 2.2. Their logarithms, 0.69
0.96                                       and 0.79 on the y axis are much
0.79                                       closer.
                                           The second two values, 2.6 and 2.8,
                                           are squeezed even more. Their
                                           logarithms are 0.96 and 1.03.
Log transformation of probe intensities
                      0.2 0.24 0.4 0.45


 The values of 0.4 and 0.45 on the x axis have logarithms (-0.92 and

 -0.80) that are further apart. The values of 0.20 and 0.25 on the x axis
 are stretched even further. Their logarithms are -1.61 and -1.39,
Checking array quality with images and histograms

The following set of arrays show one normal array and arrays with
different types of defect.

The probe array images and histograms of the log2 intensities clearly
show the defects.
q-q plots provide a visual
comparison of two

A q-q plot is a plot of the
quantiles of one data set
against the quantiles of the
second data set.

The elements of each
population are sorted (or
“ranked”), and the quantiles
of the first data set are
plotted against the
quantiles of the second data

By a quantile, we mean the fraction (or percent) of points below the given value.
That is, the 0.3 (or 30%) quantile is the point at which 30% percent of the data
fall below and 70% fall above that value.
          Example of a q-q plot
Suppose our data sets are:
Data set 1: 6.0, 3, 1, 4.1, 5.2, 2

Data set 2: 4.1, 1.2, 3.2, 2.1, 6.1, 5.0

We sort (or “rank”) the data:
Data set 1 -> 1, 2, 3, 4.1, 5.2, 6.0

Data set 2 -> 1.2 2.1, 3.2, 4.1, 5.0, 6.1

Then we plot the corresponding quantiles against each other:
e.g 1 in data set 1 vs 1.2 in data set 2,
2 in data set 1 vs 2.1 in data set 2, etc
                   q-q plot example
A 45-degree reference line is also                             q-q plot example

plotted. If the two sets come from

a population with the same
distribution, the points should fall

approximately along this
reference line.

                                       dataset 2

The greater the departure from
this reference line, the greater the

evidence for the conclusion that
the two data sets have come from                       1   2    3               4   5   6

populations with different                                          dataset 1

Many aspects of the
distributions can be
simultaneously tested.

For example, shifts in
location, shifts in scale,
changes in symmetry,
and the presence of
outliers can all be
detected from this plot.
If the two data sets come
from populations whose
distributions differ only by
a shift in location, the
points should lie along a
straight line that is
displaced either up or
down from the 45-degree
reference line.
This q-q plot shows that:

These 2 batches do not
appear to have come
from populations with a
common distribution.
The batch 1 values are
significantly higher than
the corresponding batch
2 values.
The differences are
increasing from values
525 to 625. Then the
values for the 2 batches
get closer again.
Checking array quality with histograms and q-q plots

The histograms of the log2 PM intensities and the q-q plots of the log2
PM intensities and log2 MM intensities clearly show the defects in the
following arrays.

The corresponding histograms and q-q plots of the log2 MM intensities
show the defects in a similar manner.
Histograms of .CEL intensities are a good visualization tool for
identifying saturation, which is seen as an additional peak at the
highest log intensity in the plot.

Saturated probes should be excluded from further analysis.
                           Box plots

Box plots are another good visualization tool for analyzing the overall
intensities of all probes across the array.
The box is drawn from the 25th and 75th percentiles in the distribution
of intensities.
The median, or 50th percentile, is drawn inside the box. The whiskers
(lines extending from the box) describe the spread of the data.
                            Affymetrix QC
3'/5' ratio of housekeeping genes:

Reverse transcriptase synthesizes cDNA starting from the 3'-end of an mRNA
and finishing at the 5'-end.

Thus most GeneChip expression arrays contain probes for the regions
corresponding to the 3', middle and 5'-end of the house keeping genes such as
GAPDH and Actin.

The ratio of signal intensity for 3' probes to that from 5' probes is a measure of
the number of cDNA synthesis reactions that went to completion (full length
cDNA is synthesized).

An ideal ratio is 1 while a higher value indicates that many cDNAs were started
but did not go to completion.

The 3'/5' ratio for the housekeeping genes should be at most 3 or < 4 for
GAPDH (oligonucleotide array – Best Practices Workshops 2003).

                          Percent Present (%P) values

The number of probe sets called “Present” relative to the total number of probe
sets on the array is considered to be an important quality control criterion by
the oligonucleotide array Best Practices Workshops (BPW) 2003.


Percent Present (%P) values depend on multiple factors including cell/tissue
type, biological or environmental stimuli, probe array type, and overall quality
of RNA.

BPW 2003: for PMT settings of 1,800 Volts, with SAPE amplification:

% present calls should be > 25% (“A” mouse, rat, human arrays)

Extremely low %P values are a possible indication of poor sample quality.

Replicate samples should have similar %P values (Affymetrix).

% present calls should be consistent within a range of 10% (BPW 2003)

The overall “brightness” (or image intensity) of a scanned array usually
varies from array to array.

Sometimes it is possible to see differences in the overall intensities of the
scanned images of arrays:

The upper probe array image in the handout is abnormally bright compared
to the lower (normal) image.

Differences in the overall intensities
of arrays can also be seen by
plotting histograms of the probe                   0.6

intensities on a single plot (or the
log2 of the probe intensities):

The histogram of the abnormal bright

array is shifted to the right

Another array is also shifted to the
right, but to a lesser degree

                                                         6   8      10           12   14   16

                                                                 log intensity
We can also see the need for
normalization from boxplots:

Boxplots show the minimum, lower

quartile, upper quartile, median and
maximum value of the log 2 intensity

                                          Log 2 intensities
levels for each array:

The central box is drawn from the 25th
to the 75th percentiles with a line for
the median.
                                                                   1   2   3   4   5   6   7   8

Lines(“whiskers”) join the box to the
extreme observations, apart from
suspected outliers which are plotted as                       The abnormal bright array is on the left
individual points.

There are many sources of experimental variation that cause differences in the
overall intensities of arrays:

    During RNA preparation: e.g. differences in number of cell between
     samples, mRNA extraction, labeling

    During manufacture of arrays: e.g. amount of oligos on cells

    During hybridization: e.g. amount of sample applied, amount of target

    After hybridization: e.g. optical measurements, label intensity, differences
     in signal measurement sensitivity/differences in scanning between
In order to compare gene expression levels between two or more arrays, it is
necessary that the overall intensity of each array be brought to the same

There are many different ways to achieve this, but the simplest method is to
multiply every expression value on one array by a constant scaling factor to
make the overall level equivalent to that of another array.

NB: a special case occurs when there are large overall changes in gene
expression between arrays e.g. diauxic shift experiment.

The method of normalization described above should not be used in this
What we trying to do by normalizing
      – note this is quantile normalization
                 Normalization to mRNA:

The most widely used normalization method is to normalize to mRNA.

This method assumes that the total amount of mRNA in a cell remains
constant under various experimental conditions.

It is assumed that the large majority of genes remain relatively constant and
those genes whose expression levels are increased are countered by genes
whose expression levels decrease.

Thus the sum (or mean) of the expression values of each array is simply
scaled to a common number. This gives the scaling factor. Then every
expression value on the array is multiplied by this scaling factor.

This is known as the global method of scaling/normalization in MAS 5.0 and
is used in the majority of experiments.
 Normalization to housekeeping genes

This method is based on the assumption that certain “housekeeping” genes
such as GAPDH or actin are constitutively expressed at a constant level.

Then we check that these genes are expressed at the same level in the
control and in the sample.

If they are not, we multiply all expression levels in the sample array by a
scaling factor until the maintenance genes match the expression levels in the
      Normalization to housekeeping genes

It is rare, however, that maintenance genes are completely unaffected by an
experiment and there have been numerous reports that under certain
circumstances this is not the case [Suzuki et al, 2000, Biotechniques 29: 332 -

For example, if one is comparing the expression levels of a small number of
genes where other more global normalization methods are inappropriate this
method is acceptable.

However, even in these cases, it should be kept in mind that small differences
may be due to differences in the standard.

In general, this is not considered an acceptable normalization method.
         Normalization to a reference RNA
A special situation arises when a large proportion of genes are affected in the
experiment due to global effects such as starvation or heat shock.

The global method of scaling does not make sense in this case because the
overall intensities among arrays are no longer comparable. Differences in
overall intensity are due to biological and/or environmental conditions.

In this case, spike (hybridization) controls can be used for normalization,
addition of mRNA from a foreign organism for which there are probes on the
chip. If spike controls are added in equal amounts to mRNA preparations, it is
possible to use the ratio of their intensities to scale.

For example, Affymetrix adds bacterial and bacteriophage probes to their
eukaryotic GeneChips. This allows the data from each microarray to be scaled
to the levels of these known amounts of RNA standards.

This is known as the Selected Probe Sets method of scaling/normalization in
MAS 5.0.
          Normalization to a reference RNA

The scaling factor is calculated from the ratio of the intensity levels of the
reference probes in the arrays. Then every expression level in the array is
multiplied by this scaling factor.

The disadvantages of this method of normalization arise from the compounded
errors associated with the amounts of the standard applied to each array and
errors in the measurements of these standards.

But how we know that the mRNA preparations are comparable at the time that
we add the spike controls to them?

- use spike control labeling only if there is no other choice.
                 Normalization in MAS 5.0
MAS calculates the mean intensity
of the baseline array by averaging
the intensity values of every probe
set on the array with the
exception of the top and bottom
2% of the probe set intensities
(this is known as the trimmed

The trimmed mean is used to
make the measurement robust
(i.e. not sensitive to outliers).

The mean intensity of the
comparison array is then
multiplied by a Normalization
Factor (NF) to bring it to mean
intensity level of the baseline
                      Scaling in MAS 5.0

As before, MAS calculates
the trimmed mean intensity
of each array.

The mean intensity of the
array is then multiplied by a
Scaling Factor (SF) to bring
it to an arbitrary Target
Intensity value (usually
150) set by the user.
Thus, scaling allows a
number of experiments to
become normalized to one
Target Intensity, allowing
comparison between any
two experiments.

In a particular set of experiments, the Scaling Factor value for all the arrays
should be close to each other (within three-fold of each other).
              Scaling/Normalization in MAS

The scaling/normalization factors (calculated by the global method) should be
comparable among arrays.

Best Practices Workshop 2003: for PMT settings of 1,800 Volts and using SAPE
amplification – the scaling factor should be < 5

Scaling/normalization factors calculated by the “Selected Probe Sets” method
should also be equivalent for arrays being compared.

Large discrepancies between scaling/normalization factors (e.g. three-fold or
greater) may indicate significant assay or biological variability or degradation of
the transcripts used for scaling/normalization, which leads to noisier data.
                                                        Non-linear scaling

                              The linear scaling method described above is suitable when the same degree of
                              scaling can be applied to all expression levels. However, there are instances
                              where the plot of intensities appears like the middle plot below
Gene intensities for chip 2

                               After scaling                         After scaling                       After scaling

                                                Before scaling                        Before scaling                      Before scaling

                                 Gene intensities for chip 1           Gene intensities for chip 1         Gene intensities for chip 1

                    One scale factor, linear data                One scale factor, nonlinear data      Nonlinear scaling, nonlinear data
                          Non-linear scaling
For non-linear, non-zero y-intercept data (centre graph), the use of a single
scaling factor fails:

After scaling chip 1 with a single factor, all genes are off the diagonal, implying
different expression on the two chips when it should be the same for most

Such nonlinear data are better scaled by scaling the weakly and highly
expressed genes separately.

Within a small range there is linearity and a simple scaling factor can be used.

(see Schadt et al, 2000, J. Cell. BioChem 80: 192-202,
and dChip: Li and Wong, 2001 Genome Biology 2: 1-11).

Another example of a piece-wise linear scaling technique for non-linear data is
LOWESS (LOcally WEighted Scatterplot Smoothing) normalization (regression).
                     Non-linear scaling
How many scaling factors should be used?

If we were to use as many scale factors as there are genes we will scale out
the signal, i.e. those genes that are significantly different between the two

Thus, for example, Schadt et al, J Cell Biochem (2000) 80: 192-202, use two
scale factors.

As for the case of linear scaling, robustness is an issue. One way to make
these methods more robust is to perform the normalization in multiple steps:
first fit a curve, remove a certain fraction of points which are furthest away
from the curve and which therefore are likely to be outliers, and finally fit a
curve again on the remaining points. Multiple runs of these can be done to
increase robustness.
Quantile normalization is another method of normalization that is used in RMA and
other packages from Bioconductor (www.bioconductor.org).

Each array contains a certain distribution of expression values and this method aims
at making the distributions across various arrays not just similar but identical.

This is done as follows:
 Imagine that the expression values from the various arrays have been loaded into
a spreadsheet with genes along rows and arrays along columns

 Each column is sorted first. Next the value in each row is replaced with the
average of the values in this row

 Finally, the columns are unsorted (i.e. the effect of the sorting step is reversed so
items in a column go back wherever they came from)

 The distributions in all arrays become identical in this process

                        Array1       Array2        Array3       Array4
          Gene 1        10           1000          10           10
          Gene 2        10           10            10           10
          Gene 3        10           10            10           10
          Gene 4        1000         1000          1000         1000
Original expression levels

                Array1          Array2          Array3   Array4
Gene     1      10              1000            10       10
Gene     2      10              10              10       10
Gene     3      10              10              10       10
Gene     4      1000            1000            1000     1000
Quantile normalization - Step 1: Sort each column

                Array 1         Array2          Array3   Array4
Gene     1      10              10              10       10
Gene     2      10              10              10       10
Gene     3      10              1000            10       10
Gene     4      1000            1000            1000     1000
Quantile normalization - Step 2: “unsort” the columns back to their original

                Array 1          Array2          Array3           Array4
Gene     1      10               257.5           10               10
Gene     2      10               10              10               10
Gene     3      257.5            10              257.5            257.5
Gene     4      1000             1000            1000             1000
                  Normalization in dChip
Note that MAS applies normalization after probe set summarization, but it is
also possible to do normalization before the summarization step.

e.g. in dChip:
By default, in a set of samples, the chip with median overall brightness is
chosen, and all other arrays are normalized to it using “invariant set

In invariant set normalization, a subset of PM probes are obtained as follows.
The probes on each of the arrays are ranked by expression value and probes
with similar ranks comprise the invariant set. The trimmed mean or curve
fitting is carried out on the invariant set only and then the actual shift or shifts
are performed on all the probes.
Normalization is important:

Hoffmann et al [Genome Biol. (2002) 3: Research 0033.1 – 0033.11] applied
12 algorithms by combining 4 normalization methods (invariant feature
normalization, model-based expression values, invariant probe set and global
scaling) and 3 methods of statistical analysis (parametric ANOVA/non-
parametric ANOVA and SAM) to two sample groups.

(1) The normalization method had more impact on the resulting gene list than
the method of statistical analysis.

(2) The underlying assumptions of current normalization methods generally fail
for tissues of different origins or sample subjected to extreme treatment
The user needs to apply existing normalization methods with caution in such
 Which method of normalization is best?
There are no easy answers:

(1) MAS 5.0 is not the final answer.

(2) Bolstad et al (2003) Bioinformatics 19: 185-193, have compared
normalization methods.

Their main conclusion is that non-linear methods and their derivatives that do
not require a baseline and the quantile method typically perform better than
plain unnormalized data or normalizing through mean shifting.

(3) Other references:

    Irizarry et al (2003) Biostat (in press)

    Zhou and Abagyan, Current Opinion in Drug Discovery and Research
    (2003), 6(3): 339-345

The Association of Biomolecular Resource Facilities (ABRF) conducted a
multicenter study in 2002 to identify factors that contribute to variability in
oligonucleotide microarray results.

This retrospective study used data from 835 MG-U74A and HG-U95A Affymetrix
arrays that were previously generated in the microarray core facilities of the
members of the Microarray Research Group (MARG) (Knudtson et. al; Factors
contributing to variability in DNA microarray results: the ABRF Microarray
Research Group 2002 study. J. Biomol. Tech. 2002; 108).
The results of this study indicate that:

Lab-to-lab variation accounted for the greatest source of error.

.CHP data generated by different institutions may not be easily compared
without further normalization in comparative analyses.

Generating high quality microarray data requires vigorous quality control
measures at each individual step of the process, starting with the experimental
design of the study, the generation of samples, extraction of RNA, labeling of
the probe, and microarray hybridization.

In addition:

To minimize experimental variability, the same personnel should perform the
hybridization, washing, and scanning steps.

The same Affymetrix protocols should be used for hybridization, washing, and
scanning, and microarrays from the same production lot should be used in
comparative studies.

CBiS/CMA                   JCSMR
                           Frances Shannon
Sue Wilson
Simon Easteal
Yvonne Pittelkow           BRF/JCSMR
John Maindonald            Karen Edwards
                           Kaiman Peng

To top