Analysis of microarray data by sammyc2007

VIEWS: 125 PAGES: 23

									Analysis of microarray
• Microarrays are chips which measure
  whether genes are switched on or off in
• They can be used to detect sets of
  genes responsible for genetic diseases
  such as cancer.
• This lecture:
   – introduce microarray technology
  – discuss a few applications
  – introduce statistical and computational
   techniques for analysing microarray data
          Gene expression
• All cells in an organism have the same
  genomic DNA.
• Distinct cellular identities are due to
  differences in gene expression (= transcription
  & translation of gene).
• Whether a gene is transcribed is often
  determined by the presence/ absence of other
  genes products (esp. proteins) …
• … so genes interact in complex networks:
  gene A switches on B, which turns off C which
  upregulates (increases) A, …
• Hence perturbations to single gene can lead to
  changes in expression of many genes.
      Functional genomics
• Next step after sequencing of human genome:
  understand connection between DNA sequence
  & phenotypic (actual) characteristics of
• This is complex, because proteins and genes
  act in highly connected networks and
  signalling pathways in an orchestrated
• Traditionally molecular biology has worked on
  a “one gene one function” basis & experiments
  tend to study the effects of a single gene/ few
  genes at a time, but...
         Microarray chips
• …microarrays can measure many genes at
• Microarray chips are commonly glass slides
  with a matrix of spots printed (using eg. dot
  matrix technology) on to them.
• A spot contains millions of identical molecules
  of DNA or oligonucleotide (the probes),
  which will bind a specific DNA sequence, such
  as the cDNA of a gene.
• The glass slides can contain 1000s of spots,
  each recognising a different sequence, eg. one
  spot for every gene in the human genome.
   Microarray experiments
• Since almost all mRNA translated protein, total
  mRNA of cell ~ genes expressed.
• Mash up cells and extract mRNA.
• Reverse transcribe RNA  cDNA (can be
  heated to make single-stranded).
• Label cDNA from reference cells green (Cy3)
  and cDNA from target cells red (Cy5).
• Hybridise (wash on equal amounts of target &
  reference sample & allow to bind to probes
  which have complementary bases) both
  samples, reference and target, to a single
  microarray chip.
     Results of microarray
• The spot for gene 1 =
  – red if more mRNA 1 in target cells
  – green if more mRNA 1 in reference cells
  – yellow if same in both
• Actually, images of red & green
  fluorescence are taken separately using
  laser & scanner & their intensities are
  measured using image software.
• Data often expressed as matrix of
                                 intensity red
  relative expression levels = intensity green ,
  indexed by genes and target samples.
Microarray data
                  Red (Cy5)
                  and green
                  images are
                  each spot
                  to a gene.
          Microarray data
• Reason for using relative intensities: process
  of printing of spots on to chips does not give a
  reliable fixed number of molecules, so the
  intensity measurements (which correspond to
  the amount of bound sample cDNA) represent
  not only the level of expression of the gene,
  but also the peculiarities of the chip.
• Some disadvantages to not having the
  absolute gene expression values- eg.
  confidence limits on the microarray
  measurement depend heavily on the actual
   Principal uses of chips
• Genome-scale expression analysis
  – Differentiation
  – Response to environmental factors
  – Disease states
  – Effect of drugs
• Detection of sequence variation
        Applications of
      microarrays - yeast
• The fact that we can only reliably measure
  relative gene expression, means that
  microarrays tend to be used for comparative
• Eg. “what changes in gene expression arise
  when yeast is in anaerobic v. aerobic
  conditions?” - deRisi et al, Science v. 278,
• Spot arrays with complementary DNAs to all
  genes from the yeast genome (the probes).
• Approx. 6400 probes
        Applications of
      microarrays - yeast
• Reverse transcribe mRNA from yeast cells
  harvested at various time points as conditions
  are varied from anaerobic to aerobic (start
  fermentation in sugary solution and allow
  yeast to deplete sugar).
• 7 time points (2hr intervals, first 9 hrs after
  placed in sugary medium)
• Let sample from first time point be “reference”
  (totally anaerobic, lots of sugar).
• Label reference cDNA with green dye (Cy3)
  and other sample cDNA (later time points)
  with red dye (Cy5).
        Applications of
      microarrays - yeast
• Hybridize mixture of equal quantities of
  reference sample and one of the later-time
  samples (also do timepoint 1 against itself as
  control test) to a microarray chip- one
  experiment/ chip per timepoint.
• Take images of red and green fluorescence,
  measure intensities, process (details of this
  later in lecture) and create a matrix, M, with
  entries,           intensity red
            M ij                  , at spot
                   intensity green
  representing gene i in chip containing sample j
  (jth timepoint).
         Applications of
       microarrays - yeast
• Look for genes that are differentially
  expressed in aerobic and anaerobic conditions.
• Find that when sample at initial timepoint is
  compared to itself, 99% correlation between
  intensity values.
• Timepoint 1 v. timepoint 2: 95% of genes
  have < 1.5-fold difference in expression-
  correlation of 98% between data at 2
• Timepoint 1 v. timepoint 7: c. 1700 genes out
  of 6400 had > 2-fold difference in expression-
  some genes had much higher ratio.
• Authors could infer properties of signalling
  pathways involved in the shift in metabolism.
        Applications of
      microarrays- cancer
• Take a set of patients with a certain type of
  cancer and a set of control patients with no
  cancer, take cells from tumour/ region where
  tumour is in cancer patients. Extract mRNA,
  make cDNA and dye one of the samples from
  a control patient green; all other samples red.
• Make/ buy a chip with human genes- as many
  as possible/ those thought to be relevant for
• Hybridise mixture of reference sample (green)
  and one of the other target samples to each
       Applications of
     microarrays- cancer
• Process data and statistically analyse to find
  genes which have significantly higher/ lower
  expression in cancer cells than in normal cells.
  These genes are likely to be important in
  causing cancer/ effects of cancer.
• Can also cluster data to discover different
  subclasses of cancer, eg.
       Alizadeh et al. (2000) Nature, v. 403,
• A cancer of the immune cells (lymphoma) is
  clinically diverse: 40% patients respond well
  to therapy and have good survival. Authors
  used hierarchical clustering (see later) to
  discover two new subclasses of the cancer,
  classified based on gene expression profiles.
        Applications of
      microarrays- cancer
• Thinking of the relative gene expression
  values (in fact intensities) of the different
  samples (patients) as a vector, the authors
  were able to cluster the data.
• Microarray profiling of tumours can be used to
  classify tumours into subclasses (with eg.
  survival implications) of already known
  tumour types.
        Different kinds of
• cDNA versus oligonucleotide
• We have discussed so far gene expression
  microarrays, but also:
  – Sequencing chips: contain as probes, all possible
    sequences of a given length k (typ. k=8-10 bases
    long). Mark target sample with fluorescent dye and
    hybridise. The spots with fluorescence are where
    target bound. The corresponding sequence is part of
    the target spectrum (=set of k-base sequences in
    target). Then use computers to assemble whole
    sequence. Target cannot be too long (eg. 150-200
    bps if k=8).
  – Can be used for looking for gene mutations/
    Analysis of microarray
• Data is matrix, Mij of (absolute or usually)
  relative expression values of gene i in
  condition j. Often presented as log2 values,
  since this means that downregulation of gene
  (eg. ratio ½) is not squashed into interval
  (0,1), but takes values (eg. –1) in (-,0).
• Pre-processing: There are several
  sources of variation in intensity in microarray
  experiments other than differences in gene
  expression between samples. These are
  thought of as noise and we want to remove
  them, by pre-processing. First subtract
  background intensity, which is due to binding
  to wrong spot, etc. (this is usually done by the
  image processing software).
    Analysis of microarray
• Normalization: Another source of noise is
  due to differences in labelling and detection
  efficiencies for the fluorescent labels and in
  the amount of RNA between the 2 samples
  (red/green). Normalization tries to get rid of
  this by dividing all the ratios by an appropriate
  constant to make the mean or the median of
  the ratios =1 (mean /median centring
  respectively). If the data is in log form- simply
  subtract constant.
• Assumption is that on average across all (or a
  chosen subset of) genes the levels of mRNA
  produced will be the same in the two samples.
• Alternatively use scatter plot of intensity green
  v. red & normalize to make slope=1.
     Analysis of microarray
• Normalized data: log 2 ( R / G)  c  log 2 ( R /(Gk ))
  where R & G are the red & green intensities –
  the respective backgrounds and c  log 2 k is the
  normalization constant.
• Filtering: This is the process of working out
  which genes are differentially expressed
  across the different conditions (eg. timepoints
  of the yeast experiment or cancer v. non-
  cancer) and removing from the dataset those
  genes which don’t vary. We will discuss this in
  detail later.
    Analysis of microarray
• Clustering: If you view the expression values
  of a single gene across different samples
  (rows of the expression matrix) as a vector
  then the genes can be clustered based on the
  similarity of the vectors. Likewise, using the
  columns of the matrix, the samples can be
  clustered. This helps eg. to classify cancers/
  find genes which are in same network as each
  other or have similar functions.
• We have described microarray chips for
  analysing gene expression.
• We have mentioned three key areas of
  – Normalization
  – Filtering
  – Clustering
• In the next session we will cover
  statistical methods necessary for
  filtering microarray data.

To top