ppt slides - PowerPoint Presenta by pengxiang


									           Statistics for Microarrays
      Biological background: Gene Expression
        and Molecular Laboratory Techniques

Class web site: http://statwww.epfl.ch/davison/teaching/Microarrays/
Basic principles in physics, chemistry and biology

                   Principles Known?

       Physics        Chemistry        Biology
        Matter        Compound         Organism

      Elementary      Elements          Genes

         Yes             Yes             No
Central Paradigm
Protein Synthesis

• Transcription is a complex process involving
  several steps and many proteins (enzymes)
• RNA polymerase synthesizes a single
  strand of RNA against the DNA template
  strand (anti-sense strand), adding
  nucleotides to the 3’ end of the RNA chain
• Initiation is regulated by transcription
  factors, including promoters, usually an
  initiator element and TATA box, usually
  lying just upstream (at the 5’ end) of the
  coding region
• 3’ end cleaved at AAUAAA, poly-A tail
           Exons and Introns

• Most of the genome consists of non-coding
• Some non-coding regions (centromeres and
  telomeres) may have specific chomosomal
• Other non-coding regions have regulatory
• Non-coding, non-functional DNA often
  called junk DNA, but may have some effect
  on biological functions
• The terms exon and intron refer to coding
  and non-coding DNA, respectively
Intron Splicing
Transcription Overview
Transcription Illustration

• The AUG start codon is recognized by
• Once the start codon has been identified,
  the ribosome incorporates amino acids into
  a polypeptide chain
• RNA is decoded by tRNA (transfer RNA)
  molecules, which each transport specific
  amino acids to the growing chain
• Translation ends when a stop codon (UAA,
  UAG, UGA) is reached
Translation Illustrated
From Primary Transcript
      to Protein
     Alternative Splicing (of Exons)

• How is it possible that there are over
  1,000,000 human antibodies when there
  are only about 30,000 genes?
• Alternative splicing refers to the different
  ways the exons of a gene may be combined,
  producing different forms of proteins
  within the same gene-coding region
• Alternative pre-mRNA splicing is an
  important mechanism for regulating gene
  expression in higher eukaryotes
   Molecular Laboratory Techniques

• Hybridizing DNA

• Copying DNA

• Cutting DNA

• Probing DNA

• Hybridization exploits a potent feature of
  the DNA duplex – the sequence
  complementarity of the two strands
• Remarkably, DNA can reassemble with
  perfect fidelity from separated strands
• Strands can be separated (denatured) by
       Polymerase Chain Reaction (PCR)
• PCR is used to amplify (copy) specific DNA
  sequences in a complex mixture when the ends of
  the sequence are known
• Source DNA is denatured into single strands
• Two synthetic oligonucleotides complementary to
  the 3’ ends of the segment of interest are added
  in great excess to the denatured DNA, then the
  temperature is lowered
• The genomic DNA remains denatured, because
  the complementary strands are at too low a
  concentration to encounter each other during the
  period of incubation, but the specific
  oligonucleotides hybridize with their
  complementary sequences in the genomic DNA
                   PCR, ctd
• The hybridized oligos then serve as primers
  for DNA synthesis, which begins upon
  addition of a supply of nucleotides and a
  temperature resistant polymerase such as
  Taq polymerase, from Thermus aquaticus (a
  bacterium that lives in hot springs)
• Taq polymerase extends the primers at
  temperatures up to 72˚C
• When synthesis is complete, the whole
  mixture is heated further (to 95˚C) to melt
  the newly formed duplexes
• Repeated cycles (25—30) of synthesis
  (cooling) and melting (heating) quickly provide
  many DNA copies
              Types of Viruses

A virus is a nucleic acid in a protein coat.
Reverse transcriptase makes a complementary
DNA copy from RNA.
            Reverse transcription

Clone cDNA strands, complementary to the mRNA

     mRNA     G U AA U C C U C


                                  CATTAG GAG
                                    C ATAT G G G A A
                                  C A TCTATA G A G G G
                                            TAG G
                                        CATTAG GAG
                                     CATTAG GAG
Restriction Enzymes Cut DNA
            Restriction Enzymes
• When a bacterium is invaded by a DNA-
  containing organism (e.g. virus), it can defend
  itself with restriction enzymes (REs; also
  called restriction endonucleases)
• REs recognize a specific short sequence of
  DNA and cut both strands
• The recognition sequence is typically a
  palindrome – i.e. the sequence in one strand is
  the same as in the other, read in the other
  direction (e.g. GAATTC)
• REs named after the bacteria in which they
  occur, plus sequence number (e.g. Eco RI)
        RE Example (Eco RI)

 5’ – GAATTC – 3’
 3’ – CTTAAG – 5’
                 Probing DNA
• One way to study a specific DNA fragment
  within a genome is to probe for the sequence of
  the fragment
• A probe is a labeled (usually radioactive or
  fluorescent) single-stranded oligonucleotide,
  synthesized to be complementary to the
  sequence of interest – probe sequence is known
• Attach single-stranded DNA to a membrane (or
  other solid support) and incubate with the
  probe so that it hybridizes
• Visualize the probe (e.g. by X-ray for
  radioactive probes)
The Southern blotting technique
Sample Autoradiogragh (Gel)
              Types of Blots

• Southern Blot – use DNA to probe DNA

• Northern Blot – use DNA to probe RNA

• Western Blot – use antibodies to probe

          Measuring Gene Expression

Idea: measure the amount of mRNA to see which
genes are being expressed in (used by) the cell.
Measuring protein would be more direct, but is
currently harder.
Microarrays provide a means
to measure gene expression
    Areas Being Studied with Microarrays

• Differential gene expression between two (or
  more) sample types
• Similar gene expression across treatments
• Tumor sub-class identification using gene
  expression profiles
• Classification of malignancies into known classes
• Identification of ―marker‖ genes that
  characterize different tumor classes
• Identification of genes associated with clinical
  outcomes (e.g. survival)
       cDNA microarray experiments
mRNA levels compared in many different contexts

• Different tissues, same organism (brain v. liver)

• Same tissue, same organism (ttt v. ctl, tumor v.

• Same tissue, different organisms (wt v. ko, tg, or

• Time course experiments (effect of ttt,
     Web animation of a cDNA microarray

Yeast genome on a chip
 Brief outline of steps for producing a

• cDNA probes attached or synthesized to
  solid support
• Hybridize targets

• Scan array
cDNA microarrays

               cDNA clones
              cDNA microarrays
Compare the genetic expression in two samples of cells

      PRINT                        SAMPLES
cDNA from one              cDNA labelled red/green
gene on each spot

                           e.g. treatment / control
                               normal / tumor tissue
      HYBRIDIZE                    SCAN
Add equal amounts of       Laser      Detector
labelled cDNA samples to
         Quantification of expression

For each spot on the slide we calculate

    Red intensity = Rfg - Rbg
(fg = foreground, bg = background) and

    Green intensity = Gfg - Gbg
and combine them in the log (base 2) ratio

 Log2( Red intensity / Green intensity)
               Gene Expression Data
On p genes for n slides: p is O(10,000), n is
 O(10-100), but growing,
                  slide 1   slide 2   slide 3   slide 4   slide 5   …
           1       0.46      0.30      0.80      1.51      0.90     ...
           2      -0.10      0.49      0.24      0.06      0.46     ...
 Genes     3       0.15      0.74      0.04      0.10      0.20     ...
           4      -0.45     -1.03     -0.79     -0.56     -0.32     ...
           5      -0.06      1.06      1.35      1.09     -1.09     ...

         Gene expression level of gene 5 in slide 4
          =     Log2( Red intensity / Green intensity)

     These values are conventionally displayed
     on a red (>0) yellow (0) green (<0) scale.
                   Biological question
             Differentially expressed genes
              Sample class prediction etc.

                 Experimental design

                Microarray experiment
                                  16-bit TIFF files
                   Image analysis
                                  (Rfg, Rbg), (Gfg, Gbg)
                                R, G
Estimation   Testing               Clustering   Discrimination

                 Biological verification
                   and interpretation

To top