Alternative Splicing by rX6BxQ4


									   Alternative Splicing

As an introduction to microarrays
       QuickTime™ an d a
   TIFF (LZW) decomp resso r
are need ed to see this picture.
       QuickTime™ and a
   TIFF (LZW) decomp resso r
are neede d to see this picture.
       QuickTime™ and a
   TIFF (LZW) decomp resso r
are neede d to see this picture.
       QuickTime™ and a
   TIFF (LZW) decompressor
are neede d to see this picture.
           Human Genome
• 90,000 Human proteins, initially assumed
  near that number of genes (initial estimates
• The 1000 cell roundworm Caenorhabditis
  elegans has 19,500 genes, corn has 40,000
• Current estimates are 25,000 or fewer genes
• Alternative splicing allows different tissue
  types to perform different function with same
  gene assortment
• 75% of human genes are subject to
  alternative editing
• faulty gene splicing leads to cancer and
  congenital diseases.
• gene therapy can use splicing
• We talked before about apoptotis when
  the cell determines it cant be repaired
• Bcl-x is a regulator of apoptotis, is
  alternatively spliced to produce either
  Bcl-x(L) that suppresses apoptosis, or
  Bcl-x(S) that promotes it.
       QuickTime™ and a
   TIFF (LZW) decomp resso r
are neede d to see this picture.
• Five snRNA molecules U1, U2, U3, U4,
  U5, U6 combine with as many as 150
  proteins to form the spliceosome
• It recognizes sites where introns begin
  and end
  – Cuts introns out of pre-mRNA
  – joins exons
       QuickTime™ and a
   TIFF (LZW) decomp resso r
are need ed to see this picture.
• The 5’ splice site is at the beginning of
  the intron, the 3’ site is at the end
• The average human protein coding
  gene is 28000 nucleotides long with 8.8
  exons separated by 7.8 introns
• exons are 120 nucleotides long while
  introns are 100-100,000 nucleotides
            Splicing errors
• familial dysautonomia results from a single-
  nucleotide mutation that causes a gene to be
  alternatively spliced in nervous system tissue
• The decrease in the IKBKAP protein leads to
  abnormal nervous system development (half
  die before 30)
• > 15% of gene mutations that cause genetic
  diseases and cancers are caused by splicing
                   Why splicing
• Each gene generates 3 alternatively spliced mRNAs
• Why so much intron (1-2% of genome is exons)?
• Mouse and human differences are almost all splicing
• Half of the human genome is made up of transposable
  elements, Alus being the most abundant (1.4 million copies)
    – They continue to multiply and insert themselves into the
      genome at the rate of one insertion per 100 human births
• mutations in the Alu can create a 5’ or 3’ site in an intron
  causing it to be an exon
• This mutation doesn’t impact existing exons
• It only has effect when it is alternatively spliced in
       QuickTime™ and a
   TIFF (LZW) decompressor
are neede d to see this picture.
 Microarrays For Alt. Splicing
• Use short oligonucleotides
• Get a guess at the rate of expression of
  the oligo

   Exon 1   Exon 2        Exon 4   Exon 5

               Exon 3
       Microarrays For Alt. Splicing
         Exon 1       Exon 2                       Exon 4         Exon 5

                           Exon 3

Isoform 1:
         Exon 1   Exon 2      Exon 4      Exon 5            Probe types
Isoform 2:                                                  Exon
         Exon 1      Exon 3            Exon 5
                                                            Unique (“Cassette”)
      Ideal Microarray Readings

                          a        b         c       d        e

Isoform 1:   a                           c

         Exon 1           Exon 2       Exon 4        Exon 5       Probe types
Isoform 2:   a                          d                         Junction
         Exon 1               Exon 3             Exon 5
                                                                  Unique (“Cassette”)

• Why alternatively splice?
• How does it affect the resulting
• Look at domains:
  – High level summary of protein
  – ~80% of eukaryotic proteins are multi-
  – Domains are big relative to an exon
      Some Previous Work
• Signatures of domain shuffling in the
  human genome. Kaessmann, 2002.
  Intron phase symmetry around domain
• The Effects of Alternative Splicing On
  Transmembrane Proteins in the Mouse
  Genome. Cline, 2004.
  Half of TM proteins studied affected by alt-
• Predict Alternative Splicing
• Predict Protein Domains
• Look for effects of Alt-Splicing on
  predicted domains
  – “Swapping”
  – “Knockout”
  – “Clipping”
       Microarray Design
• Genes based on mRNA and EST data
  in mouse
• Mapped to Feb. 2002 mouse genome
• ~500,000 probes (~66,000 sets)
• ~100,000 transcripts
• ~13,000 gene models
                              Technical work

                Genome Space

                                                                                  Generated Data
Provided data

                                                          gene models

                                                    Probe to transcript mapping
                                         E@NM_021320         cc-chr10-000017.82.0
                                         G6836022@J911445    cc-chr10-000017.91.1
                                         G6807921@J911524_RC cc-chr10-000018.4.0

Predicting Alternative Splicing
• Using mouse alt-splicing microarrays
• Data from Manny Ares
  – 8 tissues
  – 3 replicates of each tissue
Predicting Alternative Splicing
• General Approach: Clustering, then

  107 Clusters                          Detail View

           QuickTime™ an d a
       TIFF (LZW) decompressor
    are need ed to see this p icture.
          Gene Expression
• mRNA expression represents dynamic
  aspects of cell
• mRNA expression can be measured with
  latest technology
• mRNA is isolated and labeled with
  fluorescent protein
• mRNA is hybridized to the target; level of
  hybridization corresponds to light emission
  which is measured with a laser
Gene Expression Microarrays
The main types of gene expression
• Short oligonucleotide arrays (Affymetrix);
• cDNA or spotted arrays (Brown/Botstein).
• Long oligonucleotide arrays (Agilent Inkjet);
• Fiber-optic arrays
• ...
         Affymetrix Microarrays
Raw image



                   ~107 oligonucleotides,
                   half Perfectly Match mRNA (PM),
                   half have one Mismatch (MM)
                   Raw gene expression is intensity
                   difference: PM - MM
        Microarray Potential
• Biological discovery
  – new and better molecular diagnostics
  – new molecular targets for therapy
  – finding and refining biological pathways
• Recent examples
  – molecular diagnosis of leukemia, breast cancer, ...
  – appropriate treatment for genetic signature
  – potential new drug targets
     Microarray Data Analysis
• Gene Selection
  – find genes for therapeutic targets
  – avoid false positives (FDA approval ?)
• Classification (Supervised)
  – identify disease
  – predict outcome / select best treatment
• Clustering (Unsupervised)
  – find new biological classes / refine existing ones
  – exploration
      Microarray Data Mining
• too few records (samples), usually < 100
• too many columns (genes), usually > 1,000
• Too many columns likely to lead to False
• for exploration, a large set of all relevant
  genes is desired
• for diagnostics or identification of therapeutic
  targets, the smallest set of genes is needed
• model needs to be explainable to biologists

To top