rna by xiuliliaofz

VIEWS: 12 PAGES: 56

									RNAs in the human genome

            Sam Griffiths-Jones

        The Wellcome Trust Sanger Institute
                  Outline

• I. Non-coding RNA
    • The genome’s dark matter
    • Family classification
    • Genome annotation
• II. ncRNA genes in the human genome
    • Rogue’s gallery
    • miRNAs
    • Regulatory elements
T. thermophilus - Ramakrishnan et al., Cell, 2002
Protein/RNA genes

      DNA

      RNA


      X
     protein
              ncRNA genes

• …. code for functional RNAs
• Many cellular machines contain RNA
    •   Ribosome      rRNA
    •   Spliceosome   snRNAs (U1,U2,U4,U5,U6)
    •   Telomerase    Telomerase RNA
    •   SRP           SRP RNA
How many genes in the
  human genome?
                        Gene sweep
• CSHL 2000-2003
• Rules
       • $1 in 2000, $5 in 2001 and $20 in 2002
       • A gene is a set of connected transcripts. A transcript is a set of exons connected
         via transcription. At least one transcript must be expressed outside of the nucleus
         and one transcript must encode a protein.
       • One bet per person, per year

• Results
       • 165 bets
       • Mean 61710
       • Lowest 25947
       • Highest 153478

• Answer: 21000          Winner: Lee Rowen
• http://www.ensembl.org/Genesweep/
           ncRNA genes

• Genomic dark matter
    • Ignored by gene prediction methods
    • Not in EnsEMBL
    • Computational complexity
• ~10% of human gene count?
           The RNA World

• Origin of life / central dogma paradox
     • DNA needs proteins to replicate
     • Proteins coded for by DNA
• RNA can be code and machinery
     • Selex, aptamers
• RNAs are remnants
     • Ancient
     • Essential
Biological sequence analysis


    Protein easy
     RNA hard
              Gene finding
• Rules
      • ATG
      • TAA, TGA, TAG
      • GT…..AG
• Compositional features
      •   Exon lengths                 ?
      •   Intron lengths
      •   Codon bias
      •   General genomic properties   ?
• Homology
   Protein sequence analysis


Query:   1 MKFYTIKLPKFLGGIVRAMLGSFRKD 26
           M+ TIKLPKFL IVR     G+ + D
Sbjct: 390 MRIMTIKLPKFLAKIVRMFKGNKKSD 467
RNA sequence analysis
RNA sequence analysis
     Why are families useful?
• Alignments of related sequences
• Phylogenetic trees
• Homologue detection
• Genome annotation
• Secondary structure prediction
        S.   cerevisiae       UCCUCGUGAGAGGG
        P.   canadensis       GUCUC.UGAGAGAU
        P.   strasburgensis   CUCUC.UGAGAGAG
        K.   thermotolerans   UUCUCGUGAGAGAA
        SS                    <<<<<....>>>>>
             RNA models

• Covariance models (profile-SCFGs)
    • Analogue to profile-HMMs
    • Statistical representation of the alignment
      with structure
    • Homologue detection
    • Multiple sequence alignment
    • (Sean Eddy)
Protein sequence analysis - HMMs

ERELKKQKKLSNR
ERELKK..KQSNR
ERELKRQRKQSNR
KAAAQRQKMIKNR
                                        EREKKKRKQSNR


                D       D       D         D


      B         M       M       M        M     E


                    I       I       I
RNA sequence analysis - SCFGs
               MP
 G             MP
A A
A–U            MP
G–C            ML
G–C
               ML

               ML



 G G A A G A         U C C
 < < < . . .         > > >
RNA models - problems

• Problems
    • Speed
    • Memory
    • Sensitivity
• Speed
    •   30 billion bases in DBs
    •   O(N3) wrt model length
    •   small model         300 b/s
    •   28S rRNA            200 b/day
Sanger supercomputers
                  Rfam 5.0

• http://www.sanger.ac.uk/Software/Rfam/
• http://rfam.wustl.edu/
• 176 ncRNA families
     •   Structure annotated alignments
     •   Species distributions
     •   Keyword searches
     •   Sequence searches
• >235000 regions in EMBL 76
              ncRNA families

What we have:                  What we don’t:
•   tRNA                       •   18S, 23S rRNAs
•   5S, 5.8S rRNAs             •   Other large things (Xist etc)
                               •   Lots of snoRNAs
•   Spliceosomal RNAs
                               •   Lots of miRNAs
•   SRP, RNaseP
                               •   Many small families
•   Telomerase, tmRNA, vault
                               •   Unknowns
•   E. coli screens
•   Some snoRNAs
•   Some miRNAs
•   Some UTR elements
•   Self-splicing introns
•   …… more
         Genome annotation
• General
     One tool fits all         Compute drain
     Automatic                 Eukaryotic complications
     Comprehensive
     Great for prokaryotes
• Specific
     Heuristics                One family, one gene finder
     Increased speed
     Increased sensitivity

     tRNAscan-SE, BRUCE, SRPscan, snoscan
                  Outline

• I. Non-coding RNA
    • The genome’s dark matter
    • Family classification
    • Genome annotation
• II. ncRNA genes in the human genome
    • Rogue’s gallery
    • miRNAs
    • Regulatory elements
                  Outline

• I. Non-coding RNA
    • The genome’s dark matter
    • Family classification
    • Genome annotation
• II. ncRNA genes in the human genome
    • Rogue’s gallery
    • miRNAs
    • Regulatory elements
International Human Genome Sequencing Consortium, Nature, 2001
X chromosome inactivation in mammals




             X
        X     X          X   Y
   Dosage compensation
    Xist – X inactive-specific transcript




Avner and Heard, Nat. Rev. Genetics 2001 2(1):59-67
International Human Genome Sequencing Consortium, Nature, 2001
              microRNAs

•   A novel class of ncRNA gene
•   Products are ~22 nt RNAs
•   Precursors are 70-100 nt hairpins
•   Gene regulation by pairing to mRNA
•   Unknown before 2001
                             Timeline
•   Late 70’s – lin-4 and let-7 regulate developmental timing in worm
•   1993 – lin-4 codes for a ~22 nt RNA, complementary to 3’ UTR of lin-14
•   2000 – …. so does let-7 (stRNAs)
•   2000 – let-7 is conserved in bilaterally symmetric animals
•   2001 – ~100 miRNAs discovered by cloning in worm, fly and human
•   2002 – miRNAs conserved in plants
•   2002 – Science magazine’s breakthrough of the year
•   2002 – miRNA Registry established
•   2003 – miRNAs may account for 1% of total gene count in animals
•   2003 – a few targets of miRNAs identified
•   2004 – miRNA Registry has 719 miRNAs
                          “miRNA” in PubMed

                         140
Number of publications



                         120
                         100
                          80
                          60
                          40
                          20
                           0
                           1999   2000   2001   2002   2003   2004
                                            Year
                       miRNA biogenesis




Adapted from DP Bartel, Cell 116:281-297(2004)
miRNAs targets




          DP Bartel, Cell 2004 116:281-287
PNAS 99:15524-15529(2002)
         miRNA Registry 3.0

• Searchable database of published miRNAs
    • http://www.sanger.ac.uk/Software/Rfam/mirna/
    • 719 entries from human, mouse, rat, worm, fly, and
      plants
• Naming service
    • Pre-publication
    • Unique names for distinct miRNAs
    • Confidentiality for unpublished data
              Genomic context

       180 known miRNAs in human


     130 intergenic           50 intronic


           60 polycistronic

70 monocistronic
ncRNA gene contexts

             tRNA, snRNAs,SRP, RNase P …..



                                 AAAAAAA
                       Xist




                     miRNAs




                 miRNAs, snoRNAs
Inside-out genes




              protein
    Inside-out genes




snoRNA                degradation
             Gas5, UHG, U17HG,U19H
Cis-regulatory RNA elements
               PrfA in Listeria
      25oC


                     37oC



      PrfA



                Virulence gene
                  expression
      UTR elements in human

•   IRE               regulation of iron metabolism

•   SECIS             UGA -> SeC

•   Histone 3’ UTR    3’ end formation

•   Vimentin 3’ UTR   mRNA localisation

•   CAESAR            CTGF repression

•   …. many more
    ncRNAs in human genome
•   tRNA        600   •   SRP RNA               1
•   18S rRNA    200   •   RNase P RNA           1
•   5.8S rRNA   200   •   Telomerase RNA        1
•   28S rRNA    200   •   RNase MRP             1
•   5S rRNA     200
                      •   Y RNA                 5
•   snoRNA      300
                      •   Vault                 4
•   miRNA       250
                      •   7SK RNA               1
•   U1          40
                      •   Xist                  1
•   U2          30
                      •   H19                   1
•   U4          30
•   U5          30    •   BIC                   1

•   U6          20
•   U4atac       5    •   Antisense RNAs    1000s?
•   U6atac       5    •   Cis reg regions    100s?
•   U11          5    •   Others                 ?
•   U12          5
                     Summary
• ncRNA genes ….
     •   have diverse and essential roles
     •   may be relics of ancient RNA-based life
     •   provide major computational challenges
     •   are often ignored!
     •   >10% of human gene count?
• Family classifications are useful for ….
     • finding homologues
     • predicting structure
     • allow automatic genome annotation
 Just plain weird
• Vault is huge
      • 13 Md
      • 30 x 55 nm
• Described in 1986
• 3 proteins
      • MVP
      • TEP1
      • vPARP
• vRNA
• Conserved in higher euks

                             http://vaults.arc.ucla.edu/sci/sci_home.htm
http://vaults.arc.ucla.edu/sci/sci_home.htm
                      Thanks
•   Alex Bateman              • Ian Holmes
•   Mhairi Marshall           • Bjarne Knudsen
•   Simon Moxon               • Robbie Klein
•   Ajay Khanna
•   Sean Eddy                 • David Bartel
                              • Tom Tuschl
• Informatics support group   • Victor Ambros
                   Bibliography
• Computational genomics of non-coding RNA genes. Sean R.
  Eddy, Cell 109:137-140 (2002)
• Non-coding RNAs: the architects of eukaryotic complexity. John
  S. Mattick, EMBO Reports 2:986-991 (2001)
• MicroRNAs: Genomics, biogenesis, mechanism and function.
  David P. Bartel, Cell 116:281-297 (2004)
• Rfam: An RNA family database. Sam Griffiths-Jones et al.,
  Nucl. Acids Res. 31:439-441 (2003)
             sgj@sanger.ac.uk

   http://www.sanger.ac.uk/Software/Rfam/
             rfam@sanger.ac.uk

http://www.stats.ox.ac.uk/~hein/HumanGenome/

								
To top