Introduction to Motifs by Reileyfan

VIEWS: 0 PAGES: 23

									Introduction Regulatory motifs Structural motifs Databases




                                  Introduction to Motifs

                                              Scott Hazelhurst


                                                        2009




                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases


 Introduction
      Once genomes sequenced need to make sense of
      data: functional genomics
          gene identification
          pattern finding
      Basic ideas: patterns implies conservation implies
      functionality
               Motif: common nucleic or AA pattern that may
               have function
               Regulatory: Help detect where exons are
               Structural: Help determine function
                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Gene prediction How to find regulatory motifs


 Regulatory motifs


      Main use is gene prediction
           coding regions
           regulatory regions
      Gene prediction based on motifs:
           Content based
           Signal based
      Also homology and comparative gene prediction




                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Gene prediction How to find regulatory motifs




      Signal based
      Exon defining
           Translation start sites
           Donor and receptor splice cites
           Stop codons
      Also: presence of regulatory motifs

      Content based
      Exons more conserved: tend to be codon bias.




                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Gene prediction How to find regulatory motifs


 How to find regulatory motifs



      Pattern driven
          Use existing database of known motifs
          e.g. TRANSFAC database of known transcription factors
          but TFs variable
          may be many false positives (esp if short)




                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Gene prediction How to find regulatory motifs




      Sequence driven:
          frame: common sequence implies common functionality
          e.g. take upstream regions of co-regulated genes.
          Common patterns may be regulatory motifs
          e.g. take orthologous genes from different sequences and
          look for patterns that may be motifs




                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   PSSM Regular expressions and variations


 Structural motifs

      Need to automate protein function prediction
          Volume of genomic data makes it difficult to do manually.
      Again: conservation implies high biological significance.
          Structural motifs: building blocks of the proteins
      Given novel sequence:
          find motifs
          assign function to motifs
          build up overall function
      Different ways to represent



                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   PSSM Regular expressions and variations


 Position weighted matrices



         Position              1     2     3     4     5
                               A     B     C     D     E
                               A     B     C     F     E
                               A     B     C     G     G
                             A=1/1 B=1/1 C=1/1 D=1/3 E=2/3
                                               F=1/3 G=1/3
                                               G=1/3



                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   PSSM Regular expressions and variations




         CBGs: C-x(2,5)-C-x-[GP]-x-P-x(2,5)-C
         Regular expressions: CG+(T..T)(CG+)\1
      Tools that allow these to be used for representing
      and searching for patterns




                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Prosite PRINTS BLOCKS PFAM InterPro


 PROSITE


      Catalogue of biologically significant motifs through
      profiles and patterns.
               http://www.expasy.ch/prosite/
               Database available
               Search tools available
      PROSITE tools available elsewhere: e.g. in
      EMBOSS


                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Prosite PRINTS BLOCKS PFAM InterPro




                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Prosite PRINTS BLOCKS PFAM InterPro




                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Prosite PRINTS BLOCKS PFAM InterPro




                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Prosite PRINTS BLOCKS PFAM InterPro


 PRINTS



      Database of a collection of finger prints
           Single motifs useful but . . .
           Collection of motifs more useful.
      If two proteins share the same set of motifs in same order:
      very significant.
               But take into account 3D!




                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Prosite PRINTS BLOCKS PFAM InterPro


 BLOCKS


      Now obsolete: Use InterPro
               ungapped multiple alignments of segments of related
               protein sequences that correspond to the most conserved
               regions of proteins.
               purpose: searching, exploring relationships with trees,
               making blocks and designing PCR primers with blocks for
               isolating homologous sequences.




                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Prosite PRINTS BLOCKS PFAM InterPro


 PFAM
      Database of protein domains:
         http://pfam.sanger.ac.uk/
      PFAM terminology
         Family: A collection of related proteins
         Domain: A structural unit which can be found
         in multiple protein contexts
         Repeat: A short unit which is unstable in
         isolation but forms a stable structure when
         multiple copies are present
         Motifs: A short unit found outside globular
         domains
                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Prosite PRINTS BLOCKS PFAM InterPro


 InterPro

      InterPro is a database of
           protein families,
           domains,
           regions,
           repeats and sites
      so identifiable features found in known proteins can be applied
      to new protein sequences
               http://www.ebi.ac.uk/interpro/
               http://www.ebi.ac.uk/interpro/databases.html



                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Prosite PRINTS BLOCKS PFAM InterPro


               UniProt
               PROSITE
               Pfam
               PRINTS
               ProDom
               SMART (a Simple Modular Architecture Research Tool)
               TIGRFAM: curated collection of protein families (HMMs)
               PIRSF: protein classification system
               SUPERFAMILY is a library of profile hidden Markov
               models that represent all proteins of known structure.
               Gene3D database describes protein families and domain
               architectures in complete genomes.
               PANTHER is a large collection of protein families that
               have been subdivided into functionally related subfamilies,
               using human expertise.
                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Prosite PRINTS BLOCKS PFAM InterPro




                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Prosite PRINTS BLOCKS PFAM InterPro




                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Prosite PRINTS BLOCKS PFAM InterPro




                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Prosite PRINTS BLOCKS PFAM InterPro




                                         Scott Hazelhurst    Introduction to Motifs
Introduction Regulatory motifs Structural motifs Databases   Prosite PRINTS BLOCKS PFAM InterPro




                                         Scott Hazelhurst    Introduction to Motifs

								
To top