Docstoc

databases

Document Sample
databases Powered By Docstoc
					                                            Post-genomics and bio-informatics




     GENOME DATABASE
       REPOSITORIES
                -
1) HOW DO I FIND MY SEQUENCE?
2) CONTROLLED VOCABULARIES?
      Talk produced by Rebecca Roberts

            (rebecca@arabidopsis.info)


      Nottingham Arabidopsis Stock Centre
                                           Post-genomics and bio-informatics




        Likely starting places…
• Obtained candidate genes or sequence
    •   QTL analysis                          GeneID (Entrez gene)

                                GENE
    •
                                              Arabidopsis: AUX1 / At2g38120
        Screens
                                              Mouse: SLC3A2 / MGI:96955
                               LOCUS
    •   Microarray mining                     Human: SLC3A2 /HGNC:11026
                               KNOWN
    •   My labs pet gene                        E.coli: ECP_3909




    • Sequence data         BLAST SEARCH        REFERENCE DBS




• Were do I go now?
                                                                  Post-genomics and bio-informatics




                  Types of Database
•   Broad sequence DBs – Genebank/EMBL/DDJB
     –   Query a huge static repository (100 gigabase)
     –   Many species (~1,200)
     –   Many sequence types - BACs, gDNA, cDNA, ESTs, SNPs.…..
     –   Well annotated
     –   Many useful links and tools

•   The species or group specialist
     – Species specific or groups of species
     – Less total raw data, but specific aggregated information
     – Ensembls, Tigr, Archaeal

•   Its my data I will manage it – Home grown
     – Specific purpose
     – Restricted interest

•   Its my baby – I will display my own
     – Access, excel sheets etc
     – G-Browse, Ensembl
                                                        Post-genomics and bio-informatics




       Monoliths holding DNA sequences

  – Genbank at NCBI
      (NATIONAL CENTRE FOR BIOTECHNOLOGY INFORMATION)
  – EMBL (EUROPEAN MOLECULAR BIOLOGY LABORATORY)
  – DDBJ (DNA DATABANK OF JAPAN)

• EACH DB EXCHANGES DATA DAILY

• EACH SUBMISSION TO EACH DB HAS;
  –   A unique identifier
  –   Latin name, taxonomic status
  –   Sequence data for entry
  –   Biologically relevant annotation
  –   References and Medline/Pubmed link
                         Post-genomics and bio-informatics



Finding a Genbank accession
                         Post-genomics and bio-informatics


Finding a Genbank accession




  Gene and
  species specific
                        Post-genomics and bio-informatics




Finding a Genbank accession
                         Post-genomics and bio-informatics


Finding a Genbank accession
                                                                     Post-genomics and bio-informatics




                    Kingdoms, Species and closely related species
Animal/fungal - Ensembl             Plant                        Bacterial
http://www.ensembl.org/index.html

                                    Grasses




                                    Arabidopsis
                                    (also Brassica and Tomato)




                                    Solanacea – in development




                                    Medicago
                    Post-genomics and bio-informatics




AtEnsembl example
                    Post-genomics and bio-informatics


AtEnsembl example
                    Post-genomics and bio-informatics
AtEnsembl example
                              Post-genomics and bio-informatics




     I will make /customise my own

Ensembl           gbrowse
                                                                    Post-genomics and bio-informatics



      Gene product classification and controlled
             vocabularies - Ontologies

• Ontology,            In philosophy, ontology (from the Greek ὄν, genitive ὄντος: of
  being (part. of εἶναι: to be) and -λογία: science, study, theory) is the study of being or
  existence. It seeks to describe or posit the basic categories and relationships of being
  or existence to define entities and types of entities within its framework. Ontology can
  be said to study conceptions of reality.


• Categorical classification within a framework of
  controlled vocabulary
• Consistent standards – communication
• Parallel transmission of information and consistent
  comparisons
                                                                                  Post-genomics and bio-informatics




                        Vocabulary structures
• Ubiquitous
  – GO/SO, GENE ONTOLOGY, SEQUENCE ONTOLOGY
  –   GO; gene product cellular component, biological process and molecular function.
  –   SO; Ontology for biological sequences

  – GMOD, GENERIC MODEL ORGANISM DATABASE
  –   Model organism DBs to produce biology database

  – MGED,                MICROARRAY GENE EXPRESSION DATA
  –   Sharing of microarray and proteomics data

  – OLS, ONTOLOGY LOOKUP SERVICE
  –   Web based controlled vocabulary lookup to query multiple ontologies


• Kingdom specific
  – PO, PLANT ONTOLOGY
  –   Plant structure and development

• Species specific
  – NCBO/OBO, NATIONAL CENTRE FOR BIOMEDICAL
      ONTOLOGY and OPEN BIOMEDICAL ONTOLOGIES
  –   Biomedical information
                                                                           Post-genomics and bio-informatics




                    GO – GENE ONTOLOGY

•   The Gene Ontology project provides a controlled vocabulary to describe gene and gene product
    attributes in any organism.
•   The three organizing principles of GO are cellular component, biological process and molecular
    function.




                                Molecular
                                function


                                Biological
                                Process




                                Cellular
                                component
                                                       Post-genomics and bio-informatics




Plant ontologies at NASC                                                Hairy
                                                                       Hirsute
"What's in a name? That which                                         Pubescent

          we call a rose                                                 Bald
                                                                        Smooth
By any other word would smell                                           Glabrous
            as sweet.”

-- Romeo and Juliet (II, ii, 1-2)

    What's Montague? it is nor hand, nor foot,
   Nor arm, nor
     face, nor any other part
    Belonging to a man. O, be
                       some other name!
                                           Post-genomics and bio-informatics




Germplasm curation - standards
• We use a combination of plant structure Plant Ontology
   (PO), with Phenotype, Attribute and Trait Ontology
   (PATO).
• Plant phenotype description comprise three ontology
   terms linked together to form an EAV:
   (Entity, Attribute/Value) description.
  -entity, (noun) from PO.
  -attribute and value (description) from PATO.
                (since 2006, collapsed to EQ).
                                                           Post-genomics and bio-informatics




           EAV example: N319
Phenotype description in free
  text: Green dwarf. Broader     Entity       Attribute             Value
  leaves, glabra. Yellow seed.
                                 PO:0000003
                                              PATO:0000131          PATO:0000969
                                    (whole
                                              (relative_size)          (dwarf)
                                    plant)

                                 PO:0000003
                                              PATO:0000015          PATO:0000320
                                    (whole
                                              (color_hue)              (green)
                                    plant)


                                 PO:0009025   PATO:0000923          PATO:0000600
                                    (leaf)    (relative_width)         (wide)


                                 PO:0009025   PATO:0000067          PATO:0000453
                                    (leaf)    (relative_pilosity)      (glabrous)


                                 PO:0009010   PATO:0000015          PATO:0000324
                                    (seed)    (color_hue)              (yellow)
                                  Post-genomics and bio-informatics
     Ontologies are
   emerging entities and
     are consensual.

• leaf margin serrated
• rosette yellow/small/
   many leaves/bushy
• plant bushy


Negotiate additions vs synonyms
Ontology browser   Post-genomics and bio-informatics
Focus on ‘midvein’   Post-genomics and bio-informatics
                     Post-genomics and bio-informatics


Stock detail: N624
                                                                                                           Post-genomics and bio-informatics




                                            Useful links
ONTOLOGIES
•   GO                         :                  http://www.geneontology.org/GO.doc.shtml
•   OBO/other ontology links   :                  http://obo.sourceforge.net/
•   PO paper :                 Ilic et al 2007 (http://www.plantphysiol.org/cgi/content/abstract/pp.106.092825v1)
•   POC website                :                  http://www.plantontology.org/
UBIQUITOUS DBs
•   GENBANK                    :               http://www.ncbi.nlm.nih.gov/Genbank/
•   NCBI homepage              :               http://www.ncbi.nlm.nih.gov/
ANIMAL/FUNGAL SPECIFIC
•   Ensembl                    :               http://www.ensembl.org/index.htm
PLANT SPECIFIC
•   Gramene                    :               http://www.gramene.org/about/
•   Medicago                   :               http://www.medicago.org/genome/index.php
•   Arabidopsis Ensembl        :               http://atensembl.arabidopsis.info/index.html
•   SOL                        :               http://www.sgn.cornell.edu/index.pl
•   BACTERIAL
•   TIGR CMR                   :               http://cmr.tigr.org/tigr-scripts/CMR/CmrHomePage.cgi
•   UCSC ARCHAEAL              :               http://archaea.ucsc.edu/
MAKE YOUR OWN
•   GBROWSE                    :               http://www.gmod.org/?q=node/71
•   ENSEMBL                    :               http://www.ensembl.org/info/data/index.html#import

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:8/6/2012
language:
pages:24