NCBI Molecular Biology Resources

Document Sample
NCBI Molecular Biology Resources Powered By Docstoc
					                                              NCBI FieldGuide
           NCBI Molecular Biology
           Resources: An Update




January 2007                   Peter Cooper
                      NCBI Update




                                                                 NCBI FieldGuide
• Changes in the               • New BLAST options and
  sequence databases             features
• Changes to Entrez                – Compositional adjustments
  – Umbrella - split Entrez        – New formatting options
    nucleotide system
                               • New tools
  – New displays
       • Related Structures
                                   – OMSSA - mass
                                     spectrometry search
       • CDD Results CDTree
                                   – Splign - mRNA to genomic
• New Entrez databases               alignment tool
  –   PubChem                      – Genome Workbench -
  –   Probe                          genome annotation client
  –   GenSat                   •   Keeping up with NCBI
  –   Genome Project
  –   Genotype and Phenotype
                   The Growth of GenBank




                                                                                     NCBI FieldGuide
                   Release 157
             160


             140


             120
                                           WGS: 81.6 billion bases
(billions)




             100
  Bases




              Doubling time 12-14 months
             80


             60


             40
                                    Non-WGS: 69.0 billion bases
             20


              0
             Aug-97 Aug-98 Aug-99 Aug-00 Aug-01 Aug-02 Aug-03 Aug-04 Aug-05 Aug-06
Whole Genome Shotgun Projects




                                                        NCBI FieldGuide
ftp://ftp.ncbi.nih.gov/genbank/wgs/

                          • >450 Projects
                          • >400 Taxa
                              – 302 bacteria
                              – 128 eukaryotes
                                 • 47 fungi
                                 • 53 animals
                                 • 3 flowering plants
Mammalian WGS




                                     NCBI FieldGuide
•   Duck-billed platypus
•   Nine-banded armadillo
•   Northern tree shrew
•   Domestic rabbit
•   Guinea pig
•   Mouse
•   Rat
•   Thirteen-lined ground squirrel
•   Small-eared galago
•   Human
•   Chimpanzee
•   Rhesus macaque
•   Tenrec
•   African elephant
•   Cat
•   Dog
•   European hedgehog
•   Eurasian shrew
•   Cow
•   Little brown bat
•   Gray short-tailed opossum
Environmental Sequence WGS




                             NCBI FieldGuide
  Accessing WGS Records: Entrez




                                                          NCBI FieldGuide
Little brown bat[Organism]   Little brown bat[Organism]
AND wgs_master[Properties]   AND wgs_contig[Properties]
    Accessing WGS Records: BLAST




                                                              NCBI FieldGuide
                         wgs: organism whole genome shotgun
                         env_nt: environmental sequence




Little brown bat[Organism]
     NCBI FieldGuide
WGS Traces
Trace Archive BLAST Searches




                               NCBI FieldGuide
      NCBI FieldGuide
Trace Results
NCBI’s Trace Archive




                       NCBI FieldGuide
                                    NCBI FieldGuide
Selected Changes to the Entrez System

            Nucleotide Split
          Structure Shortcuts
        CD Displays and CDTree
Umbrella Results




                                         NCBI FieldGuide
 •Now three separate databases
    •EST– Expressed Sequence Tags
    •GSS – Genome Survey Sequences
    •Core Nucleotide – everything else
Separate Search Fields, Terms, and Histories




                                               NCBI FieldGuide
New EST Search Fields




                        NCBI FieldGuide
Related Structures Shortcut




                              NCBI FieldGuide
               Related Structures




                                               NCBI FieldGuide
                           Link to alignment
Direct Link to structure
Conserved Domain Displays




                            NCBI FieldGuide
 NCBI FieldGuide
CD-Tree
(Selected) New Entrez Databases




                                  NCBI FieldGuide
PubChem: Small Molecule Database




                                                       NCBI FieldGuide
• Three Databases
  – PubChem Substance
     • deposited / acquired records
  – PubChem Compound
     • Curated, standardized, non-redundant
  – PubChem Bioassay
     • Substances used in biological activity assays
• Fully Integrated in Entrez
• Structure Similarity Searches available
          Substance and Compound




                                                                 NCBI FieldGuide
• Substance (15.5 million records)
    – ZINC
    – ChemDB                                     Like GenBank
    – DiscoveryGate
    – ThompsonPharma
    – NCBI Structure database (MMDB)
    – NLM’s ChemIDPlus
    – KEGG pathways
    – NIST Chemistry WebBook
    – NCI’s Developmental Therapeutics (PubChem Bioassay)
    – BioCyc Database Collection
    – Commercial Vendors
•   Compound (10.2 million records)                Like RefSeq
    – Compiled from compound
    – Less redundant
    – Standardized structural formulas
Indomethacin: Compound




                         NCBI FieldGuide
Indomethacin mixture




                                         NCBI FieldGuide
         “Compounds” can be mixtures.
          Copies of substance records.
Indomethacin Component




                         NCBI FieldGuide
Links and Neighbors




                      NCBI FieldGuide
Activity Links: PubChem Bioassay




                                   NCBI FieldGuide
Structure Links




                  NCBI FieldGuide
PubChem Neighbors




                                 NCBI FieldGuide
                    Unordered!
Structure Search




                   NCBI FieldGuide
           The Probe Database




                                                       NCBI FieldGuide
• Short nucleic acid reagents
  – Gene silencing (shRNA and si RNA)
  – Variation analysis (resequencing amplicons)
  – Genotyping (HapMap project and others)
  – Gene Expression
     • Links to Gene Expression Nervous System Atlas
• Search directly or linked through other
  databases
                  Entrez Probe




                                  NCBI FieldGuide
•Nucleic acid reagents
•Sequence targeted oligos
•Over 7 million entries
•Applications
    •Genotyping
    •Gene Silencing
    •SNP Discovery
    •Genome Mapping
    •Gene Expression
         •Real time PCR
         •In Situ hybridization
Ataxia Telangiaectasia Mutated Gene Probes




                                             NCBI FieldGuide
Resequencing Amplicon –SNP Discovery




                                       NCBI FieldGuide
Gene Silencing Probes




                        NCBI FieldGuide
Gene Expresion: Probe to Gensat




                                  NCBI FieldGuide
  Gene Expression Nervous System Atlas




                                                     NCBI FieldGuide
• NINDS and Rockefeller University
• BAC Transgenic Mouse Strains
  – Enhanced GFP expression
  – Driven by endogenous gene regulatory sequences
• Histologic sections mouse brain
  – Confocal Microscopy GFP reporter
  – Immunohistochemical staining transgenics
  – In Situ Hybridization – native expression
• Results Important for understanding brain
  microanatomy and development
GENSAT: Dopamine Receptor 2 (Drd2)




                                     NCBI FieldGuide
Bright Field Immunostaining BAC Transgenic




                                             NCBI FieldGuide
Entrez Genome Project




                        NCBI FieldGuide
Project Summaries




                    NCBI FieldGuide
Environmental Sequences




                          NCBI FieldGuide
  Genotype and Phenotype: dbGap




                                                    NCBI FieldGuide
• Genotype / Phenotype Data
  –   genome-wide association studies
  –   medical sequencing
  –   molecular diagnostic assays
  –   genotype and non-clinical trait association
• Levels of Access – Privacy concerns
  – Open-access
  – Controlled-access
Two Studies Available




                        NCBI FieldGuide
NEI Age Related Eye Disease




                              NCBI FieldGuide
Genome Wide Analysis




                       NCBI FieldGuide
Phenotype:




                                                    NCBI FieldGuide
Age-related Macular Degeneration Status



                 Results summary



                                          Methods
Results Across Genome




                                       NCBI FieldGuide
              Best candidate regions
Candidate Region




                                      NCBI FieldGuide
               rs7529589
               rs203674




 Regulator of Complement Activation



        Complement Factor H
Functional SNP (T1277C, Tyr402His)




                                     NCBI FieldGuide
            Closest HapMap Snp




               His associated
               with risk
             HapMap Genotype Data




                                    NCBI FieldGuide
Genotypes
   and
Pedigrees
 available
Gene to Genotype




                         NCBI FieldGuide
         Gene genotype
Linkage Disequilibrium




                         NCBI FieldGuide
                               NCBI FieldGuide
Recent Changes to BLAST

 New Composition based stats
     CDS Feature View
  Distance Tree (TreeView)
  New Database and Views
Advanced Options:     Composition based stats




                                                           NCBI FieldGuide
             Amino acid composition:
             Ala (A) 42        19.6%        Histone H1
             Arg (R)   4        1.9%
             Asn (N)   4        1.9%
             Asp (D)   1        0.5%
             Cys (C)   0        0.0%
             Gln (Q)   2        0.9%
             Glu (E)   6        2.8%
             Gly (G) 13         6.1%
             His (H)   0        0.0%
             Ile (I)   3        1.4%
             Leu (L) 10         4.7%
             Lys (K) 57        26.6%
             Met (M)   0        0.0%
             Phe (F)   1        0.5%
             Pro (P) 19         8.9%
             Ser (S) 23        10.7%
             Thr (T) 14         6.5%
             Trp (W)   0        0.0%
             Tyr (Y)   1        0.5%
             Val (V) 14         6.5%

             Negatively charged residues (Asp + Glu): 7
             Positively charged residues (Arg + Lys): 61
Composition based vs. Matrix Adjustment




                                                                                                    NCBI FieldGuide
Score = 36.6 bits (83), Expect = 9e-07, Method: Composition-based stats.
Identities = 36/123 (29%), Positives = 60/123 (48%), Gaps = 13/123 (10%)

Query   78    ISGAILFEETLFQKNEAGVPMVNLLHNENIIPGIKVDKGLVNIPCTDEE--KSTQGLDGL    135
              I GAILFE+T+ K +        L   + ++P +K+DKGL ++     + K     L L
Sbjct   67    ILGAILFEQTMDSKIDGKYTADFLWEEKKVLPFLKIDKGLNDLDADGVQTMKPNPTLADL    126

Query   136   AERCKEYYKAGARFAKWRTVLVIDTAKGKPTDLS-IHETAWGLARYASICQQNRLVPIVE    194
               +R E + G      K R+V+     K P ++ + E + +A A +         L+PI+E
Sbjct   127   LKRANERHIFG---TKMRSVI----KKASPAGIARVVEQQFEVA--AQVVAAG-LIPIIE    176

Query   195         197
              PEI Score = 39.7 bits (91), Expect = 1e-07, Method: Compositional matrix adjust.
              PE+ Identities = 56/214 (26%), Positives = 93/214 (43%), Gaps = 36/214 (16%)
Sbjct   177   PEV 179
                 Query   78    ISGAILFEETLFQKNEAGVPMVNLLHNENIIPGIKVDKGLVNIPCTDEE--KSTQGLDGL   135
                               I GAILFE+T+ K +        L   + ++P +K+DKGL ++     + K     L L
                 Sbjct   67    ILGAILFEQTMDSKIDGKYTADFLWEEKKVLPFLKIDKGLNDLDADGVQTMKPNPTLADL   126

                 Query   136   AERCKEYYKAGARFAKWRTVLVIDTAKGKPTDLS-IHETAWGLARYASICQQNRLVPIVE   194
                                +R E + G      K R+V+     K P ++ + E + +A A +         L+PI+E
                 Sbjct   127   LKRANERHIFG---TKMRSVI----KKASPAGIARVVEQQFEVA--AQVVAAG-LIPIIE   176

                 Query   195   PEILADGPHSIEVCAVVTQKVLSCVFKALQE-NGVLLEGALLKPNMVTAGYECTAKTTTQ   253
                               PE+ +     ++ C + + +       AL E + V+L+ L P +       E T
                 Sbjct   177   PEVDINNVDKVQ-CEEILRDEIRKHLNALPETSNVMLKLTL--PTVENLYEEFTKH----   229

                 Query   254   DVGFLTVRTLRRTVPPALPGVVFLSGGQSEEEAS    287
                                              P + VV LSGG S E+A+
                 Sbjct   230   ---------------PRVVRVVALSGGYSREKAN   Better Scores and Alignments
                                                                     248
Formatting Options: CDS Feature




                                  NCBI FieldGuide
CDS Feature on Results




                         NCBI FieldGuide
Distance Tree of Results




                           NCBI FieldGuide
      Nucleotide BLAST: New Output




                                                    NCBI FieldGuide
               Default human
               database



Crab-eating
macaque
CDC20 mRNA




                               New output display
           Sortable Results




                                                               NCBI FieldGuide
                                                Separate
                                               Sections for
                                              Transcript and
                                                 Genome




Pseudogene on Chromosome 9

                         Functional Gene on Chromosome 1
Total Score: All Segments




                                        NCBI FieldGuide
            Functional Gene Now First
                 Sorting in Exon Order




                                                          NCBI FieldGuide
                                   Query start position
                                   Exon order
Default Sorting Order: Score
Longest exon usually first
Links to Map Viewer




                                     NCBI FieldGuide
Chromosome 1          Chromosome 9
           Distance Tree Arg Kinases




                                          NCBI FieldGuide
Creatine Kinases




                       Arginine Kinases
Horizontal Transfer?




                                 NCBI FieldGuide
                    Molluscan
                    Hosts




                   Liver Fluke

                   Arthropod
                   Hosts



               Trypanosome
                                                      NCBI FieldGuide
                   New Tools
•Splign
•Open Source Mass Spectrometry Search Algorithm (OMSSA)
•Genome Workbench
             Splign:
mRNA to Genomic DNA Alignment Tool




                                     NCBI FieldGuide
        NCBI FieldGuide
Splign Output
OMSSA Interface




                  NCBI FieldGuide
OMSSA Search Results




                       NCBI FieldGuide
Genome Workbench




                   NCBI FieldGuide
                       NCBI FieldGuide
Keeping Up with NCBI
NCBI Courses




                                  NCBI FieldGuide
          •   Field Guide
          •   Field Guide Plus
          •   Minicourses (10)
          •   PowerTools
               – PowerScripting
               – NCBI 4-pack
                  Announce Lists
http://www.ncbi.nlm.nih.gov/Sitemap/Summary/email_lists.htm




                                                              NCBI FieldGuide
                              l


•   ncbi-announce@ncbi.nlm.nih.gov
•   blast-announce@...
•   books-announce@...
•   dbsnp-announce@...
•   gene-announce@...
•   genomes-announce@...
•   mapview-announce@...
•   refseq-announce@...
•   sequin_users@...
•   utilities-announce@...
•   library-linkout@...
•   linkout-news@...
•   tax-linkout@...
•   genbankb@net.bio.net
                     NCBI News




                                              NCBI FieldGuide
                  •Published Quarterly
                  •Updates and New Features
                  •Online and Hardcopy

www.ncbi.nlm.nih.gov/About/newsletter.html
                              NCBI FieldGuide
  Service Addresses

   info@ncbi.nlm.nih.gov
blast-help@ncbi.nlm.nih.gov
 cooper@ncbi.nlm.nih.gov
       Web Resources
NCBI
Homepage: www.ncbi.nlm.nih.gov

Education: www.ncbi.nlm.nih.gov/Education/

NCBI
News: www.ncbi.nlm.nih.gov/About/newsletter.html

Entrez
Query: www.ncbi.nlm.nih.gov/gquery/gquery.fcgi

Structure
Search: pubchem.ncbi.nlm.nih.gov/search/

Spidey: www.ncbi.nlm.nih.gov/spidey

Splign: www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi

OMSSA: pubchem.ncbi.nlm.nih.gov/omssa/

GBench: www.ncbi.nlm.nih.gov/projects/gbench/

				
DOCUMENT INFO
Categories:
Tags:
Stats:
views:9
posted:3/30/2012
language:Latin
pages:79