NCBI Molecular Biology Resources by pptfiles

VIEWS: 64 PAGES: 71

									NCBI FieldGuide

NCBI Molecular Biology Resources: An Update

June 19, 2007

NCBI Update
• Changes in the sequence databases • Changes to Entrez
– Umbrella - split Entrez nucleotide system – New displays
• Related Structures • CDD Results CDTree

NCBI FieldGuide

• New BLAST options and features
– Compositional adjustments – New formatting options

• New tools
– OMSSA - mass spectrometry search – Splign - mRNA to genomic alignment tool – Genome Workbench genome annotation client

• New Entrez databases
– – – – – PubChem Probe GenSat Genome Project Genotype and Phenotype

•

Keeping up with NCBI

The Growth of GenBank
Release 157
160 140 120

NCBI FieldGuide

WGS: 81.6 billion bases

Bases (billions)

100

Doubling time 12-14 months
80 60 40 20 0

Non-WGS: 69.0 billion bases

Aug-97 Aug-98 Aug-99 Aug-00 Aug-01 Aug-02 Aug-03 Aug-04 Aug-05 Aug-06

Mammalian WGS
• • • • • • • • • • • • • • • • • • • • • • • • Duck-billed platypus Nine-banded armadillo Northern tree shrew Domestic rabbit Guinea pig Mouse Rat Thirteen-lined ground squirrel Small-eared galago Orangutan Human Chimpanzee Gorilla Rhesus macaque Tenrec African elephant Dog Cat Horse European hedgehog Eurasian shrew Little brown bat Cow Gray short-tailed opossum

NCBI FieldGuide

Environmental Sequence WGS

NCBI FieldGuide

New Taxon: Metagenomes

NCBI FieldGuide

Environmental sequences will move to the Metagenomes taxon

Accessing WGS Records: Entrez

NCBI FieldGuide

Little brown bat[Organism] AND wgs_master[Properties]

Little brown bat[Organism] AND wgs_contig[Properties]

Accessing WGS Records: BLAST

NCBI FieldGuide

wgs: organism whole genome shotgun env_nt: environmental sequence

Little brown bat[Organism]

NCBI FieldGuide

WGS Traces

Trace Archive BLAST Searches

NCBI FieldGuide

NCBI FieldGuide

Trace Results

Selected Changes to the Entrez System
Nucleotide Split Structure Shortcuts

NCBI FieldGuide

Umbrella Results

NCBI FieldGuide

•Now three separate databases •EST– Expressed Sequence Tags •GSS – Genome Survey Sequences •Core Nucleotide – everything else

Separate Search Fields, Terms, and Histories

NCBI FieldGuide

New EST Search Fields

NCBI FieldGuide

Related Structures Shortcut

NCBI FieldGuide

Related Structures

NCBI FieldGuide

Link to alignment Direct Link to structure

(Selected) New Entrez Databases

NCBI FieldGuide

PubChem: Small Molecule Database
• Three Databases
– PubChem Substance
• deposited / acquired records

NCBI FieldGuide

– PubChem Compound
• Curated, standardized, non-redundant

– PubChem Bioassay
• Substances used in biological activity assays

• Fully Integrated in Entrez • Structure Similarity Searches available

Substance and Compound
• Substance (15.5 million records)
ZINC Like GenBank ChemDB DiscoveryGate ThompsonPharma NCBI Structure database (MMDB) NLM’s ChemIDPlus KEGG pathways NIST Chemistry WebBook NCI’s Developmental Therapeutics (PubChem Bioassay) BioCyc Database Collection Commercial Vendors Like RefSeq Compound (10.2 million records) – Compiled from compound – Less redundant – Standardized structural formulas – – – – – – – – – – –

NCBI FieldGuide

•

Indomethacin: Compound

NCBI FieldGuide

Indomethacin mixture

NCBI FieldGuide

“Compounds” can be mixtures. Copies of substance records.

Indomethacin Component

NCBI FieldGuide

Links and Neighbors

NCBI FieldGuide

Activity Links: PubChem Bioassay

NCBI FieldGuide

Structure Links

NCBI FieldGuide

PubChem Neighbors

NCBI FieldGuide

Unordered!

Structure Search

NCBI FieldGuide

The Probe Database
• Short nucleic acid reagents
– Gene silencing (shRNA and si RNA) – Variation analysis (resequencing amplicons) – Genotyping (HapMap project and others) – Gene Expression
• Links to Gene Expression Nervous System Atlas • Real time RT-PCR reagents

NCBI FieldGuide

• Search directly or linked through other databases

Entrez Probe

NCBI FieldGuide

•Nucleic acid reagents •Sequence targeted oligos •Over 7 million entries •Applications •Genotyping •Gene Silencing •SNP Discovery •Genome Mapping •Gene Expression •Real time PCR •In Situ hybridization

Ataxia Telangiaectasia Mutated Gene Probes

NCBI FieldGuide

Resequencing Amplicon –SNP Discovery

NCBI FieldGuide

Gene Silencing Probes

NCBI FieldGuide

Gene Expresion: Probe to Gensat

NCBI FieldGuide

Entrez Genome Project

NCBI FieldGuide

Project Summaries

NCBI FieldGuide

Environmental Sequences and Metagenomes

NCBI FieldGuide

type_environmental

Genotype and Phenotype: dbGap
• Genotype / Phenotype Data
– – – – genome-wide association studies medical sequencing molecular diagnostic assays genotype and non-clinical trait association

NCBI FieldGuide

• Levels of Access – Privacy concerns
– Open-access – Controlled-access

Two Studies Available

NCBI FieldGuide

NEI Age Related Eye Disease

NCBI FieldGuide

Genome Wide Analysis

NCBI FieldGuide

Results Across Genome

NCBI FieldGuide

Best candidate regions

Candidate Region

NCBI FieldGuide

rs7529589 rs203674

Regulator of Complement Activation

Complement Factor H

Functional SNP (T1277C, Tyr402His)

NCBI FieldGuide

Closest HapMap Snp

His associated with risk

HapMap Genotype Data

NCBI FieldGuide

Genotypes and Pedigrees available

Gene to Genotype

NCBI FieldGuide

Gene genotype

NCBI FieldGuide

Recent Changes to BLAST
New Composition based stats CDS Feature View Distance Tree (TreeView) New Database and Views

Advanced Options:

Composition based stats
Histone H1

NCBI FieldGuide

Amino acid composition: Ala (A) 42 19.6% Arg (R) 4 1.9% Asn (N) 4 1.9% Asp (D) 1 0.5% Cys (C) 0 0.0% Gln (Q) 2 0.9% Glu (E) 6 2.8% Gly (G) 13 6.1% His (H) 0 0.0% Ile (I) 3 1.4% Leu (L) 10 4.7% Lys (K) 57 26.6% Met (M) 0 0.0% Phe (F) 1 0.5% Pro (P) 19 8.9% Ser (S) 23 10.7% Thr (T) 14 6.5% Trp (W) 0 0.0% Tyr (Y) 1 0.5% Val (V) 14 6.5%

Negatively charged residues (Asp + Glu): 7 Positively charged residues (Arg + Lys): 61

Composition based vs. Matrix Adjustment
Score = 36.6 bits (83), Expect = 9e-07, Method: Composition-based stats. Identities = 36/123 (29%), Positives = 60/123 (48%), Gaps = 13/123 (10%) Query Sbjct Query Sbjct Query Sbjct 78 67 136 127 195 177 ISGAILFEETLFQKNEAGVPMVNLLHNENIIPGIKVDKGLVNIPCTDEE--KSTQGLDGL I GAILFE+T+ K + L + ++P +K+DKGL ++ + K L L ILGAILFEQTMDSKIDGKYTADFLWEEKKVLPFLKIDKGLNDLDADGVQTMKPNPTLADL AERCKEYYKAGARFAKWRTVLVIDTAKGKPTDLS-IHETAWGLARYASICQQNRLVPIVE +R E + G K R+V+ K P ++ + E + +A A + L+PI+E LKRANERHIFG---TKMRSVI----KKASPAGIARVVEQQFEVA--AQVVAAG-LIPIIE 135 126 194 176

NCBI FieldGuide

PEI Score = 39.7 bits (91), Expect = 1e-07, Method: Compositional matrix adjust. 197 PE+ Identities = 56/214 (26%), Positives = 93/214 (43%), Gaps = 36/214 (16%) PEV 179

Query
Sbjct Query Sbjct Query Sbjct Query Sbjct

78
67 136 127 195 177 254 230

ISGAILFEETLFQKNEAGVPMVNLLHNENIIPGIKVDKGLVNIPCTDEE--KSTQGLDGL I GAILFE+T+ K + L + ++P +K+DKGL ++ + K L L ILGAILFEQTMDSKIDGKYTADFLWEEKKVLPFLKIDKGLNDLDADGVQTMKPNPTLADL
AERCKEYYKAGARFAKWRTVLVIDTAKGKPTDLS-IHETAWGLARYASICQQNRLVPIVE +R E + G K R+V+ K P ++ + E + +A A + L+PI+E LKRANERHIFG---TKMRSVI----KKASPAGIARVVEQQFEVA--AQVVAAG-LIPIIE PEILADGPHSIEVCAVVTQKVLSCVFKALQE-NGVLLEGALLKPNMVTAGYECTAKTTTQ PE+ + ++ C + + + AL E + V+L+ L P + E T PEVDINNVDKVQ-CEEILRDEIRKHLNALPETSNVMLKLTL--PTVENLYEEFTKH---DVGFLTVRTLRRTVPPALPGVVFLSGGQSEEEAS P + VV LSGG S E+A+ ---------------PRVVRVVALSGGYSREKAN 287

135
126 194 176 253 229

248 Better Scores and Alignments

Formatting Options: CDS Feature

NCBI FieldGuide

CDS Feature on Results

NCBI FieldGuide

Distance Tree of Results

NCBI FieldGuide

Distance Tree Arg Kinases

NCBI FieldGuide

Creatine Kinases

Arginine Kinases

Horizontal Transfer?
Molluscan Hosts

NCBI FieldGuide

Liver Fluke

Arthropod Hosts

Trypanosome

Nucleotide BLAST: New Output

NCBI FieldGuide

Default human database

Crab-eating macaque CDC20 mRNA

New output display

Sortable Results
Separate Sections for Transcript and Genome

NCBI FieldGuide

Pseudogene on Chromosome 9 Functional Gene on Chromosome 1

Total Score: All Segments

NCBI FieldGuide

Functional Gene Now First

Sorting in Exon Order

NCBI FieldGuide

Query start position Exon order Default Sorting Order: Score Longest exon usually first

Links to Map Viewer

NCBI FieldGuide

Chromosome 1

Chromosome 9

NCBI FieldGuide

New Tools
•Splign •Open Source Mass Spectrometry Search Algorithm (OMSSA) •Genome Workbench

mRNA to Genomic DNA Alignment Tool

Splign:

NCBI FieldGuide

NCBI FieldGuide

Splign Output

OMSSA Interface

NCBI FieldGuide

OMSSA Search Results

NCBI FieldGuide

Genome Workbench

NCBI FieldGuide

http://www.ncbi.nlm.nih.gov/projects/gbench/

NCBI FieldGuide

Keeping Up with NCBI

NCBI Courses
• • • • • • • Field Guide Field Guide Plus Structures PubChem Expression Resources Minicourses (11) PowerTools
– PowerScripting – NCBI 4-pack

NCBI FieldGuide

http://www.ncbi.nlm.nih.gov/Sitemap/Summary/email_lists.htm l

Announce Lists

NCBI FieldGuide

• • • • • • • • • • • • • •

ncbi-announce@ncbi.nlm.nih.gov blast-announce@... books-announce@... dbsnp-announce@... gene-announce@... genomes-announce@... mapview-announce@... refseq-announce@... sequin_users@... utilities-announce@... library-linkout@... linkout-news@... tax-linkout@... genbankb@net.bio.net

NCBI News

NCBI FieldGuide

•Published Quarterly •Updates and New Features •Online and Hardcopy www.ncbi.nlm.nih.gov/About/newsletter.html

NCBI FieldGuide

Service Addresses
info@ncbi.nlm.nih.gov blast-help@ncbi.nlm.nih.gov huynh@ncbi.nlm.nih.gov

Web Resources
NCBI Homepage: www.ncbi.nlm.nih.gov Education: www.ncbi.nlm.nih.gov/Education/ NCBI News: www.ncbi.nlm.nih.gov/About/newsletter.html Entrez Query: www.ncbi.nlm.nih.gov/gquery/gquery.fcgi Structure Search: pubchem.ncbi.nlm.nih.gov/search/

Spidey: www.ncbi.nlm.nih.gov/spidey
Splign: www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi OMSSA: pubchem.ncbi.nlm.nih.gov/omssa/

GBench: www.ncbi.nlm.nih.gov/projects/gbench/


								
To top