NCBI Original ppt show Oct by yaofenjin

VIEWS: 5 PAGES: 85

									                                    NCBI Field Guide
           NCBI Molecular Biology
                 Resources

                NCBI Databases




November 2008
          The National Center for
        Biotechnology Information




                                                             NCBI Field Guide
                                               Bethesda,MD



        Created in 1988 as a part of the
       National Library of Medicine at NIH
–   Establish public databases
–   Research in computational biology
–   Develop software tools for sequence analysis
–   Disseminate biomedical information
Web Access: www.ncbi.nlm.nih.gov




                                   NCBI Field Guide
     NCBI Databases and Services




                                                         NCBI Field Guide
• GenBank primary sequence database
• Free public access to biomedical literature
   – PubMed free Medline (3 million searches per day)
   – PubMed Central full text online access
• Entrez integrated molecular and literature databases
• BLAST highest volume sequence search service
    (100 – 200 K searches per day)
• VAST structure similarity searches
• Software and Databases
             Types of Databases




                                                       NCBI Field Guide
• Primary Databases
   – Original submissions by experimentalists
   – Content controlled by the submitter
      • Examples: GenBank, SNP, GEO
• Derivative Databases
   – Built from primary data
   – Content controlled by third party (NCBI)
      • Examples: Refseq, TPA, RefSNP, UniGene, NCBI
        Protein, Structure, Conserved Domain
     NCBI Nucleotide Sequences




                                          NCBI Field Guide
                    Primary
•   GenBank / EMBL / DDBJ 149,949,987
                   Derivative
•   RefSeq                    3,457,825
•   Third Party Annotation        6,378
•   PDB                           9,021
Total                     153,423,040
                  What is GenBank?
            NCBI’s Primary Sequence Database




                                                         NCBI Field Guide
• Nucleotide only sequence database
• Archival in nature
  – Historical
  – Reflective of submitter point of view (subjective)
  – Redundant
• GenBank Data
   – Direct submissions (traditional records)
   – Batch submissions (EST, GSS, STS)
   – ftp accounts (genome data)
• Three collaborating databases
  – GenBank
  – DNA Database of Japan (DDBJ)
  – European Molecular Biology Laboratory (EMBL)
    Database
                 International Sequence
                 Database Collaboration




                                                                           NCBI Field Guide
                                Entrez
     NIH
                  NCBI
•Submissions                GenBank
•Updates                                                    •Submissions
                                                            •Updates
                                         EMBL
                             DDBJ                     EBI
                CIB

NIG                       •Submissions
                          •Updates              SRS
               getentry                                 EMBL
    GenBank:       NCBI’s Primary Sequence Database




                                                              NCBI Field Guide
         Release 168               October 2008
             96,400,790            Records
          97,381,682,336           Bases
         Whole Genome Shotgun
              46,108,952           Records
          136,085,973,423          Bases
              142,509,742          Total Records
          233,467,655,759          Total Bases


                            • full release every two months
ftp.ncbi.nih.gov/genbank/ • incremental updates daily
                            • available only via ftp
                     The Growth of GenBank




                                                                                                   NCBI Field Guide
            240
                     November 2008
            220
            200
            180
            160                                WGS: 136 billion bases
            140
(billions
 Bases




            120
            100    Doubling time 12-14 months
            80
            60
            40                       GenBank Release: 97 billion bases
            20
             0
             Aug-01 Aug-02 Aug-03 Aug-04 Aug-05 Aug-06 Aug-07 Aug-08 Aug-09 Aug-10 Aug-11 Aug-12
        Organization of GenBank:
          Traditional Divisions




                                                           NCBI Field Guide
Records are divided into 18 Divisions.
     12 Traditional      PRI Primate
     6 Bulk              PLN Plant and Fungal
                         BCT Bacterial and Archeal
                         INV Invertebrate
                         ROD Rodent
Traditional Divisions:   VRL Viral
• Direct Submissions     VRT Other Vertebrate
   (Sequin and BankIt)   MAM Mammalian
• Accurate               PHG Phage
                         SYN Synthetic (cloning vectors)
• Well characterized     ENV Environmental Samples
                         UNA Unannotated

        Entrez query: gbdiv_xxx[Properties]
         Organization of GenBank:
              Bulk Divisions




                                                       NCBI Field Guide
 Records are divided into 18 Divisions.
      12 Traditional
      6 Bulk
                         EST Expressed Sequence Tag
                         GSS Genome Survey Sequence
                         HTG High Throughput Genomic
BULK Divisions:          STS Sequence Tagged Site
• Batch Submission       HTC High Throughput cDNA
   (Email and FTP)       PAT Patent
• Inaccurate
• Poorly characterized

         Entrez query: gbdiv_xxx[Properties]
LOCUS
DEFINITION
              AF124527                2540 bp    mRNA     linear  PLN 29-JAN-2004
              Prunus persica ethylene receptor (ETR1) mRNA, complete cds.
                                                                                       A Traditional
                                                                                    GenBank Record
ACCESSION     AF124527




                                                                                                            NCBI Field Guide
VERSION       AF124527.1 GI:6841074
KEYWORDS      .
SOURCE        Prunus persica (peach)
  ORGANISM    Prunus persica
              Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
              Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons;
              rosids; eurosids I; Rosales; Rosaceae; Amygdaloideae; Prunus.
REFERENCE     1 (bases 1 to 2540)
   AUTHORS    Bassett,C.L., Artlip,T.S. and Callahan,A.M.
   TITLE      Characterization of the peach homologue of the ethylene receptor,
              PpETR1, reveals some unusual features regarding transcript
              processing                                                            Header
   JOURNAL    Planta 215 (4), 679-688 (2002)
    PUBMED    12172852
REFERENCE     2 (bases 1 to 2540)
   AUTHORS    Bassett,C.B., Artlip,T.S. and Nickerson,M.L.
   TITLE      Direct Submission
   JOURNAL    Submitted (29-JAN-1999) Appalachian Fruit Research Station,
              USDA-ARS, 45 Wiltshire Road, Kearneysville, WV 25430, USA
FEATURES
      source
                       Location/Qualifiers
                       1..2540
                       /organism="Prunus persica"
                                                                                      The Flatfile Format
                       /mol_type="mRNA"
                       /cultivar="Loring"
                       /db_xref="taxon:3760"
                       /dev_stage="III B/C fruit"
      gene             1..2540
                       /gene="ETR1"
      CDS              269..2485
                       /gene="ETR1"
                       /codon_start=1
                       /product="ethylene receptor"
                       /protein_id="AAF28893.1"
                       /db_xref="GI:6841075"
                       /translation="MEACNCIEPQWPADELLMKYQYISDFFIALAYFSIPLELIYFVK
                       KSAVFPYRWVLVQFGAFIVLCGATHLINLWTFSMHSRTVAIVMTTAKVLTAVVSCATA
                       LMLVHIIPDLLSVKTRELFLKNKAAELDREMGLIRTQEETGRHVRMLTHEIRSTLDRH
                       TILKTTLVELGRTLALEECALWMPTRTGLELQLSYTLRQQNPVGYTVPIHLPVINQVF
                                                                                    Feature Table
                       SSNRALKISPNSPVARMRPLAGKHMPGEVVAVRVPLLHLSNFQINDWPELSTKRYALM
                       VLMLPSDSARQWHVHELELVEVVADQVAVALSHAAILEESMRARDLLMEQNIALDLAR
                       REAETAIRARNDFLAVMNHEMRTPMHAIIALSSLLQETELTPEQRLMVETILKSSHLL
                       ATLINDVLDLSRLEDGSLQLEIATFNLHSVFREVHNLIKPVASVKKLSVSLNLAADLP
                       VQAVGDEKRLMQIVLNVVGNAVKFSKEGSISITAFVAKSESLRDFRAPEFFPAQSDNH
                       FYLRVQVKDSGSGINPQDIPKLFTKFAQTQSLATRNSGGSGLGLAICKRFVNLMEGHI
                       WIESEGPGKGCTAIFIVKLGFAERSNESKLPFLTKVQANHVQTNFPGLKVLVMDDNGS
                       VTKGLLVHLGCDVTTVSSIDEFLHVISQEHKVVFMDVCMPGIDGYELAVRIHEKFTKR
                       HERPVLVALTGNIDKMTKENCMRVGMDGVILKPVSVDKMRSVLSELLEHRVLFEAM"
ORIGIN
          1 gcacgagggc tcaccgagcg agctagctct tcaggagtca aggcttctgg gtgaggggaa
         61 gaagaagaag cttctttgat gtgttggggt gccaatctaa agaggaagaa gaaggcctct
       121 aatgtattga ggtcggctgt ctgggctgcc gatctgtgtt gaatggatag tttggtagag
       181 atgcttcaac gacatagggt ggctgaaaag ggtttgaaga aagtgaagga ggaaaccaag
                                           ...
      2401 tatactgaaa cctgtctcag ttgataaaat gaggagtgtt ttatcagaac tgttggagca
                                                                                    Sequence
      2461 tcgagtttta tttgaggcta tgtaagatat aggaaaattg ttctagtgaa ggaaagattt
      2521 aaatggaaaa aaaaaaaaaa
//
         Traditional GenBank Record




                                                            NCBI Field Guide
                                              Accession
                                              •Stable
 ACCESSION           U07418                   •Reportable
                                              •Universal
 VERSION             U07418.1   GI:466461

Version                          GI number
Tracks changes in sequence       NCBI internal use




well annotated

the sequence is the data
           Bulk Divisions




                                                NCBI Field Guide
•Batch Submission and htg (email and ftp)
•Inaccurate
•Poorly Characterized

 • Expressed Sequence Tag
     – 1st pass single read cDNA
 • Genome Survey Sequence
     – 1st pass single read gDNA
 • High Throughput Genomic
     – incomplete sequences of genomic clones
 • Sequence Tagged Site
     – PCR-based mapping reagents
        GenBank Bulk Sequence: EST




                                     NCBI Field Guide
poorly
characterized
    Expressed Sequence Tags in Entrez




                                          NCBI Field Guide
Total                59 million records
Human                8.1 million
Mouse                4.9 million
Pig                  2.2 million
Maize                2.0 million
Arabidopsis          1.5 million
Cow                  1.5 million
Zebrafish            1.4 million
Soybean              1.4 million
Xenopus tropicalis   1.3 million
Rice                 1.2 million
Ciona intestinalis   1.2 million
Wheat                1.0 million
Rat                  1.0 million
Whole Genome Shotgun Projects




                                                        NCBI Field Guide
ftp.ncbi.nih.gov/genbank/wgs/

                         • >900 Projects
                         • >800 Taxa
                            –   585 Bacteria
                            –   8 Archaea
                            –   17 metagenomes
                            –   255 eukaryotes
                                 • 86 fungi
                                 • 89 animals
                                 • 7 flowering plants
Now 50 species,                      Mammalian WGS




                                                     NCBI Field Guide
   including…
•   Duck-billed platypus
•   Nine-banded armadillo
•   Northern tree shrew
•   Domestic rabbit
•   Pika
•   Guinea pig
•   Mouse
•   Rat
•   Thirteen-lined ground squirrel
•   Small-eared galago
•   Mouse lemur
•   Orangutan
•   Human
•   Chimpanzee
•   Gorilla
•   Rhesus macaque
•   Tenrec
•   African elephant
•   Dog
•   Cat
•   Horse
•   European hedgehog
•   Eurasian shrew
•   Little brown bat
•   Cow
•   Gray short-tailed opossum
       NCBI Field Guide
Plant WGS
                       NCBI Field Guide
Derivative Databases
Entrez Protein: Derivative Database




                                                                    NCBI Field Guide
   Data Source                                     Sequences
   GenPept                                            16,076,221
   RefSeq                                              6,035,597

   Third Party Annotation                                  6,034

   Swiss Prot                                           399,8106

   PIR                                                    21,703

   PRF                                                    12,079

   PDB                                                   123,996



   Total                                               18,971,426

   BLAST nr total                                       7,269,299
   (no patents, 1 million; no env_nr, 6 million)
  GenPept: GenBank CDS translations




                                                                                NCBI Field Guide
FEATURES      Location/Qualifiers
     source   1..2484
              /organism="Homo sapiens"
              /mol_type="mRNA"
              /db_xref="taxon:9606"
              /chromosome="3"
              /map="3p22-p23"
     gene     1..2484
                       >gi|463989|gb|AAC50285.1| DNA mismatch repair prote...
              /gene="MLH1"
     CDS      22..2292 MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV...
                       EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD...
              /gene="MLH1"
              /note="homolog of S. cerevisiae PMS1 (Swiss-Prot Accession
              Number P14242), S. cerevisiae MLH1 (GenBank Accession
              Number U07187), E. coli MUTL (Swiss-Prot Accession Number
              P23367), Salmonella typhimurium MUTL (Swiss-Prot Accession
              Number P14161) and Streptococcus pneumoniae (Swiss-Prot
              Accession Number P14160)"
              /codon_start=1
              /product="DNA mismatch repair protein homolog"
              /protein_id="AAC50285.1"
              /db_xref="GI:463989"
              /translation="MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKS
              TSIQVIVKEGGLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGE
              ALASISHVAHVTITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIA
              TRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRS
                 Redundant Proteins




                                                                       NCBI Field Guide
>gi|463989|gb|AAC50285.1| DNA mismatch repair prote...
MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV...   20 Proteins
EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD...

>gi|13905126|gb|AAH06850.1| MutL protein homolog 1 ...
MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV...   GenPept
EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD...

>gi|1079787|gb|AAA82079.1| DNA mismatch repair prot...
MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV...
EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD...

>gi|4557757|ref|NP_000240.1| MutL protein homolog 1...
MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV...   NCBI RefSeq
EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD...

>gi|730028|sp|P40692|MLH1_HUMAN DNA mismatch repair...
MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV...   Swiss-Prot
EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD...

>gi|741682|prf||2007430A DNA mismatch repair protei...
MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV...
                                                         PRF
EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD...

                          Etc.
   Protein Sequences from Structures




                                                                     NCBI Field Guide
>gi|5542073|pdb|1B63|A Chain A, Mutl Complexed With Adpnp
SHMPIQVLPPQLANQIAAGEVVERPASVVKELVENSLDAGATRIDIDIERGGAKLIRIRDNGCGIKKDEL
ALALARHATSKIASLDDLEAIISLGFRGEALASISSVSRLTLTSRTAEQQEAWQAYAEGRDMNVTVKPAA
HPVGTTLEVLDLFYNTPARRKFLRTEKTEFNHIDEIIRRIALARFDVTINLSHNGKIVRQYRAVPEGGQK
ERRLGAICGTAFLEQALAIEWQHGDLTLRGWVADPNHTTPALAEIQYCYVNGRMMRDRLINHAIRQACED
KLGADQQPAFVLYLEIDPHQVDVNVHPAKHEVRFHQSRLVHDFIYQGVLSVLQ
RefSeq:         NCBI’s Derivative Sequence Database




                                                            NCBI Field Guide
• Curated transcripts and proteins
   – reviewed
   – human, mouse, rat, fruit fly, zebrafish, arabidopsis
     microbial genomes (proteins), and more
• Model transcripts and proteins
• Assembled Genomic Regions (contigs)
   – human genome       – chicken
   – mouse genome       – honeybee
   – rat genome         – sea urchin
• Chromosome records
   – Human genome
   – microbial
   – organelle  srcdb_refseq[Properties]


 ftp://ftp.ncbi.nih.gov/refseq/release/
             Genomes: Two Paths




                                                               NCBI Field Guide
• NCBI Eukaryotic Genomes • Microbial Genomes
   – Since 1999           • Outside Eukaryotic
   – Map Viewer             Genomes (Plants, Fungi)

   – UniGene                    – Since 1993
   – HomoloGene                 – Comparative Proteomics
                                   • Clusters of Orthologous
   – Contigs, Transcripts and        Groups (COGs)
     Proteins                      • Protein Clusters
                                – Chromosomes and
                                  Proteins
 Selected RefSeq Accession Numbers




                                                      NCBI Field Guide
mRNAs and Proteins
NM_123456            Curated mRNA
NP_123456            Curated Protein
NR_123456            Curated non-coding RNA
XM_123456            Predicted mRNA
XP_123456            Predicted Protein
XR_123456            Predicted non-coding RNA
Gene Records
NG_123456            Reference Genomic Sequence
Chromosome
NC_123455            Microbial replicons, organelle
                     genomes, human chromosomes
Assemblies
NT_123456            Contig
NW_123456            WGS Supercontig
              Two Paths to RefSeq




                                                                        NCBI Field Guide
     Human MLH1 Sequences
                                      Arabidopsis MLH1 Sequences
   mRNA            Genomic
                      .              Genomic              Annotations
  U07343
                  AC006583                     AL161471
  AU127758                                     AL161472
                  AC011816                         :
  BC006850                          AJ270058              CAB78038
                      .                            :
                                    AJ270060   AL161595
                                               AL161596
                                                           Protein

 NM_000249
                 NT_022517                                NM_116983
             (36974983..37032341)   NC_003075
                                                           Transcript
                 NC_000003
             (37009983..37067341)



NCBI Annotated Genomes and              Submitted Genomes and
Selected Model Organisms                Annotation
GenBank to RefSeq: NCBI Organisms




                                    NCBI Field Guide
      RefSeqs: Annotation Reagents




                                                             NCBI Field Guide
                                              Genomic DNA
                                               (NC, NT, NW)
             Scanning....


             Model mRNA (XM)            Model protein (XP)
                            (XR)
                                   =?
            Curated mRNA (NM)           Curated Protein (NP)
                            (NR)


RefSeq

GenBank
Sequences
                  RefSeq Benefits




                                                      NCBI Field Guide
•   Non-redundancy
•   Explicitly linked nucleotide and protein sequences
•   Updates to reflect current sequence data and biology
•   Data validation
•   Format consistency
•   Distinct accession series
•   Stewardship by NCBI staff and collaborators
                                            Mouse
                                         Assembly




                                                    NCBI Field Guide
                            UniGene
                            Transcript




               Other
               GenBank

RefSeq
Contig   BAC




                    RefSeq
                    Transcript
                      NCBI Field Guide
Expressed Sequences

      UniGene
       GEO
NCBI Expressed Sequences




                                 NCBI Field Guide
62,282,583 mRNA sequences
  60,705,055 GenBank
 (58,955,534 EST Division)
 1,575,789 Reference Sequences
         What is UniGene?




                                                 NCBI Field Guide
A gene-oriented view of sequence entries
•MegaBlast based automated sequence clustering
•Now informed by genome hits
•Nonredundant set of gene oriented clusters
•Each cluster a unique gene
•Information on tissue types and map locations
•Includes known genes and uncharacterized ESTs
•Useful for gene discovery and selection of
mapping reagents
      EST hits: Human mRNA




                                           NCBI Field Guide
                      Thrombin mRNA


5’ EST hits



                             3’ EST hits
                  UniGene
Chordates




                                         NCBI Field Guide
                     Plants




  Invertebrates




                          Fungi et al.
Gene Catalog: Fathead Minnow MLH1Cluster




                                                  NCBI Field Guide
                           Uncharacterized ESTs
Associating Sequences: Human Thrombin




                                        NCBI Field Guide
            NCBI Field Guide
Expression Data
      Other NCBI Databases




                                                           NCBI Field Guide
•Structure:      imported structures (PDB)
                 Cn3D viewer, NCBI curation

•CDD:            conserved domain database
                 Protein families (COGs and KOGs)
                 Single domains (PFAM, SMART, CD)

•dbSNP:          nucleotide polymorphism
•Gene:           gene records
                 Unifies LocusLink and Microbial Genomes

•HomoloGene: neighboring function for Gene
 MMDB:       Molecular Modeling Data Base




                                                       NCBI Field Guide
• Derived from experimentally determined PDB records
• Value added to PDB records including:
   – Addition of explicit chemical graph information
   – Validation (secondary structure elements)
   – Inclusion of Taxonomy, Citation
   – Conversion to ASN.1 data description language
• Structure neighbors determined by
       Vector Alignment Search Tool (VAST)
Cn3D 4.1: Bacillus thuringiensis
            Toxin




                                   NCBI Field Guide
          VAST: Structure Neighbors




                                                            NCBI Field Guide
                    Vector Alignment Search Tool

                                                   4
For each protein chain,
                                      2
locate SSEs (secondary
structure elements),
                                  5   6
and represent them as
individual vectors.                                1
                                      3
                                                   IL-4 &
align the vectors                                  Leptin
                                      Human IL-4
               Protein Domains




                                                         NCBI Field Guide
• Structural Domain
  – Discrete independently folding unit of a protein
• Conserved Domain (sequence-based)
  – Protein region with recognizable position-specific
    pattern of sequence conservation
• Sequence-based domains often roughly
  correspond to structural domains
• Domains often have distinct, identifiable
  functions
NCBI’s Conserved Domain Database




                                      NCBI Field Guide
• PSI-BLAST –based score matrices
• Searchable with RPS-BLAST
• Sources
  – SMART
  – PFAM
  – COGs
  – NCBI curated domains
    • structure informed alignments
Src Domains




                                    NCBI Field Guide
          Four 3d domains
          Three conserved domains
      Structure vs Conserved Domain




                                                                  NCBI Field Guide
                     Conserved phosphotyrosine binding residues

SH2

               SH2



                TyrKC

SH3


       Cn3D
             NCBI’s SNP Database




                                                   NCBI Field Guide
•   Primary Database and Derivative (RefSNP)
•   Single Nucleotide Polymorphism
•   Repeat polymorphisms
•   Insertion-Deletion Polymorphisms
•   29 Species
•   Over 46 million submissions (submitted SNPs)
•   Over 26 million reference SNPs
              The Gene Database




                                                      NCBI Field Guide
• Gene Centered Information
• Unifies NCBI-annotated and Submitted Genomes
• 4.6 million records for 5,588 taxa
  Human            40,286 Sea Urchin        30,412
  Chimpanzee       31,570 Mosquito          12,936
  Mouse            61,928 Fruit Fly         22,722
  Rat              37,087 C. elegans        21,185
  Dog              20,190 Fungi            355,726
  Cow              26, 600 Green Plants    145,845
  Chicken          19, 936 Archaea         120,103
  Zebrafish        37, 460 Bacteria       2,685,548
                                    NCBI Field Guide
           NCBI Molecular Biology
                 Resources

                Using Entrez




November 2008
         NCBI Field Guide
WWW
Access

Entrez
&
BLAST
     Entrez: Database Integration




                                                                        NCBI Field Guide
                                      Word weight

                          PubMed
                          abstracts



        Homologene                              3 -D
                                                3-D
                                             Structure
                                             Structure
                                                                VAST
                           Gene
                                                         Neighbors
                                                         Related Structures


             Nucleotide                   Protein
BLAST                                                    BLAST
             sequences                  sequences
                                                     Neighbors
Neighbors
                            Hard Link                Related Sequences
Related Sequences
                                                     BLink
                                                     Domains
The Links Menu:   Access to Neighbors and Links




                                                  NCBI Field Guide
                                   SNP



           GEO


                                   Gene

                     PubMed




                                          Protein
The Links Menu:   Access to Neighbors and Links




                                                  NCBI Field Guide
                  Neighbors: BLAST Link
                  pre-computed BLAST




            Neighbors:
            pre-computed CDD search
The Links Menu:    Access to Neighbors and Links




                                                   NCBI Field Guide
       Neighbors




                                         Hard Links
                                                                 NCBI Field Guide
Database Searching with Entrez
Using limits and field restriction to find human MutL homolog
Linking and neighboring with MutL
Mapping SNPs onto structure
        Global NCBI (Entrez) Search




                                      NCBI Field Guide
colon cancer
Global Entrez Search Results




                               NCBI Field Guide
 Nucleotide Sequences




                                       NCBI Field Guide
 Nucleotide database now three parts

•EST: expressed sequence tags
•GSS: genome survey sequences
•Nucleotide: everything else
Core Nucleotide Results with Gene Preview




                                            NCBI Field Guide
  Gene Preview
  More relevant results




                Taxonomy Filters
Advanced Search Options




                                   NCBI Field Guide
        Tabs




                 Taxonomy filter
More Precise Nucleotides Search




                                                                  NCBI Field Guide
colon cancer[Title] AND nonpolyposis[Title] AND human[Organism]
AND biomol_mrna[Properties] AND srcdb_refseq[Properties]
                Useful Field Restrictions




                                                                                 NCBI Field Guide
[Title]: Definition line in GenBank / GenPept format shown in Summary format

    glyceraldehyde 3 phosphate dehydrogenase[Title]

[Organism]: NCBI’s taxonomy. Organizing system for molecular databases

    mouse[organism]; green plants[organism]; Streptomyces coelicolor[organism]

[Properties]: molecule type, location, database source

    biomol_mrna[properties]; biomol_genomic[properties];
    gene_in_mitochondrion[properties]; srcdb pdb[properties]


[Filter]: subsets of data, Entrez links

    all[filter]; nucleotide mapview[filter]; nucleotide omim[filter]
        Entrez Tip: Start Searches in Gene




                                                  NCBI Field Guide
                                 BLink




Other Entrez DBs


                                 Homologene:
                                 Gene Neighbors
             Gene Results




                                                NCBI Field Guide
nonpolyposis colon cancer AND human[Organism]
   Precise Results




                                        NCBI Field Guide
MLH1[Gene Name] AND Human[Organism]



                        NCBI Taxonomy
Organism Field: NCBI’s Taxonomy




                                  NCBI Field Guide
          All molecular
          databases
MLH1 Gene Record




                   NCBI Field Guide
MLH1 Gene Record: Interactions and GO




                                        NCBI Field Guide
MLH1 Gene Record: Sequences




                              NCBI Field Guide
MLH1:Links to Sequence




                         NCBI Field Guide
                           NCBI Field Guide
Finding Protein Homologs
       BLink: BLAST Link




                                NCBI Field Guide
Gene




                      Protein
BLink: BLAST Link (Best Hits)




                                        NCBI Field Guide
                   Redundant Proteins




                     Tomato homolog




       BLAST
Finding Polymorphisms




                                 NCBI Field Guide
    Protein Links




                    Gene Links
GeneView: Variations Human MLH1




                                     NCBI Field Guide
                          ATPase domain
                           NCBI Field Guide
MLH1 Structure Model and
 Mapping Polymorphisms
Related Structures: Structure Model




                                      NCBI Field Guide
Sequence Similar Structures




                                                 NCBI Field Guide
                                     Conserved
                                      Domain
  Link to Structure


                      Link to Alignment
E. coli MutL Structure




                             NCBI Field Guide
       Cn3D viewer



          Conserved Domain
Alignment Based Model: Mapping Polymorphisms




                                               NCBI Field Guide
                      Mg2+ binding site
  Ile - Val
 Better Model: Conserved Domain




                                         NCBI Field Guide
Gene




                              Protein



                    Related Structures
Better Model: Conserved Domain




                                         NCBI Field Guide
       Ile – Val     Mg2+ binding site
       Position 32

								
To top