NCBI_Entrez_14Januar by liamei12345

VIEWS: 4 PAGES: 21

									NCBI Resources
                 Entrez System
             Entrez Databases:




                                                                                       NCBI Resources
                                                   Tight Integration

                                    PMC
     Cancer Chromosome         Books               Word weight
            OMIM                       PubMed
                                                   Related Articles       PubChem
                                       Abstracts
Phylogeny                                                              3D domain
                                                                               CDD

                Taxonomy
                                                                           VAST
                                                            3 -D
                                                         Structure
                                            Gene                        Neighbors
    HomoloGene
                                   Genome                               Related Structures
Genome Project
        UniGene
     GEO                                                                    SNP
                                                                             OMIM


      BLAST                                                           BLAST
                      Nucleotide                    Protein
                      Sequences                    Sequences          Neighbors
          Neighbors                                                   Related Sequences
 Related Sequences                                                    BLink
                                    Hard Link                         Domains
        GenBank:       NCBI’s Primary Sequence Database




                                                                        NCBI Resources
• Nucleotide only sequence database
• Archival in nature
   – Historical                     PRI   primate
   – Subjective: submitter’s view   PLN   plant and fungal
   – Redundant                      BCT   bacterial and archeal
                                    INV   invertebrate
• GenBank data sources              ROD   rodent
   – Direct submissions             VRL   viral
   – Batch: EST, GSS, STS           VRT   other vertebrates
                                    MAM   mammalian
   – ftp upload (genome data)
                                    PHG   phage
• Three collaborating databases     SYN   synthetic (cloning vectors)
   – GenBank (77.6%)                UNA   un-annotated
   – DDBJ (14.1%)                   EST   Expressed Sequence Tag
   – EMBL (8.3%)                    GSS   Genome Survey Sequence
                                   HTG    High Throughput Genomic
                                   STS    Sequence Tagged Site
                                   HTC    High Throughput cDNA
                                   PAT    Patent
    RefSeq:          NCBI’s Derivative Sequence Database




                                                                             NCBI Resources
• Curated transcripts and proteins (NM_, NP_)
     – reviewed
     – Most model organisms and more
• Model transcripts and proteins (XM_, XP_)
• Assembled Genomic Regions (contigs) (NT_, NW_)
     – higher genomes
• Chromosome records (NC_)
                                           srcdb_refseq[Properties]
     – Higher genomes
     – Microbial genomes
     – Organelle genomes       http://www.ncbi.nlm.nih.gov/RefSeq/key.html
Benefits :
• non-redundant
•   explicitly linked nucleotide and protein sequences
•   regular update
•   validated data
•   consistent format
•   distinctive accession series
•   stewardship by NCBI staff and collaborators
   Sequence Record:        different formats




                                               NCBI Resources
Sequence records are kept in ASN.1 format.
They are translated “on the fly” to others
upon request.

  GenBank/GenPept    useful for scientists
  FASTA              the simplest format
  ASN.1              useful for programmers
  XML                useful for programmers
         Other Entrez Databases




                                                   NCBI Resources
• Gene:         Gene centric records

• PubMed:       Biomedical literatures


• OMIM:         Genetic disorder/phenotype

• Structure:    Structures (PDB)
                Cn3D viewer, NCBI curation

• CDD:          Conserved Domain Database
                Protein families (COGs)
                Single domains (PFAM, SMART, CD)

• SNP:          Nucleotide Polymorphism
                          NCBI Resources
Using the Entrez System
Global Entrez: Overview of the database landscape




                                                                        NCBI Resources
                                  All [filter]
                                  Human [organism]




                                                 As of April 13, 2007
                            Entrez Databases:




                                                                               NCBI Resources
          Search with Boolean Operator Connected Text Terms
                                                               A           B

                                                                     OR




                                                                     AND
Query terms are searched against “All Databases” by default

Terms are boolean AND’ed by default

Other operators need to be specified explicitly with OR, NOT

Forced phrase search needs to be quoted
                                                               NOT
Field limiter makes the search more specific
                         Entrez Databases:                                                      Literatures




                                                                                                              NCBI Resources
   PubMed: mostly abstracts from over 50K journals
   PMC: full text article, close to 300 journals
   Books: indexed book chapters from over 40 titles with
     over 150K entries
   OMIM: over 17K human genetic disorders/phenotypes

   Suggestions:
     - Get more specific records using limiter such as
         [au], [ptyp], [mesh], [pdat], and [title]
     - Explore the field limiter using “Preview/Index” tab
     - Combine different sets of results using “History” tab
     - See the term mapping using “Details” tab
Example: Nematode microarray in pubmed, first hit, book link, RNAi, first book, first section
    Entrez Literature Databases:                  Sample search




                                                                  NCBI Resources
PubMed to PMC with Free PDF

•   Search with restriction-modification system

•   Refine it by adding AND regulation

•   Find specific articles by adding: AND Blumenthal RM[au]

•   Click “Free in PMC”, what do we get there?

•   Click “Cited in PMC” to see the other papers
    Entrez Databases:




                                                                     NCBI Resources
                                      sequence  gene  disease



Search for sequences:

•   Search with phenylalanine hydroxylase

•   click RefSeq tab and the tack to get only refseq entries

•   click mRNAs tab and the tack to get only mRNA entries

•   Use field limit to get only human entries (AND human[orgn])

•   Getting the Gene record through Links

•   Getting the genetic disorder information in OMIM through Links
                    NCBI Tools




                                                            NCBI Resources
•   BLAST
•   Cn3D
•   Splign, Spidey
•   MapViewer
•   AceView BLAST: a sequence comparison tool used to:
•   etc          – Identify sequence
                   – Cluster sequences from the same gene
                   – Find related or similar sequences
                   – Map mRNA to its genomic counterpart
                   – Verify primer annealing site
                   – Identify functional domain
                         BLAST:




                                                                                                                             NCBI Resources
                                          identify unknown sequence


PCR fragment amplified from a clinical sample

Search:
Copy the sequence
Go to www.ncbi.nih.gov/BLAST/                     >Unknown RT-PCR Products
                                                  AGGTCGGCCACGCCACTCGCGGGTGGGCTCGTGTTACAGCACACCAGCCCGTTCTTTTCCCCCCCTCCCA
Select “blastn”                                   CCCTTAGTCAGACTCTGTTACTTACCCGTCCGACCACCAACTGCCCCCTTATCTAAGGGCCGGCTGGAAG
Paste the sequence in the first large input box   ACCGCCAGGGGGTCGGCCGGTGTCGCTGTAACCCCCCACGCCAATGACCCACGTACTCCAAGAAGGCATG
                                                  TGTCCCACCCCGCCTGTGTTTTTGTGCCTGGCTCTCTATGCTTGGGTCTTACTGCCTGGGGGGGGGGAGT
Change database to nt by clicking “others”        GCGGGGGAGGGGGGGTGTGGAAGGAAATGCACGGCGCGTGTGTACCCCCCCTAAAGTTGTTCCTAAAGCG
Click “BLAST!” button to search                   AGGATACGGAGGAGTGGCGGGTGCCGGGGGACCGGGGTGATCTCTGGCACGCGGGGGTGGGAAGGGTCGG
                                                  GGGAGGGGGGGATGGAGTACCGGCCCACCTGGCCGCGCGGGTGCGCGTGCCTTTGCACACCAACCCCACG
                                                  TCCCCCGGCGGTCTCTAAGAAGCACCGCCCCCCCTCCTTCATACCACCGAGCATGCCTGGGTGTGGGTTG
                                                  GTAACCAACACGCCCATCCCCTCGTCTCCTGTGATTCTCTGGCTGCACCGCATTCTTGTTTTCTAACTAT
Get Result:                                       GTTCCTGTTTCTGTCTCCCCCCCCCCCACCCCTCCGCCCCACCCCCCAACACCCACGTCTGTGGTGTGGC
The page will automatically check for result      CGACCCCCTTTTGGGCGCCCCGTCCCGCCCCGCCACCCCTCCCATCCTTTGTTGCCCTATAGTGTAGTTA
                                                  ACCCCCCCCGCCCTTTGTGGCGGCCAGAGGCCAGGTCAGTCCGGGCGGGCAGGCGCTCGCGGAAACTTAA
                                                  CACCCACACCCAACCCACTGTGGTTCTGGCTCCATGCCAGTGGCAGGATGCTTTCGGGGATCGGTGGTCA
Analyze:                                          GGCAGCCCGGGCCGCGGCTCTGTGGTTAACACCAGAGCCTGCCCAACATGGCACCCCCACTCCCACGCAC
                                                  CCCCACTCCCACGCACCCCCACTCCCACGCACCCCCACTCCCACGCACCCCCACTCCCACGCACCCCCAC
See the matching sequences returned to            TCCCACGCACCCCCACTCCCACGCACCCCCACTCCCACGCACCCCCACTCCCACGCATCCCCGCGATACA
determined the identification/source              TCCAACACAGACAGGGAAAAGATACAAAAGTAAACCTTTATTTCCCAACAGACAGCAAAAATCCCCTGAG
                                                  TTTTTTTTTATTAGGGCCAACACAAAAGACCCGCTGGTGTGTGGTGCCCGTGTCTTTCACTTTTCCCCTC
                                                  CCCGACACGGATTGGCTGGTGTAGTGGGCGCGGCCAGAGACCACCCAGCGCCCGACCCCCCCCTCCCCAC
Preserved search result under:                    AAACACGGGGGGCGTCCCTTATTGTTTTCCCTCGTCCCGGGTCGACGCCCCCTGCTCCCCGGACCACGGG
                                                  TGCCGAGACCGCAGGCTGCGGAAGTCCAGGGCGCCCACTAGGGTGCCCTGGTCGAACAGCATGTTCCCCA
1158860095-5488-172146079962.BLASTQ1              CGGGGGTCATCCAGAGGCTGTTCCACTCCGACGCGGGGGCCGTCGGGTACTCGGGGGGCATCACGTGGTT
                                                  ACCCGCGGTCTCGGGGAGCAGGGTGCGGCGGCTCCAGCCGGGGACCGCGGCCCGCAGCCGGGTCGCCATG
                                                  TTTCCCGTCTGGTCCACCAGGACCACGTACGCCCCGATGTTCCCCGTCTCCATGTCCAGGATGGGCAGGC
                                                  AGTCCCCCGTGATAGTCTTGTTCACGTAAGGCGACAGGGCGACCACGCTAGAGACCCCCGAGATGGGCAG
                                                  GTAGCGCGTGAGGCCGCCCGCGGGGACGGCCCCGGAAGTCTCCGCGTGGCGCGTCTTCCGGGCACACTTC



                                                                                       Source sequence: >gi|9629378:c6907-4953
            BLAST:      interpreting the result




                                                          NCBI Resources
The matches are all from Herpes Simplex Virus,
  strongly indicating query’s viral origin.

The patient will be diagnosed as HSV infected, even
  though he/she shows no sign of infection.

Biology:
  Once infected, the virus stays with host for life and
  periodically reemerge (cold sore). During latent
  stage, the viral genome is kept as an episome. The
  LAT region of the genome is transcribed during this
  stage. The presence of this transcript indicates HSV
  infection.
    BLAST:           mapping primer to the human genome




                                                                                       NCBI Resources
Upstream primer      5´-CCATGGCGACCCTGGAAAAGC-3´   >Primer Pair
Downstream primer    5´-CAGCAGCGGCTGTGCCTGCGG-3´   CCATGGCGACCCTGGAAAAGC
                                                   NNNNNNNNNNNNNNNNNNNNN
Convert input primers to suitable format a         CAGCAGCGGCTGTGCCTGCGG


 Go to this page:
 http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi?taxid=9606

 Adjustments needed:
 - Paste in converted primers
 - Change database to: genome (reference only)
 - Change program to blastn
 - Change filter to “None”
 - Increase Expect to 10
 - Hit submit to search
                                                    See the adjusted page with input

 Information from such a search:                      saved result for this search.
  - Find the genomic location
  - Check for secondary annealing sites
  - Identify target gene
  - See amplicon sequence and feature
  - Link to other resources with additional info
                     BLAST :           sample domain search




                                                                              NCBI Resources
>Transmembrane Protein
MLKIITRQLFARLNRHLPYRLVHRDPLPGAQTAVNATIPPSLSERCLKVAAMEQETLWRVFDTHPEGLNA
AEVTRAREKHGENRLPAQKPSPWWVHLWVCYRNPFNILLTILGGISYATEDLFAAGVIALMVGISTLLNF
VQEARSTKAADALKAMVSNTATVLRVINENGENAWLELPIDQLVPGDIIKLAAGDMIPADLRIIQARDLF
VAQASLTGESLPVEKVAATREPRQNNPLECDTLCFMGTNVVSGTAQAVVMATGAGTWFGQLAGRVSEQDN
EQNAFQKGISRVSMLLIRFMLVMAPVVLIINGYTKGDWWEAALFALSVAVGLTPEMLPMIVTSTLARGAV
KLSKQKVIVKHLDAIQNFGAMDILCTDKTGTLTQDKIVLENHTDISGKPSEHVLHCAWLNSHYQTGLKNL
LDTAVLEGVDETAARQLSGRWQKIDEIPFDFERRRMSVVVAEDSNVHQLVCKGALQEILNVCTQVRHNGD
IVPLDDNMLRRVKRVTDTLNRQGLRVVAVATKYLPAREGDYQRIDESDLILEGYIAFLDPPKETTAPALK
ALKASGITVKILTGDSELVAAKVCHEVGLDAGDVIIGSDIEGLSDDALAALAARTTLFARLTPMHKERIV
TLLKREGHVVGFMGDGINDAPALRAADIGISVDGAVDIAREAADIILLEKSLMVLEEGVIEGRRTFSNML
KYIKMTASSNFGNVFSVLVASAFLPFLPMLPLHLLIQNLLYDVSQVAIPFDNVDEEQIQKPQRWNPADLG
                 Search:
RFMVFFGPISSIFDILTFCLMWWVFHANTPETQTLFQSGWFVVGLLSQTLIVHMIRTRRLPFIQSRAAWP
                 Copy the sequence
LMAMTLLVMVVGVSLPFSPLASYLQLQALPLSYFPWLIAILVGYMTLTQLVKGFYSRRYGWQ

                  Go to http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi

                  Paste the sequence in “Query” box
                  Click “Submit Query” button to get the result

                  Analysis:
                  See the cartoon to get the summary
                  Click on cartoon of interest to see details

gi|543864|sp|P36640|ATMA_SALTY
                 BLAST:            CDD search result analysis




                                                                                         NCBI Resources
                                                                               See the
                                                                               actual
                                                                               result.

• Hydrolase domain (partial match)
     – Click the cartoon to go to domain record
     – Click on “Show Structure”
     – This launches Cn3D to display the structure with sequence alignment

• MgtA Domain (first match to the “whole” query)
     – Covering the whole domain
     – Highly conserved, broken into segments (with 10-membrane domains)


It is important to master the power of this interlinked collection of resources,
which may allow you to see that hidden pot of virtual gold ...
         Other Databases & Tools




                                                            NCBI Resources
Databases:
• Small molecules: PubChem Compound, Substance, BioAssey
• Expression: GEO dataset, GEO profile
• Large Scale genetic study: dbGAP
• SNP: single nucleotide polymorphism, haplotype, gentype
• Probe: Probes, RNAi, primers

Tools:
• Splign (exon/intron boundary mapping)
• Genome Workbench
     Outside Tools:                   complementing NCBI Resources




                                                                        NCBI Resources
•   Tools for membrane domain, signal peptide prediction:
     – http://www.ch.embnet.org/software/TMPRED_form.html
     – http://www.cbs.dtu.dk/services/SignalP/

•   Tools for multiple sequence alignment:
     – http://bioinfo.genopole-toulouse.prd.fr/multalin/multalin.html
     – http://www.drive5.com/muscle/
     – http://www.ebi.ac.uk/clustalw/

•   Tools for transcriptional factor binding sites identification:
     – http://www.cbil.upenn.edu/tess/
     – http://motif.genome.jp/

•   Tools for transcriptional factor binding sites identification:
     – http://bioinformatics.org/sms2/

•   Tools for restriction mapping:
     – http://rna.lundberg.gu.se/cutter2/
     – http://tools.neb.com/NEBcutter2/index.php
Source Of Help When Needed




                                          NCBI Resources
NCBI User Service
 info@ncbi.nlm.nih.gov
 blast-help@ncbi.nlm.nih.gov
 (301)496-2475

Help Session for NIH staff
 http://www.ncbi.nlm.nih.gov/staff/tao/

								
To top