Docstoc

NCBI_BLAST_slides

Document Sample
NCBI_BLAST_slides Powered By Docstoc
					                                  NCBI Discovery Workshops
            NCBI BLAST Services




September 30, 2009
               Using BLAST




                                           NCBI Discovery Workshops
• Basics of using NCBI BLAST
• Using the new Interface
  – Improved organism and filter options
• New Services
  – Primer BLAST
  – Align 2 Sequences Integration
  – COBALT – protein multiple alignment
  – Global Alignment tool
Basic Local Alignment Search Tool




                                                           NCBI Discovery Workshops
•   Widely used similarity search tool
•   Heuristic approach based on Smith Waterman algorithm
•   Finds best local alignments
•   Provides statistical significance
•   All combinations (DNA/Protein) query and database.
    –   DNA vs DNA
    –   DNA translation vs Protein
    –   Protein vs Protein
    –   Protein vs DNA translation
    –   DNA translation vs DNA translation
• www, standalone, and network client
                      Local Alignment Statistics




                                                                                         NCBI Discovery Workshops
  High scores of local alignments between two random sequences
  follow the Extreme Value Distribution

                                 Expect Value
             E = number of database hits you expect to find by chance

              size of database

                                                      E = Kmne-S or E = mn2-S’
Alignments




                                  your score


                                 expected number of     K = scale for search space
                                    random hits          = scale for scoring system
                                                        S’ = bitscore = (S - lnK)/ln2
                      Score
                                                      (applies to ungapped alignments)
BLAST and BLAST-like programs




                                                                              NCBI Discovery Workshops
•   Traditional BLAST (formerly blastall) nucleotide, protein, translations
     – blastn nucleotide query vs. nucleotide database
     – blastp protein query vs. protein database
     – blastx nucleotide query vs. protein database
     – tblastn protein query vs. translated nucleotide database
     – tblastx translated query vs. translated database
•   Megablast nucleotide only
     – Contiguous megablast
         • Nearly identical sequences
     – Discontiguous megablast
         • Cross-species comparison
•   Position Specific BLAST Programs protein only
     – Position Specific Iterative BLAST (PSI-BLAST)
         • Automatically generates a position specific score matrix (PSSM)
     – Reverse PSI-BLAST (RPS-BLAST)
         • Searches a database of PSI-BLAST PSSMs
Web Access: www.ncbi.nlm.nih.gov




                                   NCBI Discovery Workshops
                New Homepage
            The BLAST
             homepage
http://blast.ncbi.nlm.nih.gov/
NCBI Discovery Workshops
                   Basic BLAST: Databases
     Non-redundant protein




                                                     NCBI Discovery Workshops
nr (non-redundant protein sequences)
   – GenBank CDS translations
   – NP_, XP_ refseq_protein
                                          Services
   – Outside Protein                      blastp
      • PIR, Swiss-Prot, PRF              blastx

      • PDB (sequences from structures)

pat protein patents
env_nr environmental samples
Nucleotide Databases: Human and Mouse




                                                        NCBI Discovery Workshops
                    Megablast, blastn service




 • Human and mouse genomic and transcript now default
 • Separate sections in output for mRNA and genomic
 • Direct links to Map Viewer for genomic sequences
Nucleotide Databases: Traditional




                                      NCBI Discovery Workshops
                               Services
                               blastn
                               tblastn
                               tblastx
Nucleotide Databases: Traditional




                                                        NCBI Discovery Workshops
              Databases are mostly non-overlapping

 • nr (nt)                      • htgs
    – Traditional GenBank           – HTG division
    – NM_ and XM_
      RefSeqs                   • gss
       • refseq_rna                 – GSS division
 • NCBI Genomes                 • wgs
    – NC_ RefSeqs                   – whole genome
    – GenBank                         shotgun contigs
      Chromosomes
                                • env_nt
 • dbest
    – EST Division
                                    – environmental
       • non-human, non-
                                      samples
         mouse ests
            NCBI Discovery Workshops
Universal Form: Protein
       Universal Form: Nucleotide




                                           NCBI Discovery Workshops
Less                                More


                                       Speed
   Sensitivity




More                                Less
Discovery option: Entrez protein record




                                                 NCBI Discovery Workshops
                           Analysis Tools


                           PubMed Citations



                           Identical Proteins
        Discovery Column

                           Reference Sequences



                           Gene Record

                           HomoloGene Cluster
BLAST extensions and improvements




                                                  NCBI Discovery Workshops
 • PrimerBlast – primer designer / specificity
   checker
 • COBALT – Protein Multiple Alignment tool
 • Integration / expansion of BLAST 2 Sequences
 • Global Alignment (Needleman-Wunsch)
         NCBI Discovery Workshops
Specialized BLAST Pages
        Hands-on Practice: Goals




                                                            NCBI Discovery Workshops
• Select the appropriate BLAST database and program to
  get the most relevant results
• Use taxonomic / organism limit on the BLAST database to
  obtain specific results
• Map a sequence onto an assembled genome using
  BLAST
• Tune BLAST parameters for specific kinds of searches
• Design PCR primers for a specific template and check
  specificity
• Directly compare two sets of protein sequences and
  generate a multiple-alignment using link to COBALT
                           Protein BLAST




                                                                                           NCBI Discovery Workshops
Guided
• Query: human brain-type creatine kinase, NP_001814
• Program: blastp
• Database: refseq_protein
• Goals:
    – Identify members of this protein family in mammals.
    – Use taxonomy report, formatting options, TreeView, and links to explore results.
Independent
• Query: human tyrosine hydroxylase, NP_954986
• Program: blastp
• Database: refseq_protein
• Goals:
    – Identify members of the aromatic amino acid hydroxlase family in mammals and other
      groups
    – Use taxonomy report, formatting options, TreeView, and links to explore results.
                      Nucleotide BLAST




                                                                              NCBI Discovery Workshops
Guided
Query: Macaque CDC20 mRNA, AB168636
Program: nucleotide BLAST page with various algorithms
Database: human G+T, refseq_genomic (limit to "marmosets and tamarins")
Goals:
    Identify / map mRNA onto genomes
    Compare speed and sensitivity of blastn and megablast
    Demonstrate sortin optionds in BLAST results
    Show formatting options (CDS Features)
Independent
Query: Macaque CDC20 mRNA, AB168636
Services: Basic BLAST, mouse G+T database; chimpanzee genome BLAST, Macaque
   Genome BLAST
                           Genome BLAST




                                                                           NCBI Discovery Workshops
Guided
Query: Macaque CDC20 mRNA, AB168636
Program: Chimpanzee genome BLAST page with megablast algorithm
Database: genome (reference only)
Goals:
    to map the mRNA sequence onto chimpanzee genome
    to visualize matches in MapView in a genomic context
    to see the additional links/resources available from NCBI
Independent
Query: Homo sapiens Meis homeobox 3 (MEIS3), transcript variant 2, mRNA,
   BC069251.1
Program: Chimpanzee genome BLAST page with megablast algorithm
Database: genome (reference only)
 Align Two (or more) Sequences: Guided




                                                                     NCBI Discovery Workshops
Align 2 sequences
Query 1: Human Albumin, NP_000468
Query 2: Human GC, NP_000574
Program: blastp     Result link: [9R7XBJAF113]

Needleman Wunsch Global Sequence Alignment
Query 1: Human Albumin, NP_000468
Query 2: Human GC, NP_000574
Program: Protein    Result link: [9RDB2URT112]

Align more than two sequences (BLAST)
Query 1: Human Albumin, NP_000468
Query 2: Human AFP, Human AFM, Human GC proteins given below
    NP_001125
    NP_001124
    NP_000574         Result link: [9RDEW0SE112]
Multiple Alignment: Extend using COBALT Result Link: [9RDHHR5J212]
Align Two (or more) Sequences: Independent




                                                                             NCBI Discovery Workshops
Align 2 sequences
Query 1: Human spectrin alpha chain, brain isoform 3, NP_001182461
Query 2: Drosophila beta spectrin, NP_523388.1
Program: blastp    Results: [9RBVSDS3113]

Needleman Wunsch Global Sequence Alignment
Query 1: Human spectrin alpha chain, brain isoform 3, NP_001182461
Query 2: Drosophila beta spectrin, NP_523388.1
Program: Protein Results: [9RBYB5E4113]

Align more than two sequences (BLAST)
Query 1: Human spectrin alpha chain, brain isoform 3, NP_001182461
Query 2: Spectrin proteins from other organisms
NP_057726.3, NP_057726.3, NP_079489.2, NP_066022.2, NP_003119.2,
    NP_001020029.1, NP_000338.3, NP_001095.1, NP_001094.1, NP_001167055.1,
    NP_004915.2 Result Link: [9RC700S4112]

Multiple Alignment: Extend using COBALT        Result link: [ 9RC7G89T212]
                            PrimerBLAST




                                                                                          NCBI Discovery Workshops
Guided
Query: Human FOXP2 mRNA splice variant 2, NM_148898
Organism limit: human
Database: RefSeq RNA
Allow splice variants: off at first then on
Explanatory Notes:
The FOXP2 gene has multiple splice variants. It is useful to design primers that will
    amplify only variant. Primer BLAST can use information on splice variants to design
    specific primers with an NCBI mRNA Reference sequence template.
The stringency can be relaxed by selecting “Allow primer to amplify mRNA splice
    variants”. In this case primer pairs can be found that amplify all variants.

Independent
Query: Human glutathione S-transferase mu 1 transcript variant 2, NM_146421.2
Organism limit: human
Database: RefSeq RNA
Allow splice variants: off at first then on
Try using primer must span exon-exon junction

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:10/2/2012
language:English
pages:24