A Field Guide part 2 by guy21

VIEWS: 137 PAGES: 113

									                  National Center for Biotechnology Information



                           A Field Guide
                                      part 2




   January 30, 2007                        Washington University, St. Louis
NCBI FieldGuide
                  GenBank Records

                            The Flatfile Format


                          Header




                          Feature Table



                          Sequence

NCBI FieldGuide
                   A Typical GenBank Record


  LOCUS           NM_019570 4279 bp mRNA linear ROD 28-OCT-2004
  DEFINITION      Mus musculus REV1-like(S. cerevisiae)(Rev1l),mRNA
  ACCESSION       NM_019570
  VERSION         NM_019570.3 GI:50811869
                                                         = Title
  KEYWORDS        .

                                Version, gi
                             only change when
                             sequence changes




NCBI FieldGuide
                  GenBank Record: Feature Table




NCBI FieldGuide
          GenBank Record: Feature Table, con‟t.




NCBI FieldGuide
                  GenBank Record: sequence




        skip



NCBI FieldGuide
           Indexing for Nucleotide UID 59958365




       Field          Indexed Terms

[primary accession]   NM_001012399 [accn]
[title]               Bos taurus hemochromatosis (hfe), mRNA.
[organism]            Bos taurus [orgn]
[sequence length]     1168       [slen]     3000:6000[slen]
[modification date]   2005/02/19 [mdat]     2006/01:2006/08[mdat]
[properties]          biomol mrna [prop]
                      gbdiv mam
                      srcdb refseq

NCBI FieldGuide
                  Global Entrez Search: HFE

                         HFE




NCBI FieldGuide
                                           137 records
                  Entrez Nucleotide: HFE




                                               Not HFE
                                               hfe[title]




NCBI FieldGuide
                     Smarter Query
                  hfe[title] AND human[orgn]

                                               42 records


                                               Curated HFE
                                               splice variants
                                               (11 total)




NCBI FieldGuide
            hfe[title] AND human[orgn]     (con‟t)

                                         Primary data




NCBI FieldGuide
                        Preview/Index
                  Gateway to Advanced Searches




NCBI FieldGuide
                  Preview/Index




NCBI FieldGuide
                  Preview/Index: Properties, srcdb




                       srcdb = source database



 Properties         srcdb




                                                 click to see Index
NCBI FieldGuide
                  Preview/Index: Properties, srcdb




                    …AND srcdb refseq[Properties]

NCBI FieldGuide
                  Preview/Index: Properties, srcdb




          …AND srcdb ddbj/embl/genbank[Properties]

NCBI FieldGuide
                      „Properties‟ Search Field

 #1 hfe                                                     137
 #2 hfe[title] AND human[orgn]                               42

 #3 #2 AND srcdb refseq[prop]                          11
 #4 #2 AND srcdb ddbj/embl/genbank[prop]               31

 #5 #4 AND gbdiv pri[prop]                      29
 #4 #4 AND gbdiv est[prop]                 2



                  Primate division   gbdiv pri[prop]
                  EST division       gbdiv est[prop]
NCBI FieldGuide
                  „Properties‟ Search Field: biomol

 #1 hfe                                               116
 #2 hfe[title] AND human[orgn]                         42

 #3 #2 AND biomol mrna[prop]                    29
 #4 #2 AND biomol genomic[prop]                 13



           Genomic DNA        biomol genomic[prop]
           cDNA               biomol mrna[prop]



NCBI FieldGuide
                            More Queries…
                      Fields are database-specific
 Entrez Nucleotide
  Reviewed RefSeqs with transcript variants:
        srcdb refseq reviewed[prop] AND transcript[title] AND variant[title]




NCBI FieldGuide
                              More Queries…
                         Fields are database-specific
 Entrez Nucleotide
  Reviewed RefSeqs with transcript variants:
        srcdb refseq reviewed[prop] AND transcript[title] AND variant[title]

 Entrez Gene

    Topoisomerase genes from Archaea:
                  topoisomerase[gene name] AND archaea[organism]

    Genes on human chromosome 2 with OMIM links
             2[chromosome] AND human[organism] AND “gene omim”[filter]

    Membrane proteins linked to cancer:
            “integral to plasma membrane”[gene ontology] AND cancer[dis]
NCBI FieldGuide
                    Other Entrez Databases
  UniGene: rat clusters that have at least one mRNA
                      rat[organism] NOT 0[mrna count]

   SNP: uniquely mapped microsatellites on human chr2
   microsat[SNP Class] AND 1[Map Weight] AND 2[Chromosome]) AND
   human[orgn]


  UniSTS: markers on the Genethon map of human chromosome 12
       Genethon[Map Name] AND human[organism] AND 12[chromosome]


  Structure: structures of bacterial kinases with resolutions below 2 Å
          bacteria[organism] AND kinase AND 000.00:002.00[resolution]

NCBI FieldGuide
                  Genome Resources

Genomic Biology
  Genomic Biology



    Homologene



    Map Viewer



    Entrez Gene


NCBI FieldGuide
                  Genomic Biology




NCBI FieldGuide
                  Gen Biol: Gen Resources




NCBI FieldGuide
                  Map Viewer – Genome Annotation Updates




NCBI FieldGuide
                  Gen Biol: Gen Resources




NCBI FieldGuide
                  Genome Projects: microb




NCBI FieldGuide
                              Genome Projects: microb

   13 Eukaryotic Genome Sequencing Projects Selected: Complete – 0, Assembly – 2,
                                 In Progress - 11




NCBI FieldGuide
                  Gen Biol: Gen Resources




NCBI FieldGuide
                  Gen Biol: Gen Resources




NCBI FieldGuide
                  Gen Biol: Gen Resources




NCBI FieldGuide
                  Genome Resources

    Genomic Biology



Homologene


    Map Viewer



    Entrez Gene


NCBI FieldGuide
                         Homologene
•   No longer UniGene based
           orthologs                    orthologs
                               paralogs
•   Protein similarities first
•   Guided by taxonomic tree
•   Includes orthologs and A
    frog A chick A mouseparalogsmouse B chick B frog B




              A-chain gene                  B-chain gene

                              gene duplication


                             early globin gene

NCBI FieldGuide
                  Homologene Cluster –   MLH1 Cluster




NCBI FieldGuide
                  Rice Homolog




NCBI FieldGuide
                  Genome Resources

    Genomic Biology



    Homologene



 Map Viewer


    Entrez Gene


NCBI FieldGuide
                  List View




NCBI FieldGuide
                    Mouse

             adar




NCBI FieldGuide
                  MapViewer: Mouse ADAR




NCBI FieldGuide
                  MapViewer: Mouse ADAR, 28 Hits




NCBI FieldGuide
                  Mouse MapViewer: Gene Filter




NCBI FieldGuide
                           MV Hs ADAR




                   exon




                  3‟ UTR



NCBI FieldGuide
                           Maps & Options
                            Maps & Options
   --Sequence maps--       --Cytogenetic maps--
   Ab initio               Ideogram
   Assembly                --Genetic Maps--
   BES_Clone               MGI
   Component               WI_GEN
   Contig                  --RH maps--
   CpG island              WI/MRC-RH
   Ensembl Genes           WI-YAC
   Ensembl Transcripts
   GenBank_DNA
   Gene
   Gene Traps
   MICER
   Phenotype
   RefSeq Transcripts
   Repeats               ugHs
   rnaHS                 ugMm
                                        = SNP
   rnaMm                 ugRn
   rnaRn                 Variation
   STS

NCBI FieldGuide
                                   MapViewer




                                Gene annotations



 RefSeq RNA




                                        Variations
                  Tiling path




NCBI FieldGuide
                  Maps & Options
                   Maps & Options




NCBI FieldGuide
                                  Synteny


           Rat
          ADAR
                  Human
                  ADAR    Mouse
                          ADAR




NCBI FieldGuide
                  Genome Resources

    Genomic Biology



    Homologene



    Map Viewer



Entrez Gene

NCBI FieldGuide
                  Human ADAR




NCBI FieldGuide
                               Human ADAR –
                  Genomic regions, transcripts, and products




NCBI FieldGuide
                  Human ADAR –
                   Interactions




NCBI FieldGuide
                  Human ADAR




NCBI FieldGuide
                  Links




NCBI FieldGuide
                  Basic Local Alignment Search Tool




NCBI FieldGuide
                                   Outline
                             Web BLAST

          • pre-computed results
          • how BLAST works
                  – words; scoring matrices; statistics

          • specialized BLAST algorithms
          • what‟s new, or important
          • example oligo search


NCBI FieldGuide
                  BLAST Web Searches, 2006




    200,000/day




NCBI FieldGuide
                  BLAST Web Searches, 2005




   200,000/day




NCBI FieldGuide
                  Precomputed BLAST Services


       Nucleotide or protein:   Related Sequences

       BLAST link:              Blink

     Transcript clusters        UniGene

     Protein homologs           Homologene



NCBI FieldGuide
                  Link to Related Sequences




NCBI FieldGuide
                  Related Sequences

                                      Most similar




                                      Least similar
NCBI FieldGuide
                  BLink (BLAST Link)




NCBI FieldGuide
                         BLink Output
             Best hits        3D structures   CDD-Search




NCBI FieldGuide
               Why Do We Need
          Sequence Similarity Searching?
        • To evaluate evolutionary relationships
        • To identify and annotate sequences
        • Other:
             – model genomic structure (e.g., Splign)
             – check primer specificity in silico



                                      : NCBI‟s tool
NCBI FieldGuide
                  Basic Local Alignment
                       Search Tool

            • local, isolated, “surprising” regions of
                  similarity
            • breaks the query sequence into “words”
            • word hits to database sequences
                  extended in both directions


NCBI FieldGuide
                  Global vs Local Alignment
       Seq 1

       Seq 2

                                Global alignment



       Seq 1

       Seq 2

                                 Local alignment



NCBI FieldGuide
                  Global vs Local Alignment
            Seq1:   WHEREISWALTERNOW      (16aa)
            Seq2:   HEWASHEREBUTNOWISHERE (21aa)


                                    Global
            Seq1: 1                   W--HEREISWALTERNOW 16
                                      W HERE
            Seq2:   1   HEWASHEREBUTNOWISHERE            21




                                    Local
            Seq1: 1     W--HERE 5       Seq1:  1 W--HERE 5
                        W HERE                   W HERE
            Seq2: 3     WASHERE 9       Seq2: 15 WISHERE 21



NCBI FieldGuide
                             How BLAST Works

    1. Make lookup table of “words” for query

    2. Scan database for hits

    3. Extend alignment both directions

         –        Ungapped extensions of hits (initial HSPs)

         –        Gapped extensions (no traceback)

         –        Gapped extensions (traceback - alignment

                  details)

NCBI FieldGuide
                    Nucleotide Words


         Make a lookup table based on the word size.
                  11-mer
           ATGCTGCTAGTCGATGACGTAGCTA
           ATGCTGCTAGT
            TGCTGCTAGTC
             GCTGCTAGTCG
              ...
NCBI FieldGuide
                        Protein Words

                  AIEKCYTGCTLAQEADDTA
                  AIE
                   IEK    LEK, IDK, IQK, IER, IDR, etc
                    EKC
                              Neighborhood words
                      KCY
                       CYT
                        …
        Lookup table, including neighborhood words, is
        based on word size, score matrix, and threshold.
NCBI FieldGuide
          Scoring Systems - Proteins (BLOSUM62)
     A
     R
          4
         -1    5
                                                                                 IEK: keep LEK?
     N   -2    0    6                                                            (threshold = 11)
     D   -2   -2    1    6                  E/E
     C    0   -3   -3   -3    9
     Q   -1    1    0    0   -3    5
     E   -1    0    0    2   -4    2    5                  I/I             IEK = 14
     G    0   -2    0   -1   -3   -2   -2    6
     H   -2    0    1   -1   -3    0    0   -2    8             L/I        LEK = 12
     I   -1   -3   -3   -3   -1   -3   -3   -4   -3    4
     L   -1   -2   -3   -4   -1   -2   -3   -4   -3    2    4
     K   -1    2    0   -1   -3    1    1   -2   -1   -3   -2    5         K/K
     M   -1   -1   -2   -3   -1    0   -2   -3   -2    1    2   -1    5
     F   -2   -3   -3   -3   -2   -3   -3   -3   -1    0    0   -3    0    6
     P   -1   -2   -2   -1   -3   -1   -1   -2   -2   -3   -3   -1   -2   -4    7
     S    1   -1    1    0   -1    0    0    0   -1   -2   -2    0   -1   -2   -1 4
     T    0   -1    0   -1   -1   -1   -1   -2   -2   -1   -1   -1   -1   -2   -1 1 5
     W   -3   -3   -4   -4   -2   -2   -3   -2   -2   -3   -2   -3   -1    1   -4 -3 -2 11
     Y   -2   -2   -2   -3   -2   -1   -2   -3    2   -1   -1   -2   -1    3   -3 -2 -2 2 7
     V    0   -3   -3   -3   -1   -2   -2   -3   -3    3    1   -2    1   -1   -2 -2 0 -3 -1 4
     X    0   -1   -1   -1   -2   -1   -1   -1   -1   -1   -1   -1   -1   -1   -2 0 0 -2 -1 -1 -1
          A    R    N    D    C    Q    E    G    H    I    L    K    M    F    P S T W Y V X

NCBI FieldGuide
                  Word Hits & Extensions

                Nucleotide: one exact match
             ATGCTGCTAGTCGATGACGTAGCTA
                      GCTGCTAGTCG


        Protein: two matches within 40 residues
             AIEKCYTGCTLAQEADDTA
               IDK             EAD


NCBI FieldGuide
                           BLASTP Summary
                                           example query words
             Query:   IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILEV…
                                       YLS 15                HFL 18
                                       YLT 12                HFV 15
                                       YVS 12 Neighborhood HFS 14
                                                             HWL 13
                                       YIT 10      words
                                       etc …                 NFL 13 Neighborhood
                                                             DFL 12 score threshold
                                                             HWV 10
                                                                      T (-f) =11
                                                             etc …

           Query   1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI 47
                     +E YA YL K     F+YLSL +SP+ +DVNVHP+K VHFL+++ I
           Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEI 333

                                      Drop-off score =
                                Highest score – current score


                      -X X dropoff value for gapped alignment (in bits)
                      blastn 30, megablast 20, tblastx 0, all others 15



NCBI FieldGuide
                           BLASTP Summary
                                          example query words
             Query:   IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILEV…
                                       YLS 15                HFL 18
                                       YLT 12                HFV 15
                                       YVS 12 Neighborhood HFS 14
                                                             HWL 13
                                       YIT 10      words
                                       etc …                 NFL 13 Neighborhood
                                                             DFL 12 score threshold
                                                             HWV 10
                                                                      T (-f) =11
                                                             etc …

           Query   1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI 47
                     +E YA YL K     F+YLSL +SP+ +DVNVHP+K VHFL+++ I
           Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEI 333
                                               High-scoring pair (HSP)

                                               Gapped extension with trace back


          Query   1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI-LEV… 50
                    +E YA YL K     F+YLSL +SP+ +DVNVHP+K VHFL+++ I + +
          Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEIATSI… 337
                                   Final HSP
NCBI FieldGuide
                  Scoring Systems - Nucleotides

                          Identity matrix

                               A    G    C    T
                          A   +1   –3   –3   -3
                          G   –3   +1   –3   -3      [ -r 1 -q -3 ]
                          C   –3   –3   +1   -3
                          T   –3   –3   –3   +1

         CAGGTAGCAAGCTTGCATGTCA
         || |||||||||||| |||||           raw score = 19-9* = 10*
         CACGTAGCAAGCTTG-GTGTCA

                                         * ignores gap costs

NCBI FieldGuide
          Scoring Systems - Proteins (BLOSUM62)
     A    4
     R   -1    5
     N   -2    0    6
                                            J (leucine or isoleucine) and O (pyrrolysine)
     D   -2   -2    1    6
     C    0   -3   -3   -3    9
     Q   -1    1    0    0   -3    5
     E   -1    0    0    2   -4    2    5
     G    0   -2    0   -1   -3   -2   -2    6
     H   -2    0    1   -1   -3    0    0   -2    8
     I   -1   -3   -3   -3   -1   -3   -3   -4   -3    4
     L   -1   -2   -3   -4   -1   -2   -3   -4   -3    2    4
     K   -1    2    0   -1   -3    1    1   -2   -1   -3   -2    5
     M   -1   -1   -2   -3   -1    0   -2   -3   -2    1    2   -1    5
     F   -2   -3   -3   -3   -2   -3   -3   -3   -1    0    0   -3    0    6
     P   -1   -2   -2   -1   -3   -1   -1   -2   -2   -3   -3   -1   -2   -4    7
     S    1   -1    1    0   -1    0    0    0   -1   -2   -2    0   -1   -2   -1 4
     T    0   -1    0   -1   -1   -1   -1   -2   -2   -1   -1   -1   -1   -2   -1 1 5
     W   -3   -3   -4   -4   -2   -2   -3   -2   -2   -3   -2   -3   -1    1   -4 -3 -2 11
     Y   -2   -2   -2   -3   -2   -1   -2   -3    2   -1   -1   -2   -1    3   -3 -2 -2 2 7
     V    0   -3   -3   -3   -1   -2   -2   -3   -3    3    1   -2    1   -1   -2 -2 0 -3 -1 4
     X    0   -1   -1   -1   -2   -1   -1   -1   -1   -1   -1   -1   -1   -1   -2 0 0 -2 -1 -1 -1
          A    R    N    D    C    Q    E    G    H    I    L    K    M    F    P S T W Y V X

NCBI FieldGuide
                     Local Alignment Statistics
                                   Expect Value
  E = number of database hits you expect to find by chance, ≥ S

                         E = Kmne-S or E = mn2-S’

                         K = scale for search space
                          = scale for scoring system
                         S’ = bitscore = (S - lnK)/ln2
                         m = query length
                         n = database length


                  E is dependent on m x n (search space)


                  More info: The Statistics of Sequence Similarity Scores
NCBI FieldGuide
                     E is dependent on m x n (search space)


                                    Short query =
           low score                                              high Expect




         CAGGTAGCAAGCTTGCATGTCA
         || |||||||||||| |||||                   raw score = 19-9 = 10
         CACGTAGCAAGCTTG-GTGTCA

                  More info: The Statistics of Sequence Similarity Scores
NCBI FieldGuide
                  Scoring Systems - Proteins

        Position Independent Matrices
             PAM Matrices (Percent Accepted Mutation)
                  • Derived from observation; small dataset of
                    alignments
                  • Implicit model of evolution
                  • All calculated from PAM1
                  • PAM250 widely used

             BLOSUM Matrices (BLOck SUbstitution Matrices)
                  • Derived from observation; large dataset of highly
                    conserved blocks
                  • Each matrix derived separately from blocks with a
                    defined percent identity cutoff
                  • BLOSUM62 - default matrix for BLAST


NCBI FieldGuide
                  Position-Specific Score Matrix

                             Serine/Threonine protein kinases
                                       catalytic loop


          PSSM scores    1    5            7   4      4


            DAF-1




NCBI FieldGuide
                    Position-Specific Score Matrix
                         A    R    N    D    C    Q    E    G    H    I    L    K    M    F    P    S    T    W    Y    V
              435   K   -1    0    0   -1   -2    3    0    3    0   -2   -2    1   -1   -1   -1   -1   -1   -1   -1   -2
              436   E    0    1    0    2   -1    0    2   -1    0   -1   -1    0    0    0   -1    0    0   -1   -1   -1
              437   S    0    0   -1    0    1    1    0    1    1    0   -1    0    0    0    2    0   -1   -1    0   -1
              438   N   -1    0   -1   -1    1    0   -1    3    3   -1   -1    1   -1    0    0   -1   -1    1    1   -1
              439   K   -2    1    1   -1   -2    0   -1   -2   -2   -1   -2    5    1   -2   -2   -1   -1   -2   -2   -1
              440   P   -2   -2   -2   -2   -3   -2   -2   -2   -2   -1   -2   -1    0   -3    7   -1   -2   -3   -1   -1
              441   A    3   -2    1   -2    0   -1    0    1   -2   -2   -2    0   -1   -2    3    1    0   -3   -3    0
              442   M   -3   -4   -4   -4   -3   -4   -4   -5   -4    7    0   -4    1    0   -4   -4   -2   -4   -1    2
              443   A    4   -4   -4   -4    0   -4   -4   -3   -4    4   -1   -4   -2   -3   -4   -1   -2   -4   -3    4
              444   H   -4   -2   -1   -3   -5   -2   -2   -4   10   -6   -5   -3   -4   -3   -2   -3   -4   -5    0   -5
              445   R   -4    8   -3   -4    0   -1   -2   -3   -2   -5   -4    0   -3   -2   -4   -3   -3    0   -4   -5
              446   D   -4   -4   -1    8   -6   -2    0   -3   -3   -5   -6   -3   -5   -6   -4   -2   -3   -7   -5   -5
 catalytic    447
              448
                    I
                    K
                        -4
                         0
                             -5
                              0
                                  -6
                                   1
                                       -6
                                       -3
                                            -3
                                            -5
                                                 -4
                                                 -1
                                                      -5
                                                      -1
                                                           -6
                                                           -3
                                                                -5
                                                                -3
                                                                      3
                                                                     -5
                                                                           5
                                                                          -5
                                                                               -5
                                                                                7
                                                                                     1
                                                                                    -4
                                                                                          1
                                                                                         -5
                                                                                              -5
                                                                                              -3
                                                                                                   -5
                                                                                                   -1
                                                                                                        -3
                                                                                                        -2
                                                                                                             -4
                                                                                                             -5
                                                                                                                  -3
                                                                                                                  -4
                                                                                                                        1
                                                                                                                       -4
   loop       449   S    0   -3   -2   -3    0   -2   -2   -3   -3   -4   -4   -2   -4   -5    2    6    2   -5   -4   -4
              450   K    0    3    0    1   -5    0    0   -4   -1   -4   -3    4   -3   -2    2    1   -1   -5   -4   -4
              451   N   -4   -3    8   -1   -5   -2   -2   -3   -1   -6   -6   -2   -4   -5   -4   -1   -2   -6   -4   -5
              452   I   -3   -5   -5   -6    0   -5   -5   -6   -5    6    2   -5    2   -2   -5   -4   -3   -5   -3    3
              453   M   -4   -4   -6   -6   -3   -4   -5   -6   -5    0    6   -5    1    0   -5   -4   -3   -4   -3    0
              454   V   -3   -3   -5   -6   -3   -4   -5   -6   -5    3    3   -4    2   -2   -5   -4   -3   -5   -3    5
              455   K   -2    1    1    4   -5    0   -1   -2    1   -4   -2    4   -3   -2   -3    0   -1   -5   -2   -3
              456   N    1    1    3    0   -4   -1    1    0   -3   -4   -4    3   -2   -5   -2    2   -2   -5   -4   -4
              457   D   -3   -2    5    5   -1   -1    1   -1    0   -5   -4    0   -2   -5   -1    0   -2   -6   -4   -5
              458   L   -3   -1    0   -3    0   -3   -2    3   -4   -2    3    0    1    1   -2   -2   -3    5   -1   -3




NCBI FieldGuide
                    BLAST is a shortcut . . .

                  An alignment BLAST cannot make:
        1 GAATATATGAAGACCAAGATTGCAGTCCTGCTGGCCTGAACCACGCTATTCTTGCTGTTG
          || | || || || | || || ||    || | ||| |||||| | | || | ||| |
        1 GAGTGTACGATGAGCCCGAGTGTAGCAGTGAAGATCTGGACCACGGTGTACTCGTTGTCG

       61 GTTACGGAACCGAGAATGGTAAAGACTACTGGATCATTAAGAACTCCTGGGGAGCCAGTT
          | || ||     || ||| || | |||||| || | |||||| ||||| |         |
       61 GCTATGGTGTTAAGGGTGGGAAGAAGTACTGGCTCGTCAAGAACAGCTGGGCTGAATCCT

     121 GGGGTGAACAAGGTTATTTCAGGCTTGCTCGTGGTAAAAAC
         |||| || ||||| || ||     | | |||| || |||
     121 GGGGAGACCAAGGCTACATCCTTATGTCCCGTGACAACAAC


     Reason:
     no contiguous exact match of 7 bp.


NCBI FieldGuide
              An Alignment BLAST Can Make
   Solution: compare protein = 7e-77
Score = 290 bits (741), Expectsequences; BLASTX
Identities = 147/331 (44%), Positives = 206/331 (61%), Gaps = 8/331 (2%)
Frame = +3
    BLAST 2 Sequences (blastx) output:




NCBI FieldGuide
                  Other BLAST Algorithms


                   • Megablast

                   • Discontiguous Megablast

                   • PSI-BLAST

                   • PHI-BLAST

NCBI FieldGuide
      Megablast: NCBI’s Genome Annotator


     • Long alignments of similar DNA sequences

     • Greedy algorithm

     • Concatenation of query sequences

     • Faster than blastn; less sensitive


NCBI FieldGuide
                  MegaBLAST & Word Size



                   WORD SIZE   default   minimum


                     blastn      11         7


                   megablast     28         8


                     blastp      3          2




NCBI FieldGuide
                            Word Size
                  Trade-off: sensitivity vs speed


                        Too fast for
                        you?




NCBI FieldGuide
                   Discontiguous Megablast


                  • Uses discontiguous word matches

                  • Better for cross-species comparisons




NCBI FieldGuide
          Templates for Discontiguous Words
      W   =   11,   t   =   16,   coding:          1101101101101101
      W   =   11,   t   =   16,   non-coding:      1110010110110111
      W   =   12,   t   =   16,   coding:          1111101101101101
      W   =   12,   t   =   16,   non-coding:      1110110110110111
      W   =   11,   t   =   18,   coding:          101101100101101101
      W   =   11,   t   =   18,   non-coding:      111010010110010111
      W   =   12,   t   =   18,   coding:          101101101101101101
      W   =   12,   t   =   18,   non-coding:      111010110010110111
      W   =   11,   t   =   21,   coding:          100101100101100101101
      W   =   11,   t   =   21,   non-coding:      111010010100010010111
      W   =   12,   t   =   21,   coding:          100101101101100101101
      W   =   12,   t   =   21,   non-coding:      111010010110010010111
      W = word size; # matches in template
      t = template length
   Reference: Ma, B, Tromp, J, Li, M. PatternHunter: faster and more sensitive homology
   search. Bioinformatics March, 2002; 18(3):440-5
NCBI FieldGuide
NCBI FieldGuide
       Discontiguous (Cross-species) MegaBLAST




NCBI FieldGuide
                  Discontiguous Word Options




NCBI FieldGuide
                  Disco. Megablast Example . . .
           Query: NM_078651
           Drosophila melanogaster CG18582-PA (mbt) mRNA, (3244 bp)
           /note= mushroom bodies tiny; synonyms: Pak2, STE20, dPAK2




           Database: nr (nt),   Mammalia[orgn]




         MegaBLAST = “No significant similarity found.”


         Discontiguous megaBLAST = numerous hits . . .


NCBI FieldGuide
                  Ex: Discontiguous MegaBLAST




NCBI FieldGuide
                  Ex: BLASTN




NCBI FieldGuide
                       Basic Local Alignment Search Tool

           Save your
           searches               What’s
                                  New?




NCBI FieldGuide
                  Nucleotide BLAST Databases
     • nr (nt)                              • chromosome
          – Traditional GenBank Divisions      – NC genomic records
          – NM_ and XM_ RefSeqs             • gss
     • refseq_rna                              – GSS division
          – NM_ , XM_ , NR_                 • pat
     • refseq_genomic                          – PAT Division
          – NC_ , NT_ , NG_                 • wgs
     • est                                     – wgs entries from
          – EST Division                            traditional divisions
     • htgs                                 • pdb
          – HTG division                       – Nucleotide sequences
     • dbsts                                        from structures
          – STS Division                    • env_nt
                                               – environmental samples

NCBI FieldGuide
                  Protein BLAST Databases


       Protein
       • nr
           traditional GenBank records   nr = nr
       •   refseq = NP_, XP_
       •   swissprot
       •   pdb
       •   pat
       •   env_nr
NCBI FieldGuide
                  New Nucleotide Databases




NCBI FieldGuide
                             New Formatter




                  Select lower case

                                        Select red




NCBI FieldGuide
              BLAST Output: Alignments & Filter




                        low complexity sequence filtered




NCBI FieldGuide
                  New Output View




NCBI FieldGuide
                         New Output View

                                           Transcript &
             Results can be
                                           genomic hits
                sorted
                                            separated




              Pseudogene, chr 9




    Functional gene, chr 1
NCBI FieldGuide
                  Sorting Results

                              Resorted by
                              Total score




                          Functional gene now first
NCBI FieldGuide
                  Sorting Hits: by Score




                                   Sort by Score:
                                longest exon usually
                                        first




NCBI FieldGuide
                  Sorting Hits: by Query Start




                                         Sort by Query start:
                                          Proper exon order




NCBI FieldGuide
                                 Advanced Options
                                                                  Limit to Organism
                  all[filter] NOT ma
                            Example Entrez Queries
                                   all[Filter] NOT mammalia[organism]
                                   chimpanzee[organism]
                                   srcdb refseq reviewed[properties]

                               Nucleotide only:
                                     biomol mrna[properties]
                                     biomol genomic[properties]

                            OtherAdvanced
                                   –e 10000         expect value
                                   -v 2000          descriptions
                                   -b 2000          alignments
                  -e 10000 -v 2000



NCBI FieldGuide
       Example: Mapping Oligos Onto a
                 Genome


                                   ?
           >forward
           CCATGGCGACCCTGGAAAAGC
                                   ?
           >reverse
           CAGCAGCGGCTGTGCCTGCGG

                                   ?




NCBI FieldGuide
                  Map Oligos Onto Genome




                   >CCATGGCGACCCTGGAAAAGCNNNNNNNNNNCAGCAGCGGCTGTGCCTGCGG


                      forward primer                 reverse primer




                                     -W 7 –e 1000



NCBI FieldGuide
                  Genome BLAST Results




NCBI FieldGuide
                  Primer Alignments




                                      reverse primer




                                      forward primer



NCBI FieldGuide
                  MapViewer




NCBI FieldGuide
                  MapViewer




NCBI FieldGuide
                  Sequence View (sv)
                          forward →




                   ← reverse




NCBI FieldGuide
                       Service Addresses


 •BLAST                  blast-help@ncbi.nlm.nih.gov
 •General Help           info@ncbi.nlm.nih.gov
 •Wayne       Matten     matten@ncbi.nlm.nih.gov




NCBI FieldGuide

								
To top