Lecture 11 - Edwards Lab

Document Sample
Lecture 11 - Edwards Lab Powered By Docstoc
					             Using Local Tools:
                        BLAST
                                        BCHB524
                                             2008
                                        Lecture 11



11/12/2008          BCHB524 - 2008 - Edwards
Outline
    Install and run blast from NCBI
            Download
            Format sequence databases
            Run by hand
    Running blast and interpreting results
            Directly and using BioPython


    Exercises

    Lecture 9 exercises
11/12/2008                        BCHB524 - 2008 - Edwards   2
Local Tools
    Sometimes web-based services don't do it.
    For blast:
            Too many query sequences
            Need to search a novel sequence database
            Need to change rarely used parameters
            Web-service is too slow
    For other tools:
            No web-service?
            No interactive web-site?
            Insufficient back-end computational resources?
11/12/2008                     BCHB524 - 2008 - Edwards       3
Download standalone blast
    In Windows, make a folder "BLAST" in your "My
     Documents" folder
    Google "NCBI Blast"
            …or go to http://www.ncbi.nlm.nih.gov/BLAST
    Click on "Help" tab
    Under "Other BLAST Information",
            Click on "Download BLAST Software and Databases"
    From the table under "Executables", find the
     download link at row "win32-ia32" and column
     "blast"
            Right-click on the download link and Save As…
            Put the file in your new "BLAST" folder
    In Windows, double-click on the downloaded file.
11/12/2008                      BCHB524 - 2008 - Edwards        4
Download standalone blast
    Folders:                                   Create folder:
            bin, data, doc                             db




11/12/2008                    BCHB524 - 2008 - Edwards            5
Download standalone blast
    Look in doc:                                Download gunzip.py
            Double-click to                      from course homepage
             open web-page                        into db
             documentation




11/12/2008                     BCHB524 - 2008 - Edwards                  6
Download BLAST databases
    Follow the link (above Executables) for the
     NCBI BLAST database FTP site:
            ftp://ftp.ncbi.nlm.nih.gov/blast/db/
    The .tar.gz files contain databases already
     formatted for BLAST
    The FASTA directory contains compressed
     (.gz) FASTA format sequence databases.
            We'll download yeast.aa.gz and yeast.nt.gz to the
             db folder
11/12/2008                      BCHB524 - 2008 - Edwards         7
Download BLAST databases




11/12/2008   BCHB524 - 2008 - Edwards   8
Uncompress FASTA databases
    Select "Run…" from the "Start" menu
    In the "Open" dialog box, type "cmd" and
     click OK




11/12/2008           BCHB524 - 2008 - Edwards   9
Uncompress FASTA databases
    cd My Documents
    cd BLAST
    cd db
    dir




11/12/2008         BCHB524 - 2008 - Edwards   10
Uncompress FASTA databases
    gunzip.py yeast.*.gz
    dir




11/12/2008            BCHB524 - 2008 - Edwards   11
Format FASTA databases
    cd ..
    bin\formatdb.exe -i db\yeast.aa -p T -o T
    bin\formatdb.exe -i db\yeast.nt -p F -o T
    dir db




11/12/2008            BCHB524 - 2008 - Edwards   12
Download formatdb databases
    The .tar.gz files contain databases already
     formatted for BLAST
    Download to BLAST\db and use the
     gunzip.py program to uncompress and
     unpack
            For example, download
                refseq_protein.00.tar.gz and refseq_protein.01.tar.gz
            Uncompress and unpack
                gunzip.py refseq_protein.*.tar.gz
11/12/2008                        BCHB524 - 2008 - Edwards               13
Running BLAST from the
command-line
    We need a query sequence to search:
             >gi|6319267|ref|NP_009350.1| Yal049cp
             MASNQPGKCCFEGVCHDGTPKGRREEIFGLDTYAAGSTSPKEKVIVILTDVYGNKFNNVLLTADKFASAGYMVFVPDILF
             GDAISSDKPIDRDAWFQRHSPEVTKKIVDGFMKLLKLEYDPKFIGVVGYCFGAKFAVQHISGDGGLANAAAIAHPSFVSI
             EEIEAIDSKKPILISAAEEDHIFPANLRHLTEEKLKDNHATYQLDLFSGVAHGFAARGDISIPAVKYAKEKVLLDQIYWF
             NHFSNV
             >gi|6319268|ref|NP_009351.1| Yal048cp
             MTKETIRVVICGDEGVGKSSLIVSLTKAEFIPTIQDVLPPISIPRDFSSSPTYSPKNTVLIDTSDSDLIALDHELKSADV
             IWLVYCDHESYDHVSLFWLPHFRSLGLNIPVILCKNKCDSISNVNANAMVVSENSDDDIDTKVEDEEFIPILMEFKEIDT
             CIKTSAKTQFDLNQAFYLCQRAITHPISPLFDAMVGELKPLAVMALKRIFLLSDLNQDSYLDDNEILGLQKKCFNKSIDV
             NELNFIKDLLLDISKHDQEYINRKLYVPGKGITKDGFLVLNKIYAERGRHETTWAILRTFHYTDSLCINDKILHPRLVVP
             DTSSVELSPKGYRFLVDIFLKFDIDNDGGLNNQELHRLFKCTPGLPKLWTSTNFPFSTVVNNKGCITLQGWLAQWSMTTF
             LNYSTTTAYLVYFGFQEDARLALQVTKPRKMRRRSGKLYRSNINDRKVFNCFVIGKPCCGKSSLLEAFLGRSFSEEYSPT
             IKPRIAVNSLELKGGKQYYLILQELGEQEYAILENKDKLKECDVICLTYDSSDPESFSYLVSLLDKFTHLQDLPLVFVAS
             KADLDKQQQRCQIQPDELADELFVNHPLHISSRWLSSLNELFIKITEAALDPGKNTPGLPEETAAKDVDYRQTALIFGST
             VGFVALCSFTLMKLFKSSKFSK



    Copy and paste this FASTA file into notepad
     and save as "query.fasta" in the BLAST
     folder
11/12/2008                                    BCHB524 - 2008 - Edwards                          14
Running BLAST from the
command-line
    Run the BLAST command:




    …and check out the result in query.txt.
11/12/2008            BCHB524 - 2008 - Edwards   15
Interpreting blast results
    Parsing text-format BLAST results is hard:
            Use XML format output where possible (-m 7)
    Use BioPython's BLAST parser
    from Bio.Blast import NCBIXML

    result_handle = open("query.xml")
    for blast_result in NCBIXML.parse(result_handle):
        for alignment in blast_result.alignments:
            for hsp in alignment.hsps:
                if hsp.expect < 1e-5:
                    print '****Alignment****'
                    print 'sequence:', alignment.title
                    print 'length:', alignment.length
                    print 'e value:', hsp.expect



11/12/2008                      BCHB524 - 2008 - Edwards   16
Running BLAST from Python
    Python can run other programs, including
     blast and capture the output
    import os

    command = r'bin\blastall.exe -p blastp -i query.fasta -d db\yeast.aa'

    result_handle = os.popen(command)

    for l in result_handle:
        if l.startswith('Query='):
             print '\n'+l.rstrip()+'\n'
        if l.startswith('ref|'):
             print l.rstrip()




11/12/2008                      BCHB524 - 2008 - Edwards                    17
Running BLAST from BioPython
    Will automatically format results as XML
    from Bio.Blast import NCBIStandalone

    blast_db    = r'db\yeast.aa'
    blast_query = r'query.fasta'
    blast_exe   = r'bin\blastall.exe'

    result_handle, error_handle = NCBIStandalone.blastall(blast_exe, "blastp",
                                                          blast_db, blast_query)




11/12/2008                      BCHB524 - 2008 - Edwards                    18
NCBI Blast Parsing
    Results need to be parsed in order to be useful…
    from Bio.Blast import NCBIXML

    for blast_result in NCBIXML.parse(result_handle):
        for alignment in blast_result.alignments:
            for hsp in alignment.hsps:
                if hsp.expect < 1e-5:
                    print '****Alignment****'
                    print 'sequence:', alignment.title
                    print 'length:', alignment.length
                    print 'e value:', hsp.expect
                    print hsp.query[0:75] + '...'
                    print hsp.match[0:75] + '...'
                    print hsp.sbjct[0:75] + '...'




11/12/2008                      BCHB524 - 2008 - Edwards   19
NCBI Blast Parsing
    Each blast result contains
     multiple alignments of a
     query sequence to a
     database sequence
    Each alignment consists of
     multiple high-scoring pairs
     (HSPs)
    Each HSP has stats like
     expect, score, gaps, and
     aligned sequence chunks



11/12/2008                  BCHB524 - 2008 - Edwards   20
NCBI Blast Parsing
    Blast parsing skeleton
    from Bio.Blast import NCBIXML

    for blast_result in NCBIXML.parse(result_handle):
        # each blast_result corresponds to one query sequence
        # blast_result.query is query description, etc.
        # blast_result.descriptions contains one-line summary of alignments
        for alignment in blast_result.alignments:
             # each alignment corresponds to one database sequence
             # alignment.title is database description
            for hsp in alignment.hsps:
                 # each query/database alignment consists of multiple
                 # high-scoring pair alignment "chunks"
                 # HSP statistics are here
                 # hsp.expect, hsp.score, hsp.positives, hsp.gaps




11/12/2008                      BCHB524 - 2008 - Edwards                      21
Lab exercises
    Try each of the examples shown in these
     slides.

    Read through NCBI's documentation for the
     standalone tools.

    Experiment with the different BLAST tools
     (blastn, tblastx, etc…) and programs included
     (blastclust,megablast).
11/12/2008            BCHB524 - 2008 - Edwards   22
Lab exercises
    Find putative fruit fly / yeast orthologs
            Download FASTA file drosph.aa.gz from NCBI
            Download FASTA file yeast.aa.gz from NCBI
            Uncompress and format each FASTA file for BLAST
            Search fruit fly proteins against yeast proteins
            For each fruit fly query, output the best yeast protein with a
             significant HSP
            For each yeast query, output the best fruit fly protein with a
             significant HSP
            Find fruit fly / yeast protein pairs which are mutual best
             hits.
11/12/2008                        BCHB524 - 2008 - Edwards                23

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:5/8/2013
language:Unknown
pages:23
gegouzhen12 gegouzhen12
About