Bio-Perl

Document Sample
Bio-Perl Powered By Docstoc
					Bio-Perl
 Han, Mi Ryung
  Aug,14,2003
• Why Bioperl?

• A lot of problems in computational biology involve text-processing
   problems.
  (String matching and file parsing etc.)

• No standard modules in Perl for computational biology and
  genomic analysis.
• (Different people developed their own programs and their own file
  formats.)

• Bioperl is a set of well-documented and freely available Perl
  modules.
• Bioperl provides perl modules facilitates writing perl scripts for
  sequence manipulation, accessing of databases using a range of
  data formats and execution and parsing of the results of various
  molecular biology programs including Blast, clustalw, genscan
  and ESTscan.

• Each module represents one or more objects. The core objects of
  Bioperl are:
• 1) Sequence -- an object that represents a single DNA sequence;
• 2) Sequence Alignment;
• 3) BLAST -- can execute commands to generate BLAST reports
• and helps analyze them.
• 4) Alignment Factories -- implements one of the alignment
  algorithms
• 5) 3D Structure -- encapsulates coordinate data for 3D structures.
• [ Contents ]

•   <Sequences>
•   -Seq Obj. ; sequence data file을 읽을 때 생성
•               annotation 및 sequence feature 저장
•   -PrimarySeq Obj. ; Seq Obj. annotation의 기본 버전
•   -LocatableSeq Obj. ; Seq Obj.의 position정보저장,
•                        Alignment Obj.에 의해 사용
•   -LargeSeq Obj. ; 100Mb 이상의 매우 긴 서열처리에 사용
    -LiveSeq Obj. ; 시간에 따라 변화할 수 있는 서열
•                    정보에서 feature 영역에 대한 문
•                    제를 해결하기 위해 사용
•   <Alignments>
•   -SimpleAlign Obj. ; Multiple Align format을 읽거나 쓰는데 사용
•   -UnivAln Obj. ; gap 제거,concensus score에 따라 서열 제외 등의 기능
•   <Analysis>
•   -StandAloneBlast Obj. ; local Blast실행
•   -RemoteBlast Obj. ; Blast의 원격 실행
•   -Genscan Obj. ; Genscan 결과 파일 파싱
•   <Databases>
•   -Index Obj. ; local database로 사용하기 위한 index구성
•   -DB Obj. ; 외부 database접근/검색
• BioPerl API
• http://doc.bioperl.org/releases/bioperl-1.2.2/
• Tutorial
• http://www.bioperl.org/Core/Latest/bptutorial.html
• Bioperl course
• http://www.pasteur.fr/recherche/unites/sis/formation/biop
  erl/
• Bioperl open source code
• http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi
• The Bioperl release 1.2.2 is available at
  http://www.bioperl.org/DIST/bioperl-1.2.2.tar.gz
• $> tar xvfz bioperl-1.2.2.tar.gz
• $> cd bioperl-1.2.2
• $> perl Makefile.PL
• $> make
• $> make install

• Bioperl-run is a collection of modules that wrap bioinformatics
  applications to allow running them from bioperl.
• http://www.bioperl.org/DIST/bioperl-run-1.2.2.tar.gz
• <Sequences>

• Transforming sequence files (SeqIO)

• SeqIO can read a stream of sequences - located in a single or in
  multiple files - in a number of formats: Fasta, EMBL, GenBank,
  Swissprot, or raw (plain sequence), etc.

• Once the sequence data has been read in with SeqIO, it is
  available to bioperl in the form of Seq objects. Moreover, the Seq
  objects can then be written to another file (again using SeqIO) in
  any of the supported data formats making data converters simple
  to implement.

• $perldoc Bio::SeqIO
• <Change file format/ Get the sequence data information>
• use Bio::SeqIO;

• $in = Bio::SeqIO->new(-file =>’ex1.in’,-format=>’swiss’);
• $out= Bio::SeqIO->new(-file=>’>ex1.out’,-format=>’fasta’);
• while ($seq=$in->next_seq( ) ){
•        $out->write_seq($seq);
•        print “ACC: “ , $seq->accession_number, “ID: “, $seq->display_id,”\n”;
•        print “DESC: “, $seq->desc,”\n”;
•        print “SUB_Seq: “, $seq->subseq(1,50),”\n”;
•        $seq_stats = Bio::Tools::Seqstats->new(-seq=>$seq);
•        $hash = $seq_stats->count_monomers( );
•        foreach $hash_key (keys %$hash) {
•             print “$hash_key = “, $hash->{$hash_key}, “\n”;
•         }
• }
• < Input >

•   SQ SEQUENCE 924 AA; 103625 MW; 289B4B554A61870E CRC64;
•      MKFLILLFNI LCLFPVLAAD NHGVGPQGAS GVDPITFDIN SNQTGPAFLT AVEMAGVKYL
•      QVQHGSNVNI HRLVEGNVVI WENASTPLYT GAIVTNNDGP YMAYVEVLGD PNLQFFIKSG
•      DAWVTLSEHE YLAKLQEIRQ AVHIESVFSL NMAFQLENNK YEVETHAKNG ANMVTFIPRN
•      GHICKMVYHK NVRIYKATGN DTVTSVVGFF RGLRLLLINV FSIDDNGMMS NRYFQHVDDK
•      YVPISQKNYE TGIVKLKDYK HAYHPVDLDI KDIDYTMFHL ADATYHEPCF KIIPNTGFCI
•      TKLFDGDQVL YESFNPLIHC INEVHIYDRN NGSIICLHLN YSPPSYKAYL VLKDTGWEAT
•      THPLLEEKIE ELQDQRACEL DVNFISDKDL YVAALTNADL NYTMVTPRPH RDVIRVSDGS
•      EVLWYYEGLD NFLVCAWIYV SDGVASLVHL RIKDRIPANN DIYVLKGDLY WTRITKIQFT
•      QEIKRLVKKS KKKLAPITEE DSDKHDEPPE GPGASGLPPK APGDKEGSEG HKGPSKGSDS
•      SKEGKKPGSG KKPGPAREHK PSKIPTLSKK PSGPKDPKHP RDPKEPRKSK SPRTASPTRR
•      PSPKLPQLSK LPKSTSPRSP PPPTRPSSPE RPEGTKIIKT SKPPSPKPPF DPSFKEKFYD
•      DYSKAASRSK ETKTTVVLDE SFESILKETL PETPGTPFTT PRPVPPKRPR TPESPFEPPK
•      DPDSPSTSPS EFFTPPESKR TRFHETPADT PLPDVTAELF KEPDVTAETK SPDEAMKRPR
•      SPSEYEDTSP GDYPSLPMKR HRLERLRLTT TEMETDPGRM AKDASGKPVK LKRSKSFDDL
•      TTVELAPEPK ASRIVVDDEG TEADDEETHP PEERQKTEVR RRRPPKKPSK SPRPSKPKKP
•      KKPDSAYIPS ILAILVVSLI VGIL
•   //
• < Output >
•   ACC: P15711ID: 104K_THEPA
•   DESC: 104 kDa microneme-rhoptry antigen.
•   SUB_Seq:MKFLILLFNILCLFPVLAADNHGVGPQGASGVDPITFDINSNQTGPAFLT
•   N = 35
•   P = 100
•   Q = 17
•   A = 46
•   R = 46
•   S = 72
•   C=8
•   T = 61
•   D = 61
•   E = 64
•   V = 56
•   F = 33
•   W=6
•   G = 47
•   H = 26
•   Y = 32
•   I = 48
•   K = 84
•   L = 68
•   M = 14
• <Analysis>

• Running BLAST (using RemoteBlast.pm)

• Bioperl supports remote execution of blasts at NCBI by
  means of the RemoteBlast object.

• $perldoc Bio::Tools::Run::RemoteBlast
•   <blast search 실행 후 조건에 맞는 결과 검색하기>
•   use Bio:: SeqIO;
•   use Bio::Tools::Run::RemoteBlast;
•   $factory = Bio:: Tools::Run::RemoteBlast; -> new(
•                       -prog => ‘blastp’ , -data =>’nr’ , -expect =>’1e-10’);
•   $seq_in = Bio::SeqIO->new( -file=>’ex3.in’ , -format=>’fasta’);
•   while ($query = $seq_in->next_seq( ) ) {
•          $factory -> submit_blast ( $query );
•   while (@rids = $factory -> each_rid ) {
•       foreach $rid (@rids) {
•           $rc = $factory->retrieve_blast ( $rid );
•           if( !ref ( $rc ) ) { if ($rc < 0) { $factory->remove_rid ( $rid ) ; } }
•          else {$result = $rc->next_result( );
•                  $factory->remove_rid ( $rid );
•                  while ( $hit = $result->next_hit( ) ) {
•                     if ( $hit->length( ) >= 200) {
•                         print “Hit Name : “, $hit->length( ), “\n”;
•                         while ( $hsp = $hit->next_hsp( ) ) {
•                             print “ E_value : “,$hsp->evalue( );
                              print “ Frac_identical : ”, $hsp->frac_identical( ), “\n”;
•                             }}}}}}}
• < Input >
•   >gi|112670|sp|P15711|104K_THEPA 104 KD MICRONEME-RHOPTRY
    ANTIGENgi|320972|pir||A44945 104K microneme-rhoptry protein - Theileria
    parvagi|161866|gb|AAA18217.1| (M29954) microneme-rhoptry antigen [Theileria parva]
•   MKFLILLFNILCLFPVLAADNHGVGPQGASGVDPITFDINSNQTGPAFLTAVEMAGVKYL
•   QVQHGSNVNIHRLVEGNVVIWENASTPLYTGAIVTNNDGPYMAYVEVLGDPNLQFFIKSG
•   DAWVTLSEHEYLAKLQEIRQAVHIESVFSLNMAFQLENNKYEVETHAKNGANMVTFIPRN
•   GHICKMVYHKNVRIYKATGNDTVTSVVGFFRGLRLLLINVFSIDDNGMMSNRYFQHVDDK
•   YVPISQKNYETGIVKLKDYKHAYHPVDLDIKDIDYTMFHLADATYHEPCFKIIPNTGFCI
•   TKLFDGDQVLYESFNPLIHCINEVHIYDRNNGSIICLHLNYSPPSYKAYLVLKDTGWEAT
•   THPLLEEKIEELQDQRACELDVNFISDKDLYVAALTNADLNYTMVTPRPHRDVIRVSDGS
•   EVLWYYEGLDNFLVCAWIYVSDGVASLVHLRIKDRIPANNDIYVLKGDLYWTRITKIQFT
•   QEIKRLVKKSKKKLAPITEEDSDKHDEPPEGPGASGLPPKAPGDKEGSEGHKGPSKGSDS
•   SKEGKKPGSGKKPGPAREHKPSKIPTLSKKPSGPKDPKHPRDPKEPRKSKSPRTASPTRR
•   PSPKLPQLSKLPKSTSPRSPPPPTRPSSPERPEGTKIIKTSKPPSPKPPFDPSFKEKFYD
•   DYSKAASRSKETKTTVVLDESFESILKETLPETPGTPFTTPRPVPPKRPRTPESPFEPPK
•   DPDSPSTSPSEFFTPPESKRTRFHETPADTPLPDVTAELFKEPDVTAETKSPDEAMKRPR
•   SPSEYEDTSPGDYPSLPMKRHRLERLRLTTTEMETDPGRMAKDASGKPVKLKRSKSFDDL
•   TTVELAPEPKASRIVVDDEGTEADDEETHPPEERQKTEVRRRRPPKKPSKSPRPSKPKKP
•   KKPDSAYIPSILAILVVSLIVGIL
• < Output >

•   Hit Name: 924
•   E_value: 0.0Frac_identical: 0.779120879120879
•   Hit Name: 255
•   E_value: 2e-66Frac_identical: 0.568627450980392
•   Hit Name: 255
•   E_value: 5e-66Frac_identical: 0.564705882352941
•   Hit Name: 255
•   E_value: 9e-66Frac_identical: 0.568627450980392

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:9/30/2012
language:English
pages:14