Bioperl modules by ccp13003

VIEWS: 13 PAGES: 39

									Bioperl modules
 Object Oriented Programming in Perl (1)

• Defining a class
   – A class is simply a package with subroutines that function
     as methods.


                 #!/usr/local/bin/perl
                 package Cat;
                 sub new {
                 …
                 }
                 sub meow {
                 …
                 }
Object Oriented Programming in Perl (2)


    Perl Object
         To initiates an object from a class, call the class “new” method.



                   $new_object = new ClassName;


 Using Method
 
      To use the methods of an object, use the “->” operator.

                   $cat->meow();
 Object Oriented Programming in Perl (3)

• Inheritance
   – Declare a class array called @ISA.
      • This array store the name and parent class(es) of the new species.



               package NorthAmericanCat;
               @NorthAmericanCat::ISA = (“Cat”);
               sub new {
                 …
               }
                 Perl Modules



A Perl module is a reusable package defined in a
library file whose name is the same as the name of the
package.
                 Names of perl modules

• Each Perl module has a unique name.
• To minimize name space collision, Perl provides a
  hierarchical name space for modules.
   – Components of a module name are separated by double
     colons (::).
   – For example,
      •   Math::Complex
      •   Math::Approx
      •   String::BitCount
      •   String::Approx
                       Module files

• Each module is contained in a single file.
• Module files are stored in a subdirectory hierarchy that
  parallels the module name hierarchy.
• All module files have an extension of .pm.


             Module             Is stored in
             Config             Config.pm
             Math::Complex      Math/Complex.pm
             String::Approx     String/Approx.pm
                     Module libraries

• The Perl interpreter has a list of directories in which it searhces
  for modules.
• Global arry @INC


            >perl –V
            @INC:
            /usr/local/lib/perl5/5.00503/sun4-solaris
            /usr/local/lib/perl5/5.00503
            /usr/local/lib/perl5/site-perl/5.005/sun4-solaris
            /usr/local/lib/perl5/site-perl/5.005
               Using Modules

• A module can be loaded by calling the use
  function.
     use Foo;
     bar( “a” ); # using bar method
     blat( “b” ); # using blat method
                    Bioperl toolkit
• Core package (bioperl-live)
   – THE basic package and it’s required by all the other packages
• Run package (bioperl-run)
   – Providing wrappers for executing some 60 common
     bioinformatics applications
• DB package (bioperl-db)
   – Subproject to store sequence and annotation data in a BioSQL
     relational database
• Network package (bioperl-network)
   – Parses and analyzes protein-protein interaction data
• Dev package (bioperl-dev)
   – New and exploratory bioperl development
      Bioperl Object-Oriented

• The Bioperl takes advantages of the OO
  design to create a consistent, well
  documented, object model for interacting
  with biological data in the life sciences.
• Bioperl Name space
    The Bioperl package installs everything in the
    Bio:: namespace.
    (where are the packages stored???)
              Bioperl Objects
• Sequence handling objects
  – Sequence objects
  – Alignment objects
  – Location objects
• Other Objects:
    3D structure objects, tree objects and phylogenetic
    trees, map objects, bibliographic objects and
    graphics objects
            Sequence handling
• Typical sequence handling tasks:
  – Access the sequence
  – Format the sequence
  – Sequence alignment and comparison
     • Search for similar sequences
     • Pairwise comparisons
     • Multiple alignment
         Sequence Annotation
• Bio::SeqFeature Sequence object can have
  multiple sequence feature (SeqFeature) objects
  (e.g. Gene, Exon, or Promoter objects)
  associated with it.
• Bio::Annotation A Seq object can also have an
  Annotation object (used to store database
  links, literature references and comments)
  associated with it
      Sequence Input/Output
The Bio::SeqIO system was designed to make
getting and storing sequences to and from the
myriad of formats as easy as possible.
     Accessing sequence data
– Bioperl supports accessing remote databases as
  well as local databases.
– Bioperl currently supports sequence data retrieval
  from the GenBank, Genpept, RefSeq, SwissProt,
  and EMBL databases
          Format the sequences

• SeqIO object can read a stream of sequences in one
  format: Fasta, EMBL, GenBank, Swissprot, PIR,
  GCG, SCF, phd/phred, Ace, or raw (plain sequence),
  then write to another file in another format
      Manipulating sequence data

$seqobj->display_id()   # the human readable id of the sequence
$seqobj->subseq(5,10)   # part of the sequence as a string
$seqobj->desc()         # a description of the sequence
$seqobj->trunc(5,10)    # truncation from 5 to 10 as new object
$seqobj->revcom         # reverse complements sequence
$seqobj->translate      # translation of the sequence
…
       Search result parsing
The Bio::SearchIO system was designed for
parsing sequence database searches (BLAST,
sim4, waba, FASTA, HMMER, exonerate,
etc.)
      Manipulating alignment
The Bio::AlignIO system was designed for
manipulating the alignment objects in different
formats including aln, phylip, fasta, etc.
Example: Format the sequences


Example:
using “seq_formating.pl” to convert “sequences.gb”
to another format
                                Copy the files to the current directory




             Check whether the files are executable




Now, let’s look at the genbank file.
The home directory in Windows system.




                  If you have Notepad++ installed, click
                  “Edit with Notepad++”.

                  If not, try to open “sequence.gb” with
                  Notepad program.
uncheck
The format of the input sequences.
The perl script file
If no arguments were supplied, a usage
information will appear for instructions.
                                   <enter>



Program     Format of            Format of
name        the input            the output
            sequences            sequences

   Input file
                   Output file
Program suceeded!

Now it’s time to look at the file generated.
Use ‘command prompt’ to run the script
Type:
cd<space>c:\BioDownload

To enter the BioDownload folder
Type:
dir

To display the files in the current folder (NOT ls)

You should have the following files in the folder
(you may have other files, but that’s fine):
(1) seq_formating.pl
(2) sequences.gb.txt
Type:
perl<space>seq_formating.pl<space>sequences.gb.txt<space>genbank<space>sequences.fasta<space
>fasta
Output file
The format of the output sequences.
What’s next:


      Parsing the BLAST output

								
To top