Perl and BioPerl

Document Sample
Perl and BioPerl Powered By Docstoc
					Perl and BioPerl

      Craig A. Struble, Ph.D.
       Marquette University
Overview
   Perl
       Literals
       Variables
       Control Structures
       Miscellaneous
   BioPerl
       Sample Programs
   References

                 Perl and BioPerl - Craig A. Struble   2
Perl
   Practical Extraction and Report
    Language
       Created by Larry Wall
   Runs on just about every platform
       Most popular on Unix/Linux systems
   Excellent language for file and data
    processing

                 Perl and BioPerl - Craig A. Struble   3
  Simple Program                                       On Unix, this is the
                                                       location of the Perl
                                                           interpreter
#!/usr/local/bin/perl
# This is a comment line. This program prints “Hello World.”
# to the screen.

print “Hello world.\n”;                                     Comments start
                                                            with # and end
                                                            with the end of
                                                                the line
   Newline       Program statements
  character      are terminated with
                      semicolons




                     Perl and BioPerl - Craig A. Struble                      4
Literal Data
   Strings                             \n is printed as
    “Hello world\n”                        a newline
    „Hello world\n‟
                                          \n is printed as \n
   Numbers
    123       #   integer
    456.789   #   real or floating point
    23.45e8   #   scientific notation
    0xABC12   #   hexadecimal
    0377      #   octal

              Perl and BioPerl - Craig A. Struble               5
Variables
   Scalars
       Store a single value
       Variables start with a $
         # Declare variables (optional)
         my $a, $hello, $ou812, $hi_there;

         $hello = “Hello World\n”; # assign
         print $hello; # print value


                 Perl and BioPerl - Craig A. Struble   6
Variables
   Arrays
       Store multiple values, indexed by integers
        starting at 0
       A whole array variable starts with @
       Single elements are referred to with $ and
        [] for the index
         my @anarray;
         my @x = (1, 2, 3);
         $x[0] #contains 1

                 Perl and BioPerl - Craig A. Struble   7
Variables
   Hashes
       Stores multiple values, indexed by strings
       A whole hash variable starts with %
       Single elements are referred to with $ and {}
         my %date;
         %date = ( “day” => “Monday”,
                   “mon” => “September” );
         # print the day followed by a newline
         print $date{“day”} . “\n”;

                          concatenation

                  Perl and BioPerl - Craig A. Struble   8
Control Structures
   Perl supports the standard control structures
        Syntax is generally similar to C/C++/Java
        while, for, if, foreach
#!/usr/local/bin/perl
# Print out direction from Washington D.C.
# Usage: checkcity city

my $city = $ARGV[0];   # @ARGV holds command line arguments
if ($city eq “New York”) {
    print “New York is northeast of Washington D.C.\n”;
} elsif ($city eq “Chicago”) {
    print “Chicago is northwest of Washington D.C.\n”;
} else {
    print “I‟m not sure where $city is, sorry.\n”;
}


                       Perl and BioPerl - Craig A. Struble    9
   Control Structures
#!/usr/local/bin/perl
# Print out 0,1,2,3,4,5,6,7,8,9
# in this case, $x is local only to the loop because my is used
for (my $x = 0; $x < 10; $x++) {
    print “$x”;
    if ($x < 9) {
        print “,”;
    }
}
print “\n”;




                      Perl and BioPerl - Craig A. Struble     10
   Control Structures

#!/usr/local/bin/perl
# Demonstrate the foreach loop, which goes through elements
# in an array.

my @users = (“bonzo”, “gorgon”, “pluto”, “sting”);

foreach $user (@users) {
    print “$user is alright.\n”;
}




                      Perl and BioPerl - Craig A. Struble     11
     Functions
        Use sub to create a function.
           No named formal parameters, assign @_ to local subroutine

            variables.

#!/usr/local/bin/perl
# Subroutine for calculating the maximum
sub max {
    my $max = shift(@_);    # shift removes the first value from @_
    foreach $val (@_) {
        $max = $val if $max < $val; # Notice perl allows post ifs
    }
    return $max;
}

$high = max(1,5,6,7,8,2,4,9,3,4);
print “High value is $high\n”;

                        Perl and BioPerl - Craig A. Struble      12
      Files
          File handles are used to access files
              open and close functions

#!/usr/local/bin/perl
# Open a file and print its contents to copy.txt
my $filename = $ARGV[0];
open(MYFILE, “<$filename”); # < indicates read, > indicates write
open(OUTPUT, “>copy.txt”);
while ($line = <MYFILE>) {   # The <> operator reads a line
    print OUTPUT $line;      # no newline is needed, read from file
}
close MYFILE;                # Parenthesis are optional



                         Perl and BioPerl - Craig A. Struble     13
Regular Expressions
   One of Perl’s strengths is pattern
    matching
   Perl’s regular expression language is
    extremely powerful, but can be
    challenging to learn
   Some examples follow…


              Perl and BioPerl - Craig A. Struble   14
         Regular Expressions
#!/usr/local/bin/perl
my $filename = $ARGV[0];
open(INPUT, “<$filename”);
while (<INPUT>) {          # Note that the line read is stored in $_
    print “Found Fred.\n” if /Fred/;
    print “Found a Flintstone.\n” if m/(Fred|Wilma|Pebbles) Flintstone/;
    if (/(..):(..):(..)/) {    # match a time, dots match anything except \n
        $seconds = $3;          # parentheses store matches in $1, $2, $3, …
        print “There are $seconds seconds.\n”;
    }
}
close INPUT;


                                Perl and BioPerl - Craig A. Struble            15
   Comma Separated Value Files

#!/usr/local/bin/perl
# Some simple code demonstrating how to use split and regular
# expressions. This code extracts out values in a CSV file.
my $filename = $ARGV[0];
open(INPUT, “<$filename”);
while (<INPUT>) {
    chomp;                  # Remove terminating newline
    my @values = split /,/; # Split string in $_ where , exists
    print “The first value is: “ . $values[0] . “\n”;
}
close INPUT;




                      Perl and BioPerl - Craig A. Struble     16
Objects
   Perl supports object oriented
    programming
   Constructor name is new
   A class is really a special kind of
    package.
   Objects are created with bless


               Perl and BioPerl - Craig A. Struble   17
      Example Class Definition
package Critter;

# constructor
sub new {
    my    $objref = {};      # reference to an empty hash
    bless $objref;           # make it an object in Critter class
    return $objref;          # return the reference
}

# Instance method, first parameter is object reference
sub display {
    my $self = shift;    # just to demonstrate
    print “I‟m a critter.\n”;
}

1; # must end class with a true value
                                                                Store in Critter.pm

                          Perl and BioPerl - Craig A. Struble                         18
    Example Object Usage

#!/usr/local/bin/perl
use Critter;

my $critter = new Critter;           # create an object
$critter->display;                   # display the object
display $critter;                    # alternative notation




                        Perl and BioPerl - Craig A. Struble   19
BioPerl
   BioPerl is a collection of Perl classes
    useful for developing bioinformatics
    tools.
   http://www.bioperl.org
   Installed on the student platform




               Perl and BioPerl - Craig A. Struble   20
       Example 1
#!/usr/local/bin/perl
# Collect documents from PubMed containing the term “Breast Cancer”
# and print them.
use Bio::Biblio;

my $biblio = new Bio::Biblio;
my $collection = $biblio->find(“breast cancer”);

while ($collection->has_next) {       # there are underlines before next
  print $collection->get_next;
}




                         Perl and BioPerl - Craig A. Struble          21
     Example 2
#!/usr/local/bin/perl
# Get a sequence from RefSeq by accession number
use Bio::DB::RefSeq;


$gb = new Bio::DB::RefSeq;
$seq = $gb->get_Seq_by_acc(“NM_007304”);
print $seq->seq();




                        Perl and BioPerl - Craig A. Struble   22
     Example 3
#!/usr/local/bin/perl
# Perform various calculations on a sequence
use Bio::Seq;

my $seq = Bio::Seq->new( -seq => 'ATGGGGGTGGTGGTACCCT',
                          -id => 'human_id',
                          -accession_number => 'AL000012',
                       );

print $seq->seq() . “\n”;     # print the sequence
print $seq->revcom->seq() . “\n”;    # print the reverse complement
print $seq->translate->seq() . “\n”; # print a translation




                       Perl and BioPerl - Craig A. Struble     23
References
   Programming Perl by Wall, Christiansen,
    and Schwartz (O’Reilly)
   Learning Perl by Schwartz and Phoenix
    (O’Reilly)
   Beginning Perl for Bioinformatics by
    Tisdall (O’Reilly)
   http://www.perl.com
   http://www.bioperl.org

              Perl and BioPerl - Craig A. Struble   24