Perl What is a Script Script Example Scripts in Unix

Document Sample
Perl What is a Script Script Example Scripts in Unix Powered By Docstoc
					                                                                             What is a Script?

                                                                             !   The term “script” has origins in Unix
                                                                                 !   shell: program that navigates file system, launches other programs
                                                                                 !   shell script: sequence of shell commands saved in a file for later
                                                                             !   Scripts can launch other programs
                                                                                 !   control inputs (e.g. emulate user keystrokes)
                                                                                 !   parse outputs

                     References (both from O’Reilly):
                           Tisdall, Beginning Perl for Bioinformatics
                           Schwartz, Learning Perl

Script Example                                                               Scripts in Unix

!   What slides are in a class directory?                                    !   For a text file like slides to work as a script in Unix:
    !   Keynote slides are in files with names ending in .key                    !   the file must be marked as executable
!   Shell command:                                                                   !    % chmod +x slides
        % ls ~/classes/410/lectures/*.key                                        !   the file must be in your execution path
!   Shell script (for when your wrist is sprained): put the following in a           !    % setenv PATH ~/bin:$PATH
    file named “slides”                                                          !   the first two characters in the file are #!
        #!/bin/tcsh                                                              !   the remainder of the first line is the name of the program that will
                                                                                     interpret the script, e.g.
        ls $HOME/classes/410/lectures/*.key
                                                                                          #! /bin/tcsh
!   Then to get a list of slides:
                                                                                     !    #! /local/bin/perl
        % slides
                                                                                     !    #! /usr/bin/env perl
Script Parameters                                                                 Scripting Languages

!   To be more useful, a script can use parameters:                               !   It’s easy to think of useful extensions to this script
                 #!/bin/tcsh                                                          !   list only slides that have not been saved as PDFs
    !            ls $HOME/classes/$1/lectures/*.key                                   !   find Keynote files newer than the corresponding PDF and upload them
                                                                                          to a web site
!   Unix command line arguments are $0, $1, etc
                                                                                  !   A scripting language is the right tool for the job
!   To get a list of slides with the new script:
                                                                                      !   conditional execution
    !            % slides 410
                                                                                      !   loops
                                                                                      !   subroutines
                                                                                      !   data structures (arrays, pointers, objects, …)

Scripting Languages                                                               Why Use Scripting Languages?

!   Some popular languages for writing scripts                                    !   Why write complex operations in a scripting language instead of a
        !   Unix shell command languages (sh, csh, tcsh, ksh, ..)                     “real” language like C++ or Java?
        !   Perl                                                                  !   Pro:
        !   Python                                                                    !   extensive support for launching, communicating with existing
        !   Tcl/Tk
                                                                                      !   portable; widely available on MacOS, Windows, Linux, others
        !   Ruby
!   Perl is very widely used in bioinformatics
                                                                                  !   Con:
                                                                                      !   little or no support for large programs (type checking, modularity,
        !   BioPerl consortium has defined libraries for many common operations
                                                                                      !   cryptic syntax
             "   [nee “bio widgets”; see also BioPython, BioRuby, etc]                                            written when I was using Perl a lot --
                                                                                                                  doesn!t apply to Ruby...
Quick Intro to Perl                                                History of Perl

 !   The next few slides are a very quick introduction to Perl      !   Perl (“practical extraction and report language”) was originally
 !   Goals:                                                             developed by a Unix system administrator
     !   basic Perl literacy                                        !   More flexible, powerful than other tools available at the time (sh,
     !   cover concepts used in examples at the end of the slide        awk, grep, …)
 !   To learn more:                                                 !   Freely distributed from the beginning
     !   Learning Perl by Schwartz and Christiansen
                                                                    !   Major contributions from users, other developers
     !   Perl tutorials and docs at

CPAN                                                               Variables

 !   A big reason for Perl’s popularity: a huge library             !   Simple variables (scalars) have names beginning with a dollar sign
     !   Comprehensive Perl Archive Network (CPAN)                  !   Variables are not declared*
     !   see links at                                        $i = 3;
 !   Examples of available modules:                                       $i += 4;
                                                                          $i = $i * $j;
     !   internet connections (LWP)
                                                                    !   The last example sets $i to 0: $j is created and given a default
     !   database connectivity (DBI)
                                                                        initial value
     !   CGI support (CGI)
                                                                          * but learn about “use strict”
     !   many others
Strings                                                                       Operations on Strings

!   One of the hallmarks of a scripting language* is that all variables are   !   The dot operator concatenates strings
    strings                                                                   !   “Variable interpolation” inserts a value into a string
      $s = “Hello”;                  dot (.) is the concatenation operator          $n = length($s);
      $s .= “, world”;
                                                                                    print "The phrase has $n letters\n";
      $i = 3;                                                                 !   Strings can be compared with eq, ne, lt, etc
      $i .= " little pigs";
                                                                                    if ($s eq $t) {…}
      $s = "23 skidoo";
      $n = (substr($s,0,2) + 3) * 2;                   substr is the          !   Note that “$s lt $t” means “$s comes before $t
                                                       substring function         lexicographically”.
                                                                                    “11” < “9”! !           false
        * Not true in Ruby -- numbers must                                                     !
                                                                                    “11” lt “9”!            true
        be converted to String objects

Arrays                                                                        Array Assignment

!   Array variables have names beginning with @                               !   An array can be copied to another array
      @A = (1, 2, 3);                                                               @A = (1, 2, 3);
                                                                                    @B = @A;
!   Scalar elements of arrays are referred to with $
                                                                              !   Array assignment can implement multiple assignment:
      for ($i = 0; $i < 3; $i++) {
        $A[$i] += 1;                                                                ($x,$y) = (1,2);
!   An array name in a scalar context refers to the number of elements
    in the array:
      for ($i = 0; $i < @A; $i++) {
        $A[$i] += 1;
Input                                                                              Associative Arrays

!   To read a line from the standard input stream:                                 !   An associative array (“hash”) is an array that can be indexed by
        $s = <>;                                                                       string
!   To read from a file, open a “file handle”                                      !   Associative array names start with %
        open(IN,”worm.fa”);                                                        !   Array members are key/value pairs
        $def = <IN>;                                                                     %code = ("TTT"=>"F", "TTC"=>"F", "TTA"=>"L", ...);
!   Common idiom:                                                                  !   To access a single (scalar) element:
        open(IN,$filename)                                                               print $code{"TTA"}, "\n";
        !   ! or die “Can’t open $filename\n”;                                           $code{“ATT”} = “I”;

Boolean Expressions                                                                Regular Expressions

!   Perl is similar to C/C++ in the way it handles Boolean expressions             !   A common way to extract information from an input stream is to find
    !   0, (), and “” represent false                                                  lines that match a regular expression
    !   anything else -- non-zero integers, non-empty arrays, non-empty strings,   !   Basic RE syntax:
        etc, are interpreted as true                                                     $s =~ /pat/
!   Idiom for reading all the lines from an input stream:                              is true if the pattern occurs somewhere in $s
        while ($line = <>) {                                                       !   Example: print deflines in a FASTA file:
        }                                                                                while ($line = <>) {
                                                                                           if ($line =~ /^>/) {
                                                                                             print $line;
           Note: a blank line is not empty: it has one char (the newline)                  }
Regular Expressions (cont’d)                                             Regular Expressions (cont’d)

!   When a pattern contains parenthesized groups, Perl saves the text    !   An alternative is to collect the text values in an array:
    characters that match that part of the pattern:                              $s = "/codon_start=2,";
      $s = "/codon_start=2,";                                                    ($a,$b) = ($s =~ /(\w+)=(\d+)/);
      if ($s =~ /(\w+)=(\d+)/) {                                                 print "$a: $b\n";
        print $1, “: ”, $2, “\n";                                        !   Example: extract Genbank identifiers (GI numbers):
                                                                                 while ($line = <>) {
    prints                                                                         if (($gi) = ($line =~ /gi\|(\d+)/)) {
      codon_start: 2                                                                 print $gi, “\n”;

Regular Expressions (cont’d)                                             Script Example

!   Continuing the example: build a list of sequences indexed by their   !   This script
    GI numbers:                                                              !   sets up parameter files for an application
      while ($defline = <>) {                                                !   runs the application multiple times
        ($gi) = ($defline =~ /gi\|(\d+)/);                                   !   parses the output file to extract values
        $seqline = <>;
        chomp($seqline);                                                 !   We’ll look at Perl constructs for interacting with other processes and
        $ntseq{$gi} = $seqline;                                              the outline of this new script
!   Note: chomp($s) deletes the newline at the end of $s
Application                                                                     Control File

 !   The application controlled by this script is part of a suite of programs    !   The shell command to launch codeml is
     named PAML                                                                          !   % codeml ctl.txt
      !   Phylogenetic Analysis by Maximum Likelihood
                                                                                     where ctl.txt is the name of a “control file”
                                                                                 !   The control file specifies
 !   The program we want to run is codeml
                                                                                     !   the name of the file containing the sequences to be analyzed
      !   input: pair of sequences                                                   !   the name of the output file
           "   in general a set of n > 2 sequences                                   !   parameters
      !   output: estimates of evolutionary parameters
 !   Note: these slides were made for v3.14
      !   options/outputs may have changed since then...

Control File (cont’d)                                                           Output

 !   Part of a control file (the complete file has 25 lines):                    !   The main output values are written to a text file
     seqfile = pair01.txt                                                            !   The main output file is pair01diffs.txt
     outfile = pair01diffs.txt                                                   !   codeml also writes several additional small text files with a few
     verbose = 1             * 1: detailed output, 0: concise                        numbers each
     seqtype = 1             * 1:codons; 2:AAs; 3:codons-->AAs                       !   2ML.dS         dS (synonymous diffs per synonymous site)
     icode = 0               * 0:standard 1:mammalian mt [etc]                       !   2ML.dN         dN (nonsynonymous diffs per nonsyn site)
     fix_kappa = 1           * 1: kappa fixed 0: estimate kappa                      !   rst            miscellaneous results
     kappa = 1.118           * initial or fixed kappa
 !   The asterisk is a start-comment character
Wrapper                                                                          Specifications for mldiffs

!   A wrapper is a                                                               !   The wrapper for codeml must:
    program that                                                                     !   create a control file
    provides a new
                                                                                     !   read sequences from its input stream
    I/O interface
                                                blastall                             !   put sequences to be analyzed in a file (one pair at a time)
!   The functionality
    of the original                                                                  !   run codeml
    program is hidden                                                                !   parse the output files
    behind this                                 brfilter                             !   write the data values on its own output stream

                                                 mldiffs            codeml

Specifications (cont’d)                                                          Specifications (cont’d)

!   In this project, the same control file can be used for every input pair:              initialize
    !   each invocation of codeml uses the same parameters (kappa, etc)              !    write control file
    !   each new pair of sequences overwrites the previous pair, so we can use       !    for each pair of sequences:
        the same sequence input file name
                                                                                      ! !           write pair to sequence file
    !   get the same set of data points from the output files
                                                                                      ! !           launch codeml
                                                                                      ! !           extract values from data files
                                                                                      ! !           write data to output stream
Temporary Files                                                               Temporary Files (cont’d)

!   It is easy enough to open a file before launching codeml and to           !   Potential hazard:
    delete it later:                                                              !   what if temp exists?
             open(ML,”> temp”);                                                   !   not unlikely if info from previous run is being saved for some reason
    !        ...                                                              !   Trick: the Perl variable $$ is the script’s process ID
    !        unlink(“temp”);                                                               $tempfile = “ml$$”;
!   Use the file handle in print statements:                                          !    open(ML, “> $tmpfile”);
    !        print ML “...”;                                                          !    ...
                                                                                      !    unlink($tempfile);

Generating the Control File                                                   Generating the Control File

!   A convenient construct in Perl for generating large pieces of text is a   !   A convenient construct in Perl for generating large pieces of text is a
    “here document”                                                               “here document”

                                                                                                                            don!t forget the semicolon
    $s = <<ENDOFS;                                                                $s = <<ENDOFS;
    hello, this is a long                                                         hello, this is a long
                                               the value of $s starts on
    bit of text with                           the next line and continues        bit of text with                             string includes newlines
    \$a interpolated as $a                     to the matching end token          \$a interpolated as $a
    ENDOFS                                                                        ENDOFS
                                                                                                                         variable interpolation allowed
                                                                                                             quote special characters

                                                                                          end token only on this line
Generating the Control File (cont’d)                              Generating the Control File (cont’d)

!   Paste a control file template into the Perl script            !   Write the control file to yet another temp file:
!   Replace constants with script variables                               $ctlFileName = "ctl$$.txt";
        $control = <<ENDCTL;                                              open(CTLFILE,"> $ctlFileName");
        seqfile = $seqfile                                                print CTLFILE $control;
        outfile = $diffs                                                  close(CTLFILE);
        verbose = 1          * 1: detailed output, 0: concise
        fix_kappa = 1        * 1: kappa fixed 0: estimate kappa
        kappa = $kappa         * initial or fixed kappa

Launching Other Programs (I)                                      Launching Other Programs (II)

!   One way to run another program from a Perl script:            !   If you want to capture the output of the other program and import it
        system($s);                                                   into your script, use “backticks”:
    !   the command is executed as if it were run from a shell            !   $s = `date`;
    !   the program must be in the script user’s path                 !   the program must be in the script user’s path
    !   stdin, stdout inherited from the Perl script                  !   the value returned to Perl is whatever was printed to stdout by the
    !   the Perl script is blocked until the command terminates
                                                                      !   the Perl script’s stdin is inherited by the command
                                                                      !   the Perl script is blocked until the command terminates
!   Examples:
        system(“date > starttime”);
Launching Other Programs (III)                                             Launching Other Programs (III)

!   A third method for running another program is to open an input         !   You can also open an output file handle to a process that will read
    filehandle to another process                                              data from the Perl script
      open(FH, “blastall -d worm.fa -i worm.fa |”);                              open(FH, “| mlfilter”);
      while($line = <FH>) {                                                      while(...) {
          # process a line from a BLAST report                                       print FH $result;
      }                                                                              ...
!   The other process runs concurrently with the Perl script                     }
!   Perl’s stdin is passed to the other process                            !   The other process runs concurrently with the Perl script
                                                                           !   Perl’s stdout is passed to the other process

Launching Other Programs (III)                                             Checking for Errors

!   It is not possible to open a file handle for input and output to the   !   Always check to make sure the other process opened correctly and/
    same process                                                               or terminated properly
      open(FH, “| p |”);!
                        !              # wrong                             !   system($s) returns the exit status of the other process
      print FH “command\n”;
                                                                           !   NOTE: 0 means “success”
      $res = <FH>;
                                                                           !   A common construction:
!   There is a library module that will do this -- see Learning Perl or
    another reference                                                            if (system(“foo”) == 0) {
                                                                                 else {
                                                                                   print “foo failed\n”;
Checking for Errors (cont’d)

!   open returns false if the connection fails
           open(“blastall ... |”)
      !    ! or die “couldn’t start BLASTs\n”;

Shared By: