Perl Programming for Biologists - Part 1 by yrs83496

VIEWS: 6 PAGES: 30

									Perl Programming for
Biologists
A bold experiment into the unknown…


PART 1: Tue Aug 21st 2007
update 8/22/2007


                   Yannick Pouliot, PhD
                   Bioresearch Informationist
                   Lane Medical Library & Knowledge Management Center




                   Lane Medical Library & Knowledge Management Center
                   http://lane.stanford.edu
Class Requirements
   You must
       be registered for this workshop
       have a PC (sort of)
       have a power supply
       have wireless access
       have the admin password to your machine


   Please put your cell phone/pager on vibrate
       No cell calls in class, please 

    Lane Medical Library &
    Knowledge Management Center
                                                  2
    http://lane.stanford.edu
To Dos
   Close all programs other than IE on your laptop
   Log into virtual room
   YP: log into Safari




      Lane Medical Library &
      Knowledge Management Center
                                                      3
      http://lane.stanford.edu
To Do - 2

   Please download all class materials from
http://lane.stanford.edu/howto/index.html?id=_2593




    Lane Medical Library &
    Knowledge Management Center
                                                     4
    http://lane.stanford.edu
Class Focus
   Creating, writing and reading Excel files
   Reformatting data files for input to an
    analysis program
   Writing and reading from a database such as
    MS Access or other locally installed relational
    database, as well as from databases
    available on the Internet.

And remember: Ask LOTS OF QUESTIONS

    Lane Medical Library &
    Knowledge Management Center
                                                      5
    http://lane.stanford.edu
Cautions

   All examples pertain to MS Office 2003
        Unclear what is to be expected for MS Office 2007
   All contents pertain to Perl 5.x, not 6.x
        V.5 and 6 are NOT compatible
        V.5 is far far more common, so not much of an
         issue




    Lane Medical Library &
    Knowledge Management Center
                                                         6
    http://lane.stanford.edu
So Why Perl?
   Perl = Practical Extraction and Reporting Language
   Free
   Very widely used
       Especially in biological community
   Very flexible and portable
   Not the only language of this type
       E.g., Python
   Not the absolute easiest
       … but pretty easy
   Not suited for everything
       E.g., for ultra-fast mathematically-oriented code, C is still
        best


    Lane Medical Library &
    Knowledge Management Center
                                                                        7
    http://lane.stanford.edu
Today’s session:

- Installing and understanding what is
required to run Perl

- Understanding the basics of a Perl
program

 Lane Medical Library &
 Knowledge Management Center
                                         8
 http://lane.stanford.edu
                    Part 1: Installation




Lane Medical Library &
Knowledge Management Center
                                           9
http://lane.stanford.edu
Components to Install & Configure
1.    Perl itself
           More accurately, the Perl interpreter
           We’ll use ActiveState Perl 5.8x (ActivePerl)
                 www.activestate.com/store/freedownload.aspx?prdGuid=81fbce82-6bd5-49bc-a915-
                  08d58c2648ca
2.    Additional Perl modules
           Module = extra functions not part of the interpreter
           Described at Comprehensive Perl Archive Network (CPAN)
3.    Open Perl IDE
           IDE = integrated development environment:
                 Editor  to write/edit your program
                 Debugger  to find bugs
                 A compiler/interpreter  to run your program from within the IDE
           sourceforge.net/project/showfiles.php?group_id=23334&release_id=91440
4.    Configuring the ODBC manager (next week)
           Part of Windows
           Allows different programs to interact with databases on your machine or
            anywhere on the Web via single “doorway”


     Lane Medical Library &
     Knowledge Management Center
                                                                                                 10
     http://lane.stanford.edu
What is an Interpreter?

   = A program that translates an instruction into
    the computer’s language and executes it
    before proceeding to the next instruction
        = compiled and executed once instruction at a
         time
   Perl is usually used in interpreted mode
        Can also be compiled once (= faster)



    Lane Medical Library &
    Knowledge Management Center
                                                         11
    http://lane.stanford.edu
Installing Perl from ActiveState

1.       Go to
         www.activestate.com/store/freedownload.aspx?p
         rdGuid=81fbce82-6bd5-49bc-a915-
         08d58c2648ca
2.       Select Windows MSI package for Perl 5.8x
3.       Run the installer
            Install under c:\Perl



     Lane Medical Library &
     Knowledge Management Center
                                                     12
     http://lane.stanford.edu
Installing Additional Perl Modules
The fountain of all things Perl: CPAN
         = Comprehensive Perl Archive Network
         http://www.cpan.org/

   What does a module look like?

   Why modules?

   PPM for downloading & installing modules

   What modules are in MY Perl?




    Lane Medical Library &
    Knowledge Management Center
                                                 13
    http://lane.stanford.edu
Perl                             When to install   Name                      Function


Modules                             8/21/07
                                    8/21/07
                                                   File::Copy
                                                   File::Find
                                                                             manipulating files
                                                                             manipulating files


We’ll Be                            8/21/07
                                    8/21/07
                                                   File::Path
                                                   IO::File
                                                                             manipulating files
                                                                             accessing the insides of files
                                    8/21/07        Spreadsheet::WriteExcel   writing into an MS Excel spreadsheet
Using                               8/21/07        Spreadsheet::ParseExcel   parsing an MS Excel spreadsheet
                                    8/21/07        Spreadsheet::BasicRead    reading the contents of an MS Excel spreadsheet
                                    8/21/07        Win32::OLE                provides easy access to Windows (e.g., launching Excel)
                                    you do it      DBI                       provides access to relational databases
                                    you do it      DBD::ODBC                 provides access to relational databases




                                                   URI                       accessing URLs
                                    you do it      LWP::Simple               interacting with a Web site via http
                                    you do it      Array::Unique             returns unique elements of an array
                                    you do it      List::Unique              returns unique elements of a list
                                    you do it      Data :: Dumper            dumping data out of a data structure
                                    you do it      Switch                    switch function ("multiple if-else-then")



   Lane Medical Library &
   Knowledge Management Center
                                                                                                                               14
   http://lane.stanford.edu
Why an IDE?
      IDE = integrated development environment:
                Editor  to write/edit your program
                Debugger  to find bugs
                A “runner” (compiler/interpreter)  to run your program from
                 within the IDE
      IDEs provide facilities to facilitate writing &
       debugging
                E.g., automatic code highlighting
     We’ll use Open Perl IDE
      Free, open source, portable
                sourceforge.net/project/showfiles.php?group_id=23334&relea
                 se_id=91440

       IDE: Definition, description


    Lane Medical Library &
    Knowledge Management Center
                                                                                15
    http://lane.stanford.edu
Installing Open Perl IDE
1.  Go to
    sourceforge.net/project/showfiles.php?group_id=
    23334&release_id=91440
    and download the code
2. Create folder Program Files/OpenPerlIDE
3. Unzip into Program Files/OpenPerlIDE
4. Update Path (under System Properties,
    Advanced, Environment Variables, System
    Variables)
    → this makes it possible to run Open Perl IDE
    from anywhere on your machine…


     Lane Medical Library &
     Knowledge Management Center
                                                  16
     http://lane.stanford.edu
                        BREAK


Lane Medical Library &
Knowledge Management Center
                                17
http://lane.stanford.edu
Part 2: What does it all do?




 Lane Medical Library &
 Knowledge Management Center
                               18
 http://lane.stanford.edu
Example Short Program

1.      Start Open Perl IDE
2.      Load Simple1.pl
3.      Run Simple1.pl




     Lane Medical Library &
     Knowledge Management Center
                                   19
     http://lane.stanford.edu
Learning by Looking

   Simple2.pl




    Lane Medical Library &
    Knowledge Management Center
                                  20
    http://lane.stanford.edu
Exploring Perl’s Major Language
Elements
   http://en.wikipedia.org/wiki/Perl#Data_types




    Lane Medical Library &
    Knowledge Management Center
                                                   21
    http://lane.stanford.edu
Going Further: Programming Tips
   Plan your program
        Write down how you intend to process the data in more-or-less plain
         language
               Goal: making sure that it really does make sense
        Hacking doesn’t really pay…

   Have documentation handy
        eBooks
        ActivePerl documentation (searchable)
        Perl language reference
    → eBooks: help served on a silver platter
        Lane FAQ

   When you’re stuck: Search the Web
        Google can answer almost any programming question
               … though quality documentation is still best




    Lane Medical Library &
    Knowledge Management Center
                                                                               22
    http://lane.stanford.edu
Excel3.pl: Introducing Object
Programming
   Purpose: From an Excel worksheet that lists public
    identifiers for DNA sequences associated with
    genes, the program retrieves:
        UniGene cluster ID
        Gene symbol
        NCBI Gene ID
        … and writes the result into another Excel worksheet
   Mix of procedural and object programming
   Relevant links:
        http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene&orig_
         db=unigene
        Entrez Utilities

    Lane Medical Library &
    Knowledge Management Center
                                                                 23
    http://lane.stanford.edu
                                                          Gene symbols &
                                                           descriptions




                                                         Sequence identifier




                                                              Search
                                                            UniGene for
                                                             cluster ID                    UniGene
                                                                                ESearch


                                                             Result ID




                                                         Retrieve UniGene
                                  Excel report
                                                 write   description for that
                                                               cluster                     UniGene
                                                                                ESummary
What Excel3.pl Does
                                                             Cluster ID




                                                            Search Gene
                                                             with Gene                      Gene
                                                                                 ESearch


                                                              Result ID



                                                              Retrieve
                                  Excel report                   Gene
                                                 write
                                                             description                    Gene
                                                            for that gene       ESummary




    Lane Medical Library &
    Knowledge Management Center
                                                                                                     24
    http://lane.stanford.edu
Toying with Excel3.pl




 Lane Medical Library &
 Knowledge Management Center
                               25
 http://lane.stanford.edu
Some Key Books/Resources

   Perl Programming for Biologists
   Perl Cookbook
   Perl Quick Reference Guide
   My favorite: Perl Quick Reference




    Lane Medical Library &
    Knowledge Management Center
                                        26
    http://lane.stanford.edu
Assignments

   Install reminder of Perl modules from list
   Look at code for Example3.pl
        Modify it, break it
        Write down at least one question  so we can talk
         about it next week




    Lane Medical Library &
    Knowledge Management Center
                                                         27
    http://lane.stanford.edu
Lane Medical Library &
Knowledge Management Center
                              28
http://lane.stanford.edu
eBooks Rule




 Lane Medical Library &
 Knowledge Management Center
                               29
 http://lane.stanford.edu
What Does A Module Look Like?




 Lane Medical Library &
 Knowledge Management Center
                                30
 http://lane.stanford.edu

								
To top