Introduction to PERL

Reviews
Introduction to PERL Instructor: Jon Frederick, M.S. Division of Information Infrastructure, UNIX/NT Systems Group smiile@utk.edu, http://web.utk.edu/~smiile What is PERL? “Practical Extraction and Report Language Invented by Larry Wall in 1986 Originally for UNIX system administration Based on C, sed, awk, and “English” Why is PERL Popular? Easy to use  Default behavior: e.g. „print “hello world”;‟ Free (GNU public license) Available for every O.S.: programs transport seamlessly Modern hardware makes it run fast “No built-in limitations” Why is PERL popular? Well-documented and supported man perl  O‟Reilly books, esp. The Perl CD-ROM  comp.lang.perl  the open-source code movement  www.cpan.org, thousands of free scripts and modules #!/usr/perl-5.6/bin/perl5.6.0 print "which file would you like to search?\n"; $file = ; chomp $file; open (FH, "<$file"); print "what pattern do you want to find?\n"; $pattern = ; chomp $pattern; while ($line=) { chomp $line; push(@lines,$line); } foreach $line2 (@lines) { if ($line2 =~ /$pattern/) { print "$line2\n"; } } Running a Perl Program In UNIX, first set file permissions: Chmod u+x filename.pl % filename.pl or % perl filename.pl In Windows/DOS, C:\> filename.pl or C:\>perl filename.pl or click on the file from Windows Explorer Perl Programs First line states path to the perl interpreter. At UTK UNIX: PATH VERSION #!/usr/misc/bin/perl 4.0 #!/soft/script/bin/perl 5.003 #!/usr/perl-5.6/bin/perl5.6.0 5.6.0 Unsure? Type, “which perl” and/or “perl -v” Perl Programs # comments begin with a pound # sign # and go until end-of-line Commands are terminated by a semicolon and can go for multiple lines; Blocks of commands are surrounded by { curly braces } The standard file suffix for a perl program is “.pl” White space is “polite” but optional Scalar Variables Contain a single element, or string of characters or numbers (can be any length) Begin with a dollar sign “$” Names are case-sensitive ($var ne $VAR) Values are assigned with an equals sign “=“ Variables do not need to be pre-declared. (automatically null or zero until you assign values). Single quotes mean literal; double mean interpolate: Example: $name=“$userid\n” # gives the userid and a line return; ‟$usrid\n‟ is the same as “\$userid\\n” Scalar Variables Concatenation is achieved with a dot “.” $var1 = “Hello”; $var2 = “$var1” . “world!\n”; Print $var2; # prints “Hello world!” The default variable, “$_” print $var foreach $var (@list); # works Print $_ foreach $_ (@list); # also works print foreach (@list); # also works Array Variables Arrays are an ordered list of strings Names begin with the at sign “@” Individual elements of an array are specified by their index number. For an array named @array, the first element could be referred to by $array[0]; The last element is $array[-1] or $array[99] (if the array has 100 elements). Array Variables In a scalar context @array is the number of elements in the array; $y = @array; When quoted “@array” returns all of the elements in the array separated by a space. $#array is the index number of the last element in the array, i.e., $#array=(@array-1) The default array is @_ (not often used). Array Input Like scalar assignment: @foods = (“pizza”, “salad”, “beer”); or @foods = qw(pizza salad beer); Push adds element to the end of the list: while($x=) { push (@lines,$x); } Unshift adds element to the beginning: while($x=) { unshift (@lines, $x); } Array Input Individual Array Elements can be assigned like scalar variables: $foods[0] = “pizza”; $foods[1] = “salad”; Can be read from the standard input: @lines = ; # or push (@lines,$_) while (<>); The split command divides up the elements of a scalar into an array based on a delimiter: @line = split / /, $lines[0]; Array Input: Split Syntax: split /delimiter/, string ; Example: $line=“Time flies like an arrow\; fruit flies like a banana.”; @time = split /\s+/, $line; print “$time[5] $time[6]\n”; # prints “fruit flies” A neat trick, grab only elements 5 and 6: ($word5,$word6) = (split, $line)[5,6]; Array Output By specifying the index: While ($x < @lines) { print “$lines[$x]\n”; $x++; } By using foreach: foreach (@lines) { print “$_\n”; } Array Output By using join: $file = join „:„, @foods; # $file is now “pizza:salad:beer” By using pop and shift: $first = shift @foods; $last = pop @foods; Order can be sorted or reversed: @sorted = sort @foods; # “beer pizza salad” @reversed = reverse @sorted; # “salad pizza beer” File Input/Output Prompt the user for input Print “which file would you like to read?\n” $filename = ; chomp $filename; # get rid of that pesky newline. Use the default @ARGV array  @ARGV is list of arguments supplied by the user from the command line. $pattern = $ARGV[0] $filename = $ARGV[1]; for our script that executes: % program.pl pattern filename Input/Output Since @ARGV is a default variable in PERL, you can open files explicitly: open(FH,”<$ARGV[1]”); while($var=) { print “$var”; } close FH; Or, let Perl assume you know what you‟re doing: while(<>) { print; } # opens the default file, $ARGV[0], assigns it to the # default filehandle, assigns each line of $ARGV[0] # to the default variable, and prints. Input/Output The default or standard output is the terminal screen: while(<>) { print “$_” if ($_ =~ /$pattern/); } Which can be redirected from the cmd line: % program.pl pattern filename > newfile Input/Output Or, you can explicitly open an output filehandle: open(OUT,”>output.txt”); while($_=<>) { print OUT if (/$pattern/); } Exercise: Suppose you have the following SAS output for 100 variables. Write a PERL program that extracts and prints just the variable name and the p-value of each signed rank test, one variable name and p-value per line. The SAS System 23:11 Sunday, February 4, 2001 1 The UNIVARIATE Procedure Variable: C3C4CAL1ALPHA Test -Statistic-----p Value-----Pr > |t| 0.9246 Student's t t 0.095791 Sign M S 1.5 Pr >= |M| 0.6776 Pr >= |S| 0.8715 Signed Rank 5.5 Control Structures If (some_statement) { do something; do another something; } elsif (other_statement) { do something else; } else { do this only if both statements false; } NOTE: unless (!statement) eq if (statement) Control Structures While (some_statement) { do something; # until statement becomes false. } Equivalent to Until (!some_statement) { do something; } The Nature of Truth* „0 and “” are false; everything else is true.‟ 0 # converts to "0", so false 1-1 # computes to 0, then converts to "0", so false 1 # converts to "1", so true ““ # empty string, so false “1” # not "" or "0", so true “00” # not "" or "0", so true (this one is weird, watch out) "0.000" # also true for the same reason and warning Undef # evaluates to "", so false * Schwartz, Christiansen and Wall, 1997. Learning Perl. Boolean Operators && || ! And Or Not While ($_=<> && ($x!=12)) { print if (/Signed/ || /Student/); $x++; } # print lines containing Signed or Student from # the first twelve lines of the standard input. Comparison Operators Numeric Equal Greater than Less than Greater than or equal Less than or equal Not equal Not equal with signed return == > < >= <= != <=> String eq gt lt ge le ne cmp What’s wrong with this Picture? If (($x = 25) && ($y < 25) ) print “$y\n”; } { Arithmetic Operations Plus Minus Divide Multiply Exponentiate Modulus + / (floating point mode default) * ** % Example Suppose I have a data file that has K variables and N observations, and I want the average for each K across all N: Obs‟n, Varname(1), varname(2), … varname(k) 1, data(1), data(2), …. data (k) 2, data(1), data(2), …. data (k) … N, data(1), data(2), … data(k) while (<>) { # first read in each line and find the sum chomp; @eachline = split /\,\s+/, $_ ; if ($x < 1) { # don’t include the variable names @varnames = @eachline; @eachline = (); ++$x; } else { $k = 0; shift @eachline; # get rid of obs’n number. while ($k < @eachline) { $sum[$k] = $sum[$k] + $eachline[$k]; $k++; } ++$n; } } # counts $n, the number of obs’ns. while ($z < $k) { $average[$z] = (int(1000 * $sum[$z] / $n))/1000; $z++; } shift @varnames; print “@varnames\n”; print "@average\n"; Hashes Also known as “associative arrays” Consist of pairs of keys and values. Useful for database implementations Hash names begin with the percent sign “%” Unlike arrays which are ordered lists indexed by integers, hashes are unordered lists indexed by keys. Example: %emails = (Jon => „smiile@utk.edu‟, AJ => „ajw@utk.edu‟); print “$emails{AJ}\n”; # prints “ajw@utk.edu”; Note: hashes indexed by curly, not square brackets! Hash Input Three ways to get data into a hash: Assigning, with commas: %grades= (Jon, A, Harley, C, Marco, B)  “=>” is a more readable synonym for comma: %grades=(Jon=>A, Harley=>C, Marco=>B);  Assign each element in scalar context: $grades{Jon}=A; $grades{Harley}=C; $grades{Marco}=B;  Hash Input If a key already exists, adding it to the hash will clobber the previous value of that key. To prevent this: unless (exists ($emails{$name})) { $emails{$name}=$email; } Or: if (!emails{$name}) { $emails{$name}=$email; } Hash Output Refer to the value with the key: print “$grades{Marco}”; Grab all the keys and sort alphanumerically print sort keys %grades; Just the values print sort values %grades Hash Output Values and keys: foreach $key (keys %grades) { print “$key got a $grades{key}\n”; }  Or use each: while (($name,$grade) = each(%grades)) { print “$name got a $grade\n"; } Hash Functions Delete a key-value pair from a hash: delete $hashname{$key}; Make all the keys values and values keys: reverse %hashname; A Sample CGI Form A Sample CGI Script #!/usr/perl-5.6/bin/perl5.6.0 # invoke the perl compiler read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); # slurp in the data from the CGI form # the buffer comes in the form, #“lastname=Frederick&firstname=Jon&email=smiile@ut k.edu&phone=555-1212 # so it must be parsed into the separate data fields. @pairs = split(/&/, $buffer); A Sample CGI Script foreach $pair (@pairs) { ($name, $value) = split(/=/, $pair); $FORM{$name} = $value; } open(MEM, ") { chomp; $seen{$_} = 1; } close MEM; # the value of one is arbitrary for keys of %seen A Sample CGI Script $address = $FORM{email}; if ($seen{$address}) { print "Content-type: text/html\n\n"; print "You're already a member!

"; } else { print "Content-type: text/html\n\n"; foreach $key (sort keys %FORM) { print "$key is $FORM{$key}

"; } open(MEM,”>>memberemails.txt”); print MEM “$address”; close MEM; } Regular Expressions Regular expressions are patterns to be matched against a string Perl regular expressions are a superset of those used by the UNIX utilities grep, sed, vi and awk We‟ve already seen: print if (/$pattern/); Which is shorthand for: print $var if ($var=~m/$pattern/); Pattern Matching Operators/Functions $var=~m/$pattern/; # the match operator $var=~s/$pattern/$replacementpattern/g; # the substitution operator # “g” modifier means all occurences on each line @list = split /$pattern/, $var; # splits $var into list with $pattern as delimiter $var = join /$pattern/, @list; # joins list into a single variable /$pattern/i ; # “i” means ignore case Regular Expressions Metacharacters: \|()[{^$*+?. Backslash means “escape” or literal interpretation of metacharacters: $var =~ s/\|\$/pipe-dollar/; #means replace „|$‟ with „pipe-dollar Escaping normal alphanumeric characters turns them (some of them) into metacharacters: \s means “white space (tab or space) \n means line return Regular Expressions “|” means “or”; Parentheses allow grouping: print if (/Dept of (Psychology|Biology)/); # prints lines containing # “Dept of Psychology” or “Dept of Biology” “.” Means “any character” “*” means any number of the previous character: /Psych.*/ # matches Psychology or Psychiatry “+” means “one or more of the previous character” $line=~s/\s+/\t/g; # replace one-or-more spaces with a tab Regular Expressions “^” means beginning of the line “$” means end of the line s/^\s+//; # gets rid of spaces at beginning of line “[ ]” identify a “character class” s/[A-Ex2]/R/g # replaces A, B, C, D, E, 2, or x with R. “[^… ]” identifies a negative character class \w # any word character [a-zA-Z0-9_] while(<>) { /\@/ && print “$_\n” foreach(split /[^\w\@\.\-]/ ); } # extracts email addresses from an html file Command Line Options perl -w filename.pl  Debug mode, provides extra detail about potential flaws in code Test if file compiles successfully without actually running perl -c filename.pl  perl -e „command1; command2; …‟ Command line switch; runs perl code typed directly on the command line. perl -e ‟sleep(120); while (1) { print "\a" }‟ # a cheap alarm clock  Subroutines Defining a subroutine  sub name { …. } & name; Invoking a subroutine  print “What‟s your name?”; chomp ($name = ); & hello; sub hello { print “Hello, $name!\n”; } System Calls Backticks execute an expression “from the command line” and return the standard output: $files = `ls`; @files = split /\n/,$files; system( … ) just executes the expression and returns 1 if successful, 0 if not system (“mailx -s \”test mailing\” smiile@utk.edu < file”) Additional Resources CGI Course, March 28 and April 6. See  http://web.utk.edu/~training http://www.netcat.co.uk/rob/perl/win32perltut.html http://www.astentech.com/tutorials/Perl.html Another PERL tutorial:  A Directory of PERL tutorials:  Schwartz, R., Christiansen, T., & Wall, L. (1997). Learning Perl. Sebastopol, CA: O‟Reilly & Associates. Additional Resources The PERL Bookshelf (CD-ROM with 6 books). O‟Reilly & Associates. Includes Learning Perl. Christiansen, T., & Torkington, N. (1998). Perl Cookbook. Sebastopol, CA: O‟Reilly & Associates. UNIX for Windows http://www.research.att.com/~dgk/uwin/


Related docs
Introduction to PERL
Views: 109  |  Downloads: 23
Introduction to Perl
Views: 77  |  Downloads: 19
Introduction to Perl
Views: 45  |  Downloads: 13
Introduction to Perl
Views: 464  |  Downloads: 48
An Introduction to Perl
Views: 0  |  Downloads: 0
A Guide to PERL
Views: 83  |  Downloads: 22
Introduction to the world of perl
Views: 31  |  Downloads: 9
Introduction to Perl and BioPerl
Views: 332  |  Downloads: 14
Perl DBI Introduction Document
Views: 218  |  Downloads: 26
Perl
Views: 28  |  Downloads: 4
Perl for Bioinformatics
Views: 15  |  Downloads: 2
premium docs
Other docs by TaylorRandle
Duke Bio 25 Study Questions
Views: 877  |  Downloads: 15
VERIFICATION
Views: 252  |  Downloads: 2
Board Resolution Declaring Dividends
Views: 338  |  Downloads: 3
Sample workplace AIDS policy
Views: 386  |  Downloads: 10
Users marcsigal Desktop term papers TermPap
Views: 225  |  Downloads: 0
Operating Agreement - Wilson Equity Office LLC
Views: 333  |  Downloads: 11
Form 3903 Moving Expenses
Views: 365  |  Downloads: 2
Stock Certificate for Preferred Stock
Views: 492  |  Downloads: 20
Employee Settlement and Release Agreement
Views: 438  |  Downloads: 7
The Communist Manifesto
Views: 338  |  Downloads: 12
Sample Nondisclosure agreement
Views: 646  |  Downloads: 19
Dirty Joke Cheat
Views: 997  |  Downloads: 11
Preferred Stock Purchase Certificate
Views: 257  |  Downloads: 8