Introduction to PERL by TaylorRandle


									Introduction to PERL

      Instructor: Jon Frederick, M.S.
  Division of Information Infrastructure,
         UNIX/NT Systems Group,
      What is PERL?
“Practical Extraction and Report
Invented by Larry Wall in 1986
Originally for UNIX system
Based on C, sed, awk, and “English”
Why is PERL Popular?
Easy to use
   Default behavior: e.g. „print “hello world”;‟
Free (GNU public license)
Available for every O.S.: programs
transport seamlessly
Modern hardware makes it run fast
“No built-in limitations”
Why is PERL popular?
Well-documented and supported
 man perl
 O‟Reilly books, esp. The Perl CD-ROM

 comp.lang.perl

 the open-source code movement, thousands of free scripts
and modules
print "which file would you like to search?\n";
$file = <STDIN>; chomp $file;
open (FH, "<$file");
print "what pattern do you want to find?\n";
$pattern = <STDIN>; chomp $pattern;
while ($line=<FH>)         {
   chomp $line;        push(@lines,$line);
foreach $line2 (@lines) {
   if ($line2 =~ /$pattern/)    {
         print "$line2\n";
 Running a Perl Program
In UNIX, first set file permissions:
Chmod u+x
%           or
% perl
In Windows/DOS,
C:\> or C:\>perl
or click on the file from Windows Explorer
           Perl Programs
First line states path to the perl interpreter. At
PATH                                  VERSION
#!/usr/misc/bin/perl                  4.0
#!/soft/script/bin/perl               5.003
#!/usr/perl-5.6/bin/perl5.6.0         5.6.0

Unsure? Type, “which perl” and/or “perl -v”
           Perl Programs
# comments begin with a pound # sign
# and go until end-of-line
Commands are terminated by a
semicolon and can go for multiple lines;
Blocks of commands are surrounded by
{ curly braces }
The standard file suffix for a perl program is “.pl”
White space is “polite” but optional
       Scalar Variables
Contain a single element, or string of characters or
numbers (can be any length)
Begin with a dollar sign “$”
Names are case-sensitive ($var ne $VAR)
Values are assigned with an equals sign “=“
Variables do not need to be pre-declared.
(automatically null or zero until you assign values).
Single quotes mean literal; double mean interpolate:
Example: $name=“$userid\n”
# gives the userid and a line return;
‟$usrid\n‟ is the same as “\$userid\\n”
        Scalar Variables
  Concatenation is achieved with a dot “.”
$var1 = “Hello”;
$var2 = “$var1” . “world!\n”;
Print $var2; # prints “Hello world!”
  The default variable, “$_”
print $var foreach $var (@list); # works
Print $_ foreach $_ (@list);      # also works
print foreach (@list);            # also works
      Array Variables
Arrays are an ordered list of strings
Names begin with the at sign “@”
Individual elements of an array are
specified by their index number. For an
array named @array, the first element
could be referred to by $array[0]; The
last element is $array[-1] or $array[99]
(if the array has 100 elements).
         Array Variables
  In a scalar context @array is the number of
  elements in the array;
$y = @array;
  When quoted “@array” returns all of the
  elements in the array separated by a space.
  $#array is the index number of the last
  element in the array, i.e., $#array=(@array-1)
  The default array is @_ (not often used).
           Array Input
 Like scalar assignment:
@foods = (“pizza”, “salad”, “beer”); or
@foods = qw(pizza salad beer);
 Push adds element to the end of the list:
  while($x=<FH>)    { push (@lines,$x); }
 Unshift adds element to the beginning:
  while($x=<FH>)    { unshift (@lines, $x); }
               Array Input
   Individual Array Elements can be assigned like scalar
$foods[0] = “pizza”; $foods[1] = “salad”;
   Can be read from the standard input:
@lines = <STDIN>; # or push (@lines,$_) while (<>);
   The split command divides up the elements of a
   scalar into an array based on a delimiter:
@line = split / /, $lines[0];
        Array Input: Split
   Syntax: split /delimiter/, string ;
$line=“Time flies like an arrow\; fruit flies like a
@time = split /\s+/, $line;
print “$time[5] $time[6]\n”; # prints “fruit flies”
   A neat trick, grab only elements 5 and 6:
($word5,$word6) = (split, $line)[5,6];
              Array Output
   By specifying the index:
While ($x < @lines) {
       print “$lines[$x]\n”;
   By using foreach:
foreach (@lines)      {
       print “$_\n”;
            Array Output
   By using join:
$file = join „:„, @foods;
# $file is now “pizza:salad:beer”
   By using pop and shift:
$first = shift @foods; $last = pop @foods;
   Order can be sorted or reversed:
@sorted = sort @foods; # “beer pizza salad”
@reversed = reverse @sorted; # “salad pizza beer”
        File Input/Output
   Prompt the user for input
Print “which file would you like to read?\n”
$filename = <STDIN>;
chomp $filename; # get rid of that pesky newline.
   Use the default @ARGV array
     @ARGV is list of arguments supplied by the user
      from the command line.
$pattern = $ARGV[0]
$filename = $ARGV[1]; for our script that executes:
% pattern filename
   Since @ARGV is a default variable in PERL, you can
   open files explicitly:
while($var=<FH>) { print “$var”; }
close FH;
   Or, let Perl assume you know what you‟re doing:
while(<>)      { print; }
# opens the default file, $ARGV[0], assigns it to the
# default filehandle, assigns each line of $ARGV[0]
# to the default variable, and prints.
  The default or standard output is the terminal
while(<>) {
  print “$_” if ($_ =~ /$pattern/);
Which can be redirected from the cmd line:
% pattern filename > newfile
  Or, you can explicitly open an output
while($_=<>) {
  print OUT if (/$pattern/);
Exercise: Suppose you have the following SAS output for 100
variables. Write a PERL program that extracts and prints just
the variable name and the p-value of each signed rank test, one
variable name and p-value per line.
The SAS System                23:11 Sunday, February 4, 2001 1
                 The UNIVARIATE Procedure
                Variable: C3C4CAL1ALPHA
     Test       -Statistic-         -----p Value------
     Student's t t 0.095791          Pr > |t| 0.9246
     Sign       M       1.5     Pr >= |M| 0.6776
     Signed Rank    S         5.5    Pr >= |S| 0.8715
       Control Structures
If (some_statement) {
         do something;
         do another something;
} elsif (other_statement) {
         do something else;
} else {
         do this only if both statements false;
NOTE:           unless (!statement) eq if (statement)
       Control Structures
While (some_statement) {
        do something; # until statement becomes false.
Equivalent to
Until (!some_statement) {
        do something;
     The Nature of Truth*
„0 and “” are false; everything else is true.‟

0     # converts to "0", so false
1-1   # computes to 0, then converts to "0", so false
1     # converts to "1", so true
““    # empty string, so false
“1”   # not "" or "0", so true
“00” # not "" or "0", so true (this one is weird, watch out)
"0.000" # also true for the same reason and warning
Undef # evaluates to "", so false
* Schwartz, Christiansen and Wall, 1997. Learning Perl.
       Boolean Operators
&&    And
||    Or
!     Not

While ($_=<> && ($x!=12)) {
        print if (/Signed/ || /Student/);
} # print lines containing Signed or Student from
  # the first twelve lines of the standard input.
  Comparison Operators
                               Numeric   String

Equal                          ==        eq
Greater than                   >         gt
Less than                      <         lt
Greater than or equal          >=        ge
Less than or equal             <=        le
Not equal                      !=        ne
Not equal with signed return   <=>       cmp
 What’s wrong with this
If (($x = 25) && ($y < 25) )   {
      print “$y\n”;
  Arithmetic Operations
Plus           +
Minus          -
Divide         / (floating point mode default)
Multiply       *
Exponentiate   **
Modulus        %
  Suppose I have a data file that has K
  variables and N observations, and I want the
  average for each K across all N:
Obs‟n, Varname(1), varname(2), … varname(k)
1, data(1), data(2), …. data (k)
2, data(1), data(2), …. data (k)
N, data(1), data(2), … data(k)
while (<>) { # first read in each line and find the sum
  chomp; @eachline = split /\,\s+/, $_ ;
  if ($x < 1) { # don’t include the variable names
       @varnames = @eachline;
       @eachline = (); ++$x;
  } else {
       $k = 0;
       shift @eachline; # get rid of obs’n number.
       while ($k < @eachline) {
               $sum[$k] = $sum[$k] + $eachline[$k];
  ++$n; } } # counts $n, the number of obs’ns.
while ($z < $k) {
  $average[$z] =
       (int(1000 * $sum[$z] / $n))/1000;
shift @varnames;
print “@varnames\n”;
print "@average\n";
   Also known as “associative arrays”
   Consist of pairs of keys and values.
   Useful for database implementations
   Hash names begin with the percent sign “%”
   Unlike arrays which are ordered lists indexed by
   integers, hashes are unordered lists indexed by keys.
Example: %emails = (Jon => „‟,
                        AJ => „‟);
print “$emails{AJ}\n”; # prints “”;
Note: hashes indexed by curly, not square brackets!
           Hash Input
Three ways to get data into a hash:
 Assigning, with commas:
%grades= (Jon, A, Harley, C, Marco, B)
 “=>” is a more readable synonym for comma:

%grades=(Jon=>A, Harley=>C, Marco=>B);
 Assign each element in scalar context:

$grades{Jon}=A; $grades{Harley}=C;
                Hash Input
    If a key already exists, adding it to the hash will
    clobber the previous value of that key. To prevent
unless (exists ($emails{$name})) {
if (!emails{$name}) {
         Hash Output
Refer to the value with the key:
print “$grades{Marco}”;
Grab all the keys and sort
print sort keys %grades;
Just the values
print sort values %grades
         Hash Output
 Values and keys:
foreach $key (keys %grades) {
   print “$key got a $grades{key}\n”;

Or use each:
while (($name,$grade) = each(%grades)) {
   print “$name got a $grade\n";
        Hash Functions
  Delete a key-value pair from a hash:
delete $hashname{$key};
  Make all the keys values and values keys:
reverse %hashname;
A Sample CGI Form
    A Sample CGI Script
# invoke the perl compiler
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
# slurp in the data from the CGI form
# the buffer comes in the form,
# so it must be parsed into the separate data fields.
@pairs = split(/&/, $buffer);
     A Sample CGI Script
foreach $pair (@pairs) {
       ($name, $value) = split(/=/, $pair);
       $FORM{$name} = $value;
open(MEM, "<memberemails.txt");
while (<MEM>) {
       $seen{$_} = 1;
close MEM;
# the value of one is arbitrary for keys of %seen
     A Sample CGI Script
$address = $FORM{email};
if ($seen{$address}) {
      print "Content-type: text/html\n\n";
      print "You're already a member!<p>";
} else {
      print "Content-type: text/html\n\n";
       foreach $key (sort keys %FORM) {
            print "$key is $FORM{$key}<p>"; }
        print MEM “$address”; close MEM;
  Regular Expressions
Regular expressions are patterns to be matched
against a string
Perl regular expressions are a superset of those used
by the UNIX utilities grep, sed, vi and awk

We‟ve already seen:
print if (/$pattern/);
Which is shorthand for:
print $var if ($var=~m/$pattern/);
   Pattern Matching
# the match operator
# the substitution operator
# “g” modifier means all occurences on each line
@list = split /$pattern/, $var;
# splits $var into list with $pattern as delimiter
$var = join /$pattern/, @list;
# joins list into a single variable
/$pattern/i ; # “i” means ignore case
  Regular Expressions
Backslash means “escape” or literal interpretation of
$var =~ s/\|\$/pipe-dollar/;
#means replace „|$‟ with „pipe-dollar
Escaping normal alphanumeric characters turns them
(some of them) into metacharacters:
\s means “white space (tab or space)
\n means line return
    Regular Expressions
   “|” means “or”; Parentheses allow grouping:
   print if (/Dept of (Psychology|Biology)/);
# prints lines containing
# “Dept of Psychology” or “Dept of Biology”
   “.” Means “any character”
   “*” means any number of the previous character:
   /Psych.*/ # matches Psychology or Psychiatry
   “+” means “one or more of the previous character”
   # replace one-or-more spaces with a tab
     Regular Expressions
   “^” means beginning of the line
   “$” means end of the line
s/^\s+//;        # gets rid of spaces at beginning of line
   “[ ]” identify a “character class”
s/[A-Ex2]/R/g # replaces A, B, C, D, E, 2, or x with R.
   “[^… ]” identifies a negative character class
   \w            # any word character [a-zA-Z0-9_]
while(<>)        {
      /\@/ && print “$_\n” foreach(split /[^\w\@\.\-]/ );
 } # extracts email addresses from an html file
 Command Line Options
  perl -w
      Debug mode, provides extra detail about potential
       flaws in code
  perl -c
      Test if file compiles successfully without actually
  perl -e „command1; command2; …‟
     Command line switch; runs perl code typed
      directly on the command line.
perl -e ‟sleep(120); while (1) { print "\a" }‟
# a cheap alarm clock
  Defining a subroutine
     sub name { …. }
  Invoking a subroutine
     & name;
print “What‟s your name?”;
chomp ($name = <stdin>);
& hello;
sub hello {
  print “Hello, $name!\n”;
             System Calls
   Backticks execute an expression “from the command
   line” and return the standard output:
$files = `ls`;
@files = split /\n/,$files;

  system( … ) just executes the expression and returns
  1 if successful, 0 if not
system (“mailx -s \”test mailing\” < file”)
 Additional Resources
CGI Course, March 28 and April 6. See
Another PERL tutorial:
A Directory of PERL tutorials:
Schwartz, R., Christiansen, T., & Wall, L. (1997).
Learning Perl. Sebastopol, CA: O‟Reilly &
 Additional Resources
The PERL Bookshelf (CD-ROM with 6
books). O‟Reilly & Associates. Includes
Learning Perl.
Christiansen, T., & Torkington, N. (1998). Perl
Cookbook. Sebastopol, CA: O‟Reilly &
UNIX for Windows

To top