  A Perl Tutorial

  NLP Course - 2006
What is Perl?
 Practical Extraction and Report Language
 Interpreted Language
   Optimized for String Manipulation and File I/O
   Full support for Regular Expressions
Running Perl Scripts
 Windows
   Download ActivePerl from ActiveState
   Just run the script from a 'Command Prompt'
 UNIX – Cygwin
   Put the following in the first line of your script
   Run the script
    % perl script_name
Basic Syntax
 Statements end with semicolon ‘;’
 Comments start with ‘#’
   Only single line comments
 Variables
   You don’t have to declare a variable before you
    access it
   You don't have to declare a variable's type
Scalars and Identifiers
 Identifiers
   A variable name
   Case sensitive
 Scalar
   A single value (string or numerical)
   Accessed by prefixing an identifier with '$'
   Assignment with '='
    $scalar = expression
 Quoting Strings
   With ' (apostrophe)
     Everything is interpreted literally
   With " (double quotes)
     Variables get expanded
   With ` (backtick)
     The text is executed as a separate process, and
      the output of the command is returned as the
      value of the string

                                   Check 01_printDate.pl
Comparison Operators
String   Operation                  Arithmetic

lt       less than                  <

gt       greater than               >

eq       equal to                   ==

le       less than or equal to      <=

ge       greater than or equal to   >=

ne       not equal to               !=

cmp      compare, return 1, 0, -1   <=>
Logical Operators

    Operator   Operation

    ||, or     logical or

    &&, and    logical and

    !, not     logical not

    xor        logical xor
String Operators
     Operator          Operation
     .                 string concatenation
     x                 string repetition
     .=                concatenation and assignment

$string1 = "potato";
$string2 = "head";

$newstring = $string1 . $string2; #"potatohead"
$newerstring = $string1 x 2; #"potatopotato"
$string1 .= $string2; #"potatohead"

                                           Check concat_input.pl
Perl Functions
 Perl functions are identified by their unique names
  (print, chop, close, etc)
 Function arguments are supplied as a comma
  separated list in parenthesis.
    The commas are necessary
    The parentheses are often not
    Be careful! You can write some nasty and unreadable
     code this way!

                                    Check 02_unreadable.pl
 Ordered collection of scalars
    Zero indexed (first item in position '0')
    Elements addressed by their positions
 List Operators
    (): list constructor
    , : element separator
    []: take slices (single or multiple element chunks)
List Operations
 sort(LIST)
  a new list, the sorted version of LIST
 reverse(LIST)
  a new list, the reverse of LIST
 join(EXPR, LIST)
  a string version of LIST, delimited by EXPR
 split(PATTERN, EXPR)
   create a list from each of the portions of EXPR that
   match PATTERN

                                           Check 03_listOps.pl
 A named list
   Dynamically allocated, can be saved
   Zero-indexed
   Shares list operations, and adds to them
 Array Operators
   @: reference to the array (or a portion of it, with [])
   $: reference to an element (used with [])
Array Operations
 push(@ARRAY, LIST)
  add the LIST to the end of the @ARRAY
 pop(@ARRAY)
  remove and return the last element of @ARRAY
 unshift(@ARRAY, LIST)
  add the LIST to the front of @ARRAY
 shift(@ARRAY)
  remove and return the first element of @ARRAY
 scalar(@ARRAY)
   return the number of elements in the @ARRAY

                                        Check 04_arrayOps.pl
Associative Arrays - Hashes
 Arrays indexed on arbitrary string values
    Key-Value pairs
    Use the "Key" to find the element that has the
 Hash Operators
    % : refers to the hash
    {}: denotes the key
    $ : the value of the element indexed by the key
     (used with {})
Hash Operations
 keys(%ARRAY)
  return a list of all the keys in the %ARRAY
 values(%ARRAY)
  return a list of all the values in the %ARRAY
 each(%ARRAY)
  iterates through the key-value pairs of the %ARRAY
 delete($ARRAY{KEY})
   removes the key-value pair associated with {KEY} from
   the ARRAY
Arrays Example
#!/usr/bin/perl                                     #Add a few more numbers
# Simple List operations                            @numbers_10 = @sorted_num;
# Address an element in the list                    push(@numbers_10, ('6','7','8','9','10'));
@stringInstruments =                                print("Numbers (1-10): ",
      ("violin","viola","cello","bass");                @numbers_10,
@brass =                                                "\n");
      "tuba");                                      # Remove the last
$biggestInstrument = $stringInstruments[3];         print("Numbers (1-9): ",
print("The biggest instrument: ",
       $biggestInstrument);                             "\n");
                                                    # Remove the first
# Join elements at positions 0, 1, 2 and 4 into a   print("Numbers (2-9): ",
       white-space delimited string
print("orchestral brass: ",                             shift(@numbers_10),
    join(" ",@brass[0,1,2,4]),                          "\n");
    "\n");                                          # Combine two ops
                                                    print("Count elements (2-9): ",
@unsorted_num = ('3','5','2','1','4');                           $#@numbers_10;
@sorted_num = sort( @unsorted_num );
                                                    #             scalar( @numbers_10 ),
# Sort the list                                         "\n");
print("Numbers (Sorted, 1-5): ",                    print("What's left (numbers 2-9): ",
    @sorted_num,                                                  @numbers_10,
    "\n");                                              "\n");
Hashes Example
# Simple List operations

$player{"clarinet"} = "Susan Bartlett";
$player{"basson"} = "Andrew Vandesteeg";
$player{"flute"} = "Heidi Lawson";
$player{"oboe"} = "Jeanine Hassel";
@woodwinds = keys(%player);
@woodwindPlayers = values(%player);

# Who plays the oboe?
print("Oboe: ", $player{'oboe'}, "\n");

$playerCount = scalar(@woodwindPlayers);

while (($instrument, $name) =
     print( "$name plays the $instrument\n"
Pattern Matching
 A pattern is a sequence of characters to be
  searched for in a character string
   /pattern/
 Match operators
   =~: tests whether a pattern is matched
   !~: tests whether patterns is not matched
  Pattern     Matches        Pattern           Matches
/def/       "define"      /d.f/          dif
/\bdef\b/   a def word    /d.+f/         dabcf
/^def/      def in start of /d.*f/       df, daffff
/^def$/     def line        /de{1,3}f/   deef, deeef
/de?f/      df, def       /de{3}f/       deeef
/d[eE]f/    def, dEf      /de{3,}f/      deeeeef
/d[^eE]f/   daf, dzf      /de{0,3}f/     up to deeef
Character Ranges
 Escape     Pattern                  Description
\d       [0-9]           Any digit

\D       [^0-9]          Anything but a digit
\w       [_0-9A-Za-z]    Any word character
\W       [^_0-9A-Za-z]   Anything but a word char
\s       [ \r\t\n\f]     White-space
\S       [^\r\t\n\f]     Anything but white-space
 Memorize the matched portion of input
Use of parentheses.
    /[a-z]+(.)[a-z]+\1[a-z]+/
    asd-eeed-sdsa, sd-sss-ws
    NOT as_eee-dfg
 They can even be accessed immediately after the
  pattern is matched
    \1 in the previous pattern is what is matched by (.)
Pattern Matching Options
      Escape           Description
         g    Match all possible patterns
         i      Ignore case
         x      Ignore white-space in pattern
 Substitution operator
   s/pattern/substitution/options
 If $string = "abc123def";
   $string =~ s/123/456/
  Result: "abc456def"
   $string =~ s/123//
  Result: "abcdef"
   $string =~ s/(\d+)/[$1]/
  Result: "abc[123]def“
  Use of backreference!
Predefined Read-only Variables
$&         is the part of the string that matched the regular expression

$`         is the part of the string before the part that matched

$'         is the part of the string after the part that matched

$_ = "this is a sample string";
/sa.*le/; # matches "sample" within the string
# $` is now "this is a "
# $& is now "sample"
# $' is now " string"
Because these variables are set on each successful match, you should save the
values elsewhere if you
need them later in the program.
The split and join Functions
The split function takes a regular expression and a string, and looks for all
occurrences of the regular expression within that string. The parts of the string
that don't match the regular expression are returned in sequence as a list of

The join function takes a list of values and glues them together with a glue string
between each list element.
              Split Example                                Join Example
$line =                                      $bigstring = join($glue,@list);
:/usr/bin/perl";                             For example to rebuilt the password file
@fields = split(/:/,$line); # split $line,   try    something like:
using : as delimiter                         $outline = join(":", @fields);
# now @fields is
# "/home/merlyn","/usr/bin/perl")
String - Pattern Examples
A simple Example

print ("Ask me a question politely:\n");

$question = <STDIN>;

# what about capital P in "please"?
if ($question =~ /please/)
     print ("Thank you for being polite!\n");
     print ("That was not very polite!\n");
String – Pattern Example
print ("Enter a variable name:\n");
$varname = <STDIN>;
chop ($varname);
# Try asd$asdas... It gets accepted!
if ($varname =~ /\$[A-Za-z][_0-9a-zA-Z]*/)
      print ("$varname is a legal scalar variable\n");
elsif ($varname =~ /@[A-Za-z][_0-9a-zA-Z]*/)
      print ("$varname is a legal array variable\n");
elsif ($varname =~ /[A-Za-z][_0-9a-zA-Z]*/)
      print ("$varname is a legal file variable\n");
      print ("I don't understand what $varname is.\n");

