Perl

Document Sample
Perl Powered By Docstoc
					            Perl Tutorial
Practical extraction and report language




         http://www.comp.leeds.ac.uk/Perl/start.html




                                                1
Why Perl?

   Perl is built around regular expressions
     REs  are good for string processing
     Therefore Perl is a good scripting language
     Perl is especially popular for CGI scripts

   Perl makes full use of the power of UNIX
   Short Perl programs can be very short
     “Perlis designed to make the easy jobs easy,
      without making the difficult jobs impossible.” --
      Larry Wall, Programming Perl
                                                          2
Why not Perl?

    Perl is very UNIX-oriented
      Perl is available on other platforms...
      ...but isn’t always fully implemented there
      However, Perl is often the best way to get some
       UNIX capabilities on less capable platforms
    Perl does not scale well to large programs
      Weak   subroutines, heavy use of global variables
    Perl’s syntax is not particularly appealing

                                                           3
What is a scripting language?

   Operating systems can do many things
     copy,move, create, delete, compare files
     execute programs, including compilers
     schedule activities, monitor processes, etc.

   A command-line interface gives you access to
    these functions, but only one at a time
   A scripting language is a “wrapper” language
    that integrates OS functions

                                                     4
Major scripting languages

    UNIX has sh, Perl
    Macintosh has AppleScript, Frontier
    Windows has no major scripting languages
      probably   due to the weaknesses of DOS
    Generic scripting languages include:
      Perl (most popular)
      Tcl (easiest for beginners)
      Python (new, Java-like, best for large programs)


                                                          5
Perl Example 1
 #!/usr/local/bin/perl
 #
 # Program to do the obvious
 #
 print 'Hello world.'; # Print a message




                                           6
Comments on “Hello, World”

   Comments are # to end of line
     But  the first line, #!/usr/local/bin/perl, tells where to
      find the Perl compiler on your system
   Perl statements end with semicolons
   Perl is case-sensitive
   Perl is compiled and run in a single operation




                                                                   7
Variables
   A variable is a name of a place where some information is stored. For
    example:
     $yearOfBirth = 1976;
    $currentYear = 2000;
    $age = $currentYear-$yearOfBirth;
    print $age;
    Same name can store strings:
    $yearOfBirth = ‘None of your business’;

   The variables in the example program can be identified as such because their
    names start with a dollar ($). Perl uses different prefix characters for structure
    names in programs. Here is an overview:

   $: variable containing scalar values such as a number or a string
   @: variable containing a list with numeric keys
   %: variable containing a list with strings as keys
   &: subroutine
                                                                                         8
Operations on numbers
   Perl contains the following arithmetic operators:
   +: sum
   -: subtraction
   *: product
   /: division
   %: modulo division
   **: exponent

   Apart from these operators, Perl contains some built-in arithmetic
    functions. Some of these are mentioned in the following list:

   abs($x): absolute value
   int($x): integer part
   rand(): random number between 0 and 1
   sqrt($x): square root
                                                                         9
Test your understanding

   $text =~ s/bug/feature/;

   $text =~ s/bug/feature/g;

   $text =~ tr/[A-Z]/[a-z]/;

   $text =~ tr/AEIOUaeiou//d;

   $text =~ tr/[0-9]/x/cs;


   $text =~ s/[A-Z]/CAPS/g;


                                 10
Examples
   # replace first occurrence of "bug"
   $text =~ s/bug/feature/;

   # replace all occurrences of "bug"
   $text =~ s/bug/feature/g;

   # convert to lower case
   $text =~ tr/[A-Z]/[a-z]/;

   # delete vowels
   $text =~ tr/AEIOUaeiou//d;

   # replace nonnumber sequences with a single x
   $text =~ tr/[0-9]/x/cs;

   # replace each capital character by CAPS

   $text =~ s/[A-Z]/CAPS/g;
                                                    11
Regular expressions
   \b: word boundaries                           Examples:
   \d: digits                                    1. Clean an HTML formatted text
   \n: newline
   \r: carriage return
   \s: white space characters                    2. Grab URLs from a Web page
   \t: tab
   \w: alphanumeric characters
   ^: beginning of string                        3. Transform all lines from a file into
   $: end of string                              lower case
   .: any character
   [bdkp]: characters b, d, k and p
   [a-f]: characters a to f
   [^a-f]: all characters except a to f
   abc|def: string abc or string def


   *: zero or more times
   +: one or more times
   ?: zero or one time
   {p,q}: at least p times and at most q times
   {p,}: at least p times
   {p}: exactly p times

                                                                                     12
Lists and arrays
   @a = (); # empty list

   @b = (1,2,3); # three numbers

   @c = ("Jan","Piet","Marie"); # three strings

   @d = ("Dirk",1.92,46,"20-03-1977"); # a mixed list

   Variables and sublists are interpolated in a list
   @b = ($a,$a+1,$a+2); # variable interpolation
   @c = ("Jan",("Piet","Marie")); # list interpolation
   @d = ("Dirk",1.92,46,(),"20-03-1977"); # empty list
        # don’t get lists containing lists – just a simple list

   @e = ( @b, @c ); # same as (1,2,3,"Jan","Piet","Marie")




                                                                   13
Lists and arrays


    Practical construction operators
    ($x..$y)
 
    @x = (1..6); # same as (1, 2, 3, 4, 5, 6)

    @z = (2..5,8,11..13); # same as (2,3,4,5,8,11,12,13)

    qw() "quote word" function

      qw(Jan Piet Marie) is a shorter notation for ("Jan","Piet","Marie").


                                                                          14
Split
   It takes a regular expression and a string, and splits the string into a list, breaking it into pieces at
    places where the regular expression matches.

    $string = "Jan Piet\nMarie \tDirk";
    @list = split /\s+/, $string; # yields ( "Jan","Piet","Marie","Dirk" )
        # remember \s is a white space

    $string = " Jan Piet\nMarie \tDirk\n"; # empty string at begin and end!!!
    @list = split /\s+/, $string; # yields ( "", "Jan","Piet","Marie","Dirk", "" )

    $string = "Jan:Piet;Marie---Dirk"; # use any regular expression...
    @list = split /[:;]|---/, $string; # yields ( "Jan","Piet","Marie","Dirk" )

    $string = "Jan Piet"; # use an empty regular expression to split on letters
    @letters= split //, $string; # yields ( "J","a","n"," ","P","i","e","t")


                                                                                                                15
More about arrays
   @array = ("an","bert","cindy","dirk");

   $length = @array; # $length now has the value 4


   print $length; # prints 4

   print $#array; # prints 3, last valid subscript

   print $array[$#array] # prints "dirk"

   print scalar(@array) # prints 4



                                                      16
Working with lists
Subscripts convert lists to strings
@array = ("an","bert","cindy","dirk");
print "The array contains $array[0] $array[1] $array[2] $array[3]";

# interpolate
print "The array contains @array";

function join STRING LIST.
$string = join ":", @array;
# $string now has the value "an:bert:cindy:dirk"

Iteration over lists
for( $i=0 ; $i<=$#array; $i++){
   $item = $array[$i];
   $item =~ tr/a-z/A-Z/;
   print "$item ";
}

foreach $item (@array){
   $item =~ tr/a-z/A-Z/;
   print "$item "; # prints a capitalized version of each item
}                                                                     17
More about arrays – multiple value assignments

   ($a, $b) = ("one","two");
   ($onething, @manythings) = (1,2,3,4,5,6)
     # now $onething equals 1
     # and @manythings = (2,3,4,5,6)
   ($array[0],$array[1]) = ($array[1],$array[0]);
     # swap the first two

   Pay attention to the fact that assignment to a variable first
    evaluates the right hand-side of the expression, and then makes a
    copy of the result

   @array = ("an","bert","cindy","dirk");
   @copyarray = @array; # makes a deep copy
   $copyarray[2] = "XXXXX";
                                                                        18
Manipulating lists and their elements PUSH

    push ARRAY LIST

      appends the list to the end of the array.
 
     if the second argument is a scalar rather than a list, it appends it as the last
     item of the array.

    @array = ("an","bert","cindy","dirk");
    @brray = ("eve","frank");

    push @array, @brray;
    # @array is ("an","bert","cindy","dirk","eve","frank")

    push @brray, "gerben";
    # @brray is ("eve","frank","gerben")
                                                                                    19
Manipulating lists and their elements POP


    pop ARRAY does the opposite of push. it removes the last item of
     its argument list and returns it.
    If the list is empty it returns undef.
 
    @array = ("an","bert","cindy","dirk");
     $item = pop @array;

    # $item is "dirk" and @array is ( "an","bert","cindy")

    shift @array removes the first element - works on the left end of the
     list, but is otherwise the same as pop.

    unshift (@array, @newStuff) puts stuff on the left side of the list,
     just as push does for the right side.
                                                                            20
Grep

   grep CONDITION LIST

     returns a list of all items from list that satisfy some
    condition.

     For example:

   @large = grep $_ > 10, (1,2,4,8,16,25); # returns (16,25)

   @i_names = grep /i/, @array; # returns ("cindy","dirk")

                                                                21
map
   map OPERATION LIST
     is an extension of grep, and performs an arbitrary operation on
    each element of a list.

   For example:

   @array = ("an","bert","cindy","dirk");

   @more = map $_ + 3, (1,2,4,8,16,25);
   # returns (4,5,7,11,19,28)

   @initials = map substr($_,0,1), @array;
   # returns ("a","b","c","d")

                                                                        22
Hashes (Associative Arrays)

-   associate keys with values – named with %
-   allows for almost instantaneous lookup of a value
    that is associated with some particular key


Examples
if %wordfrequency is the hash table,
$wordfrequency{"the"} = 12731; # creates key "the", value 12731
$phonenumber{"An De Wilde"} = "+31-20-6777871";
$index{$word} = $nwords;
$occurrences{$a}++; # if this is the first reference,
                    # the value associated with $a will
                    # be increased from 0 to 1

                                                                  23
    Hash Operations
-    %birthdays = ("An","25-02-1975","Bert","12-10-1953","Cindy","23-
     05-1969","Dirk","01-04-1961");
-    # fill the hash

-    %birthdays = (An => "25-02-1975", Bert => "12-10-1953", Cindy =>
     "23-05-1969", Dirk => "01-04-1961" );
-    # fill the hash; the same as above, but more explicit

-    @list = %birthdays; # make a list of the key/value pairs

-    %copy_of_bdays = %birthdays; # copy a hash




                                                                        24
Hashes (What if not there?)
-   Existing, Defined and true.

-   If the value for a key does not exist in the hash, the access
    to it returns the undef value.

-   special test function exists(HASHENTRY), which returns
    true if the hash key exists in the hash

-   if($hash{$key}){...}, or if(defined($hash{$key})){...}
-            return false if the key $key has no associated
    value
-   print "Exists\n" if exists $array{$key};

                                                               25
Perl Example 2
#!/ex2/usr/bin/perl
# Remove blank lines from a file
# Usage: singlespace < oldfile > newfile

while ($line = <STDIN>) {
  if ($line eq "\n") { next; }
  print "$line";
}

                                           26
More Perl notes
   On the UNIX command line;
       < filename means to get input from this file
       > filename means to send output to this file
   In Perl, <STDIN> is the input file, <STDOUT> is the output
    file
   Scalar variables start with $
   Scalar variables hold strings or numbers, and they are
    interchangeable
   Examples:
       $priority = 9;
       $priority = '9';
   Array variables start with @
                                                             27
 Perl Example 3
#!/usr/local/bin/perl
# Usage: fixm <filenames>
# Replace \r with \n -- replaces input files

foreach $file (@ARGV) {
   print "Processing $file\n";
   if (-e "fixm_temp") { die "*** File fixm_temp already exists!\n"; }
   if (! -e $file) { die "*** No such file: $file!\n"; }
   open DOIT, "| tr \'\\015' \'\\012' < $file > fixm_temp"
       or die "*** Can't: tr '\015' '\012' < $ file > $ fixm_temp \n";
   close DOIT;
   open DOIT, "| mv -f fixm_temp $file"
          or die "*** Can't: mv -f fixm_temp $file\n";
   close DOIT;
}

                                                                         28
Comments on example 3
   In # Usage: fixm <filenames>, the angle brackets just mean to supply a
    list of file names here
   In UNIX text editors, the \r (carriage return) character usually shows up
    as ^M (hence the name fixm_temp)
   The UNIX command tr '\015' '\012' replaces all \015 characters (\r) with
    \012 (\n) characters
   The format of the open and close commands is:
       open fileHandle, fileName
       close fileHandle, fileName

 "| tr \'\\015' \'\\012' < $file > fixm_temp" says: Take input from $file,
  pipe it to the tr command, put the output on fixm_temp




                                                                         29
Arithmetic in Perl
$a = 1   + 2;      # Add 1 and 2 and store in $a
$a = 3   - 4;     # Subtract 4 from 3 and store in $a
$a = 5   * 6;     # Multiply 5 and 6
$a = 7   / 8;     # Divide 7 by 8 to give 0.875
$a = 9   ** 10;   # Nine to the power of 10, that is, 910
$a = 5   % 2;     # Remainder of 5 divided by 2
++$a;             # Increment $a and then return it
$a++;             # Return $a and then increment it
--$a;             # Decrement $a and then return it
$a--;             # Return $a and then decrement it

                                                      30
String and assignment operators

 $a = $b . $c; # Concatenate $b and $c
 $a = $b x $c; # $b repeated $c times

 $a = $b;    # Assign $b to $a
 $a += $b;   # Add $b to $a
 $a -= $b;   # Subtract $b from $a
 $a .= $b;   # Append $b onto $a

                                         31
Single and double quotes

   $a = 'apples';
   $b = 'bananas';
   print $a . ' and ' . $b;
     prints:   apples and bananas
   print '$a and $b';
     prints:   $a and $b
   print "$a and $b";
     prints:   apples and bananas

                                     32
Arrays
   @food = ("apples", "bananas", "cherries");
   But…
    print $food[1];
     prints   "bananas"
   @morefood = ("meat", @food);
     @morefood   ==
          ("meat", "apples", "bananas", "cherries");
   ($a, $b, $c) = (5, 10, 20);


                                                       33
push and pop
   push adds one or more things to the end of a list
     push (@food, "eggs", "bread");
     push returns the new length of the list

   pop removes and returns the last element
     $sandwich   = pop(@food);
   $len = @food; # $len gets length of @food
   $#food # returns index of last element




                                                    34
foreach

 # Visit each item in turn and call it $morsel

 foreach $morsel (@food)
 {
      print "$morsel\n";
      print "Yum yum\n";
 }


                                                 35
Tests

   “Zero” is false. This includes:
        0, '0', "0", '', ""
   Anything not false is true
   Use == and != for numbers, eq and ne for
    strings
   &&, ||, and ! are and, or, and not, respectively.




                                                        36
for loops

   for loops are just as in C or Java

   for ($i = 0; $i < 10; ++$i)
    {
          print "$i\n";
    }




                                         37
while loops
#!/usr/local/bin/perl
print "Password? ";
$a = <STDIN>;
chop $a;           # Remove the newline at end
while ($a ne "fred")
{
   print "sorry. Again? ";
   $a = <STDIN>;
   chop $a;
}
                                                 38
do..while and do..until loops
#!/usr/local/bin/perl
do
{
    print "Password? ";
    $a = <STDIN>;
     chop $a;
}
while ($a ne "fred");

                                39
if statements
if ($a)
{
      print "The string is not empty\n";
}
else
{
      print "The string is empty\n";
}

                                           40
if - elsif statements
 if (!$a)
   { print "The string is empty\n"; }
 elsif (length($a) == 1)
   { print "The string has one character\n"; }
 elsif (length($a) == 2)
   { print "The string has two characters\n"; }
 else
   { print "The string has many characters\n"; }


                                                   41
Why Perl?

   Two factors make Perl important:
     Pattern    matching/string manipulation
          Based on regular expressions (REs)
          REs are similar in power to those in Formal Languages…
          …but have many convenience features
     Ability   to execute UNIX commands
          Less useful outside a UNIX environment




                                                                    42
Basic pattern matching

   $sentence =~ /the/
     True   if $sentence contains "the"
   $sentence = "The dog bites.";
    if ($sentence =~ /the/) # is false
     …because     Perl is case-sensitive
   !~ is "does not contain"




                                            43
RE special characters
 .   # Any single character except a newline
 ^   # The beginning of the line or string
 $   # The end of the line or string
 *   # Zero or more of the last character
 +   # One or more of the last character
 ?   # Zero or one of the last character



                                               44
RE examples
^.*$      # matches the entire string
hi.*bye   # matches from "hi" to "bye" inclusive
x +y      # matches x, one or more blanks, and y
^Dear     # matches "Dear" only at beginning
bags?     # matches "bag" or "bags"
hiss+     # matches "hiss", "hisss", "hissss", etc.



                                                      45
Square brackets
[qjk]     # Either q or j or k
[^qjk]    # Neither q nor j nor k
[a-z]     # Anything from a to z inclusive
[^a-z]    # No lower case letters
[a-zA-Z] # Any letter
[a-z]+    # Any non-zero sequence of
          # lower case letters
                                             46
More examples
[aeiou]+   # matches one or more vowels
[^aeiou]+ # matches one or more nonvowels
[0-9]+     # matches an unsigned integer
[0-9A-F]   # matches a single hex digit
[a-zA-Z]   # matches any letter
[a-zA-Z0-9_]+ # matches identifiers

                                           47
More special characters
\n    # A newline
\t   # A tab
\w   # Any alphanumeric; same as [a-zA-Z0-9_]
\W   # Any non-word char; same as [^a-zA-Z0-9_]
\d    # Any digit. The same as [0-9]
\D   # Any non-digit. The same as [^0-9]
\s   # Any whitespace character
\S   # Any non-whitespace character
\b   # A word boundary, outside [] only
\B   # No word boundary
                                              48
Quoting special characters
  \|   # Vertical bar
  \[   # An open square bracket
  \)   # A closing parenthesis
  \*   # An asterisk
  \^   # A carat symbol
  \/   # A slash
  \\   # A backslash

                                  49
Alternatives and parentheses
 jelly|cream # Either jelly or cream

 (eg|le)gs    # Either eggs or legs

 (da)+        # Either da or dada or
              # dadada or...



                                       50
The $_ variable

   Often we want to process one string repeatedly
   The $_ variable holds the current string
   If a subject is omitted, $_ is assumed
   Hence, the following are equivalent:
     if($sentence =~ /under/) …
     $_ = $sentence; if (/under/) ...




                                                     51
Case-insensitive substitutions

   s/london/London/i
                    substitution; will replace london,
     case-insensitive
      LONDON, London, LoNDoN, etc.
   You can combine global substitution with case-
    insensitive substitution
     s/london/London/gi




                                                         52
Remembering patterns

   Any part of the pattern enclosed in parentheses
    is assigned to the special variables $1, $2, $3,
    …, $9
   Numbers are assigned according to the left
    (opening) parentheses
   "The moon is high" =~ /The (.*) is (.*)/
     Afterwards,   $1 = "moon" and $2 = "high"



                                                       53
Dynamic matching

   During the match, an early part of the match that
    is tentatively assigned to $1, $2, etc. can be
    referred to by \1, \2, etc.
   Example:
     \b.+\b matches a single word
     /(\b.+\b) \1/ matches repeated words
     "Now is the the time" =~ /(\b.+\b) \1/
     Afterwards, $1 = "the"



                                                    54

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:11/18/2012
language:Unknown
pages:54