Docstoc

A Brief History Perl Perl • Perl Practical extraction report language

Document Sample
A Brief History Perl Perl • Perl Practical extraction report language Powered By Docstoc
					Perl
                Perl
• Perl - Practical extraction report
  language
   –   for text files
   –   system management
   –   combines C, SED, AWK, SH
   –   interpreted
   –   dynamic




                                  Perl notes 2
    Data Structures

• scalars         $num
• arrays          @num
• associative arrays %num

• $num[50]
  – 50th element of the array num
• $#num
  – last index of num




                               Perl notes 3
                Examples
#! /usr/local/bin/perl -w
# find the sum of a list of numbers from STDIN
# one number per line

$sum = 0;
while( <STDIN> ) {
      $sum += int $_;
}
print "the sum is $sum\n";




                                             Perl notes 4
              Examples
#!/usr/bin/perl -w
# find the sum of a list of numbers from STDIN
# several numbers per line

$sum = 0;
while( <STDIN> ) {
      @nums = split;
      foreach (@nums) {
           $sum += int $_;
      }
}
print "the sum is $sum\n";




                                        Perl notes 5
              Average
#!/usr/bin/perl -w
# find the average of a list of
# numbers from STDIN
# several numbers per line

$sum = 0;
$count = 0;
while( <STDIN> ) {
      @nums = split;
      foreach (@nums) {
           $sum += int $_;
           $count++;
      }
}
print "the average is ", $sum/$count, "\n";

                                   Perl notes 6
                   median
#!/usr/bin/perl -w
# find the median of a list of number
# from STDIN
# several numbers per line

@nums = ();
while( <STDIN> ) {
     @nums = (@nums, split );
}
@nums = sort @nums;
if($#nums % 2) {
     $median = ($nums[($#nums - 1)/2]                  +
    $nums[($#nums + 1)/2])/2;
}
else {
     $median = $nums[$#nums/2];
}

print "the median is $median\n";



                                        Perl notes 7
                 Output?
#!/usr/bin/perl -w
@stuff = ("one", "two", "three");
print @stuff, "\n";
$stuff = ("one", "two", "three");
print $stuff, "\n";
$stuff = @stuff;
print $stuff, "\n";



onetwothree8
three
3




                                    Perl notes 8
       Pattern Matching
m//
s///

Modifiers
• i case-insensitive
• m multiple lines
• s single line
• x extend




                       Perl notes 9
 Regular Expressions
Code Meaning
\w     Alphanumeric Characters
\W     Non-Alphanumeric Characters
\s White Space
\S Non-White Space
\d Digits
\D Non-Digits
\b Word Boundary
\B Non-Word Boundary
\A ^   At the Beginning of a String
\Z $   At the End of a String
. Match Any Single Character




                                      Perl notes 10
  Regular Expressions
*        Zero or More Occurrences
?        Zero or One Occurrence
+ One or More Occurrences
{N}      Exactly N Occurrences
{ N,M } Between N and M
         Occurrences
.* <thingy>       Greedy Match, up to the
         last thingy
.*? <thingy>      Non-Greedy Match,
         up to the first thingy
[ set_of_things ] Match Any Item in       the Set
[ ^ set_of_things ]         Does Not
         Match Anything in the Set
( some_expression )         Tag an
         Expression
$1..$N Tagged Expressions used
        in Substitutions




                                         Perl notes 11
               Rules
• Rule 1
  – The engine tries to match as far left
    as it can
• Rule 2
  – The regular expression is regarded
    as set of alternatives. Tries them left
    to right. (see page 61)
• Rule 3
  – Items that have choices match from
    left to right
      /x*y*/
• Rule 4
  – Assertions
  – ^ $ \b \B \A \Z \G (?…) (?!…)



                                  Perl notes 12
              Rules
• Rule 5
  – A quantified atom matches only if
    the atom itself matches some
    number of times allowed by the
    quantifier

  Maximal   minimal
  {n,m}     {n,m}?
  {n,}      {n,}?        At least n
  {n}       {n}?         Exactly n
  *         *?           0 or more
  +         +?           1 or more
  ?         ??           0 or 1




                                Perl notes 13
             Rules
• Rule 6
  – Each atom matches according to its
    type
  – (…) ==> grouping + storage $1, $2
  – . matches any char except \n
  – […] groups
  – Special characters \a \n \r …
  – \1 \2 ... backreference to (…)
  – \033 octal char
  – \xf7 hex char
  – \cD control char
  – any other \ matches the char itself




                               Perl notes 14
            precedence
•   () (?: )
•   Repetition
•   Sequence
•   | alteration

Pattern          strings
/ab*c/           abc, ac, ababd, abbbc
/abc*/           a, ab, abc, abccc, abcabc
/(abc)*/         abc, abcc. empty s tring, abcabc
/ed|jo/          ed, jo, edo, ejo
/(ed)|(jo)/      ed, jo, edo, ejo
/ed|jo{1,3}/     ed, jo, edo, ejo, joo, jooooo
/ed|jo{1,3}?/    ed, jo, edo, ejo, joo, jooooo
/^ed|jo$/        fred and joe, ed jo, fred jo, jo
/^(ed|jo)$/      fred and joe, ed jo, fred jo, jo
$pat = ‘ bob’;   pat, bob, bobbobbob, bobbb, patt
/$pat{3}/
$pat = ‘ bob’;   pat, bob, bobbobbob, bobbb, patt
/($pat){3}/




                                          Perl notes 15
Pattern    strings
/\w+/      Greetings, planet   earth!
/\w*/      Greetings, planet   earth!
/n[et]*/   Greetings, planet   earth!
/n[et]+/   Greetings, planet   earth!
/G.*t/     Greetings, planet   earth!
/(‘.*’)/   this ‘test’ isn’t   good



• How do you fix it?
  /(„[^‟]‟*‟)/




                                Perl notes 16
             Examples
s/^([^ ]) +([^ ]+)/$2 $1/

/(\w+)\s*=\s*\1/

/.{40,}/

/^((\d+\.?\d*|\.\d+)$/

if (/Time: (..):(..):(..)/){
    $hours = $1;
    $minutes = $2;
    $seconds = $3;
}

                               Perl notes 17
    Default arguments
• $_, @_, @ARGV, STDIN

sub foo{
  my $x = shift; # @_ default


• in the main program @ARGV
while($_ = shift) {
  if(/^-(.*)/){
        process_optein($1);
  } else {
        process_file($_);
  }
}



                                Perl notes 18
    Reading a stream
open FIN, “myfile” or die;
while (<FIN>){
  # do something with $_
}

foreach (<FIN>){
   # do something with $_
}

print sort <FIN>;




                             Perl notes 19
        Reading a stream
# print a window
@f = <FIN>;
foreach ( 0..$#f ) {
   if[$[$_] =~ /\bShazam\b/){
         $lo = ($_ > 0)? $_ -1 : $_;
         $hi = ($_ < $#f) )? $_ +1 : $_;
         print map{“$_: $f[$_]”} $lo .. $hi;
   }
}




                                               Perl notes 20
                 Sorting
• sort numerically

sub numerically { $a <=> $b }
@list = sort numerically
   (16, 1, 8, 2, 4, 32);

or
@list = sort { $a <=> $b }
   (16, 1, 8, 2, 4, 32);

@list = sort{uc($a) cmp uc($b)}
  qw(this is a test);

#reverse
@list = sort { $b <=> $a }
   (16, 1, 8, 2, 4, 32);



                                  Perl notes 21
                           example
#! /usr/bin/perl -w
# This script will count the frequency of distinct words
# in the file that is given as an argument.
# Warning: Error checking is minimal!

die "usage: $0 file\n" unless @ARGV;
while(<>){
      tr/A-Z/a-z/;           # translate to lowercase
      @w = split(/[\W]+/,$_); # split into words
      foreach (@w){
            $list{$_}++; # increment the counter
      }
}
foreach $key (sort {$list{$b} <=> $list{$a}} keys %list) {
      print $key, ' = ', $list{$key}, "\n";
}




                                                             Perl notes 22
              Tokenizing
# tokenize an arithmetic expression
while($_){
   if(/^(\d+)/) {
          push @tok, „num‟, $1;
   } elsif(/^([+\-\/*()])/) {
          push @tok, „punct‟, $1;
   } elsif (/^([\d\D])/) {
          die “invalid char $1 in input”;
   }
   $_ = substr($_, length $1);
}



• substr slows things down
    – cut start of string




                                            Perl notes 23
           Tokenizing 2
while(/
  (\d+) |
  ([+\-\/*()]) |
  ([\d\D])/gx) {
  if($1 ne “”){
         push @tok, „num‟, $1;
  }elsif ($2 ne “”) {
         push @tok, „punct‟, $2;
  }else {
         die “invalid char $3 in input”;
  }
}




                                           Perl notes 24
             Tokenizing 3

{
    if(/\G(\d+)/gc) {
           push @tok, „num‟, $1;
    } elsif(/\G([+\-\/*()])/gc) {
           push @tok, „punct‟, $1;
    } elsif (/\G([\d\D])/gc) {
           die “invalid char $1 in input”;
    }else{
           last;
    }
    redo;
}




                                             Perl notes 25
      Use split for clarity
($a, $b, $c) =
   /^(\S+)\s+(\S+)\s+(\S+)/;

($a, $b, $c) = split /\s+/, $_;
($a, $b, $c) = split;

Get the fifth field:

($a) =
/[^:]*:[^:]*:[^:]*:[^:]*:([^:]*)/;

or
($a) = /(?:[^:]*:){4}([^:]*)/;

or
($a) = (split /:/)[4];




                                     Perl notes 26
                             unpac
ps l
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY            TIME COMMAND
100 1216 30562 30561 7 0 2804 1768 rt_sig S pts/2  0:00 -tcsh
000 1216 30658 30562 10 0 2780 1080 -    R pts/2  0:00 ps l




chomp (@ps = `ps l`);

shift @ps;
for(@ps){
  ($uid, $pid, $sz, $tt) =
    unpack '@3 A6 @9 A7 @30 A5 @52 A7', $_;
  print "$uid, $pid, $sz, $tt\n";
}




                                                                  Perl notes 27
 Avoid regex for simple strings

do_it() if $answer eq „yes‟;

do_it() if $answer =~ /^yes$/;

do_it() if $answer =~ /yes/;

do_it() if lc($answer) eq „yes‟;

do_it() if $answer =~ /^yes$/i;




                                   Perl notes 28
#!/usr/bin/perl
# remove the comments from a C program
$filename = shift or die "usage $0 filename\n";
open FIN, $filename or die "can't open file";
while (<FIN>){
  for(split m!("(:?\\\W|.)*?"|/\*|\*/)!){
     if($in_comment){
        $in_comment = 0 if $_ eq "*/";
     } else {
        if ($_ eq "/*") {
           $in_comment = 1;
           print " ";
        } else {
           print;
        }
     }
  }
  print "\n";
}
                                              Perl notes 29
            References
$a = 3.1416;
$scalar_ref = \$a;
$array_ref = \@a;
$hash_ref = \%a;
$array_el_ref = \$a[3];
$hash_el_ref = \$a{„John‟};




                              Perl notes 30
            Lists of Lists
@LoL = (
   [“fred”, “barney” ],
   [“george”, “jane”, “elroy” ],
   [“homer”, “marge”, “bart” ],
);

print $LoL[2][2]; # prints “bart”

$ref_to_LoL = [
   [“fred”, “barney” ],
   [“george”, “jane”, “elroy” ],
   [“homer”, “marge”, “bart” ],
];

print $ref_to_LoL ->[2][2];


• Note:
$LoL[2][2] implies $LoL[2]->[2]

                                    Perl notes 31
       Grow your own
while(<>){
  @tmp = split;
  push @LoL, [ @tmp ];
}




                         Perl notes 32
       Hashes of Arrays
%HoL = (
   flinstones => [“fred”, “barney” ],
   jetsons => [“george”, “jane”, “elroy” ],
   simpsons => [“homer”, “marge”, “bart” ],
);


• generation
# reading from a file with format:
# flistones: fred barney ..
while(<>){
     next unless s/^(.*?):\s*//;
     $HoL{$1} = [ split ];
}


• or
while($line = <>){
   ($who, $rest) = split /:\s*/, 2;
   @fields = split „ „, $rest;
   $Hol{$who} = [ @fields ];
}
                                              Perl notes 33
       Hashes of Arrays
# calling a function
for $group (flinstones, jetsons, simpsons) { %HoL($group) =
    [ get_family($group) ];
);




# append member to existing family
push @{ $HoL{flinstones} }, “wilma”, “betty”;


• access
$HoL{flinstone}[0] = “fred”;




                                                Perl notes 34
Packages, Modules, and Object
          Classes




                        Perl notes 35

				
DOCUMENT INFO