Docstoc

Introduction to Perl Part II

Document Sample
Introduction to Perl Part II Powered By Docstoc
					 Introduction to Perl

         Part II

By: Bridget Thomson McInnes

      22 January 2004
                   File Handlers
 Very simple compared to C/ C++ !!!
 Are not prefixed with a symbol ($, @, %, ect)


   Opening a File:
    open (SRC, “my_file.txt”);


   Reading from a File
    $line = <SRC>; # reads upto a newline character


   Closing a File
    close (SRC);
           File Handlers cont...
   Opening a file for output:
      open (DST, “>my_file.txt”);
 Opening a file for appending
    open (DST, “>>my_file.txt”);
 Writing to a file:
    print DST “Printing my first line.\n”;

   Safeguarding against opening a non existent
    file
    open (SRC, “file.txt”) || die “Could not open file.\n”;
            File Test Operators
   Check to see if a file exists:

    if ( -e “file.txt”) {
          # The file exists!
    }

   Other file test operators:
    -r readable
    -x executable
    -d is a directory
    -T is a text file
       Quick Program with File
              Handles
   Program to copy a file to a destination file

    #!/usr/local/bin/perl -w
    open(SRC, “file.txt”) || die “Could not open source
       file.\n”;
    open(DST< “>newfile.txt”);
    while ( $line = <SRC> ) {
           print DST $line;
    }
    close SRC;
    close DST;
     Some Default File Handles
   STDIN : Standard Input
    $line = <STDIN>; # takes input from stdin


   STDOUT : Standard output
    print STDOUT “File handling in Perl is sweet!\n”;


   STDERR : Standard Error
    print STDERR “Error!!\n”;
          The <> File Handle
   The “empty” file handle takes the command
    line file(s) or STDIN;
    – $line = <>;


 If program is run ./prog.pl file.txt, this will
  automatically open file.txt and read the first
  line.
 If program is run ./prog.pl file1.txt file2.txt,
  this will first read in file1.txt and then file2.txt
  ... you will not know when one ends and the
  other begins.
     The <> File Handle cont...
   If program is run ./prog.pl, the program will
    wait for you to enter text at the prompt, and
    will continue until you enter the EOF character

    – CTRL-D in UNIX
Example Program with STDIN
   Suppose you want to determine if you are one
    of the three stooges

    #!/usr/local/bin/perl
    %stooges = (larry => 1, moe => 1, curly => 1 );
    print “Enter your name: ? “;
    $name = <STDIN>; chomp $name;
    if($stooges{lc($name)}) {
          print “You are one of the Three Stooges!!\n”;
    } else {
           print “Sorry, you are not a Stooge!!\n”;
    }
               Chomp and Chop
   Chomp : function that deletes a trailing newline
    from the end of a string.
         $line = “this is the first line of text\n”;
         chomp $line; # removes the new line
          character
         print $line;    # prints “this is the first line of
                           # text” without returning
   Chop : function that chops off the last character
    of a string.
         $line = “this is the first line of text”;
         chop $line;
         print $line;     #prints “this is the first line of
          tex”
           Regular Expressions
   What are Regular Expressions .. a few
    definitions.
    – Specifies a class of strings that belong to the formal
      / regular languages defined by regular expressions
    – In other words, a formula for matching strings that
      follow a specified pattern.
   Some things you can do with regular
    expressions
    – Parse the text
    – Add and/or replace subsections of text
    – Remove pieces of the text
       Regular Expressions cont..
   A regular expression characterizes a regular
    language

   Examples in UNIX:
    – ls *.c
         Lists all the files in the current directory that are
          postfixed '.c'
    – ls *.txt
         Lists all the files in the current directory that are
          postfixed '.txt'
    Simple Example for ... ? Clarity
   In the simplest form, a regular expression is a
    string of characters that you are looking for

   We want to find all the words that contain the
    string 'ing' in our text.

   The regular expression we would use :
                      /ing/
        Simple Example cont...
   What would are program then look like:

    #!/usr/local/bin/perl
    while(<>) {
        chomp;
        @words = split/ /;
         foreach $word(@words) {
             if($word=~m/ing/) { print “$word\n”; }
         }
    }
     Regular Expressions Types
   Regular expressions are composed of two
    types of characters:
    – Literals
         Normal text characters
         Like what we saw in the previous program
          ( /ing/ )

    – Metacharacters
        special characters
        Add a great deal of flexibility to your search
              Metacharacters
 Match more than just characters
 Match line position
    – ^ start of a line      ( carat )
    – $ end of a line        ( dollar sign )


 Match any characters in a list : [ ... ]
 Example :
    – /[Bb]ridget/        matches Bridget or bridget
    – /Mc[Ii]nnes/        matches McInnes or Mcinnes
    Our Simple Example Revisited
 Now suppose we only want to match words
  that end in 'ing' rather than just contain 'ing'.
 How would we change are regular expressions
  to accomplish this:

     – Previous Regular Expression:
              $word =~m/ ing /

     – New Regular Expression:
             $word=~m/ ing$ /
Ranges of Regular Expressions
 Ranges can be specified in Regular
  Expressions
 Valid Ranges
    –   [A-Z]      Upper Case Roman Alphabet
    –   [a-z]      Lower Case Roman Alphabet
    –   [A-Za-z]   Upper or Lower Case Roman Alphabet
    –   [A-F]      Upper Case A through F Roman
                   Characters
    – [A-z]        Valid but be careful

   Invalid Ranges
    – [a-Z]        Not Valid
    – [F-A]        Not Valid
                 Ranges cont ...
   Ranges of Digits can also be specified
    – [0-9]        Valid
    – [9-0]        Invalid
   Negating Ranges
    – / [^0-9] /
         Match anything except a digit
    – / ^a /
         Match anything except an a
    – / ^[^A-Z] /
         Match anything that starts with something
          other than a single upper case letter
         First ^   : start of line
         Second ^ : negation
     Our Simple Example Again
 Now suppose we want to create a list of all the
  words in our text that do not end in 'ing'
 How would we change are regular expressions
  to accomplish this:

    – Previous Regular Expression:
             $word =~m/ ing$ /

    – New Regular Expression:
            $word=~m/ [^ ing]$ /
         Literal Metacharacters
   Suppose that you actually want to look for all
    strings that equal '^' in your text
    – Use the \ symbol
    – / \^ /     Regular expression to search for
   What does the following Regular Expressions
    Match?
               /[A-Z^]^/

    – Matches any line that contains ( A-Z or ^) followed
      by ^
        Patterns provided in Perl
   Some Patterns
    –   \d   [0–9]
    –   \w   [a – z A – z 0 – 9 _ ]
    –   \s   [ \r \t \n \f ]   (white space pattern)
    –   \D   [^ 0 - 9]
    –   \W   [^ a – z A – Z 0 – 9 ]
    –   \S   [^ \r \t \n \f]


   Example :       [ 19\d\d ]
    – Looks for any year in the 1900's
    Using Patterns in our Example
 Commonly words are not separated by just a
  single space but by tabs, returns, ect...
 Let's modify our split function to incorporate
  multiple white space

      #!/usr/local/bin/perl
      while(<>) {
         chomp;
         @words = split/\s+/, $_;
        foreach $word(@words) {
            if($word=~m/ing/) { print “$word\n”;
         }
      }
 Word Boundary Metacharacter
 Regular Expression to match the start or the
  end of a 'word' : \b
 Examples:


    –   / Jeff\b /     Match Jeff but not Jefferson
    –   / Carol\b /       Match Chris but not Caroline
    –   / Rollin\b /   Match Rollin but not Rolling
    –   /\bform /         Match form or formation but not
                          Information
    – /\bform\b/       Match form but neither information
                          nor formation
               DOT Metacharacter
 The DOT Metacharacter, '.' symbolizes any
  character except a new line
 / b . bble/
    – Would possibly return : bobble, babble, bubble
   / . oat/
    – Would possibly return : boat, coat, goat


   Note: remember '.*' usually means a bunch of
    anything, this can be handy but also can have
    hidden ramifications.
           PIPE Metacharacter
   The PIPE Metacharacter is used for alternation

   / Bridget (Thomson | McInnes) /
    – Match Bridget Thomson or Bridget McInnes but
      NOT Bridget Thomson McInnes


   / B | bridget /
    – Match B or bridget


   / ^( B | b ) ridget /
    – Match Bridget or bridget at the beginning of a line
          Our Simple Example
 Now with our example, suppose that we want
  to not only get all words that end in 'ing' but
  also 'ed'.
 How would we change are regular expressions
  to accomplish this:

    – Previous Regular Expression:
             $word =~m/ ing$ /

    – New Regular Expression:
            $word=~m/ (ing|ed)$ /
          The ? Metacharacter
 The metacharacter, ?, indicates that the
  character immediately preceding it occurs zero
  or one time
 Examples:


    – / worl?ds /
         Match either 'worlds' or 'words'

    – / m?ethane /
        Match either 'methane' or 'ethane'
           The * Metacharacter
 The metacharacter, *, indicates that the
  characterer immediately preceding it occurs zero
  or more times
 Example :


    – / ab*c/          Match 'ac', 'abc', 'abbc', 'abbbc' ect...

    – Matches any string that starts with an a, if possibly
      followed by a sequence of b's and ends with a c.


   Sometimes called Kleene's star
        Our Simple Example again
   Now suppose we want to create a list of all
    the words in our text that end in 'ing' or 'ings'
   How would we change are regular expressions
    to accomplish this:

    –    Previous Regular Expression:
              $word =~m/ ing$ /

    –    New Regular Expression:
              $word=~m/ ings?$ /
               Modifying Text
   Match
    – Up to this point, we have seen attempt to match a
      given regular expression
    – Example : $variable =~m/ regex /


   Substitution
    – Takes match one step further : if there is a match,
      then replace it with the given string
    – Example : $variable =~s/ regex / replacement

             $var =~ / Thomson / McInnes /;
             $var =~ / Bridgette / Bridget /;
          Substitution Example
   Suppose when we find all our words that end
    in 'ing' we want to replace the 'ing' with 'ed'.

    #!/usr/local/bin/perl -w
    while(<>) {
        chomp $_;
        @words = split/ \s+/, $_;
         foreach $word(@words) {
             if($word=~s/ing$/ed/) { print
      “$word\n”; }
         }
    }
    Special Variable Modified by a
                Match
   $&
     – Copy of text matched by the regex
   $'
     – A copy of the target text in from of the match
   $`
     – A copy of the target text after the match
   $1, $2, $3, ect
     – The text matched by 1st, 2nd, ect., set of
       parentheses. Note : $0 is not included here
   $+
     – A copy of the highest numbered $1, $2, $3, ect..
     Our Simple Example once
              again
   Now lets revise are program to find all the
    words that end in 'ing' without splitting our line
    of text into an array of words

    #!/usr/local/bin/perl -w
    while(<>) {
        chomp $_;
        if($_=~/([A-Za-z]*ing\b)/) { print "$&\n"; }
    }
                     Example
#!/usr/local/bin
$exp = <STDIN>; chomp $exp;
if($exp=~/^([A-Za-z+\s)*\bcrave\b(\s[A-Za-z]+)*/) {
   print “$1\n”;
   print “$2\n”;
}
– Run Program with string : I crave to rule the world!
– Results:
   I
    to rule the world!
                     Example
#!/usr/local/bin
$exp = <STDIN>; chomp $exp;
if($exp=~/\bcrave\b/) {
   print “$`\n”; print “$&\n”; print “$’\n”;
}
– Run Program with string : I crave to rule the world!
– Results:
   I
    crave
    to rule the world!
Thank you 

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:7/28/2012
language:
pages:37