Docstoc

Introduction to Perl

Document Sample
Introduction to Perl Powered By Docstoc
					Introduction to Perl

          Pawel Sirotkin
         28.11-01.12.2008, Riga
Overview
       About programming
           Why Perl?
       How to write, how to run
       Variables
       Operations
       Basic input and output
       Conditionals and loops
       Regular expressions




    2                   Introduction to Perl, NLL Riga 2008, by Pawel
                                                              Sirotkin
About programming
       Working with algorithms
       Program needs to contain exact commands
           (Mostly) not: Go buy some bread
           But: Put on your coat and shoes, open the door, go through it, close
            the door, go down the stairs…
       Has a certain input
       Processes it
       Produces a certain output




    3                         Introduction to Perl, NLL Riga 2008, by Pawel
                                                                    Sirotkin
Why Perl?
       Easy to learn
       Simple syntax
       Good at manipulating text
           Good at dealing with regular expressions




    4                       Introduction to Perl, NLL Riga 2008, by Pawel
                                                                  Sirotkin
How to write a Perl program
       Perl programs can be written in any text editor
           Notepad, vim, even Word…
           Recommended: A simple text editor with syntax highlighting




       Write the program code
       Save the file as xxx.pl
           .pl extension not necessary, but useful


    5                       Introduction to Perl, NLL Riga 2008, by Pawel
                                                                  Sirotkin
What is a Perl program like?

# This *very* simple program prints "Hello World!“

print "Hello World!";




 6                  Introduction to Perl, NLL Riga 2008, by Pawel
                                                          Sirotkin
What is a Perl program like?

# This *very* simple program prints "Hello World!“

print "Hello World!";

       The content of a line after the # is commentary. It is
        ignored by the program
       What are commentaries for, then?
           They are for you, and others who will have to read the code
           Imaging looking at a complex program in a few months and
            trying to figure out what it does
       Write as much commentary as you can


    7                       Introduction to Perl, NLL Riga 2008, by Pawel
                                                                  Sirotkin
What is a Perl program like?

# This *very* simple program prints "Hello World!“

print "Hello World!";

       This is a Perl command
           In this case, for printing text on the screen
       Every command should start at a new line
           Not a Perl requirement, but crucial for readability
       Every command should end with a semicolon;
       Many commands take arguments
           Here: “Hello World!”


    8                        Introduction to Perl, NLL Riga 2008, by Pawel
                                                                   Sirotkin
What to do with the program?
       Perl works from the command line
       Windows: „Start“  „Run…“
       Go to the directory where you saved the program
           E.g.: cd C:\Perl\MyPrograms
       Run the program:
           perl myprogram.pl
       See the results of your labours!




    9                        Introduction to Perl, NLL Riga 2008, by Pawel
                                                                   Sirotkin
Exercise (1)
    Create a folder for your Perl programs
    Open the editor of your choice and write the „Hello
     World“ program
        The command is print „Hello World!“;
        Don„t forget the commentary!
    Save the program
    Run it!
    What happens if you misprint the print command?




    10                   Introduction to Perl, NLL Riga 2008, by Pawel
                                                               Sirotkin
Variables
    The „Hello World“ program always has the same output
        Not a very useful program, as such
    We need to be able to change the output
    Variables are objects that can hold different values




    11                   Introduction to Perl, NLL Riga 2008, by Pawel
                                                               Sirotkin
Defining variables
# We define a variable „a“ and assign it a value of „42“

$a = 42;


    To define a variable, write a dollar sign followed by the
     variable‟s name
        Names should consist of letters, numbers and the underscore
        They should start with a letter
        Variable names are case-sensitive!
            $a and $A are different variables!
        Generally, a variable‟s name should tell you what the variable
         does


    12                       Introduction to Perl, NLL Riga 2008, by Pawel
                                                                   Sirotkin
Defining variables
# We define a variable „a“ and assign it a value of „42“

$a = 42;


    Variables can be assigned values
        String: text (character sequence) in quotes/double quotes
        Numbers
    $a = 42;
    $a = “some text”;




    13                   Introduction to Perl, NLL Riga 2008, by Pawel
                                                               Sirotkin
Changing variables
    Arithmetic operations
        $a = 42 / 2;       # division
        $a = 42 + 5;       # addition
        $a = $b * 2;       # multiplication
        $a = $a - $b;      # subtraction
    Also useful:
        $a += 42;           # the same as $a = $a + 42;
        The same for +, -, /
    String operations
        $a = “some“ . “ text“;          # concatenation
        $a = $a . “ more text“;

    14                   Introduction to Perl, NLL Riga 2008, by Pawel
                                                               Sirotkin
Basic output
    We have already seen an output command
        print “text“;
        print $a;
        print “text $a“;
        print “text “ . $a+$b . “ more text.“;
        Special characters:
            \n – new line
            \t – tabulator




    15                        Introduction to Perl, NLL Riga 2008, by Pawel
                                                                    Sirotkin
Exercise (2)
    Define a variable
    Assign it a value of 15
    Print it
    Double the value
    Print it again
    Define another variable with the string „apples“
    Print both variables
    Change the first variable to its square and the second to
     „pears“
    Print both variables

    16               Introduction to Perl, NLL Riga 2008, by Pawel
                                                           Sirotkin
Basic input

    The <> operator returns input from the standard source
     (usually, the keyboard)
    Syntax:
        $a = <>;
    Don‟t forget to tell the user what he‟s supposed to enter!
    Try the following program:

# This program asks the user for his name and greets him

print "What is your name? ";
$name = <>;
print "Hello $name!";

    17               Introduction to Perl, NLL Riga 2008, by Pawel
                                                           Sirotkin
Input, output and new lines

    As the user input is followed by the [Enter] key, the string
     in $name ends in a new line
    The chomp function deletes the new line at the end of a
     string
    Try the following, modified program:


# This program asks the user for his name and greets him

print "What is your name? ";
$name = <>;
chomp($name);
print "Hello $name!";
    18                Introduction to Perl, NLL Riga 2008, by Pawel
                                                            Sirotkin
Exercise (3)
    Let the user enter the radius of a circle
        Tell him the diameter (2r), circumference (2πr) and area (πr²)
         of the circle
        Try doing this using one variable for each measure
        Try doing this using only one variable




    19                   Introduction to Perl, NLL Riga 2008, by Pawel
                                                               Sirotkin
 If, else

     Until now, the course the program runs is fixed
     The if clause allows us to take different actions in
      different circumstances

# Let„s try out a conditional clause

print "Please enter password: ";
$password = <>;
if ($password == 42) {
           print "Correct password! Welcome.";
} else {
           print "Wrong password! Access denied.";
           }

     20                      Introduction to Perl, NLL Riga 2008, by Pawel
                                                                   Sirotkin
 If, else

     Note: = is the assignment operator, == is the comparison
      operator
     Else is an optional operator triggering if the if condition
      fails
# Let„s try out a conditional clause

print "Please enter password: ";
$password = <>;
if ($password == 42) {
           print "Correct password! Welcome.";
} else {
           print "Wrong password! Access denied.";
           }

     21                      Introduction to Perl, NLL Riga 2008, by Pawel
                                                                   Sirotkin
Exercise (4)
    Try out the password program.
        Why doesn„t it work correctly? Fix it.
        Tell the user if the number he entered is too large or too small
            Hint: The comparison operators you‟ll need are < and >
    Ask the user for a geometrical form (circle or square),
     and then for a radius or side length. Return the area and
     perimeter.




    22                      Introduction to Perl, NLL Riga 2008, by Pawel
                                                                  Sirotkin
 While

     What if we want to do checks until something happens?
     The while loop repeats commands until its criteria are met
         Note: in the example below, $password has no value, so it specifically
          doesn‟t have the value 42



# Now on to a "while" loop
while ($password != 42) {
          print "Access denied.\n";
          print "Please enter password: ";
          $password = <>;
          chomp($password);
}
print "Correct password! Welcome.";
     23                     Introduction to Perl, NLL Riga 2008, by Pawel
                                                                  Sirotkin
Exercise (5)
    Write a small game: take a number, and make the user
     guess it. Tell him if it„s too high or too low. If the user gets
     it right, the program terminates.
        If you like, you can take a random number:
         $random = int (rand(10) );




    24                   Introduction to Perl, NLL Riga 2008, by Pawel
                                                               Sirotkin
 Perl regular expressions

     Regular expressions very useful for text processing
     Perl matching character: =~
         Perl non-matching character: !~
     The regular expression must be in backslashes: /regex/
     The program below accepts any password that contains the
      characters „42“ anywhere
# A "while" loop with regular expressions
while ($password !~ /42/) {            # While the entered line doesn‟t contain “42”
          print "Access denied.\n";
          print "Please enter password: ";
          $password = <>;
          chomp($password);
}
print "Correct password! Welcome.";
     25                    Introduction to Perl, NLL Riga 2008, by Pawel
                                                                 Sirotkin
Perl regular expressions
    Simple string: some text
    One of a number of symbols: [aA]
        Matches a or A
        Also possible: [tT]he, matching the or The
    One of a continuous string of symbols: [a-h][1-8]
        Matches any two-character string from a1 to h8
    Special characters
        ^ matches the beginning of a line
        $ matches the end of a line




    26                   Introduction to Perl, NLL Riga 2008, by Pawel
                                                               Sirotkin
Perl regular expressions
    More special characters
        Wildcard: the dot . Matches any single character
            b.d matches bad, bed, bid, bud…
            Don„t forget: it also matches forbid, badly…
        + matches one or more of the previous character
            re+d matches red and reed (and also reeed and so on!)
         * matches zero or more occurrences of the previous
         character
            bel* matches be, bel and bell (and belll…)
        ? matches zero or one occurrences of the previous character
            soo?n Matches son or soon



    27                       Introduction to Perl, NLL Riga 2008, by Pawel
                                                                   Sirotkin
Perl regular expressions
    Character classes
        \d: digits
            Rule \d+ matches Rule 1, Rule 2, ..., Rule 334...
        \w: “word characters” – letters, digits, _
            \w \w – any two “words” separated by a blank
        \s: any whitespace (blanks, tabs)
            ^\s+\d – any line where the first character is a digit
        Capitalize the symbols to get the opposite
            \S is anything but whitespace, \D are non-digits…




    28                        Introduction to Perl, NLL Riga 2008, by Pawel
                                                                    Sirotkin
Exercise (6)
    Write a program which asks the user for his e-mail
     address.
    Check if the address is syntactically correct.
    Possible rules:
        Must contain an @ character
        At least one symbol before it
        Must contain a dot
        At least two symbols between @ and .
        At least two symbols after .
        No fancy symbols like {§*
    Do you accept addresses with more than one dot?

    29                  Introduction to Perl, NLL Riga 2008, by Pawel
                                                              Sirotkin
 Perl regular expressions

     Switches
         Tell Perl how to deal with the regular expression
          /regex/i: ignore lower/upper case
             /wiebke/i matches Wiebke and wiebke
         s/regex/regex2/: substitute regex with regex2
             $text =~ s/Mark/Euro/
         /regex/g: repeat match until end of the line
# What the //g switch does

$text     = “The meat costs 10 Mark, the fish costs 15 Mark.”;
$text2    = $text1;
$text     =~ s/Mark/Euro/;    # “The meat costs 10 Euro, the fish costs 15 Mark.”
$text2    =~ s/Mark/Euro/g; # “The meat costs 10 Euro, the fish costs 15 Euro.”
     30                      Introduction to Perl, NLL Riga 2008, by Pawel
                                                                   Sirotkin
 Perl regular expressions

     Grouping
         Allows us to use matched string
         /(text)/ matches text and stores it in a variable
             The first group is stored in $1, the second in $2...


# Substitution and grouping

$sum = 0;                                         # initializing the variable with zero
$text = “The meat costs 10 Mark, the fish costs 15 Mark.”
while ($text =~ s/(\d+) Mark/$1 Euro/) {           # numbers-spaces-”Mark”
          $sum = $sum + $1;              # adding amount to $sum value
}
print “Substituted $sum Mark for Euro!”;

     31                        Introduction to Perl, NLL Riga 2008, by Pawel
                                                                     Sirotkin
Reading files
    What if we want to have input from a file, not from the
     user?
    Open file for reading:
        open(INPUT, "<file.ext");
    Read a line:
        $line = <SOURCE>;
        $line = <>;                       # is just a special case




    32                     Introduction to Perl, NLL Riga 2008, by Pawel
                                                                 Sirotkin
Writing files
    What if we want to print to a file, not to the screen?
    Open file for writing:
        open(OUTPUT, “>file.ext");
    Write:
        print OUTPUT “Some text...”;




    33                    Introduction to Perl, NLL Riga 2008, by Pawel
                                                                Sirotkin
 Reading files

     A program for testing e-mail addresses
     Note: If we want to use a special character literally, we need to
      escape it with a backslash
         In strings : "
         In regular expressions: . + * ^ $ and the backslash \ itself

open(INPUT, "<test.txt");
while ($line = <INPUT>) {
           chomp($line);
           if ($line =~ /^.+@..+\...+$/) {          # testing for e-mail: x@xx.xx
                      print "\"$line\" is a valid e-mail address.\n";
           } else {
                      print "E-mail address \" $line\" not valid.\n";
           }
}

     34                       Introduction to Perl, NLL Riga 2008, by Pawel
                                                                    Sirotkin
Exercise (7)
    Make a text file and fill it with a Wikipedia article
        Count the number of definite and indefinite articles
        Count the number of numbers and digits
        Insert a <number!> tag before every number




    35                   Introduction to Perl, NLL Riga 2008, by Pawel
                                                               Sirotkin
Arrays
    Arrays contain lists of variables
    Syntax:
        @days = [“Monday“, “Tuesday“, “Friday“];
        $days[0] = “Saturday“;
        $day = $days[2];
    Useful for storing linear sequences of variables
    Note: @ for whole lists, $ for single variables




    36                   Introduction to Perl, NLL Riga 2008, by Pawel
                                                               Sirotkin
 Arrays

     Useful array commands
         push(@array, “element“);
             Adds a new element to the end of the array
             Creates the array if necessary
         $element = pop(@array);
             Moves the last value of @array to $element


# Trying out arrays

@tags = (“N”, “V”, “Adj”);
$tag1 = pop(@tags);                        # $tag1 is now “Adj”, @tags is (“N”, “V”)
$tag2 = pop(@tags);                        # $tag2 is now “V”, @tags is (“N”)
Push(@tags, „V“, $tag2);                   # @tags is now again (“N”, “V”, “Adj”)
     37                      Introduction to Perl, NLL Riga 2008, by Pawel
                                                                   Sirotkin
Hashes
    Hashes are associative arrays
    They are lists where the elements are not ordered, but
     identified by a „name“
    Syntax:
        %probability = (”verb“, 0.32,
                           “adjective“, 0.02,
                           “adverb“, 0);
        $probability{“noun”} = 0.52;




    38                   Introduction to Perl, NLL Riga 2008, by Pawel
                                                               Sirotkin
Exercise (7)
    What happens if you try to print an array?
        What about a hash?
    What happens if you convert an array into a hash, or the
     other way round?




    39                 Introduction to Perl, NLL Riga 2008, by Pawel
                                                             Sirotkin
Practical: Tokenizer
    Take a Wikipedia article and put it into a text file
        Clean it up if necessary
    Tokenize it!
        We only want one word per line
        Insert a „sentence boundary“ symbol where appropriate
    The output should be another file
    Think about what choices you make and why!




    40                   Introduction to Perl, NLL Riga 2008, by Pawel
                                                               Sirotkin
Practical: Tagger
    Take the POS-annotated corpus from treebank.txt
    Clean and tokenize it
    Count the tag-token probabilities
    Count the transition probabilities
        For the first time, I strongly recommend bigrams
    Apply the Viterbi algorithm and tag an input file of your
     choice!




    41                   Introduction to Perl, NLL Riga 2008, by Pawel
                                                               Sirotkin
Practical: Tagger++
    If it„s still too easy, or if you want a long-term aim:
        Implement smoothing: words can have tags you haven„t seen
         them with, or appear in contexts you never saw them before
        Try to figure out a way to guess the tags for unknown words
         better
        Write a program to train on 9/10 of the corpus, and test it on
         the rest.
            Compare your results to the actual annotations
            Do this 10 times for every 9/10
    Still too easy? Implement trigrams and compare the
     results.


    42                      Introduction to Perl, NLL Riga 2008, by Pawel
                                                                  Sirotkin

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:17
posted:5/3/2011
language:English
pages:42