Lecture 7 More basics of Perl

Document Sample
scope of work template
							    Lecture 7
More basics of Perl
     LING 5200
      Fall 2003
     Kevin Cohen
            Operators
$one_line = <IN>

$line_count++;

$total = $price * $amount;

• a symbol
• has syntax and semantics
               Functions
chomp($one_line);

print "$total\n";

@compounds = findCompoundNouns($input);

• name is alphabetic
• open class
• arguments, which can be variable in number
               Functions
chomp($one_line);        arguments

print "$total\n";

@compounds = findCompoundNouns($input);

• name is alphabetic
• arguments, which can be variable in number
• open class
             Functions
• split: given a string, convert it into
  an array
• delimiters
  – sentential input: whitespace
  – epw.cd: slashes
  – "CSV" (comma-separated values)
                Functions
Some functions take variable numbers of arguments

split(/ /);

split(/ /, $input);              one argument


                                   two arguments
split(/ /, $input, 5);


          three arguments
                 Functions
Some functions take variable numbers of arguments

                                   Regular expression
split(/ /);                        specifying delimiter
                                    delimiter
split(/ /, $input);
                                        String to split

split(/ /, $input, 5);


    delimiter String to split   Limit
             Functions
• Using the limit parameter
• epw.cd: get just the ID and
  orthographic form
                  Functions
# input: a sentence
# output: an array of compound nouns
# "compound noun" here means any sequence of two or more
# tokens tagged as nouns.
# assumption: noun tags are NN, NP, and NNS
sub findCompoundNouns {
    ($sentence) = @_;
    # TODO: as written, this only finds one compound
    # per sentence--make it catch multiple compounds
    # in the same sentence
    if ($sentence =~ /\b[^\/]N..?( [^\/]+/N..?\b)+/) {
        $compound = $&;
    }
    return($compound);

} # close subroutine definition findCompoundNouns()
                Blocks
• looping
• decisions
• (scope of variables)

while ($term = <DICTIONARY>) {
   if ($term =~ /^[A-Z]/) {
   }
} # close while-loop
         Special variables
• Variables that Perl creates itself
• Either:
  – assigned default values by Perl ab initio
  – assigned values by Perl during execution
             Special variables
• Variables that Perl creates itself
• Either assigned default values by Perl ab
  initio...
• …or assigned values by Perl during
  execution
  $! A scalar variable that contains the last
   operating system error message
  open (DICTIONARY, "$celex") || die "couldn't open the file
    named $celex: $!";
             Special variables
• Variables that Perl creates itself
• Either assigned default values by Perl ab
  initio...
• …or assigned values by Perl during
  execution
  – @ARGV an array that contains the command-
    line arguments
  findNounCompounds.pl /home/kev/GENIA.txt   Your command
  print "file name: $ARGV[0]\n";             line
  file name: /home/kev/GENIA.txt
             Special variables
• Variables that Perl creates itself
• Either assigned default values by Perl ab
  initio...
• …or assigned values by Perl during
  execution
  – @ARGV an array that contains the command-
    line arguments
  findNounCompounds.pl /home/kev/GENIA.txt
  print "file name: $ARGV[0]\n";
                                     Your script contains
  file name: /home/kev/GENIA.txt     this statement
             Special variables
• Variables that Perl creates itself
• Either assigned default values by Perl ab
  initio...
• …or assigned values by Perl during
  execution
  – @ARGV an array that contains the command-
    line arguments
  findNounCompounds.pl /home/kev/GENIA.txt
  print "file name: $ARGV[0]\n";
  file name: /home/kev/GENIA.txt     This output is produced
               Special variables
babel>myScript.pl GENIA.txt   #!/usr/local/bin/perl
file name: GENIA.txt
babel>                        $filename = $ARGV[0];
                              print "file name: $filename\n";
       Point/counterpoint
• Bad way            • Good way

while (<IN>) {       while ($line = <IN>) {
  $line = $_;          # do whatever...
  # do whatever...   }
}
Interlude
          Point of HW6
• Practical practice with corpus search
  tools
               HW6:1
• Wildcard
• Tokenization: how is it helpful?
           Homework 5: 2
• Point: tokenization
  – How the data is split into "chunks"
  – Punctuation
  – Clitics
    • Haven't -> Have n't
    • I'll -> I 'll
    • Kevin's -> Kevin 's
  – Genitives
    • Kevin's -> Kevin 's
          Homework 5: 4
• Point: POS attribute
• Point: practice searching on
  "structural," vs. "positional,"
  attributes
  – <p>
  – <s>
               HW6:5
• Sorting
• What are implications of "word" vs.
  "lemma" search?
                HW6:6
•   Dealing with unlemmatized data
•   thinking_about (>180)
•   .* vs. |
•   [Tt]
    Homework 6: 6-8,9; 10
• Point: try a couple of actual linguistic
  questions
  – Corpus searches usually need some post-
    processing (filtering)
  – Results often probabilistic vs.
    deterministic
  – You can get out (relatively) easily only
    what's been annotated
              HW6:7
• Hairy regular expression
• Build it up gradually
            HW6:6-8,9
• Gradually building a query
• "Show me all forms of think, guess,
  and believe, and separate data in
  which it is followed by that from data
  in which it is not followed by that"
• Negation

• !=

• ! word =
PP attachment
           HW6:6(tgrep)
• VP < NP < PP          • VP < (NP < PP)
• "sentences in which   • "sentences in which
  a VP immediately        a VP immediately
  dominates both an       dominates an NP
  NP and a PP"            that immediately
• = VP < PP < NP          dominates a PP"
• Immediate precedence in tgrep is
  complicated
• "A immediately precedes B if A does
  not dominate B and B is the next node
  in the sentence as enumerated by a
  top-down depth-first search." (tgrep
  man page)
                      A
•   A . nothing
•   B.C
•   B .. C        B       C

•   B .. E
•   D.C           D       E
•   D .. E
                      A
•   A . nothing
•   B.C
•   B .. C        B       C

•   B .. E
•   D.C           D       E
•   D .. E
                      A
•   A . nothing
•   B.C
•   B .. C        B       C

•   B .. E
•   D.C           D       E
•   D .. E
                      A
•   A . nothing
•   B.C
•   B .. C        B       C

•   B .. E
•   D.C           D       E
•   D .. E
                      A
•   A . nothing
•   B.C
•   B .. C        B       C

•   B .. E
•   D.C           D       E
•   D .. E
Naming variables
       Useful array stuff
• Finding out their size
• Looping through/iterating over them
  – for-loops
  – foreach-loops
  – pop/unshift
  I'm SOOO frustrated...
#!/usr/local/bin/perl

# this script is for
# printing "hello, world"

# print it
prnt "hello, world\n";
      I'm SOOO frustrated...
babel>helloWorld.pl
String found where operator expected at
  helloWorld.pl line 7, near "prnt "hello\
, world\n""
        (Do you need to predeclare prnt?)
syntax error at helloWorld.pl line 7, near
  "prnt "hello, world\n""
Execution of helloWorld.pl aborted due to
  compilation errors.
babel>
  I'm SOOO frustrated...
#!/usr/local/bin/perl

# this script is for
# printing "hello, world"

# print it
prnt "hello, world\n";
  I'm still SO frustrated...
#!/usr/local/bin/perl

# for printing "hello, world"

# print it
print "helo, world\n";
  I'm still SO frustrated...
babel>helloWorld.pl
helo, world
babel>
  I'm still SO frustrated...
#!/usr/bin/perl

# for printing "hello, world"

# print it
print "hello, world\n";
          And still...
#!/usr/local/bin/perl

# for printing "hello, world"

# print it
print "hello, world\n;
           And still...
babel>helloWorld.pl
Can't find string terminator '"'
 anywhere before EOF at
 helloWorld.pl line 6.
babel>
          And still...
#!/usr/bin/perl

# for printing "hello, world"

# print it
print "hello, world\n";