Lecture 7 More basics of Perl
Document Sample


Lecture 7
More basics of Perl
LING 5200
Fall 2003
Kevin Cohen
Operators
$one_line = <IN>
$line_count++;
$total = $price * $amount;
• a symbol
• has syntax and semantics
Functions
chomp($one_line);
print "$total\n";
@compounds = findCompoundNouns($input);
• name is alphabetic
• open class
• arguments, which can be variable in number
Functions
chomp($one_line); arguments
print "$total\n";
@compounds = findCompoundNouns($input);
• name is alphabetic
• arguments, which can be variable in number
• open class
Functions
• split: given a string, convert it into
an array
• delimiters
– sentential input: whitespace
– epw.cd: slashes
– "CSV" (comma-separated values)
Functions
Some functions take variable numbers of arguments
split(/ /);
split(/ /, $input); one argument
two arguments
split(/ /, $input, 5);
three arguments
Functions
Some functions take variable numbers of arguments
Regular expression
split(/ /); specifying delimiter
delimiter
split(/ /, $input);
String to split
split(/ /, $input, 5);
delimiter String to split Limit
Functions
• Using the limit parameter
• epw.cd: get just the ID and
orthographic form
Functions
# input: a sentence
# output: an array of compound nouns
# "compound noun" here means any sequence of two or more
# tokens tagged as nouns.
# assumption: noun tags are NN, NP, and NNS
sub findCompoundNouns {
($sentence) = @_;
# TODO: as written, this only finds one compound
# per sentence--make it catch multiple compounds
# in the same sentence
if ($sentence =~ /\b[^\/]N..?( [^\/]+/N..?\b)+/) {
$compound = $&;
}
return($compound);
} # close subroutine definition findCompoundNouns()
Blocks
• looping
• decisions
• (scope of variables)
while ($term = <DICTIONARY>) {
if ($term =~ /^[A-Z]/) {
}
} # close while-loop
Special variables
• Variables that Perl creates itself
• Either:
– assigned default values by Perl ab initio
– assigned values by Perl during execution
Special variables
• Variables that Perl creates itself
• Either assigned default values by Perl ab
initio...
• …or assigned values by Perl during
execution
$! A scalar variable that contains the last
operating system error message
open (DICTIONARY, "$celex") || die "couldn't open the file
named $celex: $!";
Special variables
• Variables that Perl creates itself
• Either assigned default values by Perl ab
initio...
• …or assigned values by Perl during
execution
– @ARGV an array that contains the command-
line arguments
findNounCompounds.pl /home/kev/GENIA.txt Your command
print "file name: $ARGV[0]\n"; line
file name: /home/kev/GENIA.txt
Special variables
• Variables that Perl creates itself
• Either assigned default values by Perl ab
initio...
• …or assigned values by Perl during
execution
– @ARGV an array that contains the command-
line arguments
findNounCompounds.pl /home/kev/GENIA.txt
print "file name: $ARGV[0]\n";
Your script contains
file name: /home/kev/GENIA.txt this statement
Special variables
• Variables that Perl creates itself
• Either assigned default values by Perl ab
initio...
• …or assigned values by Perl during
execution
– @ARGV an array that contains the command-
line arguments
findNounCompounds.pl /home/kev/GENIA.txt
print "file name: $ARGV[0]\n";
file name: /home/kev/GENIA.txt This output is produced
Special variables
babel>myScript.pl GENIA.txt #!/usr/local/bin/perl
file name: GENIA.txt
babel> $filename = $ARGV[0];
print "file name: $filename\n";
Point/counterpoint
• Bad way • Good way
while (<IN>) { while ($line = <IN>) {
$line = $_; # do whatever...
# do whatever... }
}
Interlude
Point of HW6
• Practical practice with corpus search
tools
HW6:1
• Wildcard
• Tokenization: how is it helpful?
Homework 5: 2
• Point: tokenization
– How the data is split into "chunks"
– Punctuation
– Clitics
• Haven't -> Have n't
• I'll -> I 'll
• Kevin's -> Kevin 's
– Genitives
• Kevin's -> Kevin 's
Homework 5: 4
• Point: POS attribute
• Point: practice searching on
"structural," vs. "positional,"
attributes
– <p>
– <s>
HW6:5
• Sorting
• What are implications of "word" vs.
"lemma" search?
HW6:6
• Dealing with unlemmatized data
• thinking_about (>180)
• .* vs. |
• [Tt]
Homework 6: 6-8,9; 10
• Point: try a couple of actual linguistic
questions
– Corpus searches usually need some post-
processing (filtering)
– Results often probabilistic vs.
deterministic
– You can get out (relatively) easily only
what's been annotated
HW6:7
• Hairy regular expression
• Build it up gradually
HW6:6-8,9
• Gradually building a query
• "Show me all forms of think, guess,
and believe, and separate data in
which it is followed by that from data
in which it is not followed by that"
• Negation
• !=
• ! word =
PP attachment
HW6:6(tgrep)
• VP < NP < PP • VP < (NP < PP)
• "sentences in which • "sentences in which
a VP immediately a VP immediately
dominates both an dominates an NP
NP and a PP" that immediately
• = VP < PP < NP dominates a PP"
• Immediate precedence in tgrep is
complicated
• "A immediately precedes B if A does
not dominate B and B is the next node
in the sentence as enumerated by a
top-down depth-first search." (tgrep
man page)
A
• A . nothing
• B.C
• B .. C B C
• B .. E
• D.C D E
• D .. E
A
• A . nothing
• B.C
• B .. C B C
• B .. E
• D.C D E
• D .. E
A
• A . nothing
• B.C
• B .. C B C
• B .. E
• D.C D E
• D .. E
A
• A . nothing
• B.C
• B .. C B C
• B .. E
• D.C D E
• D .. E
A
• A . nothing
• B.C
• B .. C B C
• B .. E
• D.C D E
• D .. E
Naming variables
Useful array stuff
• Finding out their size
• Looping through/iterating over them
– for-loops
– foreach-loops
– pop/unshift
I'm SOOO frustrated...
#!/usr/local/bin/perl
# this script is for
# printing "hello, world"
# print it
prnt "hello, world\n";
I'm SOOO frustrated...
babel>helloWorld.pl
String found where operator expected at
helloWorld.pl line 7, near "prnt "hello\
, world\n""
(Do you need to predeclare prnt?)
syntax error at helloWorld.pl line 7, near
"prnt "hello, world\n""
Execution of helloWorld.pl aborted due to
compilation errors.
babel>
I'm SOOO frustrated...
#!/usr/local/bin/perl
# this script is for
# printing "hello, world"
# print it
prnt "hello, world\n";
I'm still SO frustrated...
#!/usr/local/bin/perl
# for printing "hello, world"
# print it
print "helo, world\n";
I'm still SO frustrated...
babel>helloWorld.pl
helo, world
babel>
I'm still SO frustrated...
#!/usr/bin/perl
# for printing "hello, world"
# print it
print "hello, world\n";
And still...
#!/usr/local/bin/perl
# for printing "hello, world"
# print it
print "hello, world\n;
And still...
babel>helloWorld.pl
Can't find string terminator '"'
anywhere before EOF at
helloWorld.pl line 6.
babel>
And still...
#!/usr/bin/perl
# for printing "hello, world"
# print it
print "hello, world\n";
Get documents about "