Document Sample
LINGC SCPSYC 438538 Powered By Docstoc
					LING/C SC/PSYC 438/538

        Lecture 8
      Sandiway Fong
• Homework 6
  – is due
  – please bring them to the front
• Next Tuesday (September 23rd )
• Guest lecture
  – Dr. Ray Tillman
  – US Air Force Research Labs, Mesa AZ

  – Linguistics at the lab
     • Synthetic speech in simulation
     • Natural language processing and dynamic
       Human/Computer interaction during training
                 Today’s Topics
• 3 things:
  – Homework 6 review
  – Ungraded home exercise from last time
  – Section 2.3 from JM: Regular Languages and FSAs
  – Homework for today
     • Install SWI-Prolog
                                Homework Review
Exercise 2.1 from JM
•      By “word”, we mean an alphabetic string separated from other words by whitespace, any relevant punctuation,
       line breaks, and so forth.

1.     the set of all alphabetic strings

•    Assumption:
      –    string must be non-empty: i.e. the empty string is not an alphabetic string

•    Example Perl program template:

      $s = "this is a string 123456789";
      if ($s =~ /([A-Za-z]+)/) {
         print "<$1> Match!\n";                                          Note:
      } else {
        print "Fail!\n";                                                 \w doesn’t work
                                                                         \w = alphanumeric characters plus "_"
•    Example output:
      perl test.prl
      <this> Match!
                  Homework Review
• Note:
   – We can’t say [A-z] given the ASCII character set
   – Every character is encoded as a byte (8 bits) in ASCII

                                Homework Review
•   Let’s test all the words in
     $s = "this is a string 123456789";
    How do we do that?

•   Answer:
     –    use the global (g) modifier in a while loop

•   Example Perl program:

     $s = "this is a string 123456789";

     while ($s =~ /([A-Za-z]+)/g) {
       print "<$1> Match!\n";

•   Output:

     perl test.prl
     <this> Match!
     <is> Match!
     <a> Match!
     <string> Match!
                    Homework Review
2. the set of all lower case alphabetic strings ending in a b

• Example Perl program:

   $s = “Looking for low-carb diet recipes?";

   while ($s =~ /([a-z]*b)/g) {
     print "<$1> Match!\n";

• Output:
   perl test.prl
   <carb> Match!
                            Homework Review
3.    the set of all strings with two consecutive repeated words (e.g., “Humbert
      Humbert” and “the the” but not “the bug” or “the big bug”

•    Assumption:
      – There could be more than one space character between the consecutive words
      – \s+ (where \s = whitespace)

•    Example Perl program:

      $s = "Humbert Humbert saw the the saw";

      while ($s =~ /([A-Za-z]+)\s+\1/g) {
        print "<$1> repeated match!\n";

•    Output:
      perl test.prl
      <Humbert> repeated match!
      <the> repeated match!
                   Homework Review
4.   the set of all strings from the alphabet a, b such that each a is
     immediately preceded by and immediately followed by a b;

• Examples (* denotes string not in the language):
     –   *ab *ba
     –   bab
     –   λ (empty string)
     –   bb
     –   *baba
     –   babab

     – Can be tricky to think of a regular expression directly, there is a better
                    Homework Review
• Draw a FSA and convert it to a RE:

                b                                 b
     > 1            2            3            4
                b        a                b           Animation]

           b+        ( ( ab+)+       )?

           = b+(ab+)* | ε
                 Homework Review
• Example Perl program:

   $s = "ab ba bab bb baba babab";
   while ($s =~ /\b(b+(ab+)*)\b/g) {   doesn’t include
     print "<$1> match!\n";            the empty string
   }                                   case

• Output:
   perl test.prl
   <bab> match!
   <bb> match!
   <babab> match!
                                  Homework Review
5.     all strings that start at the beginning of the line with an integer and that end at the end of the line
       with a word;

•    Build it up piece by piece:
      –    ^                            match: beginning of line, then
      –    \d+                                match: integer (\d = digit), then
      –    \b.*\b                       match: other stuff after integer and before end word
      –    [A-Za-z]+                          match: word
      –    $                                  match: end of line

•    Example Perl program:

      $s = "12 or 24 hour job";

      if ($s =~ /^(\d+)\b.*\b([A-Za-z]+)$/) {
         print "integer is <$1> and end word is <$2>!\n";

•    Output:
      perl test.prl
      integer is <12> and end word is <job>!
              Homework Review
6. all strings that have both the word grotto and the
   word raven in them (but not, e.g., words like grottos
   that merely contain the word grotto);

• Assumption:
   – upper/lower case distinction important here

• Case by case:
   – grotto … raven           \bWORD1\b.*\bWORD2\b
   – raven … grotto           \bWORD2\b.*\bWORD1\b
   – disjunction – vertical bar (|)
                   Homework Review
• By “word”, we mean an alphabetic string separated from other
  words by whitespace, any relevant punctuation, line breaks, and so

7.   write a pattern that places the first word of an English sentence in
     a register. Deal with punctuation.

• Assemble it piece by piece:
     –   ^               beginning of line
     –   [^A-Za-z]*      non-word
     –   ([A-Za-z]+)     first word (placed in $1)
     –   \b                    word boundary
                                (not needed since Perl Re matching is greedy)
         Set-of-states construction
• Converting a NDFSA into a DFSA

             a       b
     >   1       2       3


                             {1,2}       {2}   b   {3}

      Set-of-states construction
• Converting a NDFSA into a DFSA

              a       b
      >   1       2       3


                          {1,3}       {2}   b   {3}
       Regular Languages and FSA
• Formal (constructive) definition of a regular language

 • Correspondence between REs and Regular Languages
    • concatenation (juxtaposition)
    • union            (| also [ ])
    • Kleene closure (*)         = (x+ = xx*)
 • Note:
    • Perl backreferences are too powerful, e.g. L = {ww}
      Regular Languages and FSA
• Other closure properties:

• Not true higher up: e.g. context-free grammars as we’ll see later
           Equivalence: FSA and REs
JM gives one direction only
• Case by case:
    – Empty set
    – Any character from the alphabet
    – Empty string
       Equivalence: FSA and REs
• Concatenation:
  – Link FSA1 to FSA2 using an empty transition
        Equivalence: FSA and REs
• Kleene closure:
   – we’ve already seen an example of this in the homework review
       Equivalence: FSA and REs
• Union:
  – disjunction
• We’re going to take a detour from the textbook
   – Regular Grammars
   – The programming language Prolog has great facilities for
     handling phrase structure grammar rules
• Install SWI-Prolog on your machines
   – Freely available from

• Read about Prolog over the weekend
   – Lots of tutorials etc. available online

Shared By: