Docstoc

Parsing Algorithms

Document Sample
Parsing Algorithms Powered By Docstoc
					          CSA350: NLP Algorithms

                     Sentence Parsing 2
         • Top Down
         • Bottom-Up
         • Left Corner
         • BUP
         • Implementation in Prolog


November 2004          csa3050: Sentence Parsing II   1
                 Sources
• Jurafsky & Martin Chapter 10
• Covington Chapter 6




November 2004   csa3050: Sentence Parsing II   2
Derivation
top down,
  left-to-
   right,
depth first



November 2004   3
                Bottom Up Filtering
• We know the current input word must
  serve as the first word in the derivation of
  the unexpanded node the parser is
  currently processing.
• Therefore the parser should not consider
  grammar rule for which the current word
  cannot serve as the "left corner"
• The left corner is the first preterminal
  node along the left edge of a derivation.
November 2004       csa3050: Sentence Parsing II   4
                 Left Corner




                           fl                     fl


The node marked Verb is a left corner of VP
 November 2004     csa3050: Sentence Parsing II        5
                Left Corner
• B is a left corner of A iff
  A * Bα
  for non-terminal A, pre-terminal B and
  symbol string α.
• Possible left corners of all non-terminal
  categories can be determined in advance
  and placed in a table.


November 2004   csa3050: Sentence Parsing II   6
Left Corner (Operational Definition)
• A nonterminal B is a left-corner of another
  nonterminal A if:
    – A=B (reflexive case);
    – there exists a rule A → Bα for non-terminal A,
      pre-terminal B and symbol string α;
      (immediate case)
    – there exists a rule C such that A → Cα and B
      is a left-corner of C (transitive case).


November 2004      csa3050: Sentence Parsing II        7
    DCG-style Grammar/Lexicon
s        -->    np, vp.                   What are the left
s        -->    aux, np, vp.
s        -->    vp.
                                          corners of S?
np       -->    det nom.
nom      -->    noun.
nom      -->    noun, nom.
nom      -->    nom, pp
pp       -->    prep, np.
np       -->    pn.
vp       -->    v.
vp       -->    v np


November 2004         csa3050: Sentence Parsing II            8
   Example of Left Corner Table

 Category                   Left Corners
S               Det, Proper-Noun, Aux, Verb
NP              Det, Proper-Noun
Nominal         Noun
VP              Verb




November 2004       csa3050: Sentence Parsing II   9
  How to use the Left Corner Table
• If attempting to parse category A,
  only consider rules A → Bα
  for which
  category(current input)  LeftCorners(B)

   S → NP VP
   S → Aux NP VP
   S → VP

November 2004   csa3050: Sentence Parsing II   10
           Prolog Implementations
•   Top Down: depth first recursive descent
•   Bottom Up: shift/reduce
•   Left Corner
•   BUP




November 2004    csa3050: Sentence Parsing II   11
          Top Down Implementation
                 in Prolog
• Parser takes form of predicate parse(C,S1,S) :
  parse a constitutent C starting with input string
  S1 and ending with input string S.
  ?- parse(s,[the,dog,barked],[]).
• If C is a pre-terminal category, check, use
  lexicon to determine that current input word has
  that category.
• Otherwise expand C using grammar rules and
  parse rhs constitutents.

November 2004      csa3050: Sentence Parsing II       12
Recoding the Grammar/Lexicon
% Grammar                       % Lexicon
rule(s,[np,vp]).                word(d,the).
rule(np,[d,n]).                 word(n,dog).
rule(vp,[v]).                   word(n,cat).
rule(vp,[v,np]).                word(n,dogs).
                                word(n,cats).
                                word(v,chase).
                                word(v,chases).

November 2004   csa3050: Sentence Parsing II      13
                Top Down Parser
parse(C,[Word|S],S) :-
 word(C,Word).

parse(C,S1,S) :-
 rule(C,Cs),
 parse_list(Cs,S1,S).

parse_list([],S,S).
parse_list([C|Cs],S1,S) :-
 parse(C,S1,S2),
  parse_list(Cs,S2,S).
November 2004      csa3050: Sentence Parsing II   14
           Shift/Reduce Algorithm
• Two data structures
    – input string
    – stack
• Repeat until input is exhausted
    – Shift word to stack
    – Reduce stack using grammar and lexicon until no
      further reductions
• Unlike top down, algorithm does not require
  category to be specified in advance. It simply
  finds all possible trees.
November 2004        csa3050: Sentence Parsing II       15
           Shift/Reduce Operation
                                     →|
Step            Action             Stack                 Input
0               (start)                                  the dog barked
1               shift              the                   dog barked
2               reduce             d                     dog barked
3               shift              dog d                 barked
4               reduce             nd                    barked
5               reduce             np                    barked
6               shift              barked np
7               reduce             v np
8               reduce             vp np
9               reduce             s
November 2004             csa3050: Sentence Parsing II                    16
       Shift/Reduce Implementation
                 in Prolog
parse(S,Res) :-                   reduce(Stk,RedStk) :-
  sr(S,[],Res).                    brule(Stk,Stk2),
                                   reduce(Stk2,RedStk).
sr(S,Stk,Res) :-                  reduce(Stk,Stk).
 shift(Stk,S,NewStk,S1),
 reduce(NewStk,RedStk),           %grammar
 sr(S1,RedStk,Res).               brule([vp,np|X],[s|X]).
sr([],Res,Res).                   brule([n,d|X],[np|X]).
                                  brule([np,v|X],[vp|X]).
shift(X,[H|Y],[H|X],Y).           brule([np,v|X],[vp|X]).

                                  %interface to lexicon
                                  brule([Word|X],[C|X]) :-
                                   word(C,Word).

November 2004     csa3050: Sentence Parsing II              17
           Shift/Reduce Operation
• Words are shifted to the beginning of the stack, which
  ends up in reverse order.
• The reduce step is simplified if we also store the rules
  backward, so that the rule s → np vp is stored as the fact
   brule([vp,np|X],[s|X]).

• The term [a,b|X] matches any list whose first and second
  elements are a and b respectively.
• The first argument directly matches the stack to which
  this rule applies
• The second argument is what the stack becomes after
  reduction.

November 2004        csa3050: Sentence Parsing II         18
                Left Corner Parsing
• Key Idea: accept a word, identify the constituent
  it marks the beginning of, and parse the rest of
  the constituent top down.
• Main Advantages:
    – Like a bottom-up parser, can handle left recursion
      without looping, since it starts each constituent by
      accepting a word from the input string.
    – Like a top-down parser, is always expecting a
      particular category for which only a few of the
      grammar rules are relevant. It is therefore more
      efficient than a plain shift-reduce algorithm.

November 2004         csa3050: Sentence Parsing II           19
                Left Corner Algorithm
To parse a constituent of type C:
1. Accept a word W from input and determine K,
   its category.
2. Complete C:
    – If K=C, exit with success; otherwise
    – Find a constituent whose expansion begins with K.
      Call that CC. For instance, if K=d (determiner), CC
      could be Np, since we have rule(np,[d,n])
    – Recursively left-corner parse all the remaining
      elements of the expansion of CC (in this case, [n]).
    – Put CC in place of K, and return to step 2
November 2004         csa3050: Sentence Parsing II           20
      Left Corner Implementation
parse(C,[W|Rest],P) :-
 word(K,W),
 complete(K,C,Rest,P).

parse_list([],P,P).
parse_list(([C|Cs],P1,P) :-
 parse(C,P1,P2),
 parse_list(Cs,P2,P).

complete(C,C,P,P).                % if C=W, do nothing
complete(K,C,P1,P) :-
 rule(CC,[K|Rest]),
 parse_list(Rest,P1,P2),
 complete(CC,C,P2,P).




November 2004       csa3050: Sentence Parsing II         21
                   Trace of Left Corner
Call:   (    7)   parse(np, [the, cat], []) ? creep
Call:   (    8)   word(_L128, the) ? creep
Exit:   (    8)   word(d, the) ? creep
Call:   (    8)   complete(d, np, [cat], []) ? creep
Call:   (    9)   rule(_L153, [d|_G306]) ? creep
Exit:   (    9)   rule(np, [d, n]) ? creep
Call:   (    9)   parse_list([n], [cat], _L155) ? creep
Call:   (   10)   parse(n, [cat], _L181) ? creep
Call:   (   11)   word(_L196, cat) ? creep
Exit:   (   11)   word(n, cat) ? creep
Call:   (   11)   complete(n, n, [], _L181) ? creep
Exit:   (   11)   complete(n, n, [], []) ? creep
Exit:   (   10)   parse(n, [cat], []) ? creep
Call:   (   10)   parse_list([], [], _L155) ? creep
Exit:   (   10)   parse_list([], [], []) ? creep
Exit:   (    9)   parse_list([n], [cat], []) ? creep
Call:   (    9)   complete(np, np, [], []) ? creep
Exit:   (    9)   complete(np, np, [], []) ? creep
Exit:   (    8)   complete(d, np, [cat], []) ? creep
Exit:   (    7)   parse(np, [the, cat], []) ? creep


November 2004                  csa3050: Sentence Parsing II   22
                BUP: Bottom Up Parser
                (Matsumoto et. al. 1983)
• Each PS rule goes into Prolog as a clause whose head
  is not the mother node but the leftmost daughter. The
  rule np → d n pp is translated as:
   d(C,S1,S) :-
     parse(n,s1,s2),
     parse(pp,S2,S3),
     np(C,S3,S).
• i.e. if you have just completed a d, parse an n, then a pp,
  then call the procedure for a completed np.
• In addition to a clause for each PS rule, BUP needs a
  terminating clause for every kind of constitutent, e.g.
   np(np,S,S).
• i.e. if you have just accepted an np and np is what you
  are looking for, you are done.

November 2004         csa3050: Sentence Parsing II          23
                BUP - Remarks
• BUP is efficient because the hard part of
  the search – what to do with a newly
  completed leftmost daughter – is handled
  by Prolog’s fastest search mechanism –
  finding a clause given the predicate.




November 2004     csa3050: Sentence Parsing II   24
   BUP Implementation - Parser
% parse(+C,+S1,-S)
% Parse a constituent of category C
% starting with input string S1 and
% ending up with input string S.

parse(C,S1,S) :-
  word(W,S1,S2),
  P =.. [W,C,S2,S],
  call(P).




November 2004     csa3050: Sentence Parsing II   25
     BUP Implementation - Rules
% PS-rules and terminating clauses

np(C,S1,S) :- parse(vp,S1,S2), s(C,S2,S). % S --> NP VP
np(C,S1,S) :- parse(conj,S1,S2),
              parse(np,S2,S3), np(C,S3,S). % NP --> NP Conj NP
np(np,X,X).

d(C,S1,S) :- parse(n,S1,S2), np(C,S2,S).      % NP --> D N
d(d,X,X).

v(C,S1,S) :- parse(np,S1,S2), vp(C,S2,S). % VP --> V NP
v(C,S1,S) :- parse(np,S1,S2), parse(pp,S2,S3),
             vp(C,S3,S).                   % VP --> V NP PP
v(v,X,X).

p(C,S1,S) :- parse(np,S1,S2), pp(C,S2,S).     % PP --> P NP
p(p,X,X).

% Terminating clauses for all other categories
s(s,X,X). vp(vp,X,X). pp(pp,X,X). n(n,X,X). conj(conj,X,X).


November 2004             csa3050: Sentence Parsing II           26
  BUP Implementation - Lexicon
% Lexicon

word(conj,[and|X],X).
word(p,[near|X],X).
word(d,[the|X],X).

word(n,[dog|X],X).           word(n,[dogs|X],X).
word(n,[cat|X],X).           word(n,[cats|X],X).
word(n,[elephant|X],X).      word(n,[elephants|X],X).

word(v,[chase|X],X).         word(v,[chases|X],X).
word(v,[see|X],X).           word(v,[sees|X],X).
word(v,[amuse|X],X).         word(v,[amuses|X],X).


November 2004     csa3050: Sentence Parsing II          27
                Principles for success
• Left recursive structures must be found, not
  predicted
• Empty categories must be predicted, not found
• An alternative way to fix things is to tranform the
  grammar into an equivalent grammar.
    – Grammar transformations can fix both left-recursion
      and epsilon productions
    – But then you parse the same language but with
      different trees


November 2004         csa3050: Sentence Parsing II          28

				
DOCUMENT INFO