# Parsing Algorithms

Document Sample

```					          CSA350: NLP Algorithms

Sentence Parsing 2
• Top Down
• Bottom-Up
• Left Corner
• BUP
• Implementation in Prolog

November 2004          csa3050: Sentence Parsing II   1
Sources
• Jurafsky & Martin Chapter 10
• Covington Chapter 6

November 2004   csa3050: Sentence Parsing II   2
Derivation
top down,
left-to-
right,
depth first

November 2004   3
Bottom Up Filtering
• We know the current input word must
serve as the first word in the derivation of
the unexpanded node the parser is
currently processing.
• Therefore the parser should not consider
grammar rule for which the current word
cannot serve as the "left corner"
• The left corner is the first preterminal
node along the left edge of a derivation.
November 2004       csa3050: Sentence Parsing II   4
Left Corner

fl                     fl

The node marked Verb is a left corner of VP
November 2004     csa3050: Sentence Parsing II        5
Left Corner
• B is a left corner of A iff
A * Bα
for non-terminal A, pre-terminal B and
symbol string α.
• Possible left corners of all non-terminal
categories can be determined in advance
and placed in a table.

November 2004   csa3050: Sentence Parsing II   6
Left Corner (Operational Definition)
• A nonterminal B is a left-corner of another
nonterminal A if:
– A=B (reflexive case);
– there exists a rule A → Bα for non-terminal A,
pre-terminal B and symbol string α;
(immediate case)
– there exists a rule C such that A → Cα and B
is a left-corner of C (transitive case).

November 2004      csa3050: Sentence Parsing II        7
DCG-style Grammar/Lexicon
s        -->    np, vp.                   What are the left
s        -->    aux, np, vp.
s        -->    vp.
corners of S?
np       -->    det nom.
nom      -->    noun.
nom      -->    noun, nom.
nom      -->    nom, pp
pp       -->    prep, np.
np       -->    pn.
vp       -->    v.
vp       -->    v np

November 2004         csa3050: Sentence Parsing II            8
Example of Left Corner Table

Category                   Left Corners
S               Det, Proper-Noun, Aux, Verb
NP              Det, Proper-Noun
Nominal         Noun
VP              Verb

November 2004       csa3050: Sentence Parsing II   9
How to use the Left Corner Table
• If attempting to parse category A,
only consider rules A → Bα
for which
category(current input)  LeftCorners(B)

S → NP VP
S → Aux NP VP
S → VP

November 2004   csa3050: Sentence Parsing II   10
Prolog Implementations
•   Top Down: depth first recursive descent
•   Bottom Up: shift/reduce
•   Left Corner
•   BUP

November 2004    csa3050: Sentence Parsing II   11
Top Down Implementation
in Prolog
• Parser takes form of predicate parse(C,S1,S) :
parse a constitutent C starting with input string
S1 and ending with input string S.
?- parse(s,[the,dog,barked],[]).
• If C is a pre-terminal category, check, use
lexicon to determine that current input word has
that category.
• Otherwise expand C using grammar rules and
parse rhs constitutents.

November 2004      csa3050: Sentence Parsing II       12
Recoding the Grammar/Lexicon
% Grammar                       % Lexicon
rule(s,[np,vp]).                word(d,the).
rule(np,[d,n]).                 word(n,dog).
rule(vp,[v]).                   word(n,cat).
rule(vp,[v,np]).                word(n,dogs).
word(n,cats).
word(v,chase).
word(v,chases).

November 2004   csa3050: Sentence Parsing II      13
Top Down Parser
parse(C,[Word|S],S) :-
word(C,Word).

parse(C,S1,S) :-
rule(C,Cs),
parse_list(Cs,S1,S).

parse_list([],S,S).
parse_list([C|Cs],S1,S) :-
parse(C,S1,S2),
parse_list(Cs,S2,S).
November 2004      csa3050: Sentence Parsing II   14
Shift/Reduce Algorithm
• Two data structures
– input string
– stack
• Repeat until input is exhausted
– Shift word to stack
– Reduce stack using grammar and lexicon until no
further reductions
• Unlike top down, algorithm does not require
category to be specified in advance. It simply
finds all possible trees.
November 2004        csa3050: Sentence Parsing II       15
Shift/Reduce Operation
→|
Step            Action             Stack                 Input
0               (start)                                  the dog barked
1               shift              the                   dog barked
2               reduce             d                     dog barked
3               shift              dog d                 barked
4               reduce             nd                    barked
5               reduce             np                    barked
6               shift              barked np
7               reduce             v np
8               reduce             vp np
9               reduce             s
November 2004             csa3050: Sentence Parsing II                    16
Shift/Reduce Implementation
in Prolog
parse(S,Res) :-                   reduce(Stk,RedStk) :-
sr(S,[],Res).                    brule(Stk,Stk2),
reduce(Stk2,RedStk).
sr(S,Stk,Res) :-                  reduce(Stk,Stk).
shift(Stk,S,NewStk,S1),
reduce(NewStk,RedStk),           %grammar
sr(S1,RedStk,Res).               brule([vp,np|X],[s|X]).
sr([],Res,Res).                   brule([n,d|X],[np|X]).
brule([np,v|X],[vp|X]).
shift(X,[H|Y],[H|X],Y).           brule([np,v|X],[vp|X]).

%interface to lexicon
brule([Word|X],[C|X]) :-
word(C,Word).

November 2004     csa3050: Sentence Parsing II              17
Shift/Reduce Operation
• Words are shifted to the beginning of the stack, which
ends up in reverse order.
• The reduce step is simplified if we also store the rules
backward, so that the rule s → np vp is stored as the fact
brule([vp,np|X],[s|X]).

• The term [a,b|X] matches any list whose first and second
elements are a and b respectively.
• The first argument directly matches the stack to which
this rule applies
• The second argument is what the stack becomes after
reduction.

November 2004        csa3050: Sentence Parsing II         18
Left Corner Parsing
• Key Idea: accept a word, identify the constituent
it marks the beginning of, and parse the rest of
the constituent top down.
– Like a bottom-up parser, can handle left recursion
without looping, since it starts each constituent by
accepting a word from the input string.
– Like a top-down parser, is always expecting a
particular category for which only a few of the
grammar rules are relevant. It is therefore more
efficient than a plain shift-reduce algorithm.

November 2004         csa3050: Sentence Parsing II           19
Left Corner Algorithm
To parse a constituent of type C:
1. Accept a word W from input and determine K,
its category.
2. Complete C:
– If K=C, exit with success; otherwise
– Find a constituent whose expansion begins with K.
Call that CC. For instance, if K=d (determiner), CC
could be Np, since we have rule(np,[d,n])
– Recursively left-corner parse all the remaining
elements of the expansion of CC (in this case, [n]).
– Put CC in place of K, and return to step 2
November 2004         csa3050: Sentence Parsing II           20
Left Corner Implementation
parse(C,[W|Rest],P) :-
word(K,W),
complete(K,C,Rest,P).

parse_list([],P,P).
parse_list(([C|Cs],P1,P) :-
parse(C,P1,P2),
parse_list(Cs,P2,P).

complete(C,C,P,P).                % if C=W, do nothing
complete(K,C,P1,P) :-
rule(CC,[K|Rest]),
parse_list(Rest,P1,P2),
complete(CC,C,P2,P).

November 2004       csa3050: Sentence Parsing II         21
Trace of Left Corner
Call:   (    7)   parse(np, [the, cat], []) ? creep
Call:   (    8)   word(_L128, the) ? creep
Exit:   (    8)   word(d, the) ? creep
Call:   (    8)   complete(d, np, [cat], []) ? creep
Call:   (    9)   rule(_L153, [d|_G306]) ? creep
Exit:   (    9)   rule(np, [d, n]) ? creep
Call:   (    9)   parse_list([n], [cat], _L155) ? creep
Call:   (   10)   parse(n, [cat], _L181) ? creep
Call:   (   11)   word(_L196, cat) ? creep
Exit:   (   11)   word(n, cat) ? creep
Call:   (   11)   complete(n, n, [], _L181) ? creep
Exit:   (   11)   complete(n, n, [], []) ? creep
Exit:   (   10)   parse(n, [cat], []) ? creep
Call:   (   10)   parse_list([], [], _L155) ? creep
Exit:   (   10)   parse_list([], [], []) ? creep
Exit:   (    9)   parse_list([n], [cat], []) ? creep
Call:   (    9)   complete(np, np, [], []) ? creep
Exit:   (    9)   complete(np, np, [], []) ? creep
Exit:   (    8)   complete(d, np, [cat], []) ? creep
Exit:   (    7)   parse(np, [the, cat], []) ? creep

November 2004                  csa3050: Sentence Parsing II   22
BUP: Bottom Up Parser
(Matsumoto et. al. 1983)
• Each PS rule goes into Prolog as a clause whose head
is not the mother node but the leftmost daughter. The
rule np → d n pp is translated as:
d(C,S1,S) :-
parse(n,s1,s2),
parse(pp,S2,S3),
np(C,S3,S).
• i.e. if you have just completed a d, parse an n, then a pp,
then call the procedure for a completed np.
• In addition to a clause for each PS rule, BUP needs a
terminating clause for every kind of constitutent, e.g.
np(np,S,S).
• i.e. if you have just accepted an np and np is what you
are looking for, you are done.

November 2004         csa3050: Sentence Parsing II          23
BUP - Remarks
• BUP is efficient because the hard part of
the search – what to do with a newly
completed leftmost daughter – is handled
by Prolog’s fastest search mechanism –
finding a clause given the predicate.

November 2004     csa3050: Sentence Parsing II   24
BUP Implementation - Parser
% parse(+C,+S1,-S)
% Parse a constituent of category C
% starting with input string S1 and
% ending up with input string S.

parse(C,S1,S) :-
word(W,S1,S2),
P =.. [W,C,S2,S],
call(P).

November 2004     csa3050: Sentence Parsing II   25
BUP Implementation - Rules
% PS-rules and terminating clauses

np(C,S1,S) :- parse(vp,S1,S2), s(C,S2,S). % S --> NP VP
np(C,S1,S) :- parse(conj,S1,S2),
parse(np,S2,S3), np(C,S3,S). % NP --> NP Conj NP
np(np,X,X).

d(C,S1,S) :- parse(n,S1,S2), np(C,S2,S).      % NP --> D N
d(d,X,X).

v(C,S1,S) :- parse(np,S1,S2), vp(C,S2,S). % VP --> V NP
v(C,S1,S) :- parse(np,S1,S2), parse(pp,S2,S3),
vp(C,S3,S).                   % VP --> V NP PP
v(v,X,X).

p(C,S1,S) :- parse(np,S1,S2), pp(C,S2,S).     % PP --> P NP
p(p,X,X).

% Terminating clauses for all other categories
s(s,X,X). vp(vp,X,X). pp(pp,X,X). n(n,X,X). conj(conj,X,X).

November 2004             csa3050: Sentence Parsing II           26
BUP Implementation - Lexicon
% Lexicon

word(conj,[and|X],X).
word(p,[near|X],X).
word(d,[the|X],X).

word(n,[dog|X],X).           word(n,[dogs|X],X).
word(n,[cat|X],X).           word(n,[cats|X],X).
word(n,[elephant|X],X).      word(n,[elephants|X],X).

word(v,[chase|X],X).         word(v,[chases|X],X).
word(v,[see|X],X).           word(v,[sees|X],X).
word(v,[amuse|X],X).         word(v,[amuses|X],X).

November 2004     csa3050: Sentence Parsing II          27
Principles for success
• Left recursive structures must be found, not
predicted
• An alternative way to fix things is to tranform the
grammar into an equivalent grammar.
– Grammar transformations can fix both left-recursion
and epsilon productions
– But then you parse the same language but with
different trees

November 2004         csa3050: Sentence Parsing II          28

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 24 posted: 8/3/2011 language: English pages: 28