Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Parsing by yaofenji

VIEWS: 3 PAGES: 51

									PARSING
Analyzing Linguistic Units

              Task             Formal Mechanism                       Resulting Representation




Morphology    Analyze words    Context            FST   composition   Morphological structure
              into morphemes   dependency
                               rules

Phonology     Analyze words    Context            FST   composition   Phonemic structure
              into phonemes    dependency
                               rules
Syntax        Analyze          Grammars:          PDA   Top-down,     Parse tree, derivation tree
              sentences for    CFGs
              syntactic                                 Bottom-up,
              relations
              between words                             Earley,

                                                        CKY

                                                        parsing



 • Why      should we parse a sentence?
          to detect relations among words
          used to normalize surface syntactic variations.
          invaluable for a number of NLP applications
    Some Concepts

Grammar: A generative device that prescribes a set of valid strings.
Parser: A device that uncovers the sequence of grammar rules that
might have generated the input sentence.
•   Input: Grammar, Sentence
•   Output: parse tree, derivation tree
Recognizer: A device that returns a “yes” if the input string could be
generated by the grammar.
•   Input: Grammar, Sentence
•   Output: boolean
Searching for a Parse

Grammar + rewrite procedure encodes
•   all strings generated by the grammar L(G)
•   all parse trees for each string (s) generated T(G) = U{Ts(G)}
Given an input sentence (I), the set of parse trees is TI (G).
Parsing is searching for TI (G) ⊆ T(G)
Ideally, parser finds the appropriate parse for the sentence.
CFG for Fragment of English

                                                         S

 S  NP VP       VP  V

 S  Aux NP VP   PP -> Prep NP                               VP
 S  VP          N  book | flight | meal | money



 NP  Det Nom    V  book | include | prefer
                                                         V            NP
 NP PropN       Aux  does

 Nom  N Nom     Prep from | to | on

 Nom  N         PropN  Houston | TWA
                                                     Book    Det             Nom
 Nom  Nom PP    Det  that | this | a

 VP  V NP                                                   that            N


                                                                           flight

                                     Bottom-up Parsing              Top-down Parsing
Top-down/Bottom-up Parsing
              Top-down (recursive decent parser)             Bottom-up (shift-reduce parser)


Starts from   S (goal)                                       Words (input)


Algorithm     a. Pick non-terminals                          a. Match sequence of input symbols with the
                                                             RHS of some rule
(Parallel)    b. Pick rules from the grammar to expand the
              non-terminals                                  b. Replace the sequence by the LHS of the
                                                             matching rule
Termination   Success: When the leaves of a tree match the   Success: When “S” is reached
              input
                                                             Failure: No more rewrites possible
              Failure: No more non-terminals to expand in
              any of the trees

Pros/Cons     Pro: Goal-driven, starts with “S”              Pro: Constrained by the input string

              Con: Constructs trees that may not match       Con: Constructs constituents that may not
              input                                          lead to the goal “S”


 • Control strategy -- how to explore search space?
     • Pursuing all parses in parallel or backtrack or …?
     • Which rule to apply next?
     • Which node to expand next?
     • Look at how the Top-down and Bottom-up parsing works on the
     board for “Book that flight”
Top-down, Depth First, Left-to-Right parser


Systematic, incremental expansion of the search space.
•   In contrast to a parallel parser
Start State: (•S,0)
End State: (•,n) n is the length of input to be parsed
Next State Rules
•   (•wj+1b,j)  (•b,j+1)
•   (•Bb,j)  (•gb,j) if Bg (note B is left-most non-terminal)
Agenda: A data structure to keep track of the states to be expanded.
Depth-first expansion, if Agenda is a stack.
Fig 10.7




   CFG
 Left Corners
• Can we help top-down parsers with some bottom-up
  information?
   – Unnecessary states created if there are many Bg rules.
   – If after successive expansions B * w d; and w does not match the
     input, then the series of expansion is wasted.
• The leftmost symbol derivable from B needs to match the input.
   – look ahead to left-corner of the tree
       • B is a left-corner of A if A * B g
       • Build table with left-corners of all non-terminals in grammar and consult before
         applying rule
                                                                 Category Left Corners
                                                                   S        Det, PropN, Aux, V
                                                                  NP           Det, PropN
                                                                 Nom                N
                                                                  VP                V
• At a given point in state expansion (•Bb,j)
   – Pick the rule B C g if left-corner of C matches the input wj+1
    Limitation of Top-down Parsing: Left Recursion


Depth-first search will never terminate if grammar is left recursive (e.g. NP --> NP
PP)


        (  * ,  *  )
                    

Solutions:
• Rewrite the grammar to a weakly equivalent one which is not left-recursive
    NP  NP PP                                  NP  Nom NP’
    NP  Nom PP                                 NP’  PP NP’
    NP  Nom
                                                NP’  e
    – This may make rules unnatural

•   Fix depth of search explicitly
Other book-keeping needed in top-down parsing
•   Memoization for reusing previously parsed substrings
•   Packed representation for parse ambiguity
    Dynamic Programming for Parsing

Memoization:
•   Create table of solutions to sub-problems (e.g. subtrees) as parse proceeds
•   Look up subtrees for each constituent rather than re-parsing
•   Since all parses implicitly stored, all available for later disambiguation
Examples: Cocke-Younger-Kasami (CYK) (1960), Graham-Harrison-Ruzzo (GHR)
(1980) and Earley (1970) algorithms
Earley parser: O(n^3) parser
•   Top-down parser with bottom-up information
•   State: [i, A   • b, j]
    –   j is the position in the string that has been parsed
    –   i is the position in the string where A begins
•   Top-down prediction: S * w1… wi A g
•   Bottom-up completion:  wj+1 … wn * wi … wn
Earley Parser
Data Structure: An n+1 cell array called : Chart
•   For each word position, chart contains set of states representing all partial
    parse trees generated to date.
    –   E.g. chart[0] contains all partial parse trees generated at the beginning of the
        sentence

Chart entries represent three type of constituents:
•   predicted constituents (top-down predictions)
•   in-progress constituents (we’re in the midst of …)
•   completed constituents (we’ve found …)
Progress in parse represented by Dotted Rules
•   Position of • indicates type of constituent
• 0     Book   1   that   2   flight   3
    (0,S  • VP, 0) (predicting VP)
    (1,NP  Det • Nom, 2) (finding NP)
    (0,VP  V NP •, 3) (found VP)
Earley Parser: Parse Success

Final answer is found by looking at last entry in chart
•   If entry resembles (0,S   •, n) then input parsed
    successfully
But … note that chart will also contain a record of all possible
parses of input string, given the grammar -- not just the
successful one(s)
•   Why is this useful?
        Earley Parsing Steps

Start State: (0, S’ •S, 0)
End State: (0, S•, n) n is the input size
Next State Rules
•   Scanner: read input
       (i, A•wj+1b, j)  (i, Awj+1•b, j+1)
•   Predictor: add top-down predictions
       (i, A•Bb, j)  (j, B•g, j) if Bg (note B is left-most non-terminal)
•   Completer: move dot to right when new constituent found
       (i, B•Ab, k) (k, Ag•, j)  (i, BA•b, j)
No backtracking and no states removed: keep complete history of parse
•   Why is this useful?
Earley Parser Steps
                   Scanner               Predictor             Completer

When does it       Applied when          Applied when non-    Applied when dot
apply              terminals are to      terminals are to the reaches the end of a
                   the right of a dot    right of a dot       rule
                   (0, VP  • V NP, 0)   (0, S  • VP ,0)      (1, NP  Det Nom •,
                                                               3)
What chart         New states are        New states are        New states are added
cell is affected   added to the next     added to current      to current cell
                   cell                  cell
What contents      Move the dot over     One new state for     One state for each rule
in the chart       the terminal          each expansion of     “waiting” for the
cell                                     the non-terminal in   constituent such as
                   (0, VP  V • NP, 1)
                                         the grammar
                                                               (0, VP  V • NP, 1)
                                         (0, VP  • V, 0)
                                                               (0, VP  V NP •, 3)
                                         (0, VP  • V NP, 0)
Book that flight (Chart [0])

Seed chart with top-down predictions for S from grammar

 g                   [0,0]   Dummy start state
 S   NP VP           [0,0]   Predictor
 S   Aux NP VP       [0,0]   Predictor
 S   VP              [0,0]   Predictor
 NP   Det Nom        [0,0]   Predictor
 NP   PropN          [0,0]   Predictor
 VP   V              [0,0]   Predictor
 VP   V NP           [0,0]   Predictor
CFG for Fragment of English


S  NP VP         Det  that | this | a
S  Aux NP VP      N  book | flight | meal | money
S  VP             V  book | include | prefer
NP  Det Nom       Aux  does
Nom  N
Nom  N Nom       Prep from | to | on
NP PropN         PropN  Houston |
VP  V            Nom  Nom PP
                  TWA
 VP  V NP        PP  Prep NP
Chart[1]

  V book          [0,1]   Scanner
  VP  V           [0,1]   Completer
  VP  V  NP       [0,1]   Completer
  S  VP           [0,1]   Completer
  NP   Det Nom    [1,1]   Predictor
  NP   PropN      [1,1]   Predictor

 V book  passed to Completer, which finds 2
 states in Chart[0] whose left corner is V and
 adds them to Chart[1], moving dots to right
Retrieving the parses

Augment the Completer to add pointer to prior states it advances as a
field in the current state
•   i.e. what states combined to arrive here?
•   Read the pointers back from the final state
What if the final cell does not have the final state? – Error handling.
•   Is it a total loss? No...
•   Chart contains every constituent and combination of constituents
    possible for the input given the grammar
•   Useful for partial parsing or shallow parsing used in information
    extraction
Alternative Control Strategies


Change Earley top-down strategy to bottom-up or ...
Change to best-first strategy based on the probabilities of
constituents
•   Compute and store probabilities of constituents in the chart
    as you parse
•   Then instead of expanding states in fixed order, allow
    probabilities to control order of expansion
Probabilistic and Lexicalized Parsing
    Probabilistic CFGs

Weighted CFGs
•   Attach weights to rules of CFG
•   Compute weights of derivations
•   Use weights to pick, preferred parses
    –   Utility: Pruning and ordering the search space, disambiguate,
        Language Model for ASR.
Parsing with weighted grammars (like Weighted FA)
•   T* = arg maxT W(T,S)
Probabilistic CFGs are one form of weighted CFGs.
Probability Model

• Rule Probability:
   – Attach probabilities to grammar rules
   – Expansions for a given non-terminal sum to 1
        R1: VP  V         .55
        R2: VP  V NP            .40
        R3: VP  V NP NP         .05
   – Estimate the probabilities from annotated corpora
     P(R1)=counts(R1)/counts(VP)
• Derivation Probability:
   –   Derivation T= {R1…Rn}                  n

   –   Probability of a derivation: P(T )   P( R )
                                             i 1
                                                    i



   –   Most likely probable parse: T *  arg T max P(T )
   –   Probability of a sentence: P( S )   P(T | S )
                                             T
        • Sum over all possible derivations for the sentence
• Note the independence assumption: Parse probability does
  not change based on where the rule is expanded.
Structural ambiguity

•   S  NP VP                              • NP  John | Mary | Denver
•   VP  V NP                              • V -> called
•   NP  NP PP                             • P -> from
•   VP  VP PP
•   PP  P NP
                                       John called Mary from Denver
                  S                                            S

         NP             VP                             NP            VP

                 VP              PP                                        NP


         V            NP     P        NP               V           NP           PP
John    called        Mary from Denver         John   called       Mary
                                                                            P     NP
                                                                          from Denver
Cocke-Younger-Kasami Parser

Bottom-up parser with top-down filtering
Start State(s): (A, i, i+1) for each Awi+1
End State: (S, 0,n) n is the input size
Next State Rules
•   (B, i, k) (C, k, j)  (A, i, j) if ABC
Example




 John     called   Mary   from   Denver
Base Case: Aw


                                NP


                         P      Denver

                  NP     from

         V        Mary

 NP      called

 John
Recursive Cases: ABC


                               NP


                        P      Denver

                 NP     from

 X      V        Mary

 NP     called

 John
                              NP


                       P      Denver

       VP       NP     from

X      V        Mary

NP     called

John
                              NP


                X      P      Denver

       VP       NP     from

X      V        Mary

NP     called

John
                       PP     NP


                X      P      Denver

       VP       NP     from

X      V        Mary

NP     called

John
                       PP     NP


                X      P      Denver

S      VP       NP     from

       V        Mary

NP     called

John
                       PP     NP


       X        X      P      Denver

S      VP       NP     from

X      V        Mary

NP     called

John
                NP     PP     NP


       X               P      Denver

S      VP       NP     from

X      V        Mary

NP     called

John
                NP     PP     NP


X      X        X      P      Denver

S      VP       NP     from

X      V        Mary

NP     called

John
       VP       NP     PP     NP


X      X        X      P      Denver

S      VP       NP     from

X      V        Mary

NP     called

John
       VP       NP     PP     NP


X      X        X      P      Denver

S      VP       NP     from

X      V        Mary

NP     called

John
       VP1      NP     PP     NP
       VP2
X      X        X      P      Denver

S      VP       NP     from

X      V        Mary

NP     called

John
S      VP1      NP     PP     NP
       VP2
X      X        X      P      Denver

S      VP       NP     from

X      V        Mary

NP     called

John
S      VP       NP     PP     NP


X      X        X      P      Denver

S      VP       NP     from

X      V        Mary

NP     called

John
Probabilistic CKY

• Assign probabilities to constituents as they are completed and
  placed in the table
• Computing the probability
       P( A, i, j )   P( A  BC, i, j )
                     A   BC

        P( A  BC, i, j )  P( B, i, k ) *P(C , k , j )* P( A  BC)
   – Since we are interested in the max P(S,0,n)
       • Use the max probability for each constituent
• Maintain back-pointers to recover the parse.
    Problems with PCFGs

The probability model we’re using is just based on the rules in the derivation.
Lexical insensitivity:
•   Doesn’t use the words in any real way
•   Structural disambiguation is lexically driven
    –   PP attachment often depends on the verb, its object, and the preposition
    –   I ate pickles with a fork.
    –   I ate pickles with relish.

Context insensitivity of the derivation
•   Doesn’t take into account where in the derivation a rule is used
    –   Pronouns more often subjects than objects
    –   She hates Mary.
    –   Mary hates her.

Solution: Lexicalization
•   Add lexical information to each rule
An example of lexical information: Heads
Make use of notion of the head of a phrase
•   Head of an NP is a noun
•   Head of a VP is the main verb
•   Head of a PP is its preposition
Each LHS of a rule in the PCFG has a lexical item
Each RHS non-terminal has a lexical item.
•   One of the lexical items is shared with the LHS.
If R is the number of binary branching rules in CFG, in lexicalized
CFG: O(2*|∑|*|R|)
Unary rules: O(|∑|*|R|)
Example (correct parse)


Attribute grammar
Example (less preferred)
Computing Lexicalized Rule Probabilities



We started with rule probabilities
•   VP  V NP PP           P(rule|VP)
    –   E.g., count of this rule divided by the number of VPs in a treebank
Now we want lexicalized probabilities
•   VP(dumped)  V(dumped) NP(sacks)PP(in)
•   P(rule|VP ^ dumped is the verb ^ sacks is the head of the NP ^
    in is the head of the PP)
•   Not likely to have significant counts in any treebank
Another Example

Consider the VPs
•   Ate spaghetti with gusto
•   Ate spaghetti with marinara
Dependency is not between mother-child.




              Vp (ate)                 Vp(ate)

                                          Np(spag)
          Vp(ate)    Pp(with)
                                          np Pp(with)
        v    np                     v
      Ate spaghetti with gusto    Ate spaghetti with marinara
 Log-linear models for Parsing

• Why restrict to the conditioning to the elements of a
  rule?
   – Use even larger context
   – Word sequence, word types, sub-tree context etc.
• In general, compute P(y|x); where fi(x,y) test the
  properties of the context; li is the weight of that
  feature.
                            e i i
                                    * f ( x, y )
                P( y | x) 
                             e i i
                             
                                 * f ( x, y )

                             yY
• Use these as scores in the CKY algorithm to find the
  best scoring parse.
Supertagging: Almost parsing
 Poachers                 now             control                 the           underground                  trade
             N                 VP              S                                          S
     N               N   Adv        VP    NP           S                                                         NP
                                                                                     NP VP
                                               NP VP                                                             N
 poachers                now                                                              V       NP
                                                       V        NP
                                                                                                 Adj         trade
    NP
                                                   control                               underground
     N                          S              S                                                  N                  N
                         Adv        S                                 NP                  Adj           N    N               N
 poachers                                 NP VP
                                                                Det        NP
                         now                   V           S                         underground            trade
      S
                                                                the                   S
   NP VP                       VP           control                                                            S
                         VP         Adv            S                            NP         S
      V          NP                                                                                         NP VP
                                            NP VP                                     NP VP
                N                  now                                                                        V         NP
                                                   V       NP                           V         NP
             poachers                                                                                                   N
                                :                                                                    Adj
                 :                             control
                                                                                                                         trade
                                :                                                         underground
                 :                                                                                               :
Summary
Parsing context-free grammars
•   Top-down and Bottom-up parsers
•   Mixed approaches (CKY, Earley parsers)
Preferences over parses using probabilities
•   Parsing with PCFG and PCKY algorithms
Enriching the probability model
•   Lexicalization
•   Log-linear models for parsing

								
To top