VIEWS: 3 PAGES: 51 POSTED ON: 9/6/2011
PARSING Analyzing Linguistic Units Task Formal Mechanism Resulting Representation Morphology Analyze words Context FST composition Morphological structure into morphemes dependency rules Phonology Analyze words Context FST composition Phonemic structure into phonemes dependency rules Syntax Analyze Grammars: PDA Top-down, Parse tree, derivation tree sentences for CFGs syntactic Bottom-up, relations between words Earley, CKY parsing • Why should we parse a sentence? to detect relations among words used to normalize surface syntactic variations. invaluable for a number of NLP applications Some Concepts Grammar: A generative device that prescribes a set of valid strings. Parser: A device that uncovers the sequence of grammar rules that might have generated the input sentence. • Input: Grammar, Sentence • Output: parse tree, derivation tree Recognizer: A device that returns a “yes” if the input string could be generated by the grammar. • Input: Grammar, Sentence • Output: boolean Searching for a Parse Grammar + rewrite procedure encodes • all strings generated by the grammar L(G) • all parse trees for each string (s) generated T(G) = U{Ts(G)} Given an input sentence (I), the set of parse trees is TI (G). Parsing is searching for TI (G) ⊆ T(G) Ideally, parser finds the appropriate parse for the sentence. CFG for Fragment of English S S NP VP VP V S Aux NP VP PP -> Prep NP VP S VP N book | flight | meal | money NP Det Nom V book | include | prefer V NP NP PropN Aux does Nom N Nom Prep from | to | on Nom N PropN Houston | TWA Book Det Nom Nom Nom PP Det that | this | a VP V NP that N flight Bottom-up Parsing Top-down Parsing Top-down/Bottom-up Parsing Top-down (recursive decent parser) Bottom-up (shift-reduce parser) Starts from S (goal) Words (input) Algorithm a. Pick non-terminals a. Match sequence of input symbols with the RHS of some rule (Parallel) b. Pick rules from the grammar to expand the non-terminals b. Replace the sequence by the LHS of the matching rule Termination Success: When the leaves of a tree match the Success: When “S” is reached input Failure: No more rewrites possible Failure: No more non-terminals to expand in any of the trees Pros/Cons Pro: Goal-driven, starts with “S” Pro: Constrained by the input string Con: Constructs trees that may not match Con: Constructs constituents that may not input lead to the goal “S” • Control strategy -- how to explore search space? • Pursuing all parses in parallel or backtrack or …? • Which rule to apply next? • Which node to expand next? • Look at how the Top-down and Bottom-up parsing works on the board for “Book that flight” Top-down, Depth First, Left-to-Right parser Systematic, incremental expansion of the search space. • In contrast to a parallel parser Start State: (•S,0) End State: (•,n) n is the length of input to be parsed Next State Rules • (•wj+1b,j) (•b,j+1) • (•Bb,j) (•gb,j) if Bg (note B is left-most non-terminal) Agenda: A data structure to keep track of the states to be expanded. Depth-first expansion, if Agenda is a stack. Fig 10.7 CFG Left Corners • Can we help top-down parsers with some bottom-up information? – Unnecessary states created if there are many Bg rules. – If after successive expansions B * w d; and w does not match the input, then the series of expansion is wasted. • The leftmost symbol derivable from B needs to match the input. – look ahead to left-corner of the tree • B is a left-corner of A if A * B g • Build table with left-corners of all non-terminals in grammar and consult before applying rule Category Left Corners S Det, PropN, Aux, V NP Det, PropN Nom N VP V • At a given point in state expansion (•Bb,j) – Pick the rule B C g if left-corner of C matches the input wj+1 Limitation of Top-down Parsing: Left Recursion Depth-first search will never terminate if grammar is left recursive (e.g. NP --> NP PP) ( * , * ) Solutions: • Rewrite the grammar to a weakly equivalent one which is not left-recursive NP NP PP NP Nom NP’ NP Nom PP NP’ PP NP’ NP Nom NP’ e – This may make rules unnatural • Fix depth of search explicitly Other book-keeping needed in top-down parsing • Memoization for reusing previously parsed substrings • Packed representation for parse ambiguity Dynamic Programming for Parsing Memoization: • Create table of solutions to sub-problems (e.g. subtrees) as parse proceeds • Look up subtrees for each constituent rather than re-parsing • Since all parses implicitly stored, all available for later disambiguation Examples: Cocke-Younger-Kasami (CYK) (1960), Graham-Harrison-Ruzzo (GHR) (1980) and Earley (1970) algorithms Earley parser: O(n^3) parser • Top-down parser with bottom-up information • State: [i, A • b, j] – j is the position in the string that has been parsed – i is the position in the string where A begins • Top-down prediction: S * w1… wi A g • Bottom-up completion: wj+1 … wn * wi … wn Earley Parser Data Structure: An n+1 cell array called : Chart • For each word position, chart contains set of states representing all partial parse trees generated to date. – E.g. chart[0] contains all partial parse trees generated at the beginning of the sentence Chart entries represent three type of constituents: • predicted constituents (top-down predictions) • in-progress constituents (we’re in the midst of …) • completed constituents (we’ve found …) Progress in parse represented by Dotted Rules • Position of • indicates type of constituent • 0 Book 1 that 2 flight 3 (0,S • VP, 0) (predicting VP) (1,NP Det • Nom, 2) (finding NP) (0,VP V NP •, 3) (found VP) Earley Parser: Parse Success Final answer is found by looking at last entry in chart • If entry resembles (0,S •, n) then input parsed successfully But … note that chart will also contain a record of all possible parses of input string, given the grammar -- not just the successful one(s) • Why is this useful? Earley Parsing Steps Start State: (0, S’ •S, 0) End State: (0, S•, n) n is the input size Next State Rules • Scanner: read input (i, A•wj+1b, j) (i, Awj+1•b, j+1) • Predictor: add top-down predictions (i, A•Bb, j) (j, B•g, j) if Bg (note B is left-most non-terminal) • Completer: move dot to right when new constituent found (i, B•Ab, k) (k, Ag•, j) (i, BA•b, j) No backtracking and no states removed: keep complete history of parse • Why is this useful? Earley Parser Steps Scanner Predictor Completer When does it Applied when Applied when non- Applied when dot apply terminals are to terminals are to the reaches the end of a the right of a dot right of a dot rule (0, VP • V NP, 0) (0, S • VP ,0) (1, NP Det Nom •, 3) What chart New states are New states are New states are added cell is affected added to the next added to current to current cell cell cell What contents Move the dot over One new state for One state for each rule in the chart the terminal each expansion of “waiting” for the cell the non-terminal in constituent such as (0, VP V • NP, 1) the grammar (0, VP V • NP, 1) (0, VP • V, 0) (0, VP V NP •, 3) (0, VP • V NP, 0) Book that flight (Chart [0]) Seed chart with top-down predictions for S from grammar g [0,0] Dummy start state S NP VP [0,0] Predictor S Aux NP VP [0,0] Predictor S VP [0,0] Predictor NP Det Nom [0,0] Predictor NP PropN [0,0] Predictor VP V [0,0] Predictor VP V NP [0,0] Predictor CFG for Fragment of English S NP VP Det that | this | a S Aux NP VP N book | flight | meal | money S VP V book | include | prefer NP Det Nom Aux does Nom N Nom N Nom Prep from | to | on NP PropN PropN Houston | VP V Nom Nom PP TWA VP V NP PP Prep NP Chart[1] V book [0,1] Scanner VP V [0,1] Completer VP V NP [0,1] Completer S VP [0,1] Completer NP Det Nom [1,1] Predictor NP PropN [1,1] Predictor V book passed to Completer, which finds 2 states in Chart[0] whose left corner is V and adds them to Chart[1], moving dots to right Retrieving the parses Augment the Completer to add pointer to prior states it advances as a field in the current state • i.e. what states combined to arrive here? • Read the pointers back from the final state What if the final cell does not have the final state? – Error handling. • Is it a total loss? No... • Chart contains every constituent and combination of constituents possible for the input given the grammar • Useful for partial parsing or shallow parsing used in information extraction Alternative Control Strategies Change Earley top-down strategy to bottom-up or ... Change to best-first strategy based on the probabilities of constituents • Compute and store probabilities of constituents in the chart as you parse • Then instead of expanding states in fixed order, allow probabilities to control order of expansion Probabilistic and Lexicalized Parsing Probabilistic CFGs Weighted CFGs • Attach weights to rules of CFG • Compute weights of derivations • Use weights to pick, preferred parses – Utility: Pruning and ordering the search space, disambiguate, Language Model for ASR. Parsing with weighted grammars (like Weighted FA) • T* = arg maxT W(T,S) Probabilistic CFGs are one form of weighted CFGs. Probability Model • Rule Probability: – Attach probabilities to grammar rules – Expansions for a given non-terminal sum to 1 R1: VP V .55 R2: VP V NP .40 R3: VP V NP NP .05 – Estimate the probabilities from annotated corpora P(R1)=counts(R1)/counts(VP) • Derivation Probability: – Derivation T= {R1…Rn} n – Probability of a derivation: P(T ) P( R ) i 1 i – Most likely probable parse: T * arg T max P(T ) – Probability of a sentence: P( S ) P(T | S ) T • Sum over all possible derivations for the sentence • Note the independence assumption: Parse probability does not change based on where the rule is expanded. Structural ambiguity • S NP VP • NP John | Mary | Denver • VP V NP • V -> called • NP NP PP • P -> from • VP VP PP • PP P NP John called Mary from Denver S S NP VP NP VP VP PP NP V NP P NP V NP PP John called Mary from Denver John called Mary P NP from Denver Cocke-Younger-Kasami Parser Bottom-up parser with top-down filtering Start State(s): (A, i, i+1) for each Awi+1 End State: (S, 0,n) n is the input size Next State Rules • (B, i, k) (C, k, j) (A, i, j) if ABC Example John called Mary from Denver Base Case: Aw NP P Denver NP from V Mary NP called John Recursive Cases: ABC NP P Denver NP from X V Mary NP called John NP P Denver VP NP from X V Mary NP called John NP X P Denver VP NP from X V Mary NP called John PP NP X P Denver VP NP from X V Mary NP called John PP NP X P Denver S VP NP from V Mary NP called John PP NP X X P Denver S VP NP from X V Mary NP called John NP PP NP X P Denver S VP NP from X V Mary NP called John NP PP NP X X X P Denver S VP NP from X V Mary NP called John VP NP PP NP X X X P Denver S VP NP from X V Mary NP called John VP NP PP NP X X X P Denver S VP NP from X V Mary NP called John VP1 NP PP NP VP2 X X X P Denver S VP NP from X V Mary NP called John S VP1 NP PP NP VP2 X X X P Denver S VP NP from X V Mary NP called John S VP NP PP NP X X X P Denver S VP NP from X V Mary NP called John Probabilistic CKY • Assign probabilities to constituents as they are completed and placed in the table • Computing the probability P( A, i, j ) P( A BC, i, j ) A BC P( A BC, i, j ) P( B, i, k ) *P(C , k , j )* P( A BC) – Since we are interested in the max P(S,0,n) • Use the max probability for each constituent • Maintain back-pointers to recover the parse. Problems with PCFGs The probability model we’re using is just based on the rules in the derivation. Lexical insensitivity: • Doesn’t use the words in any real way • Structural disambiguation is lexically driven – PP attachment often depends on the verb, its object, and the preposition – I ate pickles with a fork. – I ate pickles with relish. Context insensitivity of the derivation • Doesn’t take into account where in the derivation a rule is used – Pronouns more often subjects than objects – She hates Mary. – Mary hates her. Solution: Lexicalization • Add lexical information to each rule An example of lexical information: Heads Make use of notion of the head of a phrase • Head of an NP is a noun • Head of a VP is the main verb • Head of a PP is its preposition Each LHS of a rule in the PCFG has a lexical item Each RHS non-terminal has a lexical item. • One of the lexical items is shared with the LHS. If R is the number of binary branching rules in CFG, in lexicalized CFG: O(2*|∑|*|R|) Unary rules: O(|∑|*|R|) Example (correct parse) Attribute grammar Example (less preferred) Computing Lexicalized Rule Probabilities We started with rule probabilities • VP V NP PP P(rule|VP) – E.g., count of this rule divided by the number of VPs in a treebank Now we want lexicalized probabilities • VP(dumped) V(dumped) NP(sacks)PP(in) • P(rule|VP ^ dumped is the verb ^ sacks is the head of the NP ^ in is the head of the PP) • Not likely to have significant counts in any treebank Another Example Consider the VPs • Ate spaghetti with gusto • Ate spaghetti with marinara Dependency is not between mother-child. Vp (ate) Vp(ate) Np(spag) Vp(ate) Pp(with) np Pp(with) v np v Ate spaghetti with gusto Ate spaghetti with marinara Log-linear models for Parsing • Why restrict to the conditioning to the elements of a rule? – Use even larger context – Word sequence, word types, sub-tree context etc. • In general, compute P(y|x); where fi(x,y) test the properties of the context; li is the weight of that feature. e i i * f ( x, y ) P( y | x) e i i * f ( x, y ) yY • Use these as scores in the CKY algorithm to find the best scoring parse. Supertagging: Almost parsing Poachers now control the underground trade N VP S S N N Adv VP NP S NP NP VP NP VP N poachers now V NP V NP Adj trade NP control underground N S S N N Adv S NP Adj N N N poachers NP VP Det NP now V S underground trade S the S NP VP VP control S VP Adv S NP S V NP NP VP NP VP NP VP N now V NP V NP V NP poachers N : Adj : control trade : underground : : Summary Parsing context-free grammars • Top-down and Bottom-up parsers • Mixed approaches (CKY, Earley parsers) Preferences over parses using probabilities • Parsing with PCFG and PCKY algorithms Enriching the probability model • Lexicalization • Log-linear models for parsing