Docstoc

Morphology

Document Sample
Morphology Powered By Docstoc
					Parsing context-free grammars
   Context-free grammars specify structure, not
    process.
   There are many different ways to parse input
    in accordance with a given context-free
    grammar.
   We will review
    –   a top-down parsing algorithm
    –   a bottom-up parsing algorithm
   We will present the Earley algorithm
A simple grammar

Figure 10.2

S  NP VP                             Verb  book | include | prefer
S  Aux NP VP                         Aux  does
S  VP                                Prep  from | to | on
NP  Det Nominal                      ProperNoun  Houston | TWA
NP  ProperNoun                       PP  P NP
VP  Verb                             Nominal  Nominal PP
VP  Verb NP                          Nominal  Noun
Det  that | this | a                 Nominal  Noun Nominal
Noun  book | flight | meal | money
Bottom-up parsing
   Yngve (1955) presented a bottom-up algorithm

   Example (figure 10.4): Book that flight.
Look up words in lexicon
Book is ambiguous – there are two possible
POS tags for the word “Book”.



Noun Det Noun         Verb Det Noun

Book that flight      Book that flight
Build structure from bottom up



NOM        NOM                NOM

Noun Det Noun      Verb Det Noun

Book that flight   Book that flight
Build structure from bottom up
Now we have three possible structures:




            NP                                      NP

NOM         NOM       VP        NOM                 NOM

Noun Det Noun        Verb Det Noun       Verb Det Noun

Book that flight     Book that flight    Book that flight
Build structure from bottom up
The Noun interpretation of Book leads to a dead end, so only two
parse trees survive:

                                                VP

                                 NP                   NP

                       VP        NOM                  NOM

                      Verb Det Noun       Verb Det Noun

                      Book that flight    Book that flight
Build structure from bottom up
There is way to combine a VP and an NP to form an S, so only one
parse tree survives:                           S

                                               VP

                                                    NP

                                                    NOM

                                        Verb Det Noun

                                        Book that flight
Build structure from top down
When parsing top-down, we start with the grammar’s start symbol
and apply productions to try to match input:   S




                                         Book that flight
Build structure from top down
Here we show only the successful choices:
                                              S

                                              VP




                                        Book that flight
Build structure from top down
Here we show only the successful choices:
                                               S

                                               VP

                                                    NP




                                        Verb

                                        Book that flight
Build structure from top down
Here we show only the successful choices:
                                               S

                                               VP

                                                    NP




                                        Verb

                                        Book that flight
Build structure from top down
Here we show only the successful choices:
                                              S

                                              VP

                                                   NP

                                                   NOM

                                        Verb Det

                                        Book that flight
Build structure from top down
Here we show only the successful choices:
                                              S

                                              VP

                                                   NP

                                                   NOM

                                        Verb Det

                                        Book that flight
Build structure from top down
Here we show only the successful choices:
                                              S

                                              VP

                                                   NP

                                                   NOM

                                        Verb Det Noun

                                        Book that flight
Build structure from top down
Here we show only the successful choices:
                                              S

                                              VP

                                                   NP

                                                   NOM

                                        Verb Det Noun

                                        Book that flight
Top-down versus bottom-up approaches
Top-down advantages              Bottom-up advantages
  –   Doesn’t explore trees         –   All trees explored are
      which cannot be S                 consistent with input
  –   Subtrees fit under S       Bottom-up disadvantages
Top-down disadvantages              –   Builds structure even if S
  –   Many fruitless trees are          cannot be formed
      explored: trees explored      –   Builds neighboring
      may have no hope of               structures which can
      matching input                    never combine
Approaches to dealing with ambiguity
   parallel exploration
   depth-first strategy with backtracking
Improving top-down parsing
   Make top-down parser pay attention to input
    with bottom-up filtering (left-corner parsing)
   “The parser should not consider any grammar
    rule if he current input cannot serve as the
    first word along the left edge of some
    derivation from this rule.” [pg. 369]
   Left corners are pre-compiled.
Problems with top-down parsers
   left-recursion
       X *  X 
        * 
       Infinite loop in derivation!
   ambiguity
       not efficiently handled
   recomputation
       subtrees can be built multiple times (built, then thrown away
       during backtracking)
Earley’s algorithm
   Earley’s algorithm employs the dynamic
    programming technique to address the
    weaknesses of general top-down parsing.
   Dynamic programming involves storing of
    results so they don’t ever need to be
    recomputed.
   Dynamic programming reduces exponential
    time requirement to polynomial time
    requirement: O(N3), where N is length of input
    in words.
Data structure
   Earley’s algorithm uses a data structure called a chart
    to store information about the progress of the parse.
   A chart contains an entry for each position in the input
   A position occurs before the first word, between
    words, and after the last word.

         word1  word2  …  wordN 


   A position is represented by a number; positions in
    the input are numbered from 0 (at the left) to N (at the
    right).
Chart details
   A chart entry consists of a sequence of states.
   A state represents
    –   a subtree corresponding to a single grammar rule
    –   information about how much of a rule has been processed
    –   information about the span of the subtree w.r.t. the input
   A state is represented by an annotated grammar rule
    –   a dot () is used to show how much of the rule has been
        processed
    –   a pair of positions, [x,y], indicates the span of the subtree
        w.r.t. the input; x is the position of the left edge of the subtree,
        and y is the position of the dot.
Three operators on a chart
   Predictor
    –   applies when NonTerminal to right of  in a state is not a
        POS category (i.e. is not a pre-terminal)
    –   adds states to current chart entry
   Scanner
    –   applies when NonTerminal to right of  in a state is a POS
        category (i.e. is a pre-terminal)
    –   adds states to next chart entry
   Completer
    –   applies when there is no NonTermial (and hence no
        Terminal) to right of  in a state (i.e.  is at end)
    –   adds states to current chart entry
Predictor
   Suppose rule to which Predicator applies is:
       X    NT  [x,y]
   Predictor adds, to the current chart entry, a
    new state for each possible expansion of NT
   For each expansion EX of NT, state added is
       NT   EX [y,y]
Scanner
   Suppose rule to which Scanner applies is:
        X    POS  [x,y]
   Scanner adds, to the next chart entry, a new
    state for each possible expansion of POS
   The new state added is
        X   POS   [x,y+1]
Completer
   Suppose rule to which Completer applies is:
        X    [x,y]
   Completer adds, to the current chart entry, a
    new state for each possible reduction using
    the (now completed) state
   For each state (from any earlier chart entry) of
    the form
        Y    X  [w,x]
     a new state of the following form is added
        Y   X   [w,y]
Completer (modification)
   In order to recover parse tree information from the
    chart once parsing is complete, we need to modify the
    completer slightly.
   Each state in the chart must be given a unique
    identifier (N for state N)
   Each time the completer adds a state, it also adds the
    unique identifier of the state completed to the list of
    previous states for that new state (which is a copy of
    an already existing state, waiting for the category
    which the current state just completed).
Initial state of chart
   chart[0]    chart[1]   chart[2]   chart[3]

 0:    S
Example (from text)
   (work through on board)

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:8
posted:10/7/2011
language:English
pages:30