chapter13 Parsing by hcj

VIEWS: 9 PAGES: 27

									   Parsing


SLP Chapter 13
                        Outline
   § Parsing with CFGs
        § Bottom-up, top-down
        § CKY parsing
        § Mention of Earley and chart parsing




10/23/2013              Speech and Language Processing - Jurafsky and   2
                   Parsing
   § Parsing with CFGs refers to the task of
     assigning trees to input strings
   § Trees that covers all and only the
     elements of the input and has an S at the
     top
   § This chapter: find all possible trees
   § Next chapter (14): choose the most
     probable one


10/23/2013          Speech and Language Processing - Jurafsky and   3
                    Parsing
      § parsing involves a search




10/23/2013           Speech and Language Processing - Jurafsky and   4
             Top-Down Search
   § We’re trying to find trees rooted with an
     S; start with the rules that give us an S.
   § Then we can work our way down from
     there to the words.




10/23/2013           Speech and Language Processing - Jurafsky and   5
             Top Down Space




10/23/2013      Speech and Language Processing - Jurafsky and   6
             Bottom-Up Parsing
   § We also want trees that cover the input
     words.
   § Start with trees that link up with the
     words
   § Then work your way up from there to
     larger and larger trees.




10/23/2013          Speech and Language Processing - Jurafsky and   7
Bottom-Up Space
       Top-Down and Bottom-Up
   § Top-down
        § Only searches for trees that can be S’s
        § But also suggests trees that are not consistent
          with any of the words
   § Bottom-up
        § Only forms trees consistent with the words
        § But suggests trees that make no sense
          globally



10/23/2013               Speech and Language Processing - Jurafsky and   9
                        Control
   § Which node to try to expand next
   § Which grammar rule to use to expand a
     node
   § One approach: exhaustive search of the
     space of possibilities
   § Not feasible
        § Time is exponential in the number of non-
          terminals
        § LOTS of repeated work, as the same
          constituent is created over and over (shared
          sub-problems)
10/23/2013               Speech and Language Processing - Jurafsky and   10
             Dynamic Programming
   § DP search methods fill tables with partial results
     and thereby
        § Avoid doing avoidable repeated work
        § Solve exponential problems in polynomial time (well,
          no not really – we’ll return to this point)
        § Efficiently store ambiguous structures with shared sub
          -parts.
   § We’ll cover two approaches that roughly
     correspond to bottom-up and top-down
     approaches.
        § CKY
        § Earley


10/23/2013                 Speech and Language Processing - Jurafsky and   11
                        CKY Parsing

      § Consider the rule A ® BC
             § If there is an A somewhere in the input
               then there must be a B followed by a C in
               the input.
             § If the A spans from i to j in the input then
               there must be some k st. i<k<j
               § Ie. The B splits from the C someplace.




10/23/2013                   Speech and Language Processing - Jurafsky and   12
         Convert Grammar to CNF
       § What if your grammar isn’t binary?
             § As in the case of the TreeBank grammar?
       § Convert it to binary… any arbitrary CFG
         can be rewritten into Chomsky-Normal
         Form automatically.
             § The resulting grammar accepts (and rejects) the
               same set of strings as the original grammar.
             § But the resulting derivations (trees) are different.
             § We saw this in the last set of lecture notes




10/23/2013                     Speech and Language Processing - Jurafsky and   13
        Convert Grammar to CNF
   § More specifically, we want our rules to be
     of the form
        A®BC
        Or
        A    ® w

        That is, rules can expand to either 2 non-
         terminals or to a single terminal.


10/23/2013               Speech and Language Processing - Jurafsky and   14
              Binarization Intuition
      § Introduce new intermediate non-terminals
        into the grammar that distribute rules
        with length > 2 over several rules.
             § So…S ® A B C turns into
             S ® X C and
             X®AB
             Where X is a symbol that doesn’t occur
              anywhere else in the the grammar.


10/23/2013               Speech and Language Processing - Jurafsky and   15
    Converting grammar to CNF
   1. Copy all conforming rules to the new
      grammar unchanged
   2. Convert terminals within rules to dummy
      non-terminals
   3. Convert unit productions
   4. Make all rules with NTs on the right
      binary
   In lecture: what these mean; apply to
      example on next two slides

10/23/2013                                      16
             Sample L1 Grammar




10/23/2013        Speech and Language Processing - Jurafsky and   17
             CNF Conversion




10/23/2013      Speech and Language Processing - Jurafsky and   18
                              CKY
   § Build a table so that an A spanning from i
     to j in the input is placed in cell [i,j] in the
     table.
   § E.g., a non-terminal spanning an entire
     string will sit in cell [0, n]
        § Hopefully an S




10/23/2013                 Speech and Language Processing - Jurafsky and   19
                             CKY
   § If
        § there is an A spanning i,j in the input
        § A ® B C is a rule in the grammar
   § Then
        § There must be a B in [i,k] and a C in [k,j] for
          some i<k<j




10/23/2013                Speech and Language Processing - Jurafsky and   20
                                 CKY
   § The loops to fill the table a column at a
     time, from left to right, bottom to top.
        § When we’re filling a cell, the parts needed to
          fill it are already in the table
             § to the left and below




10/23/2013                    Speech and Language Processing - Jurafsky and   21
             CKY Algorithm




10/23/2013      Speech and Language Processing - Jurafsky and   22
                             Example




       Go through full example in lecture
10/23/2013                     Speech and Language Processing - Jurafsky and   23
                    CKY Parsing
   § Is that really a parser?
   § So, far it is only a recognizer
        § Success? an S in cell [0,N]
   § To turn it into a parser … see Lecture




10/23/2013               Speech and Language Processing - Jurafsky and   24
                    CKY Notes
  § Since it’s bottom up, CKY populates the
    table with a lot of worthless constituents.
      § To avoid this we can switch to a top-down
        control strategy
      § Or we can add some kind of filtering that
        blocks constituents where they can not
        happen in a final analysis.




10/23/2013             Speech and Language Processing - Jurafsky and   25
Dynamic Programming Parsing
          Methods
§ CKY (Cocke-Kasami-Younger) algorithm
  based on bottom-up parsing and requires
  first normalizing the grammar.
§ Earley parser is based on top-down
  parsing and does not require normalizing
  grammar but is more complex.
§ More generally, chart parsers retain
  completed phrases in a chart and can
  combine top-down and bottom-up search.

                                             26
                Conclusions
§ Syntax parse trees specify the syntactic
  structure of a sentence that helps determine its
  meaning.
  § John ate the spaghetti with meatballs with chopsticks.
  § How did John eat the spaghetti?
    What did John eat?
§ CFGs can be used to define the grammar of a
  natural language.
§ Dynamic programming algorithms allow
  computing a single parse tree in cubic time or all
  parse trees in exponential time.

                                                             27

								
To top