ppt - Carnegie Mellon University

Document Sample
ppt - Carnegie Mellon University Powered By Docstoc
					                   11-682: Introduction to
                      Human Language
                        Technologies


Natural Language Generation:
          Overview




  11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   1
            Today’s Topics
• An overview of practical issues in
  building natural language
  generation (NLG) systems
• Based on [Jurafsky & Martin,
  Chapter 20] [Reiter & Dale, 1997]
• Goal: “produce understandable
  texts in a human language from
  some underlying representation”

      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   2
         NLG Ingredients
• A representation of the input
  (probably not human-friendly)
• Knowledge of the domain
• Knowledge of the target language
• A human-friendly output format:
  – documents, reports, explanations,
    help messages, technical instructions,
    etc.

      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   3
Example NLG Applications
• Forecasts from weather maps
• Summarize results of DB queries
• Explain complex (e.g. medical)
  information
• Describe a chain of reasoning in an
  expert system
• Answering questions about an
  object in a knowledge base
      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   4
            Authoring Aids
• Template-based generation of
  routine documents
• Examples:
  – discharge summaries, referral letters
  – letters to customers
  – management summaries
  – job descriptions
  – technical manuals

      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   5
When Is NLG Appropriate?
• Are graphics more useful?
• Is human-quality output required?
• How much stylistic variation?
• Any legal liabilities / requirements?
• Constraints posed by the problem
  domain? (e.g. bandwidth)



      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   6
Templates (“mail-merge”)
• Insert input data into pre-defined
  slots in a template document
• More complex systems vary
  structure based on input
• More limited than NLG
  – NLG can achieve higher quality
  – NLG is easier to adapt to changes


      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   7
     Human vs. Machine
• Is NLG a cost-effective solution?
• Economics of NLG development
  – Systems are expensive
  – A large volume of output necessary
    to justify the expenditure
• The cost / quality threshold
  – Can NLG provide the necessary
    quality at an acceptable price?
    (or at all?)

      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   8
   Requirements Analysis
• NLG is an evolving technology...
• ...so iterative prototyping is the
  most appropriate SE technique
• Corpus-Based Methods:
  – Identify target text sample
  – Associate with internal
    representations (input to NLG)
  – Specify required NLG algorithms and
    data
      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   9
     Gathering A Corpus
• Archived examples of human texts
• Cover a full range of texts
• If no corpus, ask experts to create
  one (associated costs & conflicts)
• Document Table:
  – rows = domain categories (e.g.,
    product lines, business areas,)
  – columns = document types
    (installation, user, maintenance, etc.)

      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   10
Example Document Table
                                                          Internet
  Product vs.   Installation
                               User Guide    ReadMe       Upgrade
   Doctype         Guide
                                                         Info Sheet


  Q36 Space     Q36SPM001E     Q36SPM002E   Q36SPM003E   Q36SPM004E
  Modulator     Q36SPM001S     Q36SPM002S   Q36SPM003S   Q36SPM004S
                                                  English
   VOX A30
                 VC3001E         VC3002E          Documents
                                             VC3003E  VC3004E
                 VC3001S         VC3002S     VC3003S      VC3004S
               Spanish
               Output
                 X1B001E         X1B002E     X1B003E      X1B004E
    X100B
                 X1B001S         X1B002S     X1B003S      X1B004S



  Mothership     MTH001E        MTH002E      MTH003E      MTH004E
     1.0         MTH001S        MTH002S      MTH003S      MTH004S




       11-682: Intro to IR, NLP,MT,Speech        NLG: Overview        11
        Analyzing the
     Information Content
• Which parts convey information
  that isn’t available to the NLG
  system? E.g.:
 When is the next train to Glasgow?
 (requires external DB)
• Analysis: classifying sentences
  according to information required
  – unchanging text, direct data,
    computed data, unavailable data

      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   12
                             Sentence Types
   Easy      • Unchanging Text
                 Thank you for flying US Airways
             • Directly-Available Data
                 Scheduled departure is 6:30pm
             • Computable Data
                 There are 20 flights to Boston
 Hard or     • Unavailable Data
Impossible
                 Due to ground delay in Pittsburgh
              (Rely on Humans for Unavailable Data)

             11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   13
    6 Basic NLG Tasks
1. Content Determination
  what information should be conveyed?
2. Discourse Planning
  order & structure of message set
3. Sentence Aggregation
  grouping messages into sentences
4. Lexicalization
  words & phrases for concepts, relations
5. Referring Expression Generation
  words & phrases for entities
6. Linguistic Realisation
  syntax, morphology, orthography

   11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   14
  Typical 3-Module Architecture
                                                goal
1.Content Determination                  Text
2. Discourse Planning                  Planner

                                                text plan
3. Sentence Aggregation
4. Lexicalization
                                     Sentence          Q: How should these
5. Referring Expressions              Planner             be represented?

                                                sentence plans
6. Syntax, Morphology,               Linguistic
   Orthography                        Realizer
                                                surface text

           11-682: Intro to IR, NLP,MT,Speech      NLG: Overview     15
                   Text Plans
• Common representation : tree
  – Leaf nodes = messages
  – Internal nodes = message groupings
• Simple text plans: templates OK
• Complex text plans: require full
  representation language
 (e.g., TAMERLAN, DIOGENES)


      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   16
           Sentence Plans
• Simple: templates (select & fill)
• Complex: abstract representation
  (SPL: Sentence Planning Language)




      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   17
     Example SPL Expression
(S1/exist
 :object (01/train
          :cardinality 20
          :relations ((R1/period
                       :value daily)
                      (R2/source
                       :value Aberdeen)
                      (R3/destination
                       :value Glasgow))))
There are 20 trains a day from Aberdeen to Glasgow

          11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   18
   Content Determination
• Messages (raw content)
• User Model (influences content)
• Is Reasoning Required?
 Find a train from Aberdeen to Leeds
 (It requires two trains to get there)
• Deep Reasoning Systems
  – represent the user’s goals as well as
    any immediate query
  – utilize plan recognition & reasoning

       11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   19
     Discourse Planning
• Structure messages into a
  coherent text
• Example: start with a summary,
  then give details
• Discourse relations, e.g.:
  – elaboration: More specifically, X
  – exemplification: For example, X
  – contrast / exception: However, X
• Rhetorical Structure Theory (RST)

      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   20
   Sentence Aggregation
• No aggregation                   (1 sentence / message)
• Relative Clause
 ..which leaves at 10am
• Conjunction
 ..and the next train is the express
• Combinations
 ..and the next train is the express
 which leaves at 10am

      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   21
             Lexicalization
• Choosing words to realize concepts
  or relations
• Example:
  (action/change
   (measure outside_temperature)
   (delta (quantity/deg_F -10)))

 The temperature dropped 10 degrees

     11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   22
 Lexical Selection Rules
(*A-INGEST
  (AGENT *O-BOB)
  (PATIENT *O-MILK)) => "drink"

(*A-INGEST
  (AGENT *O-BOB)
  (PATIENT *O-CHOCOLATE)) => "eat"




    11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   23
              Case Creation
• Additional structure is required to
  realize the meaning of the
  semantic representation
(*A-KICK
  (AGENT *O-JOHN)
  (PATIENT *O-BALL))

"John propelled the ball with his foot"



       11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   24
         Case Absorption
• Word chosen to realize a semantic
  head also implies the meaning
  conveyed by a semantic role
  (*A-FILE-LEGAL-ACTION
    (AGENT *O-BOB)
    (PATIENT *O-SUIT)
    (RECIPIENT *O-ACME))

  "Bob sued Acme"

      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   25
Referring Expression Generation
 • Initial introduction
  A man in the park looked up
 • Pronouns
  He saw a bird fly over
 • Definite Descriptions
  The man covered his head with a
  newspaper


       11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   26
       Fixing Robot Text
• Start [the engine]i and run [the
  engine]i until [the engine]i reaches
  normal operating temperature
• Start []i and run [the engine]i until
  [it]i reaches normal operating
  temperature
• Second example introduces ellipsis
  and anaphora

      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   27
             Journalistic Style
“A dissident Spanish priest was charged here today
with attempting to murder the Pope. Juan Fernandez
Krohn, aged 32, was arrested after a man armed with
a bayonet approached the Pope while he was saying
prayers at Fatima on Wednesday night. According to
the police, Fernandez told the investigating
magistrates today, he trained for the past six months
for the assault. If found guilty, the Spaniard faces a
prison sentence of 15-20 years.”
(Brown and Yule, 1983)



           11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   28
                    Summary
• 6 Basic Steps in NLG
• Architectures group these steps
  into different modules
• Input / output / approach depend
  on the domain
• Design of internal data structures
  depends on complexity of task


      11-682: Intro to IR, NLP,MT,Speech   NLG: Overview   29

				
DOCUMENT INFO