What is Natural Language Processing (NLP)

Document Sample
What is Natural Language Processing (NLP) Powered By Docstoc
					Natural Language Processing                                              Natural Language Processing



                                                                         Course Goals

What is Natural Language Processing (NLP)?                                 • To study algorithms and methods for building computational models
                                                                             of natural language understanding, such as: parsing techniques,
  • The process of building computational models for understanding           semantic representations, discourse analysis, and statistical and
    natural language.                                                        corpus-based methods for text analysis and knowledge acquisition

  • INPUT: natural language text.                                          • To study issues involved in understanding natural languages, such as
                                                                             cognitive and linguistic phenomena
  • OUTPUT: representation of the meaning of the text.
                                                                           • To learn about applications that can benefit from NLP
  • Sometimes called natural language understanding (NLU) or
    computational linguistics.                                           By the end of the course, everyone will have the skills necessary to build
                                                                         NLP tools and an understanding of (and appreciation for!) the issues
                                                                         involved in processing natural language.


                                                                     1



Natural Language Processing                                              Natural Language Processing


                                                                         Applications for Natural Language Processing
Related Disciplines
                                                                           • Database Query Interfaces

Computer Science
                                                                           • Intelligent Tutoring Systems
   artificial intelligence

                                                                           • Machine Translation
Linguistics
    computational linguistics, psycholinguistics
                                                                           • Information Retrieval / Search Engines :
                                                                             Retrieval, Categorization, Routing, Filtering, Summarization
Psychology
   cognitive psychology
                                                                           • Information Extraction

Statistics
                                                                           • Question Answering Systems
    probabilistic methods, information theory

                                                                           • Speech Recognition & Spoken Language Understanding

                                                                     3
Natural Language Processing                                            Natural Language Processing



Levels of Analysis and Knowledge Used in NLP                           Morphology

Morphology: how words are constructed; prefixes & suffixes                 • kick, kicks, kicked, kicking


Syntax: structural relationships between words                           • sit, sits, sat, sitting


Semantics: meanings of words, phrases, and expressions                   • murder, murders


Discourse: relationships across different sentences or thoughts;       But it’s not just as simple as adding and deleting endings...
    contextual effects
                                                                         • gorge, gorgeous
Pragmatic: the purpose of a statement; how we use language to
    communicate                                                          • glass, glasses

World Knowledge: facts about the world at large; common sense            • arm, army


                                                                   5



Natural Language Processing                                            Natural Language Processing




                                                                       Syntax: part-of-speech tagging

Morphology Humor

                                                                                       The boy threw a ball to the brown dog.




                                                                   7
Natural Language Processing                                          Natural Language Processing




Syntax: structural ambiguity (part of speech)
                                                                     Language Understanding Humor
                        Time flies like an arrow.




                                                                 9                                                          1



Natural Language Processing                                          Natural Language Processing




                                                                     But syntax doesn’t tell us much about meaning
Syntax: structural ambiguity (attachment)
                                                                       • Colorless green ideas sleep furiously. [Chomsky]
                I saw the Grand Canyon flying to New York.

                                                                       • fire match arson hotel

                I saw the man on the hill with the telescope.
                                                                       • “dog collar” vs. “flea collar”

                                                                       • plastic cat food can cover




                                                                11                                                          1
Natural Language Processing                                             Natural Language Processing




Semantics: lexical ambiguity

I walked to the bank ...
                                                                        Word Sense Humor
                      of the river.
                      to get money.

The bug in the room ...
                     was planted by spies.
                     flew out the window.

I work for John Hancock ...
                    and he is a good boss.
                    which is a good company.




                                                                   13                                               1



Natural Language Processing                                             Natural Language Processing




                                                                        Pragmatics
Discourse: coreference                                                  Rules of Conversation

                                                                          • Can you tell me what time it is?
     President John F. Kennedy was assassinated.
     The president was shot yesterday.
                                                                          • Could I please have the salt?
     Relatives said that John was a good father.
     JFK was the youngest president in history.
                                                                        Speech Acts
     His family will bury him tomorrow.
     Friends of the Massachusetts native will hold a candlelight
          service in Mr. Kennedy’s home town.                             • I bet you $50 that the Jazz will win.

                                                                          • Will you marry me?



                                                                   15                                               1
Natural Language Processing                                           Natural Language Processing




Pragmatics Humor

                                                                      World Knowledge

                                                                      John went to the diner. He ordered a steak. He left a tip and went home.

                                                                      John wanted to commit suicide. He got a rope.




                                                                 17                                                                              1



Natural Language Processing                                           Natural Language Processing




                                                                      Early Research In Machine Translation
History of NLP: 1950-1960
                                                                           I must go home
  • The Dartmouth Conference (1956)                                        → Ich muss nach Hause gehen.

                      The birth of artificial intelligence (AI)             hydraulic ram
                                                                           → water sheep
  • Heuristic search (Newell & Simon, 1956)
                                                                           out of sight, out of mind
  • LISP (McCarthy, 1960)                                                  → blind, crazy

  • Machine Translation                                                    The spirit is willing but the flesh is weak.
                                                                           → The vodka is good but the meat is rotten.




                                                                 19                                                                              2
Natural Language Processing                                               Natural Language Processing



History of NLP: 1960-1970                                                 A Sample Dialogue with ELIZA

THEMES: limited domains & applications; engineering.                      USER: Men are all alike.
                                                                          ELIZA: In what way?
BASEBALL (Green et al., 1963) : database query generation                 USER: They’re always bugging us about something or other.
                                                                          ELIZA: Can you think of a specific example?
     Ex: “Who did the Yankess play on July 7?”                            USER: Well, my boyfriend made me come here.
     → (OR (July 7 Yankees ?ANSWER) (July 7 ?ANSWER Yankees))             ELIZA: Your boyfriend made you come here.
                                                                          USER: He says I’m depressed much of the time.
STUDENT (Bobrow, 1968) : translated algebra word problems                 ELIZA: I am sorry to hear you are depressed.
                                                                          USER: It’s true, I am unhappy.
     Ex: “John’s age is two times Mary’s age”                             ELIZA: Do you think coming here will help you not to be unhappy?
     → John Age = 2 * Mary Age                                            USER: I need some help; that much seems certain.
                                                                          ELIZA: What would it mean to you if you got some help?
ELIZA (Weizenbaum, 1966)                                                  USER: Perhaps I could learn to get along with my mother.
    engaged in dialogue as psychotherapist & patient                      ELIZA: Tell me more about your family.


                                                                     21                                                                      2



Natural Language Processing                                               Natural Language Processing


                                                                          The Truth Behind ELIZA: Keywords and Pattern
                                                                          Matching!
USER: My mother takes care of me.
ELIZA: Who else in your family takes care of you?                         Each keyword had associated transformations, such as:
USER: My father.
ELIZA: Your father.                                                              <X> and <Y> me
USER: You are like my father in some ways.                                     → “What makes you think I <Y> you?”
ELIZA: What resemblance do you see?
USER: You are not very aggressive but I think you don’t want me to               <X> are you <Y>
notice that.                                                                   → “Would you prefer it if I weren’t <Y>”
ELIZA: What makes you think I am not very aggressive?
USER: You don’t argue with me.                                                   alike
ELIZA: Why do you think I don’t argue with you?                                → “In what way?”
USER: You are afraid of me.                                                    → “What resemblance do you see?”
ELIZA: Does it please you to think I am afraid of you?
USER: My father is afraid of everybody.                                   If no patterns matched, then default strings were printed.
ELIZA: What else comes to mind when you think of your father?             Ex: “Please go on” or “Very interesting”

                                                                     23                                                                      2
Natural Language Processing                                   Natural Language Processing


                                                              History of NLP: 1970-1980

                                                              THEMES: semantic information processing; strong methods.
Limitations of ELIZA
                                                              LUNAR (Woods, 1970) – augmented transition networks (ATNs)
  • conversations quickly got repetitive.

                                                              SHRDLU (Winograd, 1972)
  • conversations quickly got repetitive.
                                                                 procedural semantics: word “definitions” were actions executed via
                                                                 program fragments.
  • typos often got echoed back to the user.
                                                              Conceptual Knowledge Structures
  • responses sometimes were completely ungrammatical.
                                                                    • conceptual dependency theory (Schank, 1975)
  • most people couldn’t be fooled for very long!                   • scripts, plans, goals (Schank & Abelson, 1977)
                                                                    • MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM
                                                                      (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert,
                                                                      1977), Politics (Carbonell, 1979), Plot Units (Lehnert 1981)...
                                                         25                                                                             2



Natural Language Processing                                   Natural Language Processing



                                                              History of NLP: 1991 - present

History of NLP: 1980-1990                                       • statistical and machine learning methods are dominant!


THEMES: general processing techniques; weak methods.            • emphasis on empirical methods with very large corpora


KODIAK (Wilensky, 1986)                                         • unconstrained, real texts
   knowledge representation language
                                                                • automated knowledge acquisition
Massively parallel parsing (Waltz & Pollack, 1985)
                                                                • but deep semantics only in limited domains
Marker Passing
   WIMP (Charniak, 1986), FAUSTUS (Norvig, 1987)                • large-scale evaluations

                                                                • some areas starting to break through commercially!


                                                         27                                                                             2