Docstoc

A.I

Document Sample
A.I Powered By Docstoc
					        Artificial
      Intelligence
Natural Language Processing

      Dr Alexiei Dingli

                              1
        Aims of NLP?
• Trying to make computers talk
• Give computers the linguistic
  abilities of humans




                                  2
           1940’s - 1950’s

• Turing’s (1936)
   – model of algorithmic computation

• McCulloch-Pitts neuron (McCulloch and Pitts, 1943)
   – a simplified model of the neuron as a kind of
     computing element (propositional logic)

•   Kleene (1951) and (1956)
    – finite automata and regular expressions.

• Shannon (1948)
   – probabilistic models of discrete Markov
     processes to automata for language.

• Chomsky (1956)
   – finite state machines as a way to characterize a
     grammar                                         3
              1940’s - 1950’s

Speech and language processing
• Shannon
   – metaphor of the noisy channel
   – entropy as a way of measuring the information
     capacity of a channel
• Foundational research in phonetics
• First machine speech recognizers (early
  1950s).
   – 1952, Bell Lab, statistical system that could
     recognize any of the 10 digits from a single
     speaker (Davis et al., 1952)



                                                     4
             1940’s - 1950’s

One of the earliest applications of
  computers
• Major attempts in US and USSR
   – Russian to English and reverse
• George Town University, Washington
  system:
   – Translated sample texts in 1954
• The ALPAC report (1964)
   – Assessed research results of groups working
     on MTs
       •   Concluded: MT not possible in near future
       •   Funding should cease for MT !
       •   Basic research should be supported
       •   Word to word translation does not work
             – Linguistic Knowledge is needed
                                                       5
       1950’s - 1970’s
     Symbolic paradigm
Formal language theory and
  generative syntax
• 1957 Noam Chomsky's Syntactic
  Structures
  – A formal definition of grammars and
    languages
  – Provides the basis for an automatic
    syntactic processing of NL expressions

• 1967 : Woods procedural
  semantics
  – A procedural approach to the
    meaning of a sentence
  – Provides the basis for a automatic
    semantic processing of NL expressions
                                             6
        1950’s - 1970’s
      Symbolic paradigm

Parsing algorithms
  – top-down and bottom-up
  – dynamic programming
  – Transformations and Discourse
    Analysis Project (TDAP)
     • Harris, 1962
     • Joshi and Hopely (1999) and
       Karttunen (1999),
     • cascade of finite-state transducers
                                             7
         1950’s - 1970’s
       Symbolic paradigm

AI
• Summer of 1956 :John McCarthy, Marvin
   Minsky, Claude Shannon, and Nathaniel
   Rochester
    – work on reasoning and logic
• Newell and Simon - the Logic Theorist and
  the General Problem Solver Early natural
  language understanding systems
    – Domains
    – Combination of pattern matching and keyword
      search
    – Simple heuristics for reasoning and question-
      answering

•   Late 1960s - more formal logical systems

                                                      8
           1950’s - 1970’s
         Statistical paradigm
• Bayesian method to the problem of optical
  character recognition.
    – Bledsoe and Browning (1959) : Bayesian text-
      recognition
        • a large dictionary
        • compute the likelihood of each observed letter
          sequence given each word in the dictionary
        • Joshi and Hopely (1999) and Karttunen (1999)
             – cascade of finite-state transducers likelihoods for each
               letter.

• Bayesian methods to the problem of authorship
  attribution on The Federalist papers
    – Mosteller and Wallace (1964)
• Testable psychological models of human language
  processing based on transformational grammar
• Resources
    – First online corpora: the Brown corpus of American
      English
    – DOC (Dictionary on Computer)
    – an on-line Chinese dialect dictionary.                              9
      Symbolic vs statistical
          approaches
Symbolic
• Based on hand written rules
• Requires linguistic expertise
• No frequencey information
• More brittle and slower than statistical approaches
• Often more precise than statistical approaches
• Error analysis is usually easier than for statistical
   approaches

Statistical
• Supervised or non-supervised
• Rules acquired from large size corpora
• Not much linguistic expertise required
• Robust and quick
• Requires large size (annotated) corpora
• Error analysis is often difficult                   10
              1970-1983
         Statistical paradigm

Speech recognition algorithms
• Hidden Markov model (HMM) and the
  metaphors of the noisy channel and decoding
   – Jelinek, Bahl, Mercer, and colleagues at IBM’s
     Thomas J. Watson Research Center,
   – Baker at Carnegie Mellon University




                                                      11
           1970-1983
     Logic-based paradigm

• Q-systems and metamorphosis
  grammars (Colmerauer, 1970,
  1975)
• Definite Clause Grammars
  (Pereira and Warren, 1980)
• Functional grammar (Kay,1979)
• Lexical Functional Grammar (LFG)
  (Bresnan and Kaplan’s,1982)
                                  12
           1970-1983
Natural Language Understanding

• SHRDLU system : simulated a robot
  embedded in a world of toy blocks
  (Winograd, 1972a).
   – natural-language text commands
      • Move the red block on top of the smaller
        green one
      • complexity and sophistication
   – first to attempt to build an extensive (for
     the time) grammar of English (based on
     Halliday’s systemic grammar)

                                                   13
          1970-1983
       Natural Language
        Understanding
• Yale School : series of language
  understanding programs
  – conceptual knowledge (scripts,
    plans, goals..)
  – human memory organization
  – network-based semantics (Quillian,
    1968)


                                     14
                 1983-1993
• Return of state models
    – Finite-state phonology and morphology (Kaplan and
      Kay, 1981)
    – Finite-state models of syntax by Church (1980).

• Return of empiricism
    – Probabilistic models throughout speech and language
      processing,
        • IBM Thomas J. Watson Research Center: probabilistic
          models of speech recognition.
        • Data-driven approaches
    – Speech - part-of-speech tagging, parsing, attachment
      ambiguities, semantics.

• New focus on model evaluation

• Considerable work on natural language generation
                                                                15
                   1994-1999
Major changes
• Probabilistic and data-driven models had become quite
  standard
• Parsing, part-of-speech tagging, reference resolution,
  and discourse processing
    – Algorithms incorporate probabilities
    – Evaluation methodologies from speech recognition and
      information retrieval.
• Increases in the speed and memory of computers
    – commercial exploitation (speech recognition, spelling and
      grammar correction)

• Rise of the Web
    – need for language-based information retrieval and
      information extraction.


                                                             16
           1994-1999
     Ressources and corpora
• Disk space becomes cheap
• Machine readable text become common
• US funding emphasises large scale
  evaluation on « real data »
• 1994 : The British National Corpus is
  made available
   – A balanced corpus of British English

• Mid 1990s : WordNet (Fellbaum & Miller)
   – A computational thesaurus developed by
     psycholinguists

• The World Wide Web used as a corpus
                                              17
               2000-2008
            Empiricist trends 1
• Spoken and written material widely available
    – Linguistic Data Consortium (LDC) ...
    – Annotated collections (standard text sources with
      various forms of syntactic, semantic, and pragmatic
      annotations)
        •   Penn Treebank (Marcus et al., 1993),)
        •   PropBank (Palmer et al., 2005),
        •   TimeBank (Pustejovsky et al., 2003b)
        •   ....

    – More complex traditional problems castable in
      supervised machine learning
        • Parsing and semantic analysis

    – Competitive evaluations
        • Parsing (Dejean and Tjong Kim Sang, 2001),
        • Information extraction (NIST, 2007a; Tjong Kim Sang, 2002;
          Tjong Kim Sang and De Meulder,
        • 2003)
        • Word sense disambiguation (Palmer et al., 2001; Kilgarriff
          and Palmer, 2000)
        • Question answering (Voorhees and Tice, 1999), and       18
          summarization (Dang, 2006).
19
         2000-2008
      Empiricist trends 2

• More serious interplay with the
  statistical machine learning
  community
  – Support vector machines (Boser
    et al., 1992; Vapnik, 1995)
  – Maximum entropy techniques
    (multinomial logistic regression)
    (Berger et al., 1996)
  – Graphical Bayesian models (Pearl,
    1988)
                                        20
           2000-2008
        Empiricist trends 2

Largely unsupervised statistical
  approaches
   – Statistical approaches to machine
     translation (Brown et al., 1990; Och and
     Ney, 2003) t
   – Topic modeling (Blei et al., 2003)

• Effective applications could be
  constructed from systems trained on
  unannotated data alone

• Use of unsupervised techniques

                                                21
    Elements of a Language
•   Phonemes
•   Morphemes
•   Syntax
•   Semantics




                             22
   From sounds to language
• Linked with language understanding
• Carried out by the auditory cortex

• Basic sounds of language are Phonemes                    (sound)
    – Smallest phonetic unit in a language
    – Capable of conveying a distinction in meaning.
        • Eg: "M", in "man," and "c", in "can," are phonemes.
    – Every language has discrete set of phonemes
    – Describing all possible sounds

• Basic unit of words are Morphemes (to change form)
    – A meaningful linguistic unit
    – Consisting of a root word or a word element that
      cannot be divided into smaller meaningful parts.
        • Eg: "Pick" and "s", in the word "picks," are morphemes

                                                                   23
              NATO Phonetic Alphabet
A - Alpha     K - Kilo       U - Uniform           0 - Zero

B - Bravo     L - Lima       V - Victor            1 - Wun (One)

C - Charlie   M - Mike       W - Whiskey           2 - Two


D - Delta     N - November   X - X-ray             3 - Tree (Three)

E - Echo      O - Oscar      Y - Yankee            4 - Fower (Four)

F - Foxtrot   P - Papa       Z - Zulu              5 - Fife (Five)

G - Golf      Q - Quebec                           6 - Six

H - Hotel     R - Romeo      . - decimal (point)   7 - Seven

I - India     S - Sierra     . - (full) stop       8 - Ait (Eight)

J - Juliet    T - Tango                            9 - Niner (Nine)



                                                                     24
                            Exercise
    Word     Morpheme                Phoneme

Bay        Bay (1)           B + ay (2)
                   ?                      ?
Pots              (2)
           Pot + s?                      s
                             P + o + t + ? (4)

A          A (1)             A (1)
                   ?                      ?
Teacher    Teach + er (2)    T + ea + ch + e + r (5)
                   ?                      ?

                                                   25
                            Exercise
    Word     Morpheme                Phoneme

Bay        Bay (1)           B + ay (2)

Pots       Pot + s (2)       P + o + t + s (4)

A          A (1)             A (1)

Teacher    Teach + er (2)    T + ea + ch + e + r (5)



                                                   26
             Syntax
      structure of language

• Languages have structure:
  – not all sequences of words over the
    given alphabet are valid
  – when a sequence of words is valid
    (grammatical), a natural structure can
    be induced on it



                                        27
                 Syntax
• Describes the constituent structure
  of NL expressions
   – (I (am sorry)), Dave, ( I ((can’t do)
     that))
• Grammars are used to describe the
  syntax of a language
• Syntactic analysers and surface
  realisers assign a syntactic structure
  to a string/semantic representation
  on the basis of a grammar

                                             28
               Syntax
• It is useful to think of this
  structure as a tree:
   – represents the syntactic structure of
     a string according to some formal
     grammar.
   – the interior nodes are labeled by
     non-terminals of the grammar,
     while the leaf nodes are labeled by
     terminals of the grammar

                                         29
                Syntax
             tree example

                 S

 NP                            VP

John         V                 NP            PP

       Adv           V   Det         n     Prep   NP

       often gives       a          book    to    Mary




                                                         30
     Methods in syntax
Words - syntactic tree
  – Algorithm: parser
     • A parser checks for correct syntax and builds
       a data structure.
  – Resources used: Lexicon + Grammar
  – Symbolic : hand-written grammar and
    lexicon
  – Statistical : grammar acquired from
    treebank
     • Treebank : text corpus in which each
       sentence has been annotated with syntactic
       structure.
     • Syntactic structure is commonly represented
       as a tree structure, hence the name
       treebank.
  – Difficulty: coverage and ambiguity
                                                 31
              Syntax
            applications
• For spell checking
   – *its a fair exchange  No syntactic
     tree
   – It’s a fair exchange  ok syntactic tree

• To construct the meaning of a
  sentence

• To generate a grammatical sentence

                                                32
Syntax to meaning

  John loves Mary

      love(j,m)




                    33
         Semantics
– Where the hell ‘d you get that idea
  HAL

– Dave, although you took thorough
  precautions in the pod against my
  hearing you, I could see your lips
  move




                                        34
                                         Lexical semantics
                                         Meaning of words
                                                            An idea
                 To get                    1.   a though or suggestion about a
1.    come to have or hold; receive.            possible course of action.
2.    succeed in attaining, achieving,     2.    a mental impression.
      or experiencing; obtain.
3.    experience, suffer, or be            3.    a belief.
      afflicted with.                      4.   (the idea) the aim or purpose.
4.    move in order to pick up, deal
      with, or bring.
                                                          The hell
5.    bring or come into a specified
      state or condition.                  1.   a place regarded in various religions as
6.    catch, apprehend, or thwart.              a spiritual realm of evil and suffering,
7.    come or go eventually or with             often depicted as a place of perpetual
      some difficulty.                          fire beneath the earth to which the
8.    move or come into a specified             wicked are sent after death.
      position or state                    2.   a state or place of great suffering.
...
                                           3.   a swear word that some people use
                                                when they are annoyed or surprised
                                                                                      35
Lexical semantics


        Who is the master?


                - Context?


                - Semantic relations?




                                   36
   Compositional semantics


• Where the hell did you get that idea?



  A swear word that some people use
  when they are annoyed or surprised
  or to emphasize something            Have this belief



                                                    37
Semantics issues in NLP

• Definition and representation of
  meaning
• Meaning construction
• Semantic relations
• Interaction between semantic and
  syntax


                                     38
           Pragmatics
• Knowledge about the kind of
  actions that speakers intend by their
  use of sentences
   – REQUEST: HAL, open the pod bay door.
   – STATEMENT: HAL, the pod bay door is
     open.
   – INFORMATION QUESTION: HAL, is the
     pod bay door open?

• Speech act analysis (politeness,
  irony, greeting, apologizing...)

                                        39
            Discourse
Where the hell'd you get that idea, HAL?




Dave and Frank were planning to
  disconnect me

 Much of language interpretation is
 dependent on the preceding
 discourse/dialogue
                                       40
 Linguistics knowledge in NLP
           summary

• Phonetics and Phonology —knowledge
  about linguistic sounds
• Morphology —knowledge of the
  meaningful components of word
• Syntax —knowledge of the structural
  relationships between word
• Semantics —knowledge of meaning
• Pragmatics — knowledge of the
  relationship of meaning to the goals and
  intentions of the speaker
• Discourse —knowledge about linguistic
  units larger than a single utterance
                                             41
           Ambiguity
          I made her duck

• I cooked duck for her.
• I cooked duck belonging to her.
• I caused her to quickly lower her
  head or body.



                                      42
            Ambiguity

• Sound-to- text issues:
  – Recognise speech.

• Speech act interpretation
  – Can you switch on the computer?’
     • Question or request?




                                       43
 Ambiguity vs paraphrase
• Ambiguity : the same sentence can
  mean different things

• Paraphrase: There are many ways of
  saying the same thing.
  –   Beer, please.
  –   Can I have a beer?
  –   Give me a beer, please.
  –   I would like beer.
  –   I’d like a beer, please.



                                       44
      Applications of NLP
•   IE
•   IR
•   QA
•   Dialogue Systems




                            45
          What is Question
           Answering?

• The main aim of QA is to present the user with a
  short answer to a question rather than a list of
  possibly relevant documents.

• As it becomes more and more difficult to find
  answers on the WWW using standard search
  engines, question answering technology will
  become increasingly important.




                                                     46
      Question Types (1)
• Clearly there are many different types of
  questions:

    – When was Mozart born?
       • Question requires a single fact as an
         answer.
       • Answer may be found verbatim in
         text i.e. “Mozart was born in 1756”.

    – How did Socrates die?
        • Finding an answer may require
          reasoning.
        • In this example die has to be linked
          with drinking poisoned wine.


                                                 47
   Question Types (2)
– How do I assemble a bike?
    • The full answer may require fusing information
      from many different sources.
    • The complexity can range from simple lists to
      script-based answers.


– Is the Earth flat?
    • Requires a simple yes/no answer.




                                                  48
  Evaluating QA Systems
• The biggest independent evaluations of question
  answering systems have been carried out at
  TREC (Text Retrieval Conference)

   – Five hundred factoid questions are provided
     and the groups taking part have a week in
     which to process the questions and return
     one answer per question.

   – No changes are allowed to your system
     between the time you receive the questions
     and the time you submit the answers.


                                                   49
             A Generic QA Framework

                                                              Document
             Search Engine                                                         Answers
                                                              Processing
Document                               Top n
Collection                           documents




                Questions                                      Questions




                •     A search engine is used to find the n most relevant documents in
                      the document collection

                •     These documents are then processed with respect to the
                      question to produce a set of answers which are passed back to
                      the user

                •     Most of the differences between question answering systems
                      are centred around the document processing stage
    A Simplified Approach
• The answers to the majority of factoid questions are
  easily recognised named entities, such as countries,
  cities, dates, peoples names, etc

• The relatively simple techniques of gazetteer lists and
  named entity recognisers allow us to locate these
  entities within the relevant documents – the most
  frequent of which can be returned as the answer

• This leaves just one issue that needs solving – how do
  we know, for a specific question, what the type of the
  answer should be



                                                         51
A Simplified Approach (1)
• The simplest way to determine
  the expected type of an answer is
  to look at the words which make
  up the question:

     • who – suggests a person
     • when – suggests a date
     • where – suggests a location


                                     52
 A Simplified Approach (2)
• Clearly this division does not account for
  every question but it is easy to add more
  complex rules:

       • country – suggests a location
       • how much – suggests an amount of
         money
       • author – suggests a person
       • birthday – suggests a date
       • college – suggests an organization

• These rules can be easily extended as we
  think of more questions to ask
                                               53
             Problems (1)
• The most frequently occurring instance of the
  right type might not be the correct answer.

   – For example if you are asking when someone was
     born, it maybe that their death was more notable
     and hence will appear more often (e.g. John F
     Kennedy’s assassination).


• There are many questions for which correct
  answers are not named entities:

   – How did Ayrton Senna die? – in a car crash

                                                    54
           Problems (2)
• The gazetteer lists and named
  entity recognisers are unlikely to
  cover every type of named entity
  that may be asked about:

  – Even those types that are covered may well
    not be complete.

  – It is of course relatively easy to build new lists,
    e.g. Birthstones.

                                                    55
    Does a gazetteer of people
      names contains all the
             names?
•   Amber
•   Precious
•   Diamond
•   Asia
•   Summer
•   Holly

• Are these person’s names?

                                 56
             Dialogue (1)
• A sequence of utterances
• Exchange of information among multiple
  dialogue participants
• Stays coherent over the time
• Driven by certain goal
   – finding the most suitable restaurant in a
     foreign city,
   – booking the cheapest flight to a given city,
   – controlling the state of the devices in a home,
   – or the goal might also be the interaction itself
     (chatting)

                                                    57
               Dialogue (2)
• Most natural means for communication for
  humans perceived as a very expressive, efficient
  and robust

• However, dialogue is very complex protocol
   – follow certain conventions or protocols that are adopted
     by participants
   – humans usually use their extensive
     knowledge and reasoning capabilities to
     understand the conversational partner
   – the dialogue utterances are often imperfect –
     ungrammatical or elliptical

                                                           58
              Ellipsis
• People often utter partial
  phrases to avoid repetition
  – A: At what time is “Titanic”
    playing?
  – B: 8pm
  – A: And “The 5th element”?

• It is necessary to keep track of
  the conversation to complete
  such phrases
                                     59
                     Deixis
• Some words can only be interpreted in
  context:
  – Previous context (anaphora)
     • “The monkey took the banana and ate it”
  – Future context (cataphora)
     • “Give me that. The book by the lamp.”
  – Temporal/spatial
     • “The man behind me will be dead tomorrow.”
     • (Who is the man? When he died/dies?)

                                                 60
     Indirect Meaning
• The meaning of a discourse may
  be far from literal.
  – B: I can’t reach him.
  – A: There is the telephone.
  – B: I am not in my office.
  – A: Okay.

• Undertones & implications are
  often employed for effect or
  efficiency
                                   61
              Turn Taking
• People seem to know very well when they
  can take their turn
   – There is little overlap (5%)
   – Gaps are often a few 1/10ths of a second
   – Appears fluid, but not obvious why


• A computational model of overlap does not
  exists
   – causes problem for dialogue systems




                                                62
   Conversational fillers
• Phrases like “a-ha”, “yes”, “hmm”
  or “eh” are often prompted in
  order to fill the pauses of the
  conversation, to indicate the
  attention or reflection

• The challenge here is to recognize
  when they should be understood
  as a request for turn taking and
  when they should be ignored
                                       63
    Most common dialogue
           domain
• Flight and train timetable information and
  reservation

• Smart homes

• Automated directory enquires
   – Yellow pages enquires
   – Weather information



                                               64
Components of a Dialogue
       System




                           65
      Automatic Speech
         Recognition
• Transforms speech to text
• Two basic types
  – Grammar-based ASR
     • The set of accepted phrases defined by
       regular/context-free grammars (i.e.
       language model in the form of a
       grammar)
     • Usually speaker independent
  – Dictation machine
     • Recognizes “any utterance”
     • N-gram language model
     • Often speaker dependent
                                           66
      Natural Language
       Understanding

• Analyzes textual utterance and
  returns its formal semantic
  representation
  – Logical formula
  – Named entities
  – etc



                                   67
      Dialogue Manager
• Coordinates activity of all components

• Maintains representation of the current
  state of the dialogue

• Communicates with external
  applications

• Decides about the next dialogue step


                                           68
       Three types of DM
• Finite-state
   – dialogue flow determined by a finite state
     automata
• Frame-based
   – form filling
• Plan (task) based
   – a dynamic plan is constructed to reach the
     dialogue goal


• … in practice, you often find an extended
  versions or combinations of above
  mentioned approaches!
                                                  69
Finite State Automata




                        70
Frame Based




              71
              Plan Based
• Take a problem solving approach
   – There are goals to be reached
   – Plans are made to reach those goals
   – The goals and plans of the other participants
     must be iteratively inferred or predicted
• Potential for handling complicated
  dialogues
   – suffers from today’s technological limitation
   – in more complex cases the planning problem
     can become computationally intractable


• Examples: Bathroom consultant

                                                     72
       Natural Language
         Generation
• Produces a textual utterance (so
  called surface realization) from an
  internal (formal) representation
  of the answer

• The surface realization can
  include formatting information
  – Speaking style, pauses
  – Background sounds

                                    73
       Text-To-Speech

• Transforms the surface realization
  into a an acoustic representation
  (sound signal)




                                   74
    Typical parameters
• Commercial systems:
  – small vocabulary (~100 words)
  – closed domain
  – system initiative
• Research systems:
  – larger (but still small) vocabulary
    (~10000 words)
  – closed domain
  – (limited) mixed initiative
                                          75
      Different Initiatives
• System-initiative
   – system always has control, user only
     responds to system questions
• User-initiative:
   – user always has control, system passively
     answers user questions
• Mixed-initiative:
   – control switches between system and user
     using fixed rules
• Variable-initiative:
   – control switches between system and user
     dynamically based on participant roles,
     dialogue history, etc.

                                                 76
        Multi Modal Dialogue
              Systems
• Several possible input/output modalities to
  communicate with dialogue systems
    – speech, text, pointing, graphics, gestures, face configurations,
      body positions, emotions, etc.
• Not single “most convenient” modality (different
  modalities have different advantages)
    –   entering day of week: click on a calendar
    –   entering Zip code: use keyboard
    –   performing commands: speech
    –   complex query: express them as typed natural language
• Several modalities useful
    – when one modality is not applicable - e.g. eyes or hands are
      busy, silent environment
    – or when difficult to use - e.g. small devices with limited
      keyboard and small screen

                                                                     77
        Case Study
• Eliza
• Comic
• Companions




                     78
                Eliza
• Eliza, the first chatterbot : a
  computer program that mimics
  human conversation.
   – Joseph Weizenbaum, Massachusetts
     Institute of Technology
• User types in natural language
• ELIZA then analyzes the user's
  statement and generates some
  response which it types out.
• http://www-ai.ijs.si/eliza/eliza.html
                                          79
                 ELIZA
          Sample conversation
•   Men are all alike.
•   IN WHAT WAY?
•   They're always bugging us about something or other.
•   CAN YOU THINK OF A SPECIFIC EXAMPLE?
•   Well, my boyfriend made me come here.
•   YOUR BOYFRIEND MADE YOU COME HERE
•   He says I'm depressed much of the time.
•   I AM SORRY TO HEAR YOU ARE DEPRESSED
•   It's true. I am unhappy
•   DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE
•   UNHAPPY
•   I need some help, that much seems certain.
•   WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP
•   Perhaps I could learn to get along with my mother.
•   TELL ME MORE ABOUT YOUR FAMILY



                                                          80
             Eliza's strategy
                Keywords
• Input read and inspected for the presence of
  a keyword

• When keyword is found, the sentence is
  transformed according to a rule associated
  with the keyword

• Else a content-free remark or, under certain
  conditions, an earlier transformation is
  retrieved.

• The text so computed or retrieved is then
  printed out

                                                 81
            ELIZA
   Example transformations
"I am X"  "How long have you been X?«

      I am very unhappy.
      HOW LONG HAVE YOU BEEN VERY
        UNHAPPY?

"X YOU Y ME"  "What makes you think I Y
   you?«

      It seems that you hate me.
      WHAT MAKES YOU THINK THAT I
                                           82
         HATE YOU?
               ELIZA
          Why does it work?

• Rogerian psychotherapist  Can pretend
  to know almost nothing of the real world
      "I went for a long boat ride"
      "Tell me about boats«

• Not a real dialog (no real communication)
   – ELiza generates a reaction (often a rather non
     commital one) to the user's statement
   – The user cannot ask for information.
   – The user cannot issue an order

                                                  83
                 ELIZA
          Does it really work?
•   Eliza: Hello. I am ELIZA. How can I help you?
•   You: I went to Paris yesterday
•   Eliza: Can you elaborate on that?
•   You: I took the train
•   Eliza: Can you elaborate on that?
•   You: The train was late
•   Eliza: Please go on.
•   You: All the seats were full
•   Eliza: Please go on.
•   You: A little boy was having a tantrum
•   Eliza: Please go on.

                                                    84
              ELIZA
       Does it really work?
Indexicals (I, you, here, now)
     User: I am having fun right now.
     Eliza: Does it please you to believe that I
       am having fun right now.


World Knowledge
     User: I am a human.
     Eliza: How long have you been a human?



                                               85
               When Eliza fails
•   Yes/No Questions
          User: Are you a good therapist?
          Eliza: Would you prefer it if I were a good therapist?
          User: I'm wondering about your skills as a therapist;
          Eliza: Does it please you to believe I am wondering about
    my skills of therapist?

•   Wh-Questions
          User: Who are you?
          Eliza: Would you prefer it if I were not?
          User: What do you want me to tell?
          Eliza: Does that question interest you?

•   Complex Assertions
            User: I don't have a problem. I'm quite happy, I have a
    girlfriend, I study. Everything is OK!
            Eliza: Why do you not have any problem, I'm quite happy,
    you have a girlfriend, you study. Everything is OK!

                                                                      86
  Eliza is not intelligent but...
• Regardless of what people believe or know about the
  inner workings of computers, they talk about them and
  interact with them as social entities.
• Weizenbaum (1976) notes that many people continued to
  believe in ELIZA’s abilities even after the program’s
  operation was explained to them.
• People act toward computers as if they were people and
  expect that computers should be able to understand their
  needs and be capable of interacting with them naturally.
• Given these predispositions, speech- and language-based
  system are not supposed to be intelligent
• But they may provide users with the most
  natural interface for many applications

                                                      87
The Comic Avatar




                   88
Wizard of Oz




               89
Putting it together




                      90
The Companions
  Architecture




                 91
The Companions Robot




                       92
The Companions Interface 1




                             93
The Companions Interface 2




                             94
  What is Named Entity
     Recognition?
• Identification of proper names in texts, and
  their classification into a set of predefined
  categories of interest

• Persons

• Organisations (companies, government
  organisations, committees, etc)

• Locations (cities, countries, rivers, etc)

• Date and time expressions

• Various other types as appropriate

                                               95
   Why is NE important?
• NE provides a foundation from which to build
  more complex IE systems

• Relations between NEs can provide tracking,
  ontological information and scenario building

• Tracking (co-reference) “Dr Head, John, he”




                                                  96
                Two kinds of approaches

Knowledge Engineering            Learning Systems
• rule based
                                 • use statistics or other
• developed by experienced         machine learning
  language engineers
• make use of human intuition    • developers do not need
                                   expertise
• require only small amount of
  training data                  • require large amounts of
• development can be very time     annotated training data
  consuming                      • some changes may require
• some changes may be hard         re-annotation of the entire
  to accommodate                   training corpus


                                                             97
       Typical NE pipeline

• Pre-processing (tokenisation,
  sentence splitting, morphological
  analysis, POS tagging)
• Entity finding (gazeteer lookup,
  NE grammars)
• Coreference (alias finding,
  orthographic coreference etc.)
• Export to database / XML

                                      98
            GATE and ANNIE
• GATE (Generalised Architecture for Text
  Engineering) is a framework for language
  processing

• ANNIE (A Nearly New Information Extraction
  system) is a suite of language processing tools,
  which provides NE recognition

GATE also includes:
• plugins for language processing, e.g. parsers,
  machine learning tools, stemmers, IR tools, IE
  components for various languages etc.
• tools for visualising and manipulating ontologies
• ontology-based information extraction tools
• evaluation and benchmarking tools

                                                     99
GATE




       100
     Information
Extraction vs. Retrieval



      IR            IE
             




                           101
A couple of approaches …

• Active learning to reduce annotation
  burden
   – Supervised learning
   – Adaptive IE
   – The Melita methodology

• Automatic annotation of large
  repositories
   – Largely unsupervised
   – Armadillo
 The Seminar Announcements
            Task
• Created by Carnegie Mellon School of
  Computer Science

• How to retrieve
   –   Speaker
   –   Location
   –   Start Time
   –   End Time

• From seminar announcements
  received by email

                                     103
    Seminar Announcements
           Example
Dr. Steals presents in Dean Hall at
  one am.

              becomes

<speaker>Dr. Steals</speaker>
  presents in <location>Dean
  Hall</location> at <stime>one
  am</stime>.
                                      104
        Information Extraction
              Measures
• How many documents out of the retrieved documents
  are relevant?




• How many retrieved documents are relevant out of all
  the relevant documents?




• Weighted harmonic mean of precision and recall



                                                    105
   IE Measures Examples
• If I ask the librarian to search for
  books on cars, there are 10
  relevant books in the library and
  out of the 8 he found, only 4
  seem to be relevant books. What
  is his precision, recall and f-
  measure?



                                     106
   IE Measures Answers
• If I ask the librarian to search for books
  on cars, there are 10 relevant books in
  the library and out of the 8 he found,
  only 4 seem to be relevant books.
  What is his precision, recall and f-
  measure?

• Precision = 4/8 = 50%
• Recall    = 4/10 = 40%
• F         =(2*50*40)/(50+40) = 44.4%

                                          107
             Adaptive IE
• What is IE?
   – Automated ways of extracting
     unstructured or partially structured
     information from machine readable files

• What is AIE?
   – Performs tasks of traditional IE
   – Exploits the power of Machine Learning
     in order to adapt to
      • complex domains having large amounts of
        domain dependent data
      • different sub-language features
      • different text genres
   – Considers important the Usability and
     Accessibility of the system                  108
      What is adaptable?
• New domain information
   – Based upon an ontology which can change

• Different sub-language features
   – POS, Noun chunks, etc

• Different text genres
   – Free text, structured, semi-structured, etc

• Different types
   – Text, String, Date, Name, etc
                                               109
       Amilcare
• Tool for adaptive IE from Web-related
  texts
   – Specifically designed for document
     annotation
   – Based on (LP)2 algorithm
     *Linguistic Patterns by Learning Patterns

      • Covering algorithm based on Lazy NLP
      • Trains with a limited amount of examples
      • Effective on different text types
            – free texts
            – semi-structured texts
            – structured texts

   – Uses Gate and Annie for preprocessing
                                                   110
              CMU: detailed results
            (LP)2   BWI    HMM     SRV Rapier Whisk
 speaker     77.6   67.7    76.6   56.3 53.0   18.3
 location    75.0   76.7    78.6   72.3 72.7   66.4
   stime     99.0   99.6    98.5   98.5 93.4   92.6
   etime     95.5   93.9    62.1   77.9 96.2   86.0
All Slots    86.0   83.9    82.0   77.1 77.3   64.9

 1. Best overall accuracy
 2. Best result on speaker field
 3. No results below 75%
                 RULIE:
     Rule Unification for learning IE

            (LP)2 RULIE
 speaker     77.6  82.0
 location    75.0  80.0
   stime     99.0  99.0
   etime     95.5  98.0
All Slots    86.0  89.7
     IE by example (1)
   the seminar at 4 pm will ...

How can we learn a rule to extract
        the seminar time?




                                  113
IE by example (2)




                    114
IE by example (3)




                    115
      Shallow Vrs Deep
        Approaches
• Shallow approach
  – Uses syntax primarily
     • Tokenisation, POS, etc.


• Deep approach
  – Uses syntactic information
  – Uses semantics (Named entity, etc)
  – Heuristics (World rules, Brother is
    male)
  – Additional knowledge
                                      116
    Single Vrs Multi Slot
• Single
  – Extract one element at a time
     • The seminar is at 4pm.


• Multi Slot
  – Extract several concepts
    simultaneously
     • Tom is the brother of Mary.
           – Brother(Tom, Mary)

                                     117
Top-Down Vrs Bottom Up
• Top-Down
  – Starts from a generic rule and
    specialise it


• Bottom Up
  – Starts from a specific rule and
    relax it



                                      118
Top Down




           119
Bottom Up




            120
 Overfitting Vrs Underfitting
• Underfitting
  – When the learner does not manage
    to detect the full underlying model
  – Produces excessive bias


• Overfitting
  – When the learner fits the model
    and the noise

                                      121
 Stages of document processing
• Document selection involves identification and
  retrieval of potentially relevant documents from
  a large set (e.g. the web) in order to reduce the
  search space. Standard or semantically-enhanced
  IR techniques can be used for this.

• Document pre-processing involves cleaning and
  preparing the documents, e.g. removal of
  extraneous information, error correction, spelling
  normalisation, tokenisation, POS tagging, etc.

• Document processing consists mainly of
  information extraction

                                                  122
       Metadata extraction
• Metadata extraction consists of two
  types:

   – Explicit metadata extraction involves
     information describing the document, such
     as that contained in the header information
     of HTML documents (titles, abstracts,
     authors, creation date, etc.)

   – Implicit metadata extraction involves
     semantic information deduced from the
     material itself, i.e. endogenous information
     such as names of entities and relations
     contained in the text. This essentially
     involves Information Extraction techniques,
     often with the help of an ontology.
                                                    123
IE for Document Access
 • With traditional query engines, getting the
   facts can be hard and slow

        • Where has the President visited in the
          last year?
        • Which places in Europe have had cases
          of Bird Flu?

 • Which search terms would you use to get this
   kind of information?
 • How can you specify you want someone’s
   home page?

 • IE returns information in a structured way
 • IR returns documents containing the relevant
   information somewhere (if you’re lucky)

                                                   124
  IE as an alternative to IR
• IE returns knowledge at a much deeper
  level than traditional IR

• Constructing a database through IE and
  linking it back to the documents can
  provide a valuable alternative search tool.

• Even if results are not always accurate,
  they can be valuable if linked back to the
  original text

                                                125
   Try IE yourself ... (1)
• Given a particular text ...
• Find all the successions ...
   – Hint there are 6 including the one
     below
   – Hint we do not have complete
     information

E.g.
<SUCCESSION-1>
   –   ORGANIZATION : “New York Times”
   –   POST : "president"
   –   WHO_IS_IN : “Russell T. Lewis”
   –   WHO_IS_OUT : “Lance R. Primis”     126
<DOC>
<DOCID> wsj93_050.0203 </DOCID>
<DOCNO> 930219-0013. </DOCNO>
<HL> Marketing Brief: @ Noted.... </HL>
<DD> 02/19/93 </DD>
<SO> WALL STREET JOURNAL (J), PAGE B5 </SO>
<CO> NYTA </CO>
<IN> MEDIA (MED), PUBLISHING (PUB) </IN>
<TXT>
<p> New York Times Co. named Russell T. Lewis, 45, president and general manager
   of its flagship New York Times newspaper, responsible for all business-side
   activities. He was executive vice president and deputy general manager. He
   succeeds Lance R. Primis, who in September was named president and chief
   operating officer of the parent.
</p>
</TXT>
</DOC>


                                                                            127
           Answer (1)
<SUCCESSION-2>
   ORGANIZATION : "New York Times"
   POST : "general manager"
   WHO_IS_IN : "Russell T. Lewis"
   WHO_IS_OUT : "Lance R. Primis"

<SUCCESSION-3>
   ORGANIZATION : "New York Times"
   POST : "executive vice president"
   WHO_IS_IN :
   WHO_IS_OUT : "Russell T. Lewis"



                                       128
            Answer (2)
<SUCCESSION-4>
   ORGANIZATION : "New York Times"
   POST : "deputy general manager"
   WHO_IS_IN :
   WHO_IS_OUT : "Russell T. Lewis"

<SUCCESSION-5>
   ORGANIZATION : "New York Times Co."
   POST : "president"
   WHO_IS_IN : "Lance R. Primis"
   WHO_IS_OUT :



                                         129
               Answer (3)
<SUCCESSION-6>
   ORGANIZATION : "New York Times Co."
   POST : "chief operating officer"
   WHO_IS_IN : "Lance R. Primis"
   WHO_IS_OUT :




                                         130
Questions?




             131

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:22
posted:8/8/2011
language:English
pages:131