Worst Business Model Dot Com Boom - PowerPoint by lph11510

VIEWS: 25 PAGES: 67

More Info
									  It is the best of times
(and the worst of times)
      Kenneth Church
         Microsoft
   church@microsoft.com
                   Responsibility; Attribute                                          Interesting &
                                     Wow!
                Dangerous Positions to Others                                         Controversial
                (What a difference a decade makes)
• Empiricism has come of age                          Lonely                   Preaching to Choir
         – Radical Fringe  Mainstream
• 1993: Workshop on Very Large                                       100%




                                                     % Statistical
  Corpora (WVLC)                                                      80%




                                                       Papers
         – Intended to be a 1-time event                              60%
         – But so successful that it                                  40%
           evolved into a series of                                   20%
           EMNLP conferences
                                                                       0%
• EMNLP-2004 received so




                                                                            1985

                                                                                   1990

                                                                                          1995

                                                                                                 2000

                                                                                                        2005
  many submissions that the
  program committee had to be
  expanded at the last minute                                                      ACL Meeting
         – Success/Catastrophe                                       Bob Moore        Fred Jelinek




July 25, 2004                       EMNLP-2004 & Senseval-2004                                                 2
         The Structure of Scientific
      Revolutions (1962) – Kuhn (p.10)
•               Paradigms
         –        Examples from Physics
                  •   Aristotle’s Physica
                  •   Ptolemy’s Almagest
                  •   Newton’s Principia and Optics
                  •   Franklin’s Electricity
                  •   Lavoisier’s Chemistry
                  •   Lyell’s Geology
•               Two characteristics:
         1.       Sufficiently unprecedented to attract an enduring group of
                  adherents from competing modes of scientific activity
         2.       Simultaneously, sufficiently open-ended to leave all sorts of
                  problems for the redefined group of practitioners to resolve


July 25, 2004                           EMNLP-2004 & Senseval-2004                3
                Organizational Innovations
                        (Radical  Mainstream)
• Late Submission Deadline
         – Immediately after ACL notifications
                • ACL was rejecting good papers for bad reasons    Innovation
         – Short review cycles  Freshness
• Invest in the Future: Encourage Innovation
         – Chair (Energetic, Promising, Source of new ideas)
         – Co-chair (Established, Knows how it has been done)
• Avoid incremental papers
         – Reviewers prefer boring papers over radical ones
         – Reviewers do what reviewers do; chairs  correction
• Inclusiveness: Diversity  Growth (Sales)
         – Thankless chores  Marketing carrots
         – 1/3 promising, 1/3 stability, 1/3 outreach               Checks &
         – Hold conferences in Europe, Asia & America               Balances

July 25, 2004
                       Short term ≠ Long term
                                      EMNLP-2004 & Senseval-2004          4
  What Worked and What Didn’t?
                                                                  Data
•               Stay on msg: It is data, stupid!
         –        WVLC (Very Large) >> EMNLP (Empirical Methods)
         –        If you have a lot of data,         Methodology
                 •   Then you don’t need a lot of methodology
•               Empiricism means diff things to diff people
         1. Machine Learning (Self-organizing Methods)
         2. Exploratory Data Analysis (EDA)
         3. Corpus-Based Lexicography         Kucera & Francis gave
                                               great invited talk
•         Lots of papers on 1                  (but they couldn’t
         – EMNLP-2004 theme (error analysis)  2submitted talks)
                                           follow
         –        Senseval grew out of 3


July 25, 2004                        EMNLP-2004 & Senseval-2004          5
  Word Sense Disambiguation (WSD) History

• Bar-Hillel (1960):                                • Yarowsky:
         – Abandoned Machine                                – Parallel corpus 
           Translation (MT)                                   encyclopedia + thesaurus
         – Couldn’t see how to make                         – Bilingual ≠ Monolingual
           progress on WSD (pen)                                 • interest
         – Can’t translate without                               • wear
           disambiguating                                   – ML: Co-training
                • bank (money)  banque                          • Supervised 
                • bank (river)  banc                              Unsupervised
• 1990s                                             • Lexicography: Hector
         – Parallel text ≈ Labeled                          – Joint collaboration: Oxford
           corpus for supervised                              University Press & DEC
           training and testing                             – flagging  flogging
         – Isn’t it great the translators           • Senseval
           have WSD labeled all this
           data for us!
July 25, 2004                       EMNLP-2004 & Senseval-2004                              6
                      A Road Rarely Taken:
       Tukey’s Exploratory Data Analysis (EDA)
• Linear Regression                                                         50000
                                                                            40000
         – Standard practice:                                               30000




                                                                     Time
                • Plug data into off-the-                                   20000
                  shelf package                                             10000
                • Publish (if “significant”)                                   0

         – Better:                                                                  0      10    20
                                                                                        Sentence Length
                                                                                                          30

                • Check for outliers                  No Result
                • Bowed residuals
                                                                Standard texts (e.g., Aho)…
                                                                   50000
                    – Evidence of a positive
                      or negative derivative                  consider … worst case… This
                                                                   40000
                                                                   30000




                                                                     Time
                • Deviations from                           assumption clearly fails to apply to
                                                                   20000
                  assumptions (normality)                         natural language… Our
                                                                   10000
                    – Fanout                                   experiments have shown that
                                                                       0
• Slocum’s Thesis (1981)                                    average-case time performance…
                                                                         0    10     20    30

         – “Proof” that CKY takes                                          Sentence Length
                                                             is approximately linear (p. 102)
           linear time
July 25, 2004                           EMNLP-2004 & Senseval-2004                                             7
            Many Machine Learning (ML) Techniques (SVMs,
             Perceptrons) are Similar to (Logistic) Regression;
            Rarely see EDA (Robust Statistical) Methods in ML




                                                             The Elements of Statistical Learning
                                                               – Hastie, Tibshirani, Friedman
                                                                       (2001), p 380
July 25, 2004                   EMNLP-2004 & Senseval-2004                                          8
                    Historical Context
                                                        Empiricists                           Rationalists
• 1950s:                                                feel lonely                           feel lonely
         – Rigorous methodology
                • Information theory                                     100%




                                                         % Statistical
                • Behaviorism                                             80%




                                                           Papers
• Unfulfilled unrealistic                                                 60%
                                                                          40%
  expectations video
                                                                          20%
         – ALPAC report                                                    0%
         – Whither Speech




                                                                                1985

                                                                                       1990

                                                                                               1995

                                                                                                      2000

                                                                                                             2005
           Recognition?                 Kuhn Crisis
• 1970s:                                                                               ACL Meeting
         – Let it all hang out
                                                                         Bob Moore        Fred Jelinek
                • Artificial Intelligence
                • Cognitive Psychology
• 1990s:                                              Kuhn Crisis
         – Revival of empiricism
July 25, 2004                           EMNLP-2004 & Senseval-2004                                                  9
Borrowed Slide: Jelinek (LREC)

      “Whither Speech Recognition?”
 Also, ALPAC (chair)
  & Bell Labs exec     Pierce, JASA 1969
         …ASR is attractive to money. The attraction is perhaps
           similar to the attraction of schemes for turning water
           into gasoline, extracting gold from the sea, or going
           to the moon.
         Most recognizers behave not like scientists, but like
           mad inventors or untrustworthy engineers.
         …performance will continue to be very limited unless
           the recognizing device understands what is being
           said with something of the facility of a native speaker
           (that is, better than a foreigner fluent in the language)
         Any application of the foregoing discussion to work in
           the general area of pattern recognition is left as an
           exercise for the reader.
   July 25, 2004                 EMNLP-2004 & Senseval-2004            10
      ALPAC (1966): the (in)famous report
                             John Hutchins
• The best known event in the history of MT is …
         – Automatic Language Processing Advisory Committee (ALPAC)
• Its effect was to bring to an end the substantial funding
  of MT research in US for some twenty years.
         – More significantly was the clear message to the general public
           and the rest of the scientific community that MT was hopeless.
         – For years afterwards, an interest in MT was something to keep
           quiet about; it was almost shameful.
         – To this day, the 'failure' of MT is still repeated by many as an
           indisputable fact.
• The impact of ALPAC is undeniable
         – While the fame or notoriety of ALPAC is familiar,
         – What the report actually said is now becoming less familiar and
           often forgotten or misunderstood…


July 25, 2004                     EMNLP-2004 & Senseval-2004                  11
Theory            ALPAC Recommendations
              The committee recommends expenditures in two distinct areas

  • Computational                                              •   Improvement of translation:
                                                                     1. practical methods for evaluation of
    linguistics as part of                                              translations;
                                                                     2. means for speeding up the human
    linguistics                                                         translation process;
                                                                     3. evaluation of quality and cost of various
           – Studies of parsing,                                        sources of translations;
             generation… including                                   4. investigation of the utilization of
                                                                        translations, to guard against
             experiments in                                             production of translations that are
                                                                        never read;
             translation…                                            5. study of delays in the over-all
                                                                        translation process, and means for
           – Linguistics should be                                      eliminating them, both in journals and
                                                                        in individual items;
             supported as science,                                   6. evaluation of the relative speed and
                  •   and should not be                                 cost of various sorts of machine-aided
                                                                        translation;
                      judged by any                                  7. adaptation of existing mechanized
                      immediate or                                      editing and production processes in
                      foreseeable contribution                          translation;
                      to practical translation                       8. the over-all translation process; and
                                                                     9. production of adequate reference
                                                                        works for the translator, including the
                                         Practice                       adaptation of glossaries that now exist
                                                                        primarily for automatic dictionary look-
  July 25, 2004                             EMNLP-2004 & Senseval-2004
                                                                        up in machine translation                12
          Best of Times   Outline
• We’re making consistent progress, or
• We’re running around in circles, or
         – Don’t worry; be happy
• We’re going off a cliff…




July 25, 2004             EMNLP-2004 & Senseval-2004   13
            Where have we been and where are we going?
                 Moore’s Law: Ideal Answer




                Moores: Bob ≠ Gorden ≠ Roger
July 25, 2004                EMNLP-2004 & Senseval-2004   14
 Borrowed Slide
Audrey Le (NIST)
      Error Rate




                                Moore’s Law Time Constant:
                                • 10x improvement per decade


                       Date (15 years)
   July 25, 2004   EMNLP-2004 & Senseval-2004             15
                Charles Wayne’s Challenge:
          Demonstrate Consistent Progress Over Time
                  Managing                      •       Controversial in 1980s
                 Expectations                           –    But not in 1990s
                                                        –    Though,  grumbling
                                                •        Benefits
                                                        1. Agreement on what to do
                                                        2. Limits endless discussion
                                                        3. Helps sell the field
                                                             •   Manage expectations
                                                             •   Fund raising
                                                •       Risks (similar to benefits)
                                                        1. All our eggs are in one
                                                           basket (lack of diversity)
                                                        2. Not enough discussion
                                                             •   Hard to change course
                                                        3. Methodology  Burden



July 25, 2004                   EMNLP-2004 & Senseval-2004                               16
                              Hockey Stick
                             Business Case
                $




                     2003                       2004        2005

                    Last
                                       This             t     Next
                                       Year
                    Year                                      Year
July 25, 2004              EMNLP-2004 & Senseval-2004                17
     Where have we been and where are we going?
            Consistent Progress over Time   Manage
                                                                                 Expectations
      Extrapolation/Prediction                         Extrapolation/Prediction
           is Applicable                                   is Not Applicable




                                                        $
                                                            2003   2004   2005
                                                                    t




July 25, 2004             EMNLP-2004 & Senseval-2004                                   18
                When will we see the last non-
                  statistical paper? 2010?
                   % Statistical   100%
                     Papers         80%
                                    60%
                                    40%
                                    20%
                                     0%
                                          1985

                                                      1990

                                                                    1995

                                                                           2000

                                                                                  2005
                                                     ACL Meeting
                                   Bob Moore               Fred Jelinek

July 25, 2004                              EMNLP-2004 & Senseval-2004                    19
            Top Ten Metrics of Success
                                                           Search
1.         Value Creation (Reality)
2.         Stock Prices (Belief)                          Speech
3.         Startup Companies Raise Venture Capital (Excitement)
4.         Prototype Applications (Plausibility)          Senseval
5.         Grand-Students (Survive the Test of Time)      wants to
6.         Students Get Good Jobs                  We      be here
7.         Students Finish PhD Theses              are
8.         Citations                              here
9.         Conference Registrations
10.        Publications (Quantity)


July 25, 2004                 EMNLP-2004 & Senseval-2004       20
                         Outline
• We’re making consistent progress, or
• We’re running around in circles, or
         – Don’t worry; be happy                       Best of Times
                                                          (Not!)
• We’re going off a cliff…
                                                         Been there;
                                                          Done that




July 25, 2004             EMNLP-2004 & Senseval-2004             21
                                      It has been claimed that
     Recent progress made possible by Empiricism
            Progress (or Oscillating Fads)?
•      1950s: Empiricism was at its peak
         – Dominating a broad set of fields
                • Ranging from psychology (Behaviorism)
                • To electrical engineering (Information Theory)
         – Psycholinguistics: Word frequency norms (correlated with reaction time, errors)
                • Word association norms (priming): bread and butter, doctor / nurse
         – Linguistics/psycholinguistics: focus on distribution (correlate of meaning)
                • Firth: “You shall know a word by the company it keeps”
                • Collocations: Strong tea v. powerful computers
•      1970s: Rationalism was at its peak
         – with Chomsky’s criticism of ngrams in Syntactic Structures (1957)
         – and Minsky and Papert’s criticism of neural networks in Perceptrons (1969).
•      1990s: Revival of Empiricism
         – Availability of massive amounts of data (popular arg, even before the web)
                • “More data is better data”
                • Quantity >> Quality (balance)
         – Pragmatic focus:
                • What can we do with all this data?
                • Better to do something than nothing at all
         – Empirical methods (and focus on evaluation): Speech  Language
•      2010s: Revival of Rationalism (?)
July 25, 2004                                 EMNLP-2004 & Senseval-2004                     22
                                      It has been claimed that
     Recent progress made possible by Empiricism
            Progress (or Oscillating Fads)?
•      1950s: Empiricism was at its peak
         – Dominating a broad set of fields
                • Ranging from psychology (Behaviorism)
                • To electrical engineering (Information Theory)
         – Psycholinguistics: Word frequency norms (correlated with reaction time, errors)
                • Word association norms (priming): bread and butter, doctor / nurse
         – Linguistics/psycholinguistics: focus on distribution (correlate of meaning)
                • Firth: “You shall know a word by the company it keeps”
                • Collocations: Strong tea v. powerful computers
•      1970s: Rationalism was at its peak
         – with Chomsky’s criticism of ngrams in Syntactic Structures (1957)
         – and Minsky and Papert’s criticism of neural networks in Perceptrons (1969).
•      1990s: Revival of Empiricism
         – Availability of massive amounts of data (popular arg, even before the web)
                • “More data is better data”
                • Quantity >> Quality (balance)
         – Pragmatic focus:
                • What can we do with all this data?
                • Better to do something than nothing at all
         – Empirical methods (and focus on evaluation): Speech  Language
•      2010s: Revival of Rationalism (?)
July 25, 2004                                 EMNLP-2004 & Senseval-2004                     23
                                      It has been claimed that
     Recent progress made possible by Empiricism
            Progress (or Oscillating Fads)?
•      1950s: Empiricism was at its peak
         – Dominating a broad set of fields
                • Ranging from psychology (Behaviorism)
                • To electrical engineering (Information Theory)
         – Psycholinguistics: Word frequency norms (correlated with reaction time, errors)
                • Word association norms (priming): bread and butter, doctor / nurse
         – Linguistics/psycholinguistics: focus on distribution (correlate of meaning)
                • Firth: “You shall know a word by the company it keeps”
                • Collocations: Strong tea v. powerful computers
•      1970s: Rationalism was at its peak
         – with Chomsky’s criticism of ngrams in Syntactic Structures (1957)
         – and Minsky and Papert’s criticism of neural networks in Perceptrons (1969).
•      1990s: Revival of Empiricism
         – Availability of massive amounts of data (popular arg, even before the web)
                • “More data is better data”
                • Quantity >> Quality (balance)
         – Pragmatic focus:
                • What can we do with all this data?
                • Better to do something than nothing at all
         – Empirical methods (and focus on evaluation): Speech  Language
•      2010s: Revival of Rationalism (?)
July 25, 2004                                 EMNLP-2004 & Senseval-2004                     24
                                      It has been claimed that
     Recent progress made possible by Empiricism
            Progress (or Oscillating Fads)?
•      1950s: Empiricism was at its peak                                   • Periodic signals are continuous
         – Dominating a broad set of fields                                • Support extrapolation/prediction
                • Ranging from psychology (Behaviorism)                    • Progress? Consistent progress?
                • To electrical engineering (Information Theory)
         – Psycholinguistics: Word frequency norms (correlated with reaction time, errors)
                • Word association norms (priming): bread and butter, doctor / nurse
         – Linguistics/psycholinguistics: focus on distribution (correlate of meaning)
                • Firth: “You shall know a word by the company it keeps”
                • Collocations: Strong tea v. powerful computers
•      1970s: Rationalism was at its peak
         – with Chomsky’s criticism of ngrams in Syntactic Structures (1957)
         – and Minsky and Papert’s criticism of neural networks in Perceptrons (1969).
•      1990s: Revival of Empiricism
         – Availability of massive amounts of data (popular arg, even before the web)
                • “More data is better data”
                • Quantity >> Quality (balance)               Consistent progress?
         – Pragmatic focus:
                • What can we do with all this data?
                • Better to do something than nothing at all
         – Empirical methods (and focus on evaluation): Speech  Language
•      2010s: Revival of Rationalism (?)                            Extrapolation/Prediction: Applicable?
July 25, 2004                                 EMNLP-2004 & Senseval-2004                              25
                                Speech  Language
                                 Has the pendulum
                                  swung too far?
   • What happened between TMI-1992 and TMI-2002 (if anything)?
   • Have empirical methods become too popular?
                – Has too much happened since TMI-1992?
   • I worry that the pendulum has swung so far that
                – We are no longer training students for the possibility
                    •   that the pendulum might swing the other way
   • We ought to be preparing students with a broad education including:
           – Statistics and Machine Learning
           – as well as Linguistic Theory
   •      History repeats itself: Mark Twain; bad idea then and still a bad idea now
           – 1950s: empiricism
           – 1970s: rationalism (empiricist methodology became too burdensome)
           – 1990s: empiricism
           – 2010s: rationalism (empiricist methodology is burdensome, again)


July 25, 2004                                EMNLP-2004 & Senseval-2004           26
                          Speech  Language
                           Has the pendulum
                            swung too far?
   • What happened between TMI-1992 and TMI-2002 (if anything)?
   • Have empirical methods become too popular?
                                                                Plays well at
       – Has too much happened since TMI-1992?
                                                                  Machine
   • I worry that the pendulum has swung so far that
       – We are no longer training students for the possibility  Translation
          • that the pendulum might swing the other way         conferences
   • We ought to be preparing students with a broad education including:
           – Statistics and Machine Learning
           – as well as Linguistic Theory
   •      History repeats itself: Mark Twain; bad idea then and still a bad idea now
           – 1950s: empiricism
           – 1970s: rationalism (empiricist methodology became too burdensome)
           – 1990s: empiricism
           – 2010s: rationalism (empiricist methodology is burdensome, again)


July 25, 2004                        EMNLP-2004 & Senseval-2004                   27
                          Speech  Language
                           Has the pendulum
                            swung too far?
   • What happened between TMI-1992 and TMI-2002 (if anything)?
   • Have empirical methods become too popular?
                                                                Plays well at
       – Has too much happened since TMI-1992?
                                                                  Machine
   • I worry that the pendulum has swung so far that
       – We are no longer training students for the possibility  Translation
          • that the pendulum might swing the other way         conferences
   • We ought to be preparing students with a broad education including:
           – Statistics and Machine Learning
           – as well as Linguistic Theory
   •      History repeats itself: Mark Twain; bad idea then and still a bad idea now
           – 1950s: empiricism
           – 1970s: rationalism (empiricist methodology became too burdensome)
           – 1990s: empiricism
           – 2010s: rationalism (empiricist methodology is burdensome, again)


July 25, 2004                        EMNLP-2004 & Senseval-2004                   28
                               Speech  Language
                                Has the pendulum
                                 swung too far?
   • What happened between TMI-1992 and TMI-2002 (if anything)?
   • Have empirical methods become too popular?
                                                                Plays well at
       – Has too much happened since TMI-1992?
                                                                  Machine
   • I worry that the pendulum has swung so far that
       – We are no longer training students for the possibility  Translation
          • that the pendulum might swing the other way         conferences
   • We ought to be preparing students with a broad education including:
                – Statistics and Machine Learning
                – as well as Linguistic Theory
   • History repeats itself:
                –   1950s: empiricism
                –   1970s: rationalism (empiricist methodology became too burdensome)
                –   1990s: empiricism
                –   2010s: rationalism (empiricist methodology is burdensome, again)


July 25, 2004
                Grandparents and grandchildren have a natural alliance…
                                          EMNLP-2004 & Senseval-2004                29
                              Rationalism                            Empiricism
                Well-known                                           Shannon, Skinner, Firth,
                           Chomsky, Minsky
                 advocates                                             Harris
                     Model Competence Model                          Noisy Channel Model

Contexts of Interest Phrase-Structure                                N-Grams

                                                                     Minimize Prediction Error
                              All and Only
                                                                       (Entropy)
                      Goals
                              Explanatory                            Descriptive
                              Theoretical                            Applied
             Linguistic Agreement & Wh-                              Collocations & Word
        Generalizations   movement                                     Associations
                     Principle-Based,                                Forward-Backward
  Parsing Strategies    CKY (Chart),                                    (HMMs), Inside-outside
                        ATNs, Unification                               (PCFGs)
                              Understanding                          Recognition
                Applications Who did what to
                                                                     Noisy Channel Applications
July 25, 2004
                                 whom   EMNLP-2004 & Senseval-2004                               30
                Covering all the Bases
   It is hard to make predictions (especially about the future)

• When will we see the last
  non-statistical paper?
         – 2010?
• Revival of rationalism:                               The answer to any
                                                        question: 6 years!
         – 2010?




July 25, 2004              EMNLP-2004 & Senseval-2004                    31
                         Outline
• We’re making consistent progress, or
• We’re running around in circles, or
         – Don’t worry; be happy                       Rising tide of data
• We’re going off a cliff…                                lifts all boats


                                                        No matter what
                                                       happens, it’s goin’
                                                           be great!

July 25, 2004             EMNLP-2004 & Senseval-2004                    32
     Rising Tide of Data Lifts All Boats
       If you have a lot of data, then you don’t need a lot of methodology

• 1985: “There is no data like more data”
         – Fighting words uttered by radical fringe elements (Mercer at
           Arden House)
• 1993 Workshop on Very Large Corpora
         – Perfect timing: Just before the web
         – Couldn’t help but succeed
         – Fate
• 1995: The Web changes everything
• All you need is data (magic sauce)
         –      No linguistics
         –      No artificial intelligence (representation)
         –      No machine learning
         –      No statistics
         –      No error analysis

July 25, 2004                           EMNLP-2004 & Senseval-2004           33
      “It never pays to think until you’ve
           run out of data” – Eric Brill
                                               Moore’s Law Constant:
                         Data Collection Rates  Improvement Rates
                 Banko & Brill: Mitigating the Paucity-of-Data Problem (HLT 2001)



No consistently
 best learner
                                                                   More




                                                                                    Quoted out of context
                                                                  data is
                                                                  better
                                                                   data!


                        Fire everybody and
                      spend the money on data
 July 25, 2004                     EMNLP-2004 & Senseval-2004                 34
Borrowed Slide: Jelinek (LREC)


                                    Benefit of Data
                             LIMSI: Lamel (2002) – Broadcast News




                   WER




                                                               hours

              Supervised:         transcripts
              Lightly supervised: closed captions


   July 25, 2004                                    EMNLP-2004 & Senseval-2004   35
                The rising tide of data will lift all boats!
                TREC Question Answering & Google:
                    What is the highest point on Earth?




July 25, 2004                  EMNLP-2004 & Senseval-2004      36
                 The rising tide of data will lift all boats!
                Acquiring Lexical Resources from Data:
                Dictionaries, Ontologies, WordNets, Language Models, etc.
                                http://labs1.google.com/sets
         England               Japan                          Cat       cat
          France               China                          Dog      more
         Germany                India                        Horse       ls
            Italy            Indonesia                        Fish      rm
          Ireland             Malaysia                        Bird      mv
           Spain               Korea                        Rabbit      cd
         Scotland              Taiwan                        Cattle     cp
         Belgium              Thailand                        Rat      mkdir
         Canada              Singapore                     Livestock   man
          Austria             Australia                     Mouse       tail
         Australia          Bangladesh                      Human      pwd
July 25, 2004                        EMNLP-2004 & Senseval-2004                37
      Rising Tide of Data Lifts All Boats
        If you have a lot of data, then you don’t need a lot of methodology

• More data  better results
    – TREC Question Answering
                 • Remarkable performance: Google
                   and not much else
                    – Norvig (ACL-02)
                    – AskMSR (SIGIR-02)
    – Lexical Acquisition
                 • Google Sets
                    – We tried similar things
                       » but with tiny corpora
                       » which we called large




 July 25, 2004                          EMNLP-2004 & Senseval-2004            38
                                                                                Don’t worry;
                                          Applications                           Be happy
      •               What good is word sense disambiguation (WSD)?
                  –     Information Retrieval (IR)
5 Ian Andersons



                        •    Salton: Tried hard to find ways to use NLP to help IR
                             –   but failed to find much (if anything)
                        •    Croft: WSD doesn’t help because IR is already using those
                             methods
                        •    Sanderson (next two slides)
                  –     Machine Translation (MT)
                        •    Original motivation for much of the work on WSD
                        •    But IR arguments may apply just as well to MT
      •               What good is POS tagging? Parsing? NLP? Speech?
      •               Commercial Applications of Natural Language
                      Processing, CACM 1995
                  –     $100M opportunity (worthy of government/industry’s attention)
                        1.   Search (Lexis-Nexis)
                        2.   Word Processing (Microsoft)           ALPAC
      •               Warning: premature commercialization is risky
      July 25, 2004                                EMNLP-2004 & Senseval-2004              39
                  Sanderson (SIGIR-94)
       http://dis.shef.ac.uk/mark/cv/publications/papers/my_papers/SIGIR94.pdf

                                                                         Not much?


                 • Could WSD help IR?




                                                                                        5 Ian Andersons
F
                 • Answer: no
                     – Introducing ambiguity
                       by pseudo-words
                       doesn’t hurt (much)




                                                                Query Length (Words)

July 25, 2004     Short queries matter most, but hardest for WSD
                                   EMNLP-2004 & Senseval-2004                          40
                  Sanderson (SIGIR-94)
       http://dis.shef.ac.uk/mark/cv/publications/papers/my_papers/SIGIR94.pdf


                                                                Soft WSD?



F
                • Resolving ambiguity
                  badly is worse than not
                  resolving at all
                    – 75% accurate WSD
                      degrades performance
                    – 90% accurate WSD:
                      breakeven point

                                                                Query Length (Words)

July 25, 2004                      EMNLP-2004 & Senseval-2004                          41
                            An example of Error Analysis/Representation
                 Some Promising Suggestions
       (Generate lots of conference papers, but may not support the field)

 • Two Languages are                                 • Demonstrate that NLP is good
   Better than One                                     for something
                                                             – Statistical methods (IR &
          – For many classic hard NLP                          WSD) focus on bags of nouns,
            problems                                              • Ignoring verbs, adjectives,
                 • Word Sense                                       predicates, intensifiers, etc.
                   Disambiguation (WSD)                      – Hypothesis: Ignored because
                 • PP-attachment                               perceptrons can’t model XOR
                 • Conjunction                               – Task: classify “comments” into
                 • Predicate-argument                          “good,” “bad” and “neutral”
                   relationships                                  • Lots of terms associated with
                 • Japanese and Chinese                             just one category
                   Word breaking                                  • Some associated with two
                                                                       – Depending on argument
          – Parallel corpora  plenty                             • Good & Bad, but not neutral:
            of annotated (labeled)                                  Mickey Mouse, Rinky Dink
            testing and training data                                  – Bad: Mickey Mouse(us)
          – Don’t need unsupervised                                    – Good: Mickey Mouse(them)
            magic (data >> magic)                            – Current IR/WSD methods
                                                               don’t capture predicate-
                                                               argument relationships
Senseval++
 July 25, 2004                       EMNLP-2004 & Senseval-2004                                      42
                                                                                                      Magic

                                         IT
                                            R
                                             I-
                                                W UN
                                                   A
                                             C S P ED
                                               L




July 25, 2004
                                                                                                       10%
                                                                                                                20%
                                                                                                                              30%
                                                                                                                                             40%
                                                                                                                                                            50%
                                                                                                                                                                            60%
                                                                                                                                                                                          70%
                                                                                                                                                                                                           80%
                                                                                                                                                                                                                          90%




                                                                                  0%
                                                  R S-W - LS
                                                    es
                                                         ea ork -U
                                                             rc be                                                                                 0.4
                                                               h           n                                                                          01
                                                                 - D ch
                                                                       IM                                                            0.3
                                                                  IIT A P
                                                                                                                                        19
                                                                        2                                                         0.2
                                                                           (                                                         93
                                                                  IIT R)
                                                                        1                                                0.2
                                                                           (R                                                44
                                                                              )
                                                                         IIT                                            0.2
                                                                                                                            39




                                                                                       Unsupervised
                                                                              2
                                                                         IIT                                           0.2
                                                                  JH 1
                                                                                                                           32
                                                                       U                                              0.2
                                                                           (R                                            2
                                                 St                  SM )                                                                                                             0.6
                                                   an                      U                                                                                                              42
                                                        fo          KU ls
                                                 Si rd                                                                                                                                0.6
                                                    ne - C NL                                                                                                                            38
                                                        qu S P
                                                             a- 22                                                                                                                  0.6
                                                               LI          4                                                                                                            29
                                                                 A N
                                                                      -S                                                                                                           0.6
                                                                           C                                                                                                          17
                                                                             T
                                                                                                                                                                                                       Supervision


                                                                      TA                                                                                                          0.6
                                                                 D LP                                                                                                                 13
                                                                    ul                                                                                                          0.5
                                                                        ut
                                                                           h                                                                                                       94
                                                                             3
                                                  BC U                   JH                                                                                                  0.5
                                                        U MD U
                                                                                                                                                                                71
                                                           -e -                                                                                                             0.5
                                                               hu SS                                                                                                            68
                                                                   -d T
                                                                      lis                                                                                                   0.5
                                                                          t                                                                                                     68
                                                                 D -all
                                                                    ul                                                                                                     0.5
                                                                        ut
                                                                 D h5                                                                                                          64
                                                                    ul
                                                                       ut                                                                                                 0.5
                                                                          h                                                                                                   54
                                                                 D           C
                                                                    ul
                                                                        ut                                                                                               0.5




                                                                                       Supervised
                                                                 D         h                                                                                                 5
                                                                    ul 4                                                                                                0.5
                                                                        ut
                                                                 D h2                                                                                                       42
                                                                    ul
                                                                        u                                                                                               0.5




                             Bragging Rights
                                                                D th 1                                                                                                      39
                                                                    ul
                                                                       ut                                                                                              0.5
                                                                          h
                                                                 D           A                                                                                             34
                                                           U ulu
                                                              N          th                                                                                          0.5




EMNLP-2004 & Senseval-2004
                                                                ED           B                                                                                           23
                                                                      -L                                                                                           0.5
                                                                 Al        S-                                                                                         08
                                               BC                    ica T
                                                                                                                                                                                                 (fine-grained scoring)




                                                   U
                                                                                                                                                                  0.4
                                                                          nt                                                                                         98
                                                                             e
                                                                                                                                                                                                English Lexical Sample




                                            Ba - e
                                 Ba              se hu                  IR                                                                          0.4
                                                    lin -d                                                                                             11
                                   se                             l         ST
                                     li Ba e L ist                                                                          0.2
                                  Ba ne G sel e sk -be                                                                         49
                                    se        r        ine         C st
                                       lin oup C orp                                                                      0.2
                                          e                                                                                  33
                                             G       ing om u s
                                                ro         L m                                                                                                     0.5
                                          Ba up esk one                                                                                                               12
                                    Ba        se ing               C st
                                        se lin Co orp                                                                                                       0.4
                                           lin e G m us                                                                                                        76
                                              e                     m
                                                 G rou on                                                                                             0.4
                                                    ro         p           e                                                                             37
                                     Ba                 up ing st
                                         se                ing          Le                                                                           0.4
                                            lin                 L           sk                                                                          27
                                               e Ba esk
                                                  G          s                                                               0.2
                                                     ro elin De                                                                 68
                                                        up          e f
                                                  Ba ing Le                                                           0.2
                                                                                       Baseline




                                                      se R sk                                                             3
                                                          li        a                                                 0.2
                                                   Ba ne L ndo                                                           26
                                                                                                                                                                                  Recall




                                                        se e s m
                                                            lin k
                                                                          D
                                                                                                                  0.1
                                                               e                                                     83
                                                                 R ef
                                                                                                                                                                                  Precision




                                                                    an                                          0.1
                                                                         do                                        63
                                                                             m
                                                                                                             0.1
                                                                                                                41
                                                                                                                                                                                                                                http://www.sle.sharp.co.uk/senseval2/Results/all_graphs.xls
                                                                                                                                                                                                                                                                                              Supervision >> Magic > Baseline




43




                                                                                                      Baseline
  Breakdown by
 Systems & Words
• Spelling correction task
         – Golding & Schabes (1996)
• Some methods work
  better on some words
         – and other methods work
           better on other words
• Should breakdown
  Senseval results by both
  systems and words
• Discover opportunities for
  hybrids across systems
• Error analysis
         – POS distinctions (easy)
         – Local context (trigrams)
         – Larger contexts (IR)
July 25, 2004                     EMNLP-2004 & Senseval-2004   44
July 25, 2004
                                                                                              harder?
                                                                                                                                                     – Error analysis
                                                                                                                               • Benchmarking:
                                                                                                                                                                                                                                                      • Shared learnings




                                                                                            – Rate of progress?
                                                                                                                                                                                                                                                                                                                                                    • Marketing & Sales




                                                                   • Not bragging rights:
                                                                                                                                                     – Compare and contrast
                                                                                                                                                                                                                                                                                  – Rising tide lifts all boats
                                                                                                                                                                                                                                                                                    Funding goes up and up




                               the smartest of them all…
                                                                                                                                                                                                                                                                                  – Scores going up and up 




                                                                                                                                                     – What works and what doesn’t?




                             – Mirror, mirror on the wall, who’s
                                                                                            – What makes problems easier or
                                                                                            – How hard are various problems?


                                                                                                                                                      IT
                                                                                                                                                         R
                                                                                                                                                             I-
                                                                                                                                                                W UN




EMNLP-2004 & Senseval-2004
                                                                                                                                                                   AS E
                                                                                                                                                             C           P D
                                                                                                                                                                  L
                                                                                                                                                                                                       0%
                                                                                                                                                                                                                           10%
                                                                                                                                                                                                                                    20%
                                                                                                                                                                                                                                                  30%
                                                                                                                                                                                                                                                                 40%
                                                                                                                                                                                                                                                                                50%
                                                                                                                                                                                                                                                                                                60%
                                                                                                                                                                                                                                                                                                              70%
                                                                                                                                                                                                                                                                                                                               80%
                                                                                                                                                                                                                                                                                                                                              90%




                                                                                                                                                                  R S-W - LS
                                                                                                                                                                     es
                                                                                                                                                                         ea ork -U
                                                                                                                                                                               rc be                                                                                   0.4
                                                                                                                                                                                  h                                                                                       01
                                                                                                                                                                                      - D nc h
                                                                                                                                                                                           IM                                                            0.3
                                                                                                                                                                                      IIT A P
                                                                                                                                                                                                                                                            19
                                                                                                                                                                                            2                                                         0.2
                                                                                                                                                                                      IIT (R)                                                            93
                                                                                                                                                                                            1                                                0.2
                                                                                                                                                                                               (R                                                44
                                                                                                                                                                                                   )
                                                                                                                                                                                             IIT                                            0.2
                                                                                                                                                                                                                                                39
                                                                                                                                                                                                            Unsupervised




                                                                                                                                                                                                  2
                                                                                                                                                                                             IIT                                           0.2
                                                                                                                                                                                      JH 1
                                                                                                                                                                                                                                               32
                                                                                                                                                                                          U                                               0.2
                                                                                                                                                                                               (R                                            2
                                                                                                                                                                 St                     SM )                                                                                                              0.6
                                                                                                                                                                   an                          U                                                                                                             42
                                                                                                                                                                         fo             K ls                                                                                                             0.6
                                                                                                                                                                 Si rd UN
                                                                                                                                                                    ne - C L                                                                                                                                 38
                                                                                                                                                                         qu             S P                                                                                                             0.6
                                                                                                                                                                               a-         22
                                                                                                                                                                                  LI           4                                                                                                           29
                                                                                                                                                                                      A N
                                                                                                                                                                                          -S                                                                                                          0.6
                                                                                                                                                                                               C                                                                                                          17
                                                                                                                                                                                                  T
                                                                                                                                                                                          TA                                                                                                          0.6
                                                                                                                                                                                     D LP                                                                                                                13
                                                                                                                                                                                        ul                                                                                                         0.5
                                                                                                                                                                                            ut
                                                                                                                                                                                               h                                                                                                       94
                                                                                                                                                                                                  3
                                                                                                                                                                  BC U                       J                                                                                                  0.5
                                                                                                                                                                        U MD HU
                                                                                                                                                                                                                                                                                                    71
                                                                                                                                                                            -e -                                                                                                                0.5
                                                                                                                                                                                  hu SS                                                                                                            68
                                                                                                                                                                                       -d T
                                                                                                                                                                                          l is                                                                                                  0.5
                                                                                                                                                                                     D t-al l                                                                                                      68
                                                                                                                                                                                        ul                                                                                                     0.5
                                                                                                                                                                                            ut
                                                                                                                                                                                    D h5                                                                                                           64
                                                                                                                                                                                        ul                                                                                                    0.5
                                                                                                                                                                                           ut
                                                                                                                                                                                               h                                                                                                 54
                                                                                                                                                                                     D           C
                                                                                                                                                                                        ul                                                                                                   0.5
                                                                                                                                                                                            ut
                                                                                                                                                                                                            Supervised




                                                                                                                                                                                     D h4                                                                                                        5
                                                                                                                                                                                        ul                                                                                                  0.5
                                                                                                                                                                                            ut
                                                                                                                                                                                     D h2                                                                                                       42
                                                                                                                                                                                        ul
                                                                                                                                                                                            u                                                                                              0.5
                                                                                                                                                                                    D th 1                                                                                                     39
                                                                                                                                                                                       ul
                                                                                                                                                                                          ut
                                                                                                                                                                                              h
                                                                                                                                                                                                                                                                                           0.5
                                                                                                                                                                                    D            A                                                                                             34
                                                                                                                                                                            U ul u                                                                                                       0.5
                                                                                                                                                                                 N                                                                                                           23
                                                                                                                                                                                   ED th B
                                                                                                                                                                                          -L                                                                                           0.5
                                                                                                                                                                                     Al S-T                                                                                               08
                                                                                                                                                               BC                        ic                                                                                           0.4
                                                                                                                                                                   U                        an
                                                                                                                                                                                                                                                                                                                     (fine-grained scoring)




                                                                                                                                                                                                te                                                                                       98
                                                                                                                                                                                                                                                                                                                    English Lexical Sample




                                                                                                                                                           Ba - e
                                                                                                                                            Ba                  s e hu                      I                                                                           0.4
                                                                                                                                                                    l in          - d RS                                                                                   11
                                                                                                                                              se                                      l           T
                                                                                                                                                 l in Ba e L is t                                                                               0.2
                                                                                                                                             Ba e s e e sk -be                                                                                     49
                                                                                                                                               s e Gr l ine                             C st
                                                                                                                                                    l in        ou                        o                                                   0.2
                                                                                                                                                        e          p Co rp                                                                       33
                                                                                                                                                             G ing m m u s
                                                                                                                                                                ro           L                                                                                                      0.5
                                                                                                                                                        Ba up esk one                                                                                                                  12
                                                                                                                                               Ba s e ing                               C st                                                                                    0.4
                                                                                                                                                     s e l in                   C or
                                                                                                                                                         l in e G om pus                                                                                                           76
                                                                                                                                                               e                        m                                                                                 0.4
                                                                                                                                                                 G rou on
                                                                                                                                                                    ro             p                                                                                         37
                                                                                                                                                Ba                      up ing es t
                                                                                                                                                                              in                                                                                         0.4
                                                                                                                                                       se                        g           Le
                                                                                                                                                           l in                     L            s                                                                          27
                                                                                                                                                               e Ba esk k                                                                        0.2
                                                                                                                                                                  G se
                                                                                                                                                                      ro           l in De                                                          68
                                                                                                                                                                         u              e          f
                                                                                                                                                                  Ba ping Le                                                              0.2
                                                                                                                                                                                                            Baseline




                                                                                                                                                                       se                       s                                             3
                                                                                                                                                                           l in Ran k                                                     0.2
                                                                                                                                                                   Ba e                       d                                              26
                                                                                                                                                                                                                                                                                                      Recall




                                                                                                                                                                        s e Le om
                                                                                                                                                                              l in sk                                                 0.1
                                                                                                                                                                                  e           D                                          83
                                                                                                                                                                                      R ef
                                                                                                                                                                                                                                                                                                      Precision




                                                                                                                                                                                        an                                          0.1
                                                                                                                                                                                              do                                       63
                                                                                                                                                                                                  m
                                                                                                                                                                                                                                                                                                                                                                          Goals of Shared Evaluations




                                                                                                                                                                                                                                 0.1
                                                                                                                                                                                                                                    41
45
                           Outline
• We’re making consistent progress, or
• We’re running around in circles, or
         – Don’t worry; be happy
• We’re going off a cliff…

                 According to unnamed sources:
                Speech Winter  Language Winter


                   Dot Boom  Dot Bust
July 25, 2004               EMNLP-2004 & Senseval-2004   46
                                                                                  Kuhn Crisis

    Early Warning Signs for Future
• Senseval feels the need to demonstrate applications of their stuff
  (and maybe there aren’t any)
• Complacency (don’t worry; be happy)




                                                                                       Campbell (ACL-04):
         – Too little dissent: students aren’t rebelling against their teachers
         – I get uncomfortable when




                                                                                         Rules >> ML
                • There is so much agreement on what to do and so much optimism
                • And so few worries and so little dissent/controversy.
• Mindless Metrics
         – Whatever you measure, you get…
         – Scores go up and up and up, but are we really doing better?
                • According to the scores, parsing is doing well without words,
                • But you can’t solve classic problems (PPs) without words!
• Burdensome Methodology  Exclusiveness
         – Can’t play (in speech) unless you work in a big lab
• Following Speech off a Cliff
         – Empirical methods: Speech  Language        Been great,                    but…
         – Speech Winter  Language Winter (Dot Boom  Dot Bust)
         – What goes up, (usually) comes down…
July 25, 2004                             EMNLP-2004 & Senseval-2004                        47
July 25, 2004   EMNLP-2004 & Senseval-2004   48
July 25, 2004   EMNLP-2004 & Senseval-2004   49
         Sample of 20 Survey Questions
                (Strong Emphasis on Applications)
• When will
         – More than 50% of new PCs have dictation on them, either at
           purchase or shortly after.
         – Most telephone Interactive Voice Response (IVR) systems
           accept speech input.
         – Automatic airline reservation by voice over the telephone is the
           norm.
         – TV closed-captioning (subtitling) is automatic and pervasive.
         – Telephones are answered by an intelligent answering machine
           that converses with the calling party to determine the nature and
           priority of the call.
         – Public proceedings (e.g., courts, public inquiries, parliament,
           etc.) are transcribed automatically.
• Two surveys of ASRU attendees: 1997 & 2003

July 25, 2004                     EMNLP-2004 & Senseval-2004               50
2003 Responses ≈ 1997 Responses + 6 Years
    (6 years of hard work  No progress)




 July 25, 2004         EMNLP-2004 & Senseval-2004   51
            Top Ten Metrics of Success
                (Risky to Promise Apps and Fail to Deliver)
                                                              Search
1.         Value Creation (Reality)
2.         Stock Prices (Belief)                          Speech
3.         Startup Companies Raise Venture Capital (Excitement)
4.         Prototype Applications (Plausibility)          Senseval
5.         Grand-Students (Survive the Test of Time)       wants to
6.         Students Get Jobs                       We      be here
7.         Students Finish PhD Theses              are
8.         Citations                              here
9.         Conference Registrations
10.        Publications (Quantity)


July 25, 2004                   EMNLP-2004 & Senseval-2004         52
                           Wrong Apps?
• New Priorities                                   • Old Priorities
         – Increase demand for                             – Dictation app dates back to
                                                             days of dictation machines
           space >> Data entry                             – Speech recognition has not
• New Killer Apps                                            displaced typing
                                                                • Speech recognition has
         – Search >> Dictation                                    improved
                • Speech Google!                                • But typing skills have
         – Data mining                                            improved even more
                                                                    – My son will learn typing in
                                                                      1st grade
                                                                    – Sec rarely take dictation
                                                           – Dictation machines are history
                                                                • My son may never see one
                                                                • Museums have slide rulers
                                                                  and steam trains
                                                                    – But dictation machines?



July 25, 2004                      EMNLP-2004 & Senseval-2004                                       53
                                   Speech Data Mining
                                     & Call Centers:
                                             An Intelligence Bonanza
                 • Some companies are collecting
                   information with technology
                   designed to monitor incoming calls
                   for service quality.
                 • Last summer, Continental Airlines
                   Inc. installed software from
                   Witness Systems Inc. to monitor
                   the 5,200 agents in its four
                   reservation centers.
                 • But the Houston airline quickly
                   realized that the system, which
                   records customer phone calls and
                   information on the responding
                   agent's computer screen, also was
                   an intelligence bonanza, says
                   André Harris, reservations training
                   and quality-assurance director.
July 25, 2004   EMNLP-2004 & Senseval-2004                             54
                  Speech Data Mining
  • Label calls as success or failure based on
    some subsequent outcome (sale/no sale)
  • Extract features from speech
  • Find patterns of features that can be used
    to predict outcomes
  • Hypotheses:
            – Customer: “I’m not interested”  no sale
            – Agent: “I just want to tell you…”  no sale

        Inter-ocular effect (hits you between the eyes);
Don’t need a statistician to know which way the wind is blowing
 July 25, 2004                EMNLP-2004 & Senseval-2004    55
         Ways for Conferences to Fail
 • Incrementalism/Burdensome Methodology (Lesson from 1950s)
           – We do research for fun and profit – Arno Penzias
           – Fun and/or Profit >> By-the-Book Correctness
 • Arrogance, Mindless Metrics, etc.
 • Control
           – Too much control
                •   Excessive Exclusiveness (mutual admiration society/old-boy network)
                •   Change (serendipity) is essential: New and Different  Fun and Excitement
                •   Growth and prosperity depends on new talent (students) & new topics
                •   Can’t afford to keep doing what we already know how to do
           – Too little control
                • Stay on msg: It’s data, stupid! (Our msg ≠ ACL’s)           Rarely a problem,
 • Set Inappropriate Expectations                                              especially with
           – Promise too little                                               thesis proposals
                • Senseval feels the need to become more applied
           – Promise too much: Promise Applications and Fail to Deliver
           – Success/Catastrophe                                     Rarely a problem
                • What if we actually achieved all our goals?                   (except for
                                                                              March of Dimes)
July 25, 2004                             EMNLP-2004 & Senseval-2004                        56
    Ways for Conferences to Succeed
•               I wish I knew…
•               Fate (can’t fail)
         –        Rising Tide of Data Lifts All Boats
•               Luck/timing: WVLC-93 was just before Web
•               Sales & Marketing
         –        Evaluation, Evaluation, Evaluation
•               Strategic Vision
         –        In retrospect, 1993 WVLC worked wonderfully
         –        Distinguished us from mainstream
         –        Offered excitement and hope for future
                  •   Especially appealing to students (growth opportunity)


July 25, 2004                          EMNLP-2004 & Senseval-2004             57
Borrowed Slide: Jelinek (LREC)                        Great Strategy  Success

       Great Challenge: Annotating Data
   • Produce annotated data with minimal
     supervision Self-organizing “Magic” ≠ Error Analysis
   • Active learning
            – Identify reliable labels
            – Identify best candidates for annotation
   • Co-training
   • Bootstrap (project) resources from one
     application to another


   July 25, 2004              EMNLP-2004 & Senseval-2004                   58
                  Grand Challenges
  ftp://ftp.cordis.lu/pub/ist/docs/istag040319-draftnotesofthemeeting.pdf




July 25, 2004                  EMNLP-2004 & Senseval-2004                   59
  Roadmaps: Structure of a Strategy
 (not the union of what we are all doing)
    •       Goals
                – Example: Replace keyboard with             •     Small is beautiful
                  microphone                                         – Quantity is not a good thing
                – Exciting (memorable) sound bite                    – Awareness
                – Broad grand challenge that we                      – 1-slide version
                  can work toward but never solve                         • if successful, you get maybe 3
                                                                            more slides
    •       Metrics
                – Examples:
                                                             •     Size of container
                    • WER: word error rate                           – Goal: 1-3
                    • Time to perform task                           – Metrics: 3
                – Easy to measure                                    – Milestones: a dozen
    •       Milestones                                                    • Mostly for next year: Q1-4
                                                                          • Plus some for years 2, 5, 10 & 20
                – Should be no question if it has
                  been accomplished                                  – Accomplishments: a dozen
                – Example: reduce WER on task x              •     Broad applicability & illustrative
                  by y% by time t                                    – Don’t cover everything
    •       Accomplishments v. Activities                            – Highlight stuff that
                – Accomplishments are good                                • Applies to multiple groups
                – Activity is not a substitute for                        • Forward-Looking / Exciting
                  accomplishments
                – Milestones look forward whereas
                  accomplishments look backward
July 25, 2004       • Serendipity is good!   EMNLP-2004 & Senseval-2004                                      60
                     Goals:
                     1. The multilingual companion
                     2. Life log


                    Grand Challenges



                                                     Goal: Produce NLP apps
                                                      that improve the way
                                                      people communicate
                                                        with one another
 Goal: Reduce
barriers to entry        €€€
                                                          Apps &
 Resources                                                Techniques
 July 25, 2004
                       Evaluation
                        EMNLP-2004 & Senseval-2004                     61
Substance: Recommended if…
                         Summary: What Worked
                           and What Didn’t? What’s the right
                                                                                        answer?
  •               Data
           –        Stay on msg: It is the data, stupid!
                   •     WVLC (Very Large) >> EMNLP (Empirical Methods)
                   •     If you have a lot of data,
                         –   Then you don’t need a lot of methodology
                                                                                        There’ll be a
                   •     Rising Tide of Data Lifts All Boats                          quiz at the end
  •               Methodology                                                         of the decade…
           –        Empiricism means different things to different people
                   1.    Machine Learning (Self-organizing Methods)
                   2.    Exploratory Data Analysis (EDA)
                   3.    Corpus-Based Lexicography
                                                                          Magic: Recommended if…
           –        Lots of papers on 1
                   •     EMNLP-2004 theme (error analysis)  2
                   •     Senseval grew out of 3
                                                                            Short term ≠ Long term
Promise: Recommended if…
  July 25, 2004                              EMNLP-2004 & Senseval-2004      Lonely               62
Backup
                Speech  Language
• Been great so far,
         – But too much of a good thing…
• Take the good




July 25, 2004             EMNLP-2004 & Senseval-2004   64
                                          Fire
• Fuel
         – Infrastructure: Shared datasets and lexical resources
                • Wordnet, LDC, the Web
         – Organizers
                • Walker & Zampolli
         – Funding
                • Darpa (Charles Wayne), EU…
• Sparks
         – Exciting Applications (The Web)
         – Grand Challenges
         – Leaders: Jelinek, Mercer, Miller, Kucera & Francis,
           Leech, Sinclair, Tukey, Liberman…


July 25, 2004                         EMNLP-2004 & Senseval-2004   65
• Hi Ken,

• Rada probably has more to add, but obviously we would
  like to hear something about WSD or word senses. We
  are currently trying to move Senseval to include
  application-specific evaluations (eg within MT or IR, or in
  specialized domains) and to more general semantic
  analysis of text (eg frames or subcats). Something to
  inspire people in this direction would be great.

• Phil.


July 25, 2004            EMNLP-2004 & Senseval-2004         66
                 Organizational Innovations
                          (Radical  Mainstream)
• Late Submission Deadline
         – Immediately after ACL notifications
                • ACL was rejecting good papers for bad reasons          Innovation
         – Short review cycles  Freshness
• Invest in the Future: Encourage Innovation
         – Chair (Energetic, Promising, Source of new ideas)               Checks &
         – Co-chair (Established, Knows how it has been done)
• Inclusiveness:                                                           Balances
         – Thankless Chores  Marketing Carrots (Maximize # of reviewers)
         – Balance program committee, reviewers (and hopefully submissions,
           acceptances and registrations):
                • 1/3 stability, 1/3 promising, 1/3 outreach
                • Diversity: experience, gender, geography, topic
         – Hold conferences in Europe, Asia & America
                • Huge potential market in Asia: 4 out of 5 jumbo jets
         – Maintain 20-25% acceptance rate  Parallel Sessions & Posters
• Avoid incremental papers
         – Average grades (low grade dominates)  Advocate + Second
July 25, 2004                              EMNLP-2004 & Senseval-2004                 67

								
To top