Bootstrapping Bootstrapping

Document Sample
Bootstrapping Bootstrapping Powered By Docstoc
					Bootstrapping

  April 3 2007
 William Cohen
Prehistory
   Karl Friedrich Hieronymus, Freiherr von
   Münchhausen (11 May 1720 – 22 February
   1797) was a German baron who in his youth
   was sent to serve as page to Anthony Ulrich II,
   Duke of Brunswick-Lüneburg and later joined
   the Russian military. He served until 1750, in
   particular taking part in two campaigns against
   the Turks. Returning home, Münchhausen
   supposedly told a number of outrageous tall
   tales about his adventures. The Baron was
   born in Bodenwerder and died there as well.

   According to the stories, as retold by others, the
   Baron's astounding feats included riding
   cannonballs, travelling to the Moon, and
   escaping from a swamp by pulling himself up
   by his own hair. … In later versions he was
   using his own boot straps to pull himself
   out of the sea. [Wikipedia]
Prehistory
  “Bob Wilson is desperately trying to finish his
  doctoral thesis and has locked himself in his room
  in a marathon attempt to do so. His typewriter jams,
  and as he unjams it he hears someone say "Don't
  bother, it's hogwash anyway." The thesis, in fact,
  deals with time travel. The interloper is a man who
  seems strangely familiar, and might be recognizable
  without the two-day growth of beard and the black
  eye. …”

  “In computing, bootstrapping refers to a process
  where a simple system activates another more
  complicated system that serves the same purpose.
  It is a solution to the Chicken-and-egg problem of
  starting a certain system without the system already
  functioning. The term is most often applied to the
  process of starting up a computer, in which a
  mechanism is needed to execute the software
  program that is responsible for executing software
  programs …” [Wikipedia]
    Some more recent history - 1
                                     Idea: write some specific patterns that
                                        indicate A is a kind of B:
                                     1. … such NP as NP (“at such schools
                                        as CMU, students rarely need
                                        extensions”)
                                     2. NP, NP, or other NP (“William,
                                        Carlos or other machine learning
                     [Coling 1992]      professors”)
                                     3. NP including NP (“struggling teams
                                        including the Pirates”)
Results: 8.6M words of Grolier‟s
encyclopedia  7067 pattern          4. NP, especially NP (prestigious
instances  152 relations               conferences, especially NIPS)
Many were not in WordNet.
                    Some history – 2a
                                            Idea: exploit “pattern/relation duality”:
                                            1. Start with some seed instances of
                                               (author,title) pairs (“Isaac Asimov”,
                                               “The Robots of Dawn”)
                                            2. Look for occurrences of these pairs
                                               on the web.
                    [some workshop, 1998]   3. Generate patterns that match the
                                               seeds.
Unlike Hearst, Brin learned the patterns;
and learned very high-precision, easy-          - URLprefix, prefix, middle, suffix
to-match patterns.                          4. Extract new (author, title) pairs that
                                               match the patterns.
Result: 24M web pages + 5 books 
199 occurrences  3 patterns  4047         5. Go to 2.
occurrences + 5M pages  3947
occurrences  105 patterns  …
15,257 books *with some manual tweaks
                 Some history – 2b
Instances     Occurrences    Patterns
                                        Idea: exploit “pattern/relation duality”:
                                        1. Start with some seed instances of
                                           (author,title) pairs (“Isaac Asimov”,
                                           “The Robots of Dawn”)
                                        2. Look for occurrences of these pairs
                                           on the web.
                                        3. Generate patterns that match the
                                           seeds.
                                            - URLprefix, prefix, middle, suffix
                                        4. Extract new (author, title) pairs that
                                           match the patterns.
Result: 24M web pages + 5 books 
199 occurrences  3 patterns  4047     5. Go to 2.
occurrences + 5M pages  3947
occurrences  105 patterns  …
15,257 books *with some manual tweaks
Some history – 3



     [COLT 98]
                 Some history – 3b
Instances     Occurrences     Patterns   Instances/Occurrences   Patterns




How to filter out “bad” instances,
occurrences, patterns?
                     Bootstrapping
Hearst „92                                Deeper linguistic features, free text…




             BM‟98
                      Learning, semi-supervised learning, dual feature spaces…




         Brin‟98
                             Scalability, surface patterns, use of web crawlers…
                       Bootstrapping
Hearst „92                                      Deeper linguistic features, free text…

                                 Boosting-based co-train method using content &
         Collins & Singer „99    context features; context based on Collins‟ parser;
                                 learn to classify three types of NE
             BM‟98
                            Learning, semi-supervised learning, dual feature spaces…




         Brin‟98
                                   Scalability, surface patterns, use of web crawlers…
                        Bootstrapping
Hearst „92                                       Deeper linguistic features, free text…

                     Riloff & Jones „99   Hearst-like patterns, Brin-like
         Collins & Singer „99             bootstrapping (+ “meta-level”
                                          bootstrapping) on MUC data
             BM‟98
                             Learning, semi-supervised learning, dual feature spaces…




         Brin‟98
                                    Scalability, surface patterns, use of web crawlers…
                         Bootstrapping
Hearst „92                                        Deeper linguistic features, free text…

                     Riloff & Jones „99
              Collins & Singer „99

             BM‟98
                              Learning, semi-supervised learning, dual feature spaces…

                                                     EM like co-train method with
               Cucerzan & Yarowsky „99             context & content both defined
                                                           by character-level tries
         Brin‟98
                                     Scalability, surface patterns, use of web crawlers…
                          Bootstrapping
Hearst „92                                         Deeper linguistic features, free text…

                      Riloff & Jones „99                   Stevenson &
                                                            Greenwood
               Collins & Singer „99              …                2005
                                                                           Rosenfeld
                                                                       and Feldman
             BM‟98
                            Learning, semi-supervised learning, dual feature spaces…
                                                                                2006
                                                                       Etzioni et
                                                   …                     al 2005
               Cucerzan & Yarowsky „99
         Brin‟98
                                      Scalability, surface patterns, use of web crawlers…

                 De-emphasize duality, focus on
                     distance between patterns.
         Stevenson & Greenwood
Instances/Occurrences     Patterns
                                        Patterns


                                                   Pattern-
                                                   pattern-
                                                   from is
                                                   semantic
                                                   similarity
                                                   (Wordnet)
                  Flow from pattern-
                   pattern depends
                     on empirical
                     similarity (i.e.
                      overlapping
                    occurrences in
                        corpus)
                          Bootstrapping
Hearst „92                                         Deeper linguistic features, free text…

                      Riloff & Jones „99                   Stevenson &
                                                            Greenwood
               Collins & Singer „99              …                2005
                                                                           Rosenfeld
                                                                       and Feldman
             BM‟98
                            Learning, semi-supervised learning, dual feature spaces…
                                                                                2006
                                                                       Etzioni et
                                                   …                     al 2005
               Cucerzan & Yarowsky „99
         Brin‟98
                                      Scalability, surface patterns, use of web crawlers…


                                                           Clever idea for learning
                                                         relation patterns & strong
                                                               experimental results
           Rosenfeld & Feldman
• Instances  Occurrences as
  before.
• Vary “positive” occurrences to
  get near-miss “negative”
  occurrences, using asymmetry,
  disjointness, etc.
• Learn patterns in a
  (moderately) expressive but
  easy-to-match language (NPs
  from OpenNLP).
Know It All
                       Architecture

Set of predicates to consider + two names for each


  ~= [H92]
Architecture
                Bootstrapping - 1




1. Submit the queries & apply the rules  initial seeds.
2. Evaluate each seed with each discriminator U: e.g., compute
   PMI stats like: |hits(“city Boston”)| / |hits(“Boston”)|
3. Take the top seeds from each class and call them POSITIVE
   and use disjointness, etc to find NEGATIVE seeds.
4. Train a NaiveBayes classifier using thresholded U‟s as features.
Bootstrapping - 2

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:11/27/2011
language:English
pages:21