Bootstrapping
April 3 2007
William Cohen
Prehistory
Karl Friedrich Hieronymus, Freiherr von
Münchhausen (11 May 1720 – 22 February
1797) was a German baron who in his youth
was sent to serve as page to Anthony Ulrich II,
Duke of Brunswick-Lüneburg and later joined
the Russian military. He served until 1750, in
particular taking part in two campaigns against
the Turks. Returning home, Münchhausen
supposedly told a number of outrageous tall
tales about his adventures. The Baron was
born in Bodenwerder and died there as well.
According to the stories, as retold by others, the
Baron's astounding feats included riding
cannonballs, travelling to the Moon, and
escaping from a swamp by pulling himself up
by his own hair. … In later versions he was
using his own boot straps to pull himself
out of the sea. [Wikipedia]
Prehistory
“Bob Wilson is desperately trying to finish his
doctoral thesis and has locked himself in his room
in a marathon attempt to do so. His typewriter jams,
and as he unjams it he hears someone say "Don't
bother, it's hogwash anyway." The thesis, in fact,
deals with time travel. The interloper is a man who
seems strangely familiar, and might be recognizable
without the two-day growth of beard and the black
eye. …”
“In computing, bootstrapping refers to a process
where a simple system activates another more
complicated system that serves the same purpose.
It is a solution to the Chicken-and-egg problem of
starting a certain system without the system already
functioning. The term is most often applied to the
process of starting up a computer, in which a
mechanism is needed to execute the software
program that is responsible for executing software
programs …” [Wikipedia]
Some more recent history - 1
Idea: write some specific patterns that
indicate A is a kind of B:
1. … such NP as NP (“at such schools
as CMU, students rarely need
extensions”)
2. NP, NP, or other NP (“William,
Carlos or other machine learning
[Coling 1992] professors”)
3. NP including NP (“struggling teams
including the Pirates”)
Results: 8.6M words of Grolier‟s
encyclopedia 7067 pattern 4. NP, especially NP (prestigious
instances 152 relations conferences, especially NIPS)
Many were not in WordNet.
Some history – 2a
Idea: exploit “pattern/relation duality”:
1. Start with some seed instances of
(author,title) pairs (“Isaac Asimov”,
“The Robots of Dawn”)
2. Look for occurrences of these pairs
on the web.
[some workshop, 1998] 3. Generate patterns that match the
seeds.
Unlike Hearst, Brin learned the patterns;
and learned very high-precision, easy- - URLprefix, prefix, middle, suffix
to-match patterns. 4. Extract new (author, title) pairs that
match the patterns.
Result: 24M web pages + 5 books
199 occurrences 3 patterns 4047 5. Go to 2.
occurrences + 5M pages 3947
occurrences 105 patterns …
15,257 books *with some manual tweaks
Some history – 2b
Instances Occurrences Patterns
Idea: exploit “pattern/relation duality”:
1. Start with some seed instances of
(author,title) pairs (“Isaac Asimov”,
“The Robots of Dawn”)
2. Look for occurrences of these pairs
on the web.
3. Generate patterns that match the
seeds.
- URLprefix, prefix, middle, suffix
4. Extract new (author, title) pairs that
match the patterns.
Result: 24M web pages + 5 books
199 occurrences 3 patterns 4047 5. Go to 2.
occurrences + 5M pages 3947
occurrences 105 patterns …
15,257 books *with some manual tweaks
Some history – 3
[COLT 98]
Some history – 3b
Instances Occurrences Patterns Instances/Occurrences Patterns
How to filter out “bad” instances,
occurrences, patterns?
Bootstrapping
Hearst „92 Deeper linguistic features, free text…
BM‟98
Learning, semi-supervised learning, dual feature spaces…
Brin‟98
Scalability, surface patterns, use of web crawlers…
Bootstrapping
Hearst „92 Deeper linguistic features, free text…
Boosting-based co-train method using content &
Collins & Singer „99 context features; context based on Collins‟ parser;
learn to classify three types of NE
BM‟98
Learning, semi-supervised learning, dual feature spaces…
Brin‟98
Scalability, surface patterns, use of web crawlers…
Bootstrapping
Hearst „92 Deeper linguistic features, free text…
Riloff & Jones „99 Hearst-like patterns, Brin-like
Collins & Singer „99 bootstrapping (+ “meta-level”
bootstrapping) on MUC data
BM‟98
Learning, semi-supervised learning, dual feature spaces…
Brin‟98
Scalability, surface patterns, use of web crawlers…
Bootstrapping
Hearst „92 Deeper linguistic features, free text…
Riloff & Jones „99
Collins & Singer „99
BM‟98
Learning, semi-supervised learning, dual feature spaces…
EM like co-train method with
Cucerzan & Yarowsky „99 context & content both defined
by character-level tries
Brin‟98
Scalability, surface patterns, use of web crawlers…
Bootstrapping
Hearst „92 Deeper linguistic features, free text…
Riloff & Jones „99 Stevenson &
Greenwood
Collins & Singer „99 … 2005
Rosenfeld
and Feldman
BM‟98
Learning, semi-supervised learning, dual feature spaces…
2006
Etzioni et
… al 2005
Cucerzan & Yarowsky „99
Brin‟98
Scalability, surface patterns, use of web crawlers…
De-emphasize duality, focus on
distance between patterns.
Stevenson & Greenwood
Instances/Occurrences Patterns
Patterns
Pattern-
pattern-
from is
semantic
similarity
(Wordnet)
Flow from pattern-
pattern depends
on empirical
similarity (i.e.
overlapping
occurrences in
corpus)
Bootstrapping
Hearst „92 Deeper linguistic features, free text…
Riloff & Jones „99 Stevenson &
Greenwood
Collins & Singer „99 … 2005
Rosenfeld
and Feldman
BM‟98
Learning, semi-supervised learning, dual feature spaces…
2006
Etzioni et
… al 2005
Cucerzan & Yarowsky „99
Brin‟98
Scalability, surface patterns, use of web crawlers…
Clever idea for learning
relation patterns & strong
experimental results
Rosenfeld & Feldman
• Instances Occurrences as
before.
• Vary “positive” occurrences to
get near-miss “negative”
occurrences, using asymmetry,
disjointness, etc.
• Learn patterns in a
(moderately) expressive but
easy-to-match language (NPs
from OpenNLP).
Know It All
Architecture
Set of predicates to consider + two names for each
~= [H92]
Architecture
Bootstrapping - 1
1. Submit the queries & apply the rules initial seeds.
2. Evaluate each seed with each discriminator U: e.g., compute
PMI stats like: |hits(“city Boston”)| / |hits(“Boston”)|
3. Take the top seeds from each class and call them POSITIVE
and use disjointness, etc to find NEGATIVE seeds.
4. Train a NaiveBayes classifier using thresholded U‟s as features.
Bootstrapping - 2