Unsupervised Information Extract
Shared by: shimeiyan3
-
Stats
- views:
- 8
- posted:
- 2/18/2010
- language:
- English
- pages:
- 8
Document Sample


Unsupervised Information Extraction Approach Using Graph Mutual
Reinforcement
Hany Hassan Ahmed Hassan Ossama Emam
IBM Cairo Technology Development Center
Giza, Egypt
P.O. Box 166 Al-Ahram
hanyh@eg.ibm.com hasanah@eg.ibm.com emam@eg.ibm.com
tify the most informative patterns, where patterns
Abstract that match many instances tend to be correct.
Similarly, instances matched by many patterns
Information Extraction (IE) is the task of tend to be correct. The intuition is that large un-
extracting knowledge from unstructured supervised data is redundant, i.e. different in-
text. We present a novel unsupervised stances of information could be found many
approach for information extraction times in different contexts and by different repre-
based on graph mutual reinforcement. sentation. The problem can therefore be seen as
The proposed approach does not require hubs (instances) and authorities (patterns) prob-
any seed patterns or examples. Instead, it lem which can be solved using the Hypertext
depends on redundancy in large data sets Induced Topic Selection (HITS) algorithm
and graph based mutual reinforcement to (Kleinberg, 1998).
induce generalized “extraction patterns”. HITS is an algorithmic formulation of the no-
The proposed approach has been used to tion of authority in web pages link analysis,
acquire extraction patterns for the ACE based on a relationship between a set of relevant
(Automatic Content Extraction) Relation “authoritative pages” and a set of “hub pages”.
Detection and Characterization (RDC) The HITS algorithm benefits from the following
task. ACE RDC is considered a hard task observation: when a page (hub) links to another
in information extraction due to the ab- page (authority), the former confers authority
sence of large amounts of training data over the latter.
and inconsistencies in the available data. By analogy to the authoritative web pages
The proposed approach achieves superior problem, we could represent the patterns as au-
performance which could be compared to thorities and instances as hubs, and use mutual
supervised techniques with reasonable reinforcement between patterns and instances to
training data. weight the most authoritative patterns. Highly
weighted patterns are then used in extracting in-
1 Introduction formation.
The proposed approach does not need any
In this paper we propose a novel, and completely seeds or examples. Human involvement is only
unsupervised approach for information extrac- needed in determining the entities of interest; the
tion. We present a general technique; however entities among which we are seeking relations.
we focus on relation extraction as an important The paper proceeds as follows: in Section 2
task of Information Extraction. The approach we discuss previous work followed by a brief
depends on constructing generalized extraction definition of our general notation in Section 3. A
patterns, which could match many instances, and detailed description of the proposed approach
deploys graph based mutual reinforcement to then follows in Section 4. Section 5 discusses the
weight the importance of these patterns. The mu- application of the proposed approach to the prob-
tual reinforcement is used to automatically iden-
lem of detecting semantic relations from text. (Muslea et al., 1999) introduced an inductive
Section 6 discusses experimental results while algorithm to generate extraction rules based on
the conclusion is presented in Section 7. user labeled training examples. This approach
suffers from the labeled data bottleneck.
2 Previous Work (Agichtein et. al, 2000) presented an approach
using seed examples to generate initial patterns
Most of the previous work on Information Ex-
and to iteratively obtain further patterns. Then
traction (IE) focused on supervised learning. Re-
ad-hoc measures were deployed to estimate the
lation Detection and Characterization (RDC) was
relevancy of the patterns that have been newly
introduced in the Automatic Content Extraction
obtained. The major drawbacks of this approach
Program (ACE) (ACE, 2004). The approaches
are: its dependency on seed examples leads to
proposed to the ACE RDC task such as kernel
limited capability of generalization, and the esti-
methods (Zelenko et al., 2002) and Maximum
mation of patterns relevancy requires the de-
Entropy methods (Kambhatla, 2004) required the
ployment of ad-hoc measures.
availability of large set of human annotated cor-
(Hasegawa et. al. 2004) introduced unsuper-
pora which are tagged with relation instances.
vised approach for relation extraction depending
However human annotated instances are limited,
on clustering context words between named enti-
expensive, and time consuming to obtain, due to
ties; this approach depends on ad-hoc context
the lack of experienced human annotators and the
similarity between phrases in the context and
low inter-annotator agreements.
focused on certain types of relations.
Some previous work adopted weakly super-
(Etzioni et al, 2005) proposed a system for
vised or unsupervised learning approaches.
building lists of named entities found on the web.
These approaches have the advantage of not
Their system uses a set of eight domain-
needing large tagged corpora but need seed ex-
independent extraction patterns to generate can-
amples or seed extraction patterns. The major
didate facts.
drawback of these approaches is their depend-
All approaches, proposed so far, suffer from
ency on seed examples or seed patterns which
either requiring large amount of labeled data or
may lead to limited generalization due to de-
the dependency on seed patterns (or examples)
pendency on handcrafted examples. Some of
that result in limited generalization.
these approaches are briefed here:
(Brin,98) presented an approach for extracting 3 General Notation
the authorship information as found in books de-
scription on the World Wide Web. This tech- In graph theory, a graph is a set of objects called
nique is based on dual iterative pattern relation vertices joined by links called edges. A bipartite
extraction wherein a relation and pattern set is graph, also called a bigraph, is a special graph
iteratively constructed. This approach has two where the set of vertices can be divided into two
major drawbacks: the use of handcrafted seed disjoint sets with no two vertices of the same set
examples to extract more examples similar to sharing an edge.
these handcrafted seed examples and the use of a The Hypertext Induced Topic Selection
lexicon as the main source for extracting infor- (HITS) algorithm is an algorithm for rating, and
mation. therefore ranking, web pages. The HITS algo-
(Blum and Mitchell, 1998) proposed an ap- rithm makes use of the following observation:
proach based on co-training that uses unlabeled when a page (hub) links to another page (author-
data in a particular setting. They exploit the fact ity), the former confers authority over the latter.
that, for some problems, each example can be HITS uses two values for each page, the "author-
described by multiple representations. ity value" and the "hub value". "Authority value"
(Riloff & Jones, 1999) presented the Meta- and "hub value" are defined in terms of one an-
Bootstrapping algorithm that uses an un- other in a mutual recursion. An authority value is
annotated training data set and a set of seeds to computed as the sum of the scaled hub values
learn a dictionary of extraction patterns and a that point to that authority. A hub value is the
domain specific semantic lexicon. Other works sum of the scaled authority values of the authori-
tried to exploit the duality of patterns and their ties it points to.
extractions for the purpose of inferring the se- A template, as we define for this work, is a se-
mantic class of words like (Thelen & Riloff, quence of generic forms that could generalize
2002) and (Lin et al, 2003).
over the given instances. An example template weighting or induction. Both steps are detailed in
is: the next sub-sections.
GPE POS (PERSON)+
4.1 Initial Patterns Construction
GPE: Geographical Political En- As shown in Figure 1, several syntactic, lexical,
tity
and semantic analyzers could be applied to the
POS: possessive ending
PERSON: PERSON Entity unstructured text. The resulting analyses could be
employed in the construction of extraction pat-
This template could match the sentence: terns. It is worth mentioning that the proposed
“France’s President Jacque Chirac...”. This tem- approach is general enough to accommodate any
plate is derived from the representation of the pattern design; the introduced pattern design is
Named Entity tags, Part-of-Speech (POS) tags for illustration purposes only.
and semantic tags. The choice of the template
representation here is for illustration purpose American vice President Al Gore said today...
only; any combination of tags, representations
and tagging styles might be used.
A pattern is more specific than a template. A Entities PEOPLE O O PERSON O O...
pattern specifies the role played by the tags (first
entity, second entity, or relation). An example of POS ADJ NOUN_PHRASE NNP VBD CD...
a pattern is:
GPE(E2) POS (PERSON)+(E1) Tagged PEOPLE NOUN_PHRASE PERSON VBD CD...
Stream
This pattern indicates that the word(s) with the
tag GPE in the sentence represents the second
Figure 1: An example of the output of analys-
en-tity (Entity 2) in the relation, while the
ers applied to the unstructured text
word(s) tagged PERSON represents the first en-
tity (Entity 1) in this relation, the “+” symbol Initially, we need to start with some templates
means that the (PERSON) entity is repetitive (i.e. and patterns to proceed with the induction proc-
may consist of several tokens). ess. Relatively large amount of text data is
A tuple, in our notation during this paper, is tagged with different taggers to produce the pre-
the result of the application of a pattern to un- viously mentioned patterns styles. An n-gram
structured text. In the above example, one result language model is built on this data and used to
of applying the pattern to some raw text is the construct weighted finite state machines.
following tuple: Paths with low cost (high language model
probabilities) are chosen to construct the initial
Entity 1: Jacque Chirac set of templates; the intuition is that paths with
Entity 2: France low cost (high probability) are frequent and
Relation: EMP-Executive could represent potential candidate patterns.
The resulting initial set of templates is applied
4 The Approach to a very large text data to produce all possible
patterns. The number of candidate initial patterns
The unsupervised graph-based mutual rein-
could be reduced significantly by specifying the
forcement approach, we propose, depends on the
candidate types of entities; for example we might
construction of generalized “extraction patterns”
specify that the first entity could be PEROSN or
that could match many instances. The patterns
PEOPLE while the second entity could be OR-
are then weighted according to their importance
GANIZATION, LOCATION, COUNTRY and
by deploying graph based mutual reinforcement
etc...
techniques. This duality in patterns and extracted
The candidate patterns are then applied to the
information (tuples) could be stated that patterns
tagged stream and the unstructured text to collect
could match different tuples, and tuples in turn
a set of patterns and matched tuples pairs.
could be matched by different patterns. The pro-
The following procedure briefs the Initial Pat-
posed approach is composed of two main steps
tern Construction Step:
namely, initial patterns construction and pattern
• Select a random set of text data.
• Apply various taggers on text data and con- that the tuples matched by many different pat-
struct templates style. terns tend to be correct and the patterns matching
many different tuples tend to be good patterns. In
• Build n-gram language model on template other words; we want to choose, among the large
style data. space of patterns in the data, the most informa-
• Construct weighted finite state machines tive, highest confidence patterns that could iden-
from the n-gram language model. tify correct tuples; i.e. choosing the most “au-
thoritative” patterns in analogy with the hub au-
• Choose n-best paths in the finite state ma- ~ ~
chines. thority problem. However, both P and T are un-
known. The induction process proceeds as fol-
• Use best paths as initial templates. lows: each pattern p in P is associated with a
• Apply initial templates on large text data. numerical authority weight av which expresses
how many tuples match that pattern. Similarly,
• Construct initial patterns and associated tu- each tuple t in T has a numerical hub weight ht
ples sets. which expresses how many patterns were
matched by this tuple. The weights are calculated
4.2 Pattern Induction
iteratively as follows:
The inherent duality in the patterns and tuples T ( p)h ( i ) (u )
relation suggests that the problem could be inter- a ( i +1)
( p) = u =1
(1)
preted as a hub authority problem. This problem H (i )
could be solved by applying the HITS algorithm
h ( i +1 ) (t ) =
P (t ) a
(i )
(u ) (2)
to iteratively assign authority and hub scores to u =1 (i)
A
patterns and tuples respectively. where T(p) is the set of tuples matched by p, P(t)
is the set of patterns matching t, a ( i +1) ( p ) is the
P T
authoritative weight of pattern p at iteration
P T (i + 1) , and h ( i +1) (t ) is the hub weight of tuple t
at iteration (i + 1) . H(i) and A(i) are normaliza-
P T
tion factors defined as:
T ( p)
h ( i ) (u )
P | P|
T
H (i ) = p =1 u =1
(3)
P (t ) ( i )
P T
A (i ) =
|T |
v =1 u =1
a (u ) (4)
P T
Highly weighted patterns are identified and used
P T for extracting relations.
Patterns Tuples
4.3 Tuple Clustering
The tuple space should be reduced to allow more
Figure 2: A bipartite graph represent- matching between pattern-tuple pairs. This space
ing patterns and tuples reduction could be accomplished by seeking a
tuple similarity measure, and constructing a
Patterns and tuples are represented by a bipar- weighted undirected graph of tuples. Two tuples
tite graph as illustrated in figure 2. Each pattern are linked with an edge if their similarity meas-
or tuple is represented by a node in the graph. ure exceeds a certain threshold. Graph clustering
Edges represent matching between patterns and algorithms could be deployed to partition the
tuples. The pattern induction problem can be graph into a set of homogeneous communities or
formulated as follows: Given a very large set of clusters. To reduce the space of tuples, we seek a
data D containing a large set of patterns P which matching criterion that group similar tuples to-
match a large set of tuples T, the problem is to gether. Using WordNet, we can measure the se-
~ mantic similarity or relatedness between a pair of
identify P , the set of patterns that match the set
~ concepts (or word senses), and by extension, be-
of the most correct tuples T . The intuition is tween a pair of sentences. We use the similarity
measure described in (Wu and Palmer, 1994) HITS algorithm and the highly ranked patterns
which finds the path length to the root node are then used for relation extraction.
from the least common subsumer (LCS) of the
two word senses which is the most specific word 5 Experimental Setup
sense they share as an ancestor. The similarity
5.1 ACE Relation Detection and Charac-
score of two tuples, ST, is calculated as follows:
terization
In this section, we describe Automatic Content
S T = S E1 + S E 2
2 2
(5)
Extraction (ACE). ACE is an evaluation con-
ducted by NIST to measure Entity Detection and
where SE1, and SE2 are the similarity scores of the Tracking (EDT) and Relation Detection and
first entities in the two tuples, and their second Characterization (RDC). The EDT task is con-
entitles respectively. cerned with the detection of mentions of entities,
The tuple matching procedure assigns a simi- and grouping them together by identifying their
larity measure to each pair of tuples in the data- coreference. The RDC task detects relations be-
set. Using this measure we can construct an undi- tween entities identified by the EDT task. We
rected graph G. The vertices of G are the tuples. choose the RDC task to show the performance of
Two vertices are connected with an edge if the the graph based unsupervised approach we pro-
similarity measure between their underlying tu- pose. To this end we need to introduce the notion
ples exceeds a certain threshold. It was noticed of mentions and entities. Mentions are any in-
that the constructed graph consists of a set of stances of textual references to objects like peo-
semi isolated groups as shown in figure 3. Those ple, organizations, geopolitical entities (countries,
groups have a very large number of inter-group cities …etc), locations, or facilities. On the other
edges and meanwhile a rather small number of hand, entities are objects containing all mentions
intra-group edges. This implies that using a to the same object. Here, we present some exam-
graph clustering algorithm would eliminate those ples of ACE entities and relations:
weak intra-group edges and produce separate Spain’s Interior Minister
groups or clusters representing similar tuples. We announced this evening the
used Markov Cluster Algorithm (MCL) for graph arrest of separatist organi-
clustering (Dongen, 2000). MCL is a fast and zation Eta’s presumed leader
scalable unsupervised clustering algorithm for Ignacio Garcia Arregui. Ar-
graphs based on simulation of stochastic flow. regui, who is considered to
be the Eta organization’s
T
top man, was arrested at
T T T
T T T T 17h45 Greenwich. The Spanish
T T judiciary suspects Arregui
T T T
T of ordering a failed attack
T T on King Juan Carlos in 1995.
T T T T
T T
T T T T T In this fragment, all the underlined phrases are
T
mentions to “Eta” organization, or to “Garcia
Before Clustering After Clustering
Arregui”. There is a management relation be-
tween “leader” which references to “Gar-
Figure 3: Applying Clustering Algorithms to Tu- cia Arregui” and “Eta”.
ple graph
5.2 Patterns Construction and Induction
An example of a couple of tuples that could be We used the LDC English Gigaword Corpus,
matched by this technique is: AFE source from January to August 1996 as a
United Stated(E2) presi- source for unstructured text. This provides a total
dent(E1) of 99475 documents containing 36 M words. In
US(E2) leader(E1) the performed experiments, we focus on two
types of relations EMP-ORG relations and GPE-
A bipartite graph of patterns and tuple clusters AFF relations which represent almost 50% of all
is constructed. Weights are assigned to patterns relations in RDC – ACE task.
and tuple clusters by iteratively applying the
POS (part of speech) tagger and mention tagger zation of different amount of highly weighted
were applied to the data, the used pattern design patterns. Table 2 presents the same results using
consists of a mix between the part of speech semantic tuple matching and clustering, as de-
(POS) tags and the mention tags for the words in scribed in section 4.3.
the unsupervised data. We use the mention tag, if
it exists; otherwise we use the part of speech tag. No. of
An example of the analyzed text and the pre- Patterns Precision Recall F-Measure
sumed associated pattern is shown: 1500 35.9 66.3 46.58
1000 41.2 59.7 48.75
Text: Eta’s presumed leader 700 43.1 58.1 49.49
Arregui … 500 46 56.5 50.71
Pos: NNP POS JJ NN NNP
400 46.9 52.9 49.72
Mention: ORG 0 0 0 PERSON
Pattern: ORG(E2) POS JJ 200 50.1 44.9 47.36
NN(R) PERSON(E1)
Table 1: The effect of varying the number of
An n-gram language model, 5-gram model and induced patterns on the system performance
back off to lower order n-grams, was built on the (syntactic tuple matching)
data tagged with the described patterns’ style.
Weighted finite states machines were constructed No. of
Patterns Precision Recall F-Measure
with the language model probabilities. The n-best
paths, 20 k paths, were identified and deployed 1500 36.1 67.2 46.97
as the initial template set. Sequences that do not 1000 43.7 59.6 50.43
contain the entities of interest, and hence cannot 700 44.1 59.3 50.58
represent relations, were automatically filtered 500 46.3 57.2 51.18
out. This resulted in an initial templates set of 400 47.3 57.6 51.94
around 3000 element. This initial templates set 200 48.1 45.9 46.97
was applied on the text data to establish initial
patterns and tuples pairs. Graph based mutual Table 2: The effect of varying the number of
reinforcement technique was deployed with 10 induced patterns on the system performance (se-
iterations on the patterns and tuples pairs to mantic tuple matching)
weight the patterns.
We conducted two groups of experiments, the 80
70
first with simple syntactic tuple matching, and
60
the second with semantic tuple clustering as de- 50
scribed in section 4.3 40
30
6 Results and Discussion 20
10
0
We compare our results to a state-of-the-art su- Precision Recall F Measure
pervised system similar to the system described Sup 67.1 54.2 59.96
in (Kambhatla, 2004). Although it is unfair to Unsup-Syn 46 56.5 50.71
make a comparison between a supervised system Unsup-Sem 47.3 57.6 51.94
and a completely unsupervised system, we chose
to make this comparison to test the performance
of the proposed unsupervised approach on a real Figure 4: A comparison between the supervised
task with defined test set and state-of-the-art per- system (Sup), the unsupervised system with syn-
formance. The supervised system was trained on tactic tuple matching (Unsup-Syn), and with se-
145 K words which contain 2368 instances of the mantic tuple matching (Unsup-Sem)
two relation types we are considering.
The system performance is measured using Best F-Measure is achieved using relatively
precision, recall and F-Measure with various small number of induced patterns (400 and 500
amounts of induced patterns. Table 1 presents the patterns) while using more patterns increases the
precision, recall and F-measure for the two rela- recall but degrades the precision.
tions using the presented approach with the utili- Table 2 indicates that the semantic clustering
of tuples did not provide significant improve-
ment; although better performance was achieved 7 Conclusion and Future Work
with less number of patterns (400 patterns). We
think that the deployed similarity measure and it In this work, a general framework for unsuper-
needs further investigation to figure out the rea- vised information extraction based on mutual
son for that. reinforcement in graphs has been introduced. We
Figure 4 presents the comparison between the construct generalized extraction patterns and de-
proposed unsupervised systems and the reference ploy graph based mutual reinforcement to auto-
supervised system. The unsupervised systems matically identify the most informative patterns.
achieves good results even in comparison to a We provide motivation for our approach from a
state-of-the-art supervised system. graph theory and graph link analysis perspective.
Sample patterns and corresponding matching Experimental results have been presented sup-
text are introduced in Table 3 and Table 4. Table porting the applicability of the proposed ap-
3 shows some highly ranked patterns while Table proach to ACE Relation Detection and Charac-
4 shows examples of low ranked patterns. terization (RDC) task, demonstrating its applica-
bility to hard information extraction problems.
Pattern Matches The proposed approach achieves remarkable re-
Peruvian President Alberto Fu- sults comparable to a state-of-the-art supervised
GPE (PERSON)+
jimori system, achieving 51.94 F-measure compared to
GPE (PERSON)+
Zimbabwean President Robert 59.96 F-measure of the state-of-the-art super-
Mugabe vised system which requires huge amount of hu-
GPE (PERSON)+ PLO leader Yasser Arafat man annotated data. The proposed approach
Zimbabwe 's President Robert
GPE POS (PERSON)+
Mugabe
represents a powerful unsupervised technique for
American clinical neuropsy- information extraction in general and particularly
GPE JJ PERSON for relations extraction that requires no seed pat-
chologist
GPE JJ PERSON American diplomatic personnel terns or examples and achieves significant per-
PERSON IN JJ GPE candidates for local government formance.
ORGANIZATION PER- In our future work, we plan to focus on general-
Airways spokesman
SON izing the approach for targeting more NLP prob-
ORGANIZATION PER- lems.
Ajax players
SON
PERSON IN DT (OR- chairman of the opposition par-
GANIZATION)+ ties
8 Acknowledgements
(ORGANIZATION)+
PERSON
opposition parties chairmans We would like to thank Salim Roukos for his
invaluable suggestions and support. We would
Table3: Examples of patterns with high weights also like to thank Hala Mostafa for helping with
the early investigation of this work. Finally we
would like to thank the anonymous reviewers for
Pattern Matches
GPE CC (PERSON)+ Barcelona and Johan their constructive criticism and helpful com-
Cruyff ments.
GPE , CC PERSON Paris , but Riccardi
GPE VBZ VBN PERSON Pyongyang has accepted References
Gallucci
GPE VBZ VBN PERSON Russia has abandoned us ACE. 2004. The NIST ACE evaluation website.
http://www.nist.gov/speech/tests/ace/
GPE VBZ VBN P PER- Rwanda 's defeated Hutu
SON Eugene Agichtein and Luis Gravano. 2000. Snow-
GPE VBZ VBN PERSON state has pressed Arafat ball: Extracting Relations from Large Plain-Text
GPE VBZ VBN TO VB Taiwan has tried to keep Collections. Proceedings of the 5th ACM Confer-
PERSON Lee ence on Digital Libraries (DL 2000).
(PERSON)+ VBD GPE Alfred Streim told Ger-
ORGANIZATION man radio Sergy Brin. 1998. Extracting Patterns and Relations
(PERSON)+ VBD GPE Dennis Ross met Syrian from the World Wide Web. Proceedings of the 1998
ORGANIZATION army International Workshop on the Web and Data-
(PERSON)+ VBD GPE Van Miert told EU indus- bases”
ORGANIZATION try
Stijn van Dongen. 2000. A Cluster Algorithm for
Graphs. Technical Report INS-R0010, National
Table4: Examples of patterns with low weights
Research Institute for Mathematics and Computer
Science in the Netherlands.
Stijn van Dongen. 2000. Graph Clustering by Flow Winston Lin, Roman Yangarber, Ralph Grishman.
Simulation. PhD thesis, University of Utrecht 2003. Bootstrapped Learning of Semantic Classes
from Positive and Negative Examples. Proceedings
Oren Etzioni, Michael Cafarella, Doug Downey, Ana-
of the 20th International Conference on Machine
Maria Popescu, Tal Shaked, Stephen Soderland,
Learning (ICML 2003) Workshop on The Contin-
Daniel S. Weld, and Alexander Yates. 2004. Web-
uum from Labeled to Unlabeled Data in Machine
scale information extraction in KnowItAll (prelimi-
Learning and Data Mining.
nary results). In Proceedings of the 13th World
Wide Web Conference, pages 100-109. Ion Muslea, Steven Minton, and Craig
Knoblock.1999. A hierarchical approach to wrap-
Oren Etzioni, Michael Cafarella, Doug Downey, Ana-
per induction. Proceedings of the Third Interna-
Maria Popescu, Tal Shaked, Stephen Soderland,
tional Conference on Autonomous Agents.
Daniel S. Weld, and Alexander Yates. 2005. Unsu-
pervised Named-Entity Extraction from the Web: Ted Pedersen, Siddharth Patwardhan, and Jason
An Experimental Study. Artificial Intelligence, Michelizzi. 2004, WordNet::Similarity - Measuring
2005. the Relatedness of Concepts. Proceedings of Fifth
Annual Meeting of the North American Chapter of
Radu Florian, Hany Hassan, Hongyan Jing, Nanda
the Association for Computational Linguistics
Kambhatla, Xiaqiang Luo, Nicolas Nicolov, and
(NAACL 2004)
Salim Roukos. 2004. A Statistical Model for multi-
lingual entity detection and tracking. Proceedings Ellen Riloff and Rosie Jones. 2003. Learning diction-
of the Human Language Technologies Conference aries for information extraction by multilevel boot-
(HLT-NAACL 2004). strapping. Proceedings of the Sixteenth national
Conference on Artificial Intelligence (AAAI 1999).
Dayne Freitag, and Nicholas Kushmerick. 2000.
Boosted wrapper induction. The 14th European Michael Thelen and Ellen Riloff. 2002. A Bootstrap-
Conference on Artificial Intelligence Workshop on ping Method for Learning Semantic Lexicons using
Machine Learning for Information Extraction Extraction Pattern Contexts. Proceedings of the
2002 Conference on Empirical Methods in Natural
Rayid Ghani and Rosie Jones. 2002. A Comparison of
Language Processing (EMNLP 2002).
Efficacy and Assumptions of Bootstrapping Algo-
rithms for Training Information Extraction Sys- Scott White, and Padhraic Smyth. 2003. Algorithms
tems. Workshop on Linguistic Knowledge Acquisi- for Discoveing Relative Importance in Graphs.
tion and Representation: Bootstrapping Annotated Proceedings of Ninth ACM SIGKDD International
Data at the Linguistic Resources and Evaluation Conference on Knowledge Discovery and Data
Conference (LREC 2002). Mining.
Takaaki Hasegawa, Satoshi Sekine, Ralph Grishman. Zhibiao Wu, and Martha Palmer. 1994. Verb seman-
2004. Discovering Relations among Named Enti- tics and lexical selection. Proceedings of the 32nd
ties from Large Corpora. Proceedings of The 42nd Annual Meeting of the Association for Computa-
Annual Meeting of the Association for Computa- tional Linguistics (ACL 1994).
tional Linguistics (ACL 2004).
Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty.
Taher Haveliwala. 2002. Topic-sensitive PageRank. 2003. Semi-supervised Learning using Gaussian
Proceedings of the 11th International World Wide Fields and Harmonic Functions. Proceedings of
Web Conference the 20th International Conference on Machine
Learning (ICML 2003).
Thorsten Joachims. 2003. Transductive Learning via
Spectral Graph Partitioning. Proceedings of the In-
ternational Conference on Machine Learning
(ICML 2003).
Nanda Kambhatla. 2004. Combining Lexical, Syntac-
tic, and Semantic Features with Maximum Entropy
Models for Information Extraction. Proceedings of
The 42nd Annual Meeting of the Association for
Computational Linguistics (ACL 2004).
John Kleinberg. 1998. Authoritative Sources in a Hy-
perlinked Environment. Proceedings of the 9th
ACM-SIAM Symposium on Discrete Algorithms.
N. Kushmerick, D.S. Weld, R.B. Doorenbos. 1997.
Wrapper Induction for Information Extraction.
Proceedings of the International Joint Conference
on Artificial Intelligence.
Related docs
Get documents about "