Unsupervised Information Extract

Shared by: shimeiyan3
Categories
Tags
-
Stats
views:
8
posted:
2/18/2010
language:
English
pages:
8
Document Sample
scope of work template
							Unsupervised Information Extraction Approach Using Graph Mutual
                         Reinforcement


        Hany Hassan                       Ahmed Hassan                       Ossama Emam

                            IBM Cairo Technology Development Center
                                          Giza, Egypt
                                    P.O. Box 166 Al-Ahram

    hanyh@eg.ibm.com                hasanah@eg.ibm.com                    emam@eg.ibm.com



                                                    tify the most informative patterns, where patterns
                     Abstract                       that match many instances tend to be correct.
                                                    Similarly, instances matched by many patterns
     Information Extraction (IE) is the task of     tend to be correct. The intuition is that large un-
     extracting knowledge from unstructured         supervised data is redundant, i.e. different in-
     text. We present a novel unsupervised          stances of information could be found many
     approach for information extraction            times in different contexts and by different repre-
     based on graph mutual reinforcement.           sentation. The problem can therefore be seen as
     The proposed approach does not require         hubs (instances) and authorities (patterns) prob-
     any seed patterns or examples. Instead, it     lem which can be solved using the Hypertext
     depends on redundancy in large data sets       Induced Topic Selection (HITS) algorithm
     and graph based mutual reinforcement to        (Kleinberg, 1998).
     induce generalized “extraction patterns”.         HITS is an algorithmic formulation of the no-
     The proposed approach has been used to         tion of authority in web pages link analysis,
     acquire extraction patterns for the ACE        based on a relationship between a set of relevant
     (Automatic Content Extraction) Relation        “authoritative pages” and a set of “hub pages”.
     Detection and Characterization (RDC)           The HITS algorithm benefits from the following
     task. ACE RDC is considered a hard task        observation: when a page (hub) links to another
     in information extraction due to the ab-       page (authority), the former confers authority
     sence of large amounts of training data        over the latter.
     and inconsistencies in the available data.        By analogy to the authoritative web pages
     The proposed approach achieves superior        problem, we could represent the patterns as au-
     performance which could be compared to         thorities and instances as hubs, and use mutual
     supervised techniques with reasonable          reinforcement between patterns and instances to
     training data.                                 weight the most authoritative patterns. Highly
                                                    weighted patterns are then used in extracting in-
1     Introduction                                  formation.
                                                       The proposed approach does not need any
In this paper we propose a novel, and completely    seeds or examples. Human involvement is only
unsupervised approach for information extrac-       needed in determining the entities of interest; the
tion. We present a general technique; however       entities among which we are seeking relations.
we focus on relation extraction as an important        The paper proceeds as follows: in Section 2
task of Information Extraction. The approach        we discuss previous work followed by a brief
depends on constructing generalized extraction      definition of our general notation in Section 3. A
patterns, which could match many instances, and     detailed description of the proposed approach
deploys graph based mutual reinforcement to         then follows in Section 4. Section 5 discusses the
weight the importance of these patterns. The mu-    application of the proposed approach to the prob-
tual reinforcement is used to automatically iden-
lem of detecting semantic relations from text.           (Muslea et al., 1999) introduced an inductive
Section 6 discusses experimental results while        algorithm to generate extraction rules based on
the conclusion is presented in Section 7.             user labeled training examples. This approach
                                                      suffers from the labeled data bottleneck.
2    Previous Work                                       (Agichtein et. al, 2000) presented an approach
                                                      using seed examples to generate initial patterns
Most of the previous work on Information Ex-
                                                      and to iteratively obtain further patterns. Then
traction (IE) focused on supervised learning. Re-
                                                      ad-hoc measures were deployed to estimate the
lation Detection and Characterization (RDC) was
                                                      relevancy of the patterns that have been newly
introduced in the Automatic Content Extraction
                                                      obtained. The major drawbacks of this approach
Program (ACE) (ACE, 2004). The approaches
                                                      are: its dependency on seed examples leads to
proposed to the ACE RDC task such as kernel
                                                      limited capability of generalization, and the esti-
methods (Zelenko et al., 2002) and Maximum
                                                      mation of patterns relevancy requires the de-
Entropy methods (Kambhatla, 2004) required the
                                                      ployment of ad-hoc measures.
availability of large set of human annotated cor-
                                                         (Hasegawa et. al. 2004) introduced unsuper-
pora which are tagged with relation instances.
                                                      vised approach for relation extraction depending
However human annotated instances are limited,
                                                      on clustering context words between named enti-
expensive, and time consuming to obtain, due to
                                                      ties; this approach depends on ad-hoc context
the lack of experienced human annotators and the
                                                      similarity between phrases in the context and
low inter-annotator agreements.
                                                      focused on certain types of relations.
   Some previous work adopted weakly super-
                                                         (Etzioni et al, 2005) proposed a system for
vised or unsupervised learning approaches.
                                                      building lists of named entities found on the web.
These approaches have the advantage of not
                                                      Their system uses a set of eight domain-
needing large tagged corpora but need seed ex-
                                                      independent extraction patterns to generate can-
amples or seed extraction patterns. The major
                                                      didate facts.
drawback of these approaches is their depend-
                                                         All approaches, proposed so far, suffer from
ency on seed examples or seed patterns which
                                                      either requiring large amount of labeled data or
may lead to limited generalization due to de-
                                                      the dependency on seed patterns (or examples)
pendency on handcrafted examples. Some of
                                                      that result in limited generalization.
these approaches are briefed here:
    (Brin,98) presented an approach for extracting    3    General Notation
the authorship information as found in books de-
scription on the World Wide Web. This tech-           In graph theory, a graph is a set of objects called
nique is based on dual iterative pattern relation     vertices joined by links called edges. A bipartite
extraction wherein a relation and pattern set is      graph, also called a bigraph, is a special graph
iteratively constructed. This approach has two        where the set of vertices can be divided into two
major drawbacks: the use of handcrafted seed          disjoint sets with no two vertices of the same set
examples to extract more examples similar to          sharing an edge.
these handcrafted seed examples and the use of a         The Hypertext Induced Topic Selection
lexicon as the main source for extracting infor-      (HITS) algorithm is an algorithm for rating, and
mation.                                               therefore ranking, web pages. The HITS algo-
   (Blum and Mitchell, 1998) proposed an ap-          rithm makes use of the following observation:
proach based on co-training that uses unlabeled       when a page (hub) links to another page (author-
data in a particular setting. They exploit the fact   ity), the former confers authority over the latter.
that, for some problems, each example can be          HITS uses two values for each page, the "author-
described by multiple representations.                ity value" and the "hub value". "Authority value"
   (Riloff & Jones, 1999) presented the Meta-         and "hub value" are defined in terms of one an-
Bootstrapping algorithm that uses an un-              other in a mutual recursion. An authority value is
annotated training data set and a set of seeds to     computed as the sum of the scaled hub values
learn a dictionary of extraction patterns and a       that point to that authority. A hub value is the
domain specific semantic lexicon. Other works         sum of the scaled authority values of the authori-
tried to exploit the duality of patterns and their    ties it points to.
extractions for the purpose of inferring the se-         A template, as we define for this work, is a se-
mantic class of words like (Thelen & Riloff,          quence of generic forms that could generalize
2002) and (Lin et al, 2003).
over the given instances. An example template          weighting or induction. Both steps are detailed in
is:                                                    the next sub-sections.
    GPE POS (PERSON)+
                                                       4.1    Initial Patterns Construction
    GPE: Geographical Political En-                    As shown in Figure 1, several syntactic, lexical,
    tity
                                                       and semantic analyzers could be applied to the
    POS: possessive ending
    PERSON: PERSON Entity                              unstructured text. The resulting analyses could be
                                                       employed in the construction of extraction pat-
   This template could match the sentence:             terns. It is worth mentioning that the proposed
“France’s President Jacque Chirac...”. This tem-       approach is general enough to accommodate any
plate is derived from the representation of the        pattern design; the introduced pattern design is
Named Entity tags, Part-of-Speech (POS) tags           for illustration purposes only.
and semantic tags. The choice of the template
representation here is for illustration purpose          American vice President Al Gore said today...
only; any combination of tags, representations
and tagging styles might be used.
   A pattern is more specific than a template. A       Entities PEOPLE    O     O    PERSON O      O...
pattern specifies the role played by the tags (first
entity, second entity, or relation). An example of      POS     ADJ   NOUN_PHRASE NNP VBD CD...
a pattern is:

    GPE(E2)      POS       (PERSON)+(E1)               Tagged PEOPLE NOUN_PHRASE PERSON VBD CD...
                                                       Stream

   This pattern indicates that the word(s) with the
tag GPE in the sentence represents the second
                                                             Figure 1: An example of the output of analys-
en-tity (Entity 2) in the relation, while the
                                                             ers applied to the unstructured text
word(s) tagged PERSON represents the first en-
tity (Entity 1) in this relation, the “+” symbol          Initially, we need to start with some templates
means that the (PERSON) entity is repetitive (i.e.     and patterns to proceed with the induction proc-
may consist of several tokens).                        ess. Relatively large amount of text data is
   A tuple, in our notation during this paper, is      tagged with different taggers to produce the pre-
the result of the application of a pattern to un-      viously mentioned patterns styles. An n-gram
structured text. In the above example, one result      language model is built on this data and used to
of applying the pattern to some raw text is the        construct weighted finite state machines.
following tuple:                                          Paths with low cost (high language model
                                                       probabilities) are chosen to construct the initial
    Entity 1:            Jacque Chirac                 set of templates; the intuition is that paths with
    Entity 2:            France                        low cost (high probability) are frequent and
    Relation:            EMP-Executive                 could represent potential candidate patterns.
                                                          The resulting initial set of templates is applied
4    The Approach                                      to a very large text data to produce all possible
                                                       patterns. The number of candidate initial patterns
The unsupervised graph-based mutual rein-
                                                       could be reduced significantly by specifying the
forcement approach, we propose, depends on the
                                                       candidate types of entities; for example we might
construction of generalized “extraction patterns”
                                                       specify that the first entity could be PEROSN or
that could match many instances. The patterns
                                                       PEOPLE while the second entity could be OR-
are then weighted according to their importance
                                                       GANIZATION, LOCATION, COUNTRY and
by deploying graph based mutual reinforcement
                                                       etc...
techniques. This duality in patterns and extracted
                                                          The candidate patterns are then applied to the
information (tuples) could be stated that patterns
                                                       tagged stream and the unstructured text to collect
could match different tuples, and tuples in turn
                                                       a set of patterns and matched tuples pairs.
could be matched by different patterns. The pro-
                                                          The following procedure briefs the Initial Pat-
posed approach is composed of two main steps
                                                       tern Construction Step:
namely, initial patterns construction and pattern
                                                       • Select a random set of text data.
•     Apply various taggers on text data and con-     that the tuples matched by many different pat-
      struct templates style.                         terns tend to be correct and the patterns matching
                                                      many different tuples tend to be good patterns. In
•     Build n-gram language model on template         other words; we want to choose, among the large
      style data.                                     space of patterns in the data, the most informa-
•     Construct weighted finite state machines        tive, highest confidence patterns that could iden-
      from the n-gram language model.                 tify correct tuples; i.e. choosing the most “au-
                                                      thoritative” patterns in analogy with the hub au-
•     Choose n-best paths in the finite state ma-                                                                   ~      ~
      chines.                                         thority problem. However, both P and T are un-
                                                      known. The induction process proceeds as fol-
•     Use best paths as initial templates.            lows: each pattern p in P is associated with a
•     Apply initial templates on large text data.     numerical authority weight av which expresses
                                                      how many tuples match that pattern. Similarly,
•     Construct initial patterns and associated tu-   each tuple t in T has a numerical hub weight ht
      ples sets.                                      which expresses how many patterns were
                                                      matched by this tuple. The weights are calculated
4.2     Pattern Induction
                                                      iteratively as follows:
The inherent duality in the patterns and tuples                                             T ( p)h ( i ) (u )
relation suggests that the problem could be inter-            a   ( i +1)
                                                                            ( p) =          u =1
                                                                                                                           (1)
preted as a hub authority problem. This problem                                                      H (i )
could be solved by applying the HITS algorithm
                                                              h ( i +1 ) (t ) =
                                                                                               P (t ) a
                                                                                                         (i )
                                                                                                               (u )        (2)
to iteratively assign authority and hub scores to                                              u =1           (i)
                                                                                                         A
patterns and tuples respectively.                     where T(p) is the set of tuples matched by p, P(t)
                                                      is the set of patterns matching t, a ( i +1) ( p ) is the
                 P                  T
                                                      authoritative weight of pattern p at iteration
                 P                  T                 (i + 1) , and h ( i +1) (t ) is the hub weight of tuple t
                                                      at iteration (i + 1) . H(i) and A(i) are normaliza-
                 P                  T
                                                      tion factors defined as:
                                                                                               T ( p)
                                                                                                        h ( i ) (u )
                 P                                                               | P|
                                    T
                                                              H (i ) =           p =1          u =1
                                                                                                                           (3)
                                                                                                     P (t ) ( i )
                 P                  T
                                                              A (i ) =
                                                                                     |T |
                                                                                     v =1            u =1
                                                                                                            a       (u )   (4)
                 P                  T
                                                      Highly weighted patterns are identified and used
                 P                  T                 for extracting relations.
               Patterns          Tuples
                                                      4.3    Tuple Clustering
                                                      The tuple space should be reduced to allow more
      Figure 2: A bipartite graph represent-          matching between pattern-tuple pairs. This space
      ing patterns and tuples                         reduction could be accomplished by seeking a
                                                      tuple similarity measure, and constructing a
   Patterns and tuples are represented by a bipar-    weighted undirected graph of tuples. Two tuples
tite graph as illustrated in figure 2. Each pattern   are linked with an edge if their similarity meas-
or tuple is represented by a node in the graph.       ure exceeds a certain threshold. Graph clustering
Edges represent matching between patterns and         algorithms could be deployed to partition the
tuples. The pattern induction problem can be          graph into a set of homogeneous communities or
formulated as follows: Given a very large set of      clusters. To reduce the space of tuples, we seek a
data D containing a large set of patterns P which     matching criterion that group similar tuples to-
match a large set of tuples T, the problem is to      gether. Using WordNet, we can measure the se-
           ~                                          mantic similarity or relatedness between a pair of
identify P , the set of patterns that match the set
                                ~                     concepts (or word senses), and by extension, be-
of the most correct tuples T . The intuition is       tween a pair of sentences. We use the similarity
measure described in (Wu and Palmer, 1994)                                              HITS algorithm and the highly ranked patterns
which finds the path length to the root node                                            are then used for relation extraction.
from the least common subsumer (LCS) of the
two word senses which is the most specific word                                         5     Experimental Setup
sense they share as an ancestor. The similarity
                                                                                        5.1    ACE Relation Detection and Charac-
score of two tuples, ST, is calculated as follows:
                                                                                               terization
                                                                                        In this section, we describe Automatic Content
     S T = S E1 + S E 2
             2      2
                                                       (5)
                                                                                        Extraction (ACE). ACE is an evaluation con-
                                                                                        ducted by NIST to measure Entity Detection and
where SE1, and SE2 are the similarity scores of the                                     Tracking (EDT) and Relation Detection and
first entities in the two tuples, and their second                                      Characterization (RDC). The EDT task is con-
entitles respectively.                                                                  cerned with the detection of mentions of entities,
   The tuple matching procedure assigns a simi-                                         and grouping them together by identifying their
larity measure to each pair of tuples in the data-                                      coreference. The RDC task detects relations be-
set. Using this measure we can construct an undi-                                       tween entities identified by the EDT task. We
rected graph G. The vertices of G are the tuples.                                       choose the RDC task to show the performance of
Two vertices are connected with an edge if the                                          the graph based unsupervised approach we pro-
similarity measure between their underlying tu-                                         pose. To this end we need to introduce the notion
ples exceeds a certain threshold. It was noticed                                        of mentions and entities. Mentions are any in-
that the constructed graph consists of a set of                                         stances of textual references to objects like peo-
semi isolated groups as shown in figure 3. Those                                        ple, organizations, geopolitical entities (countries,
groups have a very large number of inter-group                                          cities …etc), locations, or facilities. On the other
edges and meanwhile a rather small number of                                            hand, entities are objects containing all mentions
intra-group edges. This implies that using a                                            to the same object. Here, we present some exam-
graph clustering algorithm would eliminate those                                        ples of ACE entities and relations:
weak intra-group edges and produce separate                                                 Spain’s Interior Minister
groups or clusters representing similar tuples. We                                          announced this evening the
used Markov Cluster Algorithm (MCL) for graph                                               arrest of separatist organi-
clustering (Dongen, 2000). MCL is a fast and                                                zation Eta’s presumed leader
scalable unsupervised clustering algorithm for                                              Ignacio Garcia Arregui. Ar-
graphs based on simulation of stochastic flow.                                              regui, who is considered to
                                                                                            be the Eta organization’s
                                                                        T
                                                                                            top man, was arrested at
                               T                               T                    T
  T            T                           T   T                                            17h45 Greenwich. The Spanish
                           T                                       T                        judiciary suspects Arregui
                                       T               T                        T
          T                                                                                 of ordering a failed attack
      T           T                                                                         on King Juan Carlos in 1995.
                               T                   T           T            T
 T                     T
              T                    T           T                    T           T       In this fragment, all the underlined phrases are
                                                           T
                                                                                        mentions to “Eta” organization, or to “Garcia
          Before Clustering                            After Clustering
                                                                                        Arregui”. There is a management relation be-
                                                                                        tween “leader” which references to “Gar-
Figure 3: Applying Clustering Algorithms to Tu-                                         cia Arregui” and “Eta”.
ple graph
                                                                                        5.2    Patterns Construction and Induction
  An example of a couple of tuples that could be                                        We used the LDC English Gigaword Corpus,
matched by this technique is:                                                           AFE source from January to August 1996 as a
  United Stated(E2) presi-                                                              source for unstructured text. This provides a total
  dent(E1)                                                                              of 99475 documents containing 36 M words. In
  US(E2) leader(E1)                                                                     the performed experiments, we focus on two
                                                                                        types of relations EMP-ORG relations and GPE-
   A bipartite graph of patterns and tuple clusters                                     AFF relations which represent almost 50% of all
is constructed. Weights are assigned to patterns                                        relations in RDC – ACE task.
and tuple clusters by iteratively applying the
POS (part of speech) tagger and mention tagger         zation of different amount of highly weighted
were applied to the data, the used pattern design      patterns. Table 2 presents the same results using
consists of a mix between the part of speech           semantic tuple matching and clustering, as de-
(POS) tags and the mention tags for the words in       scribed in section 4.3.
the unsupervised data. We use the mention tag, if
it exists; otherwise we use the part of speech tag.       No. of
An example of the analyzed text and the pre-             Patterns        Precision      Recall    F-Measure
sumed associated pattern is shown:                        1500             35.9         66.3       46.58
                                                          1000             41.2         59.7       48.75
    Text: Eta’s presumed leader                           700              43.1         58.1       49.49
    Arregui …                                             500              46           56.5       50.71
    Pos: NNP POS JJ NN NNP
                                                          400              46.9         52.9       49.72
    Mention: ORG 0 0 0 PERSON
    Pattern: ORG(E2) POS JJ                               200              50.1         44.9       47.36
    NN(R) PERSON(E1)
                                                       Table 1: The effect of varying the number of
An n-gram language model, 5-gram model and             induced patterns on the system performance
back off to lower order n-grams, was built on the      (syntactic tuple matching)
data tagged with the described patterns’ style.
Weighted finite states machines were constructed          No. of
                                                         Patterns        Precision      Recall    F-Measure
with the language model probabilities. The n-best
paths, 20 k paths, were identified and deployed           1500             36.1         67.2       46.97
as the initial template set. Sequences that do not        1000             43.7         59.6       50.43
contain the entities of interest, and hence cannot        700              44.1         59.3       50.58
represent relations, were automatically filtered          500              46.3         57.2       51.18
out. This resulted in an initial templates set of         400              47.3         57.6       51.94
around 3000 element. This initial templates set           200              48.1         45.9       46.97
was applied on the text data to establish initial
patterns and tuples pairs. Graph based mutual          Table 2: The effect of varying the number of
reinforcement technique was deployed with 10           induced patterns on the system performance (se-
iterations on the patterns and tuples pairs to         mantic tuple matching)
weight the patterns.
   We conducted two groups of experiments, the                      80
                                                                    70
first with simple syntactic tuple matching, and
                                                                    60
the second with semantic tuple clustering as de-                    50
scribed in section 4.3                                              40
                                                                    30

6    Results and Discussion                                         20
                                                                    10
                                                                    0
We compare our results to a state-of-the-art su-                            Precision    Recall   F Measure
pervised system similar to the system described              Sup              67.1        54.2      59.96
in (Kambhatla, 2004). Although it is unfair to               Unsup-Syn         46         56.5      50.71

make a comparison between a supervised system                Unsup-Sem        47.3        57.6      51.94

and a completely unsupervised system, we chose
to make this comparison to test the performance
of the proposed unsupervised approach on a real        Figure 4: A comparison between the supervised
task with defined test set and state-of-the-art per-   system (Sup), the unsupervised system with syn-
formance. The supervised system was trained on         tactic tuple matching (Unsup-Syn), and with se-
145 K words which contain 2368 instances of the        mantic tuple matching (Unsup-Sem)
two relation types we are considering.
   The system performance is measured using               Best F-Measure is achieved using relatively
precision, recall and F-Measure with various           small number of induced patterns (400 and 500
amounts of induced patterns. Table 1 presents the      patterns) while using more patterns increases the
precision, recall and F-measure for the two rela-      recall but degrades the precision.
tions using the presented approach with the utili-        Table 2 indicates that the semantic clustering
                                                       of tuples did not provide significant improve-
ment; although better performance was achieved           7      Conclusion and Future Work
with less number of patterns (400 patterns). We
think that the deployed similarity measure and it        In this work, a general framework for unsuper-
needs further investigation to figure out the rea-       vised information extraction based on mutual
son for that.                                            reinforcement in graphs has been introduced. We
   Figure 4 presents the comparison between the          construct generalized extraction patterns and de-
proposed unsupervised systems and the reference          ploy graph based mutual reinforcement to auto-
supervised system. The unsupervised systems              matically identify the most informative patterns.
achieves good results even in comparison to a            We provide motivation for our approach from a
state-of-the-art supervised system.                      graph theory and graph link analysis perspective.
   Sample patterns and corresponding matching            Experimental results have been presented sup-
text are introduced in Table 3 and Table 4. Table        porting the applicability of the proposed ap-
3 shows some highly ranked patterns while Table          proach to ACE Relation Detection and Charac-
4 shows examples of low ranked patterns.                 terization (RDC) task, demonstrating its applica-
                                                         bility to hard information extraction problems.
       Pattern                        Matches            The proposed approach achieves remarkable re-
                       Peruvian President Alberto Fu-    sults comparable to a state-of-the-art supervised
GPE (PERSON)+
                       jimori                            system, achieving 51.94 F-measure compared to
GPE (PERSON)+
                       Zimbabwean President Robert       59.96 F-measure of the state-of-the-art super-
                       Mugabe                            vised system which requires huge amount of hu-
GPE (PERSON)+          PLO leader Yasser Arafat          man annotated data. The proposed approach
                       Zimbabwe 's President Robert
GPE POS (PERSON)+
                       Mugabe
                                                         represents a powerful unsupervised technique for
                       American clinical neuropsy-       information extraction in general and particularly
GPE JJ PERSON                                            for relations extraction that requires no seed pat-
                       chologist
GPE JJ PERSON          American diplomatic personnel     terns or examples and achieves significant per-
PERSON IN JJ GPE       candidates for local government   formance.
ORGANIZATION PER-                                        In our future work, we plan to focus on general-
                       Airways spokesman
SON                                                      izing the approach for targeting more NLP prob-
ORGANIZATION PER-                                        lems.
                       Ajax players
SON
PERSON IN DT (OR-      chairman of the opposition par-
GANIZATION)+           ties
                                                         8      Acknowledgements
(ORGANIZATION)+
PERSON
                       opposition parties chairmans      We would like to thank Salim Roukos for his
                                                         invaluable suggestions and support. We would
Table3: Examples of patterns with high weights           also like to thank Hala Mostafa for helping with
                                                         the early investigation of this work. Finally we
                                                         would like to thank the anonymous reviewers for
         Pattern                  Matches
 GPE CC (PERSON)+         Barcelona and Johan            their constructive criticism and helpful com-
                          Cruyff                         ments.
 GPE , CC PERSON          Paris , but Riccardi
 GPE VBZ VBN PERSON       Pyongyang has accepted         References
                          Gallucci
 GPE VBZ VBN PERSON       Russia has abandoned us        ACE. 2004. The NIST ACE evaluation website.
                                                           http://www.nist.gov/speech/tests/ace/
 GPE VBZ VBN P PER-       Rwanda 's defeated Hutu
 SON                                                     Eugene Agichtein and Luis Gravano. 2000. Snow-
 GPE VBZ VBN PERSON       state has pressed Arafat         ball: Extracting Relations from Large Plain-Text
 GPE VBZ VBN TO VB        Taiwan has tried to keep         Collections. Proceedings of the 5th ACM Confer-
 PERSON                   Lee                              ence on Digital Libraries (DL 2000).
 (PERSON)+ VBD GPE        Alfred Streim told Ger-
 ORGANIZATION             man radio                          Sergy Brin. 1998. Extracting Patterns and Relations
 (PERSON)+ VBD GPE        Dennis Ross met Syrian              from the World Wide Web. Proceedings of the 1998
 ORGANIZATION             army                                International Workshop on the Web and Data-
 (PERSON)+ VBD GPE        Van Miert told EU indus-            bases”
 ORGANIZATION             try
                                                         Stijn van Dongen. 2000. A Cluster Algorithm for
                                                            Graphs. Technical Report INS-R0010, National
Table4: Examples of patterns with low weights
                                                            Research Institute for Mathematics and Computer
                                                            Science in the Netherlands.
Stijn van Dongen. 2000. Graph Clustering by Flow        Winston Lin, Roman Yangarber, Ralph Grishman.
   Simulation. PhD thesis, University of Utrecht          2003. Bootstrapped Learning of Semantic Classes
                                                          from Positive and Negative Examples. Proceedings
Oren Etzioni, Michael Cafarella, Doug Downey, Ana-
                                                          of the 20th International Conference on Machine
  Maria Popescu, Tal Shaked, Stephen Soderland,
                                                          Learning (ICML 2003) Workshop on The Contin-
  Daniel S. Weld, and Alexander Yates. 2004. Web-
                                                          uum from Labeled to Unlabeled Data in Machine
  scale information extraction in KnowItAll (prelimi-
                                                          Learning and Data Mining.
  nary results). In Proceedings of the 13th World
  Wide Web Conference, pages 100-109.                   Ion    Muslea,     Steven   Minton,   and    Craig
                                                           Knoblock.1999. A hierarchical approach to wrap-
Oren Etzioni, Michael Cafarella, Doug Downey, Ana-
                                                           per induction. Proceedings of the Third Interna-
  Maria Popescu, Tal Shaked, Stephen Soderland,
                                                           tional Conference on Autonomous Agents.
  Daniel S. Weld, and Alexander Yates. 2005. Unsu-
  pervised Named-Entity Extraction from the Web:        Ted Pedersen, Siddharth Patwardhan, and Jason
  An Experimental Study. Artificial Intelligence,         Michelizzi. 2004, WordNet::Similarity - Measuring
  2005.                                                   the Relatedness of Concepts. Proceedings of Fifth
                                                          Annual Meeting of the North American Chapter of
Radu Florian, Hany Hassan, Hongyan Jing, Nanda
                                                          the Association for Computational Linguistics
  Kambhatla, Xiaqiang Luo, Nicolas Nicolov, and
                                                          (NAACL 2004)
  Salim Roukos. 2004. A Statistical Model for multi-
  lingual entity detection and tracking. Proceedings    Ellen Riloff and Rosie Jones. 2003. Learning diction-
  of the Human Language Technologies Conference            aries for information extraction by multilevel boot-
  (HLT-NAACL 2004).                                        strapping. Proceedings of the Sixteenth national
                                                           Conference on Artificial Intelligence (AAAI 1999).
Dayne Freitag, and Nicholas Kushmerick. 2000.
  Boosted wrapper induction. The 14th European          Michael Thelen and Ellen Riloff. 2002. A Bootstrap-
  Conference on Artificial Intelligence Workshop on       ping Method for Learning Semantic Lexicons using
  Machine Learning for Information Extraction             Extraction Pattern Contexts. Proceedings of the
                                                          2002 Conference on Empirical Methods in Natural
Rayid Ghani and Rosie Jones. 2002. A Comparison of
                                                          Language Processing (EMNLP 2002).
  Efficacy and Assumptions of Bootstrapping Algo-
  rithms for Training Information Extraction Sys-       Scott White, and Padhraic Smyth. 2003. Algorithms
  tems. Workshop on Linguistic Knowledge Acquisi-         for Discoveing Relative Importance in Graphs.
  tion and Representation: Bootstrapping Annotated        Proceedings of Ninth ACM SIGKDD International
  Data at the Linguistic Resources and Evaluation         Conference on Knowledge Discovery and Data
  Conference (LREC 2002).                                 Mining.
Takaaki Hasegawa, Satoshi Sekine, Ralph Grishman.       Zhibiao Wu, and Martha Palmer. 1994. Verb seman-
  2004. Discovering Relations among Named Enti-           tics and lexical selection. Proceedings of the 32nd
  ties from Large Corpora. Proceedings of The 42nd        Annual Meeting of the Association for Computa-
  Annual Meeting of the Association for Computa-          tional Linguistics (ACL 1994).
  tional Linguistics (ACL 2004).
                                                        Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty.
Taher Haveliwala. 2002. Topic-sensitive PageRank.         2003. Semi-supervised Learning using Gaussian
  Proceedings of the 11th International World Wide        Fields and Harmonic Functions. Proceedings of
  Web Conference                                          the 20th International Conference on Machine
                                                          Learning (ICML 2003).
Thorsten Joachims. 2003. Transductive Learning via
  Spectral Graph Partitioning. Proceedings of the In-
  ternational Conference on Machine Learning
  (ICML 2003).
Nanda Kambhatla. 2004. Combining Lexical, Syntac-
  tic, and Semantic Features with Maximum Entropy
  Models for Information Extraction. Proceedings of
  The 42nd Annual Meeting of the Association for
  Computational Linguistics (ACL 2004).
John Kleinberg. 1998. Authoritative Sources in a Hy-
  perlinked Environment. Proceedings of the 9th
  ACM-SIAM Symposium on Discrete Algorithms.
N. Kushmerick, D.S. Weld, R.B. Doorenbos. 1997.
  Wrapper Induction for Information Extraction.
  Proceedings of the International Joint Conference
  on Artificial Intelligence.

						
Related docs