Unsupervised Information Extract
Shared by: shimeiyan3
Unsupervised Information Extraction Approach Using Graph Mutual Reinforcement Hany Hassan Ahmed Hassan Ossama Emam IBM Cairo Technology Development Center Giza, Egypt P.O. Box 166 Al-Ahram email@example.com firstname.lastname@example.org email@example.com tify the most informative patterns, where patterns Abstract that match many instances tend to be correct. Similarly, instances matched by many patterns Information Extraction (IE) is the task of tend to be correct. The intuition is that large un- extracting knowledge from unstructured supervised data is redundant, i.e. different in- text. We present a novel unsupervised stances of information could be found many approach for information extraction times in different contexts and by different repre- based on graph mutual reinforcement. sentation. The problem can therefore be seen as The proposed approach does not require hubs (instances) and authorities (patterns) prob- any seed patterns or examples. Instead, it lem which can be solved using the Hypertext depends on redundancy in large data sets Induced Topic Selection (HITS) algorithm and graph based mutual reinforcement to (Kleinberg, 1998). induce generalized “extraction patterns”. HITS is an algorithmic formulation of the no- The proposed approach has been used to tion of authority in web pages link analysis, acquire extraction patterns for the ACE based on a relationship between a set of relevant (Automatic Content Extraction) Relation “authoritative pages” and a set of “hub pages”. Detection and Characterization (RDC) The HITS algorithm benefits from the following task. ACE RDC is considered a hard task observation: when a page (hub) links to another in information extraction due to the ab- page (authority), the former confers authority sence of large amounts of training data over the latter. and inconsistencies in the available data. By analogy to the authoritative web pages The proposed approach achieves superior problem, we could represent the patterns as au- performance which could be compared to thorities and instances as hubs, and use mutual supervised techniques with reasonable reinforcement between patterns and instances to training data. weight the most authoritative patterns. Highly weighted patterns are then used in extracting in- 1 Introduction formation. The proposed approach does not need any In this paper we propose a novel, and completely seeds or examples. Human involvement is only unsupervised approach for information extrac- needed in determining the entities of interest; the tion. We present a general technique; however entities among which we are seeking relations. we focus on relation extraction as an important The paper proceeds as follows: in Section 2 task of Information Extraction. The approach we discuss previous work followed by a brief depends on constructing generalized extraction definition of our general notation in Section 3. A patterns, which could match many instances, and detailed description of the proposed approach deploys graph based mutual reinforcement to then follows in Section 4. Section 5 discusses the weight the importance of these patterns. The mu- application of the proposed approach to the prob- tual reinforcement is used to automatically iden- lem of detecting semantic relations from text. (Muslea et al., 1999) introduced an inductive Section 6 discusses experimental results while algorithm to generate extraction rules based on the conclusion is presented in Section 7. user labeled training examples. This approach suffers from the labeled data bottleneck. 2 Previous Work (Agichtein et. al, 2000) presented an approach using seed examples to generate initial patterns Most of the previous work on Information Ex- and to iteratively obtain further patterns. Then traction (IE) focused on supervised learning. Re- ad-hoc measures were deployed to estimate the lation Detection and Characterization (RDC) was relevancy of the patterns that have been newly introduced in the Automatic Content Extraction obtained. The major drawbacks of this approach Program (ACE) (ACE, 2004). The approaches are: its dependency on seed examples leads to proposed to the ACE RDC task such as kernel limited capability of generalization, and the esti- methods (Zelenko et al., 2002) and Maximum mation of patterns relevancy requires the de- Entropy methods (Kambhatla, 2004) required the ployment of ad-hoc measures. availability of large set of human annotated cor- (Hasegawa et. al. 2004) introduced unsuper- pora which are tagged with relation instances. vised approach for relation extraction depending However human annotated instances are limited, on clustering context words between named enti- expensive, and time consuming to obtain, due to ties; this approach depends on ad-hoc context the lack of experienced human annotators and the similarity between phrases in the context and low inter-annotator agreements. focused on certain types of relations. Some previous work adopted weakly super- (Etzioni et al, 2005) proposed a system for vised or unsupervised learning approaches. building lists of named entities found on the web. These approaches have the advantage of not Their system uses a set of eight domain- needing large tagged corpora but need seed ex- independent extraction patterns to generate can- amples or seed extraction patterns. The major didate facts. drawback of these approaches is their depend- All approaches, proposed so far, suffer from ency on seed examples or seed patterns which either requiring large amount of labeled data or may lead to limited generalization due to de- the dependency on seed patterns (or examples) pendency on handcrafted examples. Some of that result in limited generalization. these approaches are briefed here: (Brin,98) presented an approach for extracting 3 General Notation the authorship information as found in books de- scription on the World Wide Web. This tech- In graph theory, a graph is a set of objects called nique is based on dual iterative pattern relation vertices joined by links called edges. A bipartite extraction wherein a relation and pattern set is graph, also called a bigraph, is a special graph iteratively constructed. This approach has two where the set of vertices can be divided into two major drawbacks: the use of handcrafted seed disjoint sets with no two vertices of the same set examples to extract more examples similar to sharing an edge. these handcrafted seed examples and the use of a The Hypertext Induced Topic Selection lexicon as the main source for extracting infor- (HITS) algorithm is an algorithm for rating, and mation. therefore ranking, web pages. The HITS algo- (Blum and Mitchell, 1998) proposed an ap- rithm makes use of the following observation: proach based on co-training that uses unlabeled when a page (hub) links to another page (author- data in a particular setting. They exploit the fact ity), the former confers authority over the latter. that, for some problems, each example can be HITS uses two values for each page, the "author- described by multiple representations. ity value" and the "hub value". "Authority value" (Riloff & Jones, 1999) presented the Meta- and "hub value" are defined in terms of one an- Bootstrapping algorithm that uses an un- other in a mutual recursion. An authority value is annotated training data set and a set of seeds to computed as the sum of the scaled hub values learn a dictionary of extraction patterns and a that point to that authority. A hub value is the domain specific semantic lexicon. Other works sum of the scaled authority values of the authori- tried to exploit the duality of patterns and their ties it points to. extractions for the purpose of inferring the se- A template, as we define for this work, is a se- mantic class of words like (Thelen & Riloff, quence of generic forms that could generalize 2002) and (Lin et al, 2003). over the given instances. An example template weighting or induction. Both steps are detailed in is: the next sub-sections. GPE POS (PERSON)+ 4.1 Initial Patterns Construction GPE: Geographical Political En- As shown in Figure 1, several syntactic, lexical, tity and semantic analyzers could be applied to the POS: possessive ending PERSON: PERSON Entity unstructured text. The resulting analyses could be employed in the construction of extraction pat- This template could match the sentence: terns. It is worth mentioning that the proposed “France’s President Jacque Chirac...”. This tem- approach is general enough to accommodate any plate is derived from the representation of the pattern design; the introduced pattern design is Named Entity tags, Part-of-Speech (POS) tags for illustration purposes only. and semantic tags. The choice of the template representation here is for illustration purpose American vice President Al Gore said today... only; any combination of tags, representations and tagging styles might be used. A pattern is more specific than a template. A Entities PEOPLE O O PERSON O O... pattern specifies the role played by the tags (first entity, second entity, or relation). An example of POS ADJ NOUN_PHRASE NNP VBD CD... a pattern is: GPE(E2) POS (PERSON)+(E1) Tagged PEOPLE NOUN_PHRASE PERSON VBD CD... Stream This pattern indicates that the word(s) with the tag GPE in the sentence represents the second Figure 1: An example of the output of analys- en-tity (Entity 2) in the relation, while the ers applied to the unstructured text word(s) tagged PERSON represents the first en- tity (Entity 1) in this relation, the “+” symbol Initially, we need to start with some templates means that the (PERSON) entity is repetitive (i.e. and patterns to proceed with the induction proc- may consist of several tokens). ess. Relatively large amount of text data is A tuple, in our notation during this paper, is tagged with different taggers to produce the pre- the result of the application of a pattern to un- viously mentioned patterns styles. An n-gram structured text. In the above example, one result language model is built on this data and used to of applying the pattern to some raw text is the construct weighted finite state machines. following tuple: Paths with low cost (high language model probabilities) are chosen to construct the initial Entity 1: Jacque Chirac set of templates; the intuition is that paths with Entity 2: France low cost (high probability) are frequent and Relation: EMP-Executive could represent potential candidate patterns. The resulting initial set of templates is applied 4 The Approach to a very large text data to produce all possible patterns. The number of candidate initial patterns The unsupervised graph-based mutual rein- could be reduced significantly by specifying the forcement approach, we propose, depends on the candidate types of entities; for example we might construction of generalized “extraction patterns” specify that the first entity could be PEROSN or that could match many instances. The patterns PEOPLE while the second entity could be OR- are then weighted according to their importance GANIZATION, LOCATION, COUNTRY and by deploying graph based mutual reinforcement etc... techniques. This duality in patterns and extracted The candidate patterns are then applied to the information (tuples) could be stated that patterns tagged stream and the unstructured text to collect could match different tuples, and tuples in turn a set of patterns and matched tuples pairs. could be matched by different patterns. The pro- The following procedure briefs the Initial Pat- posed approach is composed of two main steps tern Construction Step: namely, initial patterns construction and pattern • Select a random set of text data. • Apply various taggers on text data and con- that the tuples matched by many different pat- struct templates style. terns tend to be correct and the patterns matching many different tuples tend to be good patterns. In • Build n-gram language model on template other words; we want to choose, among the large style data. space of patterns in the data, the most informa- • Construct weighted finite state machines tive, highest confidence patterns that could iden- from the n-gram language model. tify correct tuples; i.e. choosing the most “au- thoritative” patterns in analogy with the hub au- • Choose n-best paths in the finite state ma- ~ ~ chines. thority problem. However, both P and T are un- known. The induction process proceeds as fol- • Use best paths as initial templates. lows: each pattern p in P is associated with a • Apply initial templates on large text data. numerical authority weight av which expresses how many tuples match that pattern. Similarly, • Construct initial patterns and associated tu- each tuple t in T has a numerical hub weight ht ples sets. which expresses how many patterns were matched by this tuple. The weights are calculated 4.2 Pattern Induction iteratively as follows: The inherent duality in the patterns and tuples T ( p)h ( i ) (u ) relation suggests that the problem could be inter- a ( i +1) ( p) = u =1 (1) preted as a hub authority problem. This problem H (i ) could be solved by applying the HITS algorithm h ( i +1 ) (t ) = P (t ) a (i ) (u ) (2) to iteratively assign authority and hub scores to u =1 (i) A patterns and tuples respectively. where T(p) is the set of tuples matched by p, P(t) is the set of patterns matching t, a ( i +1) ( p ) is the P T authoritative weight of pattern p at iteration P T (i + 1) , and h ( i +1) (t ) is the hub weight of tuple t at iteration (i + 1) . H(i) and A(i) are normaliza- P T tion factors defined as: T ( p) h ( i ) (u ) P | P| T H (i ) = p =1 u =1 (3) P (t ) ( i ) P T A (i ) = |T | v =1 u =1 a (u ) (4) P T Highly weighted patterns are identified and used P T for extracting relations. Patterns Tuples 4.3 Tuple Clustering The tuple space should be reduced to allow more Figure 2: A bipartite graph represent- matching between pattern-tuple pairs. This space ing patterns and tuples reduction could be accomplished by seeking a tuple similarity measure, and constructing a Patterns and tuples are represented by a bipar- weighted undirected graph of tuples. Two tuples tite graph as illustrated in figure 2. Each pattern are linked with an edge if their similarity meas- or tuple is represented by a node in the graph. ure exceeds a certain threshold. Graph clustering Edges represent matching between patterns and algorithms could be deployed to partition the tuples. The pattern induction problem can be graph into a set of homogeneous communities or formulated as follows: Given a very large set of clusters. To reduce the space of tuples, we seek a data D containing a large set of patterns P which matching criterion that group similar tuples to- match a large set of tuples T, the problem is to gether. Using WordNet, we can measure the se- ~ mantic similarity or relatedness between a pair of identify P , the set of patterns that match the set ~ concepts (or word senses), and by extension, be- of the most correct tuples T . The intuition is tween a pair of sentences. We use the similarity measure described in (Wu and Palmer, 1994) HITS algorithm and the highly ranked patterns which finds the path length to the root node are then used for relation extraction. from the least common subsumer (LCS) of the two word senses which is the most specific word 5 Experimental Setup sense they share as an ancestor. The similarity 5.1 ACE Relation Detection and Charac- score of two tuples, ST, is calculated as follows: terization In this section, we describe Automatic Content S T = S E1 + S E 2 2 2 (5) Extraction (ACE). ACE is an evaluation con- ducted by NIST to measure Entity Detection and where SE1, and SE2 are the similarity scores of the Tracking (EDT) and Relation Detection and first entities in the two tuples, and their second Characterization (RDC). The EDT task is con- entitles respectively. cerned with the detection of mentions of entities, The tuple matching procedure assigns a simi- and grouping them together by identifying their larity measure to each pair of tuples in the data- coreference. The RDC task detects relations be- set. Using this measure we can construct an undi- tween entities identified by the EDT task. We rected graph G. The vertices of G are the tuples. choose the RDC task to show the performance of Two vertices are connected with an edge if the the graph based unsupervised approach we pro- similarity measure between their underlying tu- pose. To this end we need to introduce the notion ples exceeds a certain threshold. It was noticed of mentions and entities. Mentions are any in- that the constructed graph consists of a set of stances of textual references to objects like peo- semi isolated groups as shown in figure 3. Those ple, organizations, geopolitical entities (countries, groups have a very large number of inter-group cities …etc), locations, or facilities. On the other edges and meanwhile a rather small number of hand, entities are objects containing all mentions intra-group edges. This implies that using a to the same object. Here, we present some exam- graph clustering algorithm would eliminate those ples of ACE entities and relations: weak intra-group edges and produce separate Spain’s Interior Minister groups or clusters representing similar tuples. We announced this evening the used Markov Cluster Algorithm (MCL) for graph arrest of separatist organi- clustering (Dongen, 2000). MCL is a fast and zation Eta’s presumed leader scalable unsupervised clustering algorithm for Ignacio Garcia Arregui. Ar- graphs based on simulation of stochastic flow. regui, who is considered to be the Eta organization’s T top man, was arrested at T T T T T T T 17h45 Greenwich. The Spanish T T judiciary suspects Arregui T T T T of ordering a failed attack T T on King Juan Carlos in 1995. T T T T T T T T T T T In this fragment, all the underlined phrases are T mentions to “Eta” organization, or to “Garcia Before Clustering After Clustering Arregui”. There is a management relation be- tween “leader” which references to “Gar- Figure 3: Applying Clustering Algorithms to Tu- cia Arregui” and “Eta”. ple graph 5.2 Patterns Construction and Induction An example of a couple of tuples that could be We used the LDC English Gigaword Corpus, matched by this technique is: AFE source from January to August 1996 as a United Stated(E2) presi- source for unstructured text. This provides a total dent(E1) of 99475 documents containing 36 M words. In US(E2) leader(E1) the performed experiments, we focus on two types of relations EMP-ORG relations and GPE- A bipartite graph of patterns and tuple clusters AFF relations which represent almost 50% of all is constructed. Weights are assigned to patterns relations in RDC – ACE task. and tuple clusters by iteratively applying the POS (part of speech) tagger and mention tagger zation of different amount of highly weighted were applied to the data, the used pattern design patterns. Table 2 presents the same results using consists of a mix between the part of speech semantic tuple matching and clustering, as de- (POS) tags and the mention tags for the words in scribed in section 4.3. the unsupervised data. We use the mention tag, if it exists; otherwise we use the part of speech tag. No. of An example of the analyzed text and the pre- Patterns Precision Recall F-Measure sumed associated pattern is shown: 1500 35.9 66.3 46.58 1000 41.2 59.7 48.75 Text: Eta’s presumed leader 700 43.1 58.1 49.49 Arregui … 500 46 56.5 50.71 Pos: NNP POS JJ NN NNP 400 46.9 52.9 49.72 Mention: ORG 0 0 0 PERSON Pattern: ORG(E2) POS JJ 200 50.1 44.9 47.36 NN(R) PERSON(E1) Table 1: The effect of varying the number of An n-gram language model, 5-gram model and induced patterns on the system performance back off to lower order n-grams, was built on the (syntactic tuple matching) data tagged with the described patterns’ style. Weighted finite states machines were constructed No. of Patterns Precision Recall F-Measure with the language model probabilities. The n-best paths, 20 k paths, were identified and deployed 1500 36.1 67.2 46.97 as the initial template set. Sequences that do not 1000 43.7 59.6 50.43 contain the entities of interest, and hence cannot 700 44.1 59.3 50.58 represent relations, were automatically filtered 500 46.3 57.2 51.18 out. This resulted in an initial templates set of 400 47.3 57.6 51.94 around 3000 element. This initial templates set 200 48.1 45.9 46.97 was applied on the text data to establish initial patterns and tuples pairs. Graph based mutual Table 2: The effect of varying the number of reinforcement technique was deployed with 10 induced patterns on the system performance (se- iterations on the patterns and tuples pairs to mantic tuple matching) weight the patterns. We conducted two groups of experiments, the 80 70 first with simple syntactic tuple matching, and 60 the second with semantic tuple clustering as de- 50 scribed in section 4.3 40 30 6 Results and Discussion 20 10 0 We compare our results to a state-of-the-art su- Precision Recall F Measure pervised system similar to the system described Sup 67.1 54.2 59.96 in (Kambhatla, 2004). Although it is unfair to Unsup-Syn 46 56.5 50.71 make a comparison between a supervised system Unsup-Sem 47.3 57.6 51.94 and a completely unsupervised system, we chose to make this comparison to test the performance of the proposed unsupervised approach on a real Figure 4: A comparison between the supervised task with defined test set and state-of-the-art per- system (Sup), the unsupervised system with syn- formance. The supervised system was trained on tactic tuple matching (Unsup-Syn), and with se- 145 K words which contain 2368 instances of the mantic tuple matching (Unsup-Sem) two relation types we are considering. The system performance is measured using Best F-Measure is achieved using relatively precision, recall and F-Measure with various small number of induced patterns (400 and 500 amounts of induced patterns. Table 1 presents the patterns) while using more patterns increases the precision, recall and F-measure for the two rela- recall but degrades the precision. tions using the presented approach with the utili- Table 2 indicates that the semantic clustering of tuples did not provide significant improve- ment; although better performance was achieved 7 Conclusion and Future Work with less number of patterns (400 patterns). We think that the deployed similarity measure and it In this work, a general framework for unsuper- needs further investigation to figure out the rea- vised information extraction based on mutual son for that. reinforcement in graphs has been introduced. We Figure 4 presents the comparison between the construct generalized extraction patterns and de- proposed unsupervised systems and the reference ploy graph based mutual reinforcement to auto- supervised system. The unsupervised systems matically identify the most informative patterns. achieves good results even in comparison to a We provide motivation for our approach from a state-of-the-art supervised system. graph theory and graph link analysis perspective. Sample patterns and corresponding matching Experimental results have been presented sup- text are introduced in Table 3 and Table 4. Table porting the applicability of the proposed ap- 3 shows some highly ranked patterns while Table proach to ACE Relation Detection and Charac- 4 shows examples of low ranked patterns. terization (RDC) task, demonstrating its applica- bility to hard information extraction problems. Pattern Matches The proposed approach achieves remarkable re- Peruvian President Alberto Fu- sults comparable to a state-of-the-art supervised GPE (PERSON)+ jimori system, achieving 51.94 F-measure compared to GPE (PERSON)+ Zimbabwean President Robert 59.96 F-measure of the state-of-the-art super- Mugabe vised system which requires huge amount of hu- GPE (PERSON)+ PLO leader Yasser Arafat man annotated data. The proposed approach Zimbabwe 's President Robert GPE POS (PERSON)+ Mugabe represents a powerful unsupervised technique for American clinical neuropsy- information extraction in general and particularly GPE JJ PERSON for relations extraction that requires no seed pat- chologist GPE JJ PERSON American diplomatic personnel terns or examples and achieves significant per- PERSON IN JJ GPE candidates for local government formance. ORGANIZATION PER- In our future work, we plan to focus on general- Airways spokesman SON izing the approach for targeting more NLP prob- ORGANIZATION PER- lems. Ajax players SON PERSON IN DT (OR- chairman of the opposition par- GANIZATION)+ ties 8 Acknowledgements (ORGANIZATION)+ PERSON opposition parties chairmans We would like to thank Salim Roukos for his invaluable suggestions and support. We would Table3: Examples of patterns with high weights also like to thank Hala Mostafa for helping with the early investigation of this work. Finally we would like to thank the anonymous reviewers for Pattern Matches GPE CC (PERSON)+ Barcelona and Johan their constructive criticism and helpful com- Cruyff ments. GPE , CC PERSON Paris , but Riccardi GPE VBZ VBN PERSON Pyongyang has accepted References Gallucci GPE VBZ VBN PERSON Russia has abandoned us ACE. 2004. The NIST ACE evaluation website. http://www.nist.gov/speech/tests/ace/ GPE VBZ VBN P PER- Rwanda 's defeated Hutu SON Eugene Agichtein and Luis Gravano. 2000. Snow- GPE VBZ VBN PERSON state has pressed Arafat ball: Extracting Relations from Large Plain-Text GPE VBZ VBN TO VB Taiwan has tried to keep Collections. Proceedings of the 5th ACM Confer- PERSON Lee ence on Digital Libraries (DL 2000). (PERSON)+ VBD GPE Alfred Streim told Ger- ORGANIZATION man radio Sergy Brin. 1998. Extracting Patterns and Relations (PERSON)+ VBD GPE Dennis Ross met Syrian from the World Wide Web. Proceedings of the 1998 ORGANIZATION army International Workshop on the Web and Data- (PERSON)+ VBD GPE Van Miert told EU indus- bases” ORGANIZATION try Stijn van Dongen. 2000. A Cluster Algorithm for Graphs. Technical Report INS-R0010, National Table4: Examples of patterns with low weights Research Institute for Mathematics and Computer Science in the Netherlands. Stijn van Dongen. 2000. Graph Clustering by Flow Winston Lin, Roman Yangarber, Ralph Grishman. Simulation. PhD thesis, University of Utrecht 2003. Bootstrapped Learning of Semantic Classes from Positive and Negative Examples. Proceedings Oren Etzioni, Michael Cafarella, Doug Downey, Ana- of the 20th International Conference on Machine Maria Popescu, Tal Shaked, Stephen Soderland, Learning (ICML 2003) Workshop on The Contin- Daniel S. Weld, and Alexander Yates. 2004. Web- uum from Labeled to Unlabeled Data in Machine scale information extraction in KnowItAll (prelimi- Learning and Data Mining. nary results). In Proceedings of the 13th World Wide Web Conference, pages 100-109. Ion Muslea, Steven Minton, and Craig Knoblock.1999. A hierarchical approach to wrap- Oren Etzioni, Michael Cafarella, Doug Downey, Ana- per induction. Proceedings of the Third Interna- Maria Popescu, Tal Shaked, Stephen Soderland, tional Conference on Autonomous Agents. Daniel S. Weld, and Alexander Yates. 2005. Unsu- pervised Named-Entity Extraction from the Web: Ted Pedersen, Siddharth Patwardhan, and Jason An Experimental Study. Artificial Intelligence, Michelizzi. 2004, WordNet::Similarity - Measuring 2005. the Relatedness of Concepts. Proceedings of Fifth Annual Meeting of the North American Chapter of Radu Florian, Hany Hassan, Hongyan Jing, Nanda the Association for Computational Linguistics Kambhatla, Xiaqiang Luo, Nicolas Nicolov, and (NAACL 2004) Salim Roukos. 2004. A Statistical Model for multi- lingual entity detection and tracking. Proceedings Ellen Riloff and Rosie Jones. 2003. Learning diction- of the Human Language Technologies Conference aries for information extraction by multilevel boot- (HLT-NAACL 2004). strapping. Proceedings of the Sixteenth national Conference on Artificial Intelligence (AAAI 1999). Dayne Freitag, and Nicholas Kushmerick. 2000. Boosted wrapper induction. The 14th European Michael Thelen and Ellen Riloff. 2002. A Bootstrap- Conference on Artificial Intelligence Workshop on ping Method for Learning Semantic Lexicons using Machine Learning for Information Extraction Extraction Pattern Contexts. Proceedings of the 2002 Conference on Empirical Methods in Natural Rayid Ghani and Rosie Jones. 2002. A Comparison of Language Processing (EMNLP 2002). Efficacy and Assumptions of Bootstrapping Algo- rithms for Training Information Extraction Sys- Scott White, and Padhraic Smyth. 2003. Algorithms tems. Workshop on Linguistic Knowledge Acquisi- for Discoveing Relative Importance in Graphs. tion and Representation: Bootstrapping Annotated Proceedings of Ninth ACM SIGKDD International Data at the Linguistic Resources and Evaluation Conference on Knowledge Discovery and Data Conference (LREC 2002). Mining. Takaaki Hasegawa, Satoshi Sekine, Ralph Grishman. Zhibiao Wu, and Martha Palmer. 1994. Verb seman- 2004. Discovering Relations among Named Enti- tics and lexical selection. Proceedings of the 32nd ties from Large Corpora. Proceedings of The 42nd Annual Meeting of the Association for Computa- Annual Meeting of the Association for Computa- tional Linguistics (ACL 1994). tional Linguistics (ACL 2004). Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty. Taher Haveliwala. 2002. Topic-sensitive PageRank. 2003. Semi-supervised Learning using Gaussian Proceedings of the 11th International World Wide Fields and Harmonic Functions. Proceedings of Web Conference the 20th International Conference on Machine Learning (ICML 2003). Thorsten Joachims. 2003. Transductive Learning via Spectral Graph Partitioning. Proceedings of the In- ternational Conference on Machine Learning (ICML 2003). Nanda Kambhatla. 2004. Combining Lexical, Syntac- tic, and Semantic Features with Maximum Entropy Models for Information Extraction. Proceedings of The 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004). John Kleinberg. 1998. Authoritative Sources in a Hy- perlinked Environment. Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms. N. Kushmerick, D.S. Weld, R.B. Doorenbos. 1997. Wrapper Induction for Information Extraction. Proceedings of the International Joint Conference on Artificial Intelligence.