VIEWS: 3 PAGES: 2 POSTED ON: 1/7/2012
KNOW2: Language understanding technologies for multilingual domain-oriented information access∗ ıas o KNOW2: Tecnolog´ de comprensi´n del lenguaje para el acceso u o multiling¨e a la informaci´n orientada a dominios Eneko Agirre o Irene Castell´n Salvador Climent Jordi Turmo German Rigau UB, GRIAL UOC, GRIAL o Lluis Padr´ EHU, IxA taldea firstname.lastname@example.org email@example.com UPC, TALP firstname.lastname@example.org email@example.com firstname.lastname@example.org email@example.com Resumen: El objetivo de KNOW2 es avanzar en el desarrollo de un entorno inte- o grado que permita la implantaci´n a bajo coste de portales verticales de acceso a la o o n informaci´n para dominios concretos. El proyecto tiene una duraci´n de tres a˜os y acaba de comenzar en enero del 2010. a a Palabras clave: Procesamiento del Lenguaje Natural, An´lisis Sint´ctico, Inter- o a o o o pretaci´n Sem´ntica, Adquisici´n de Conocimiento, Extracci´n de Informaci´n, Re- o o cuperaci´n de Informaci´n Abstract: The goal of the project is to explore integrated environments allowing the cost-eﬀective deployment of vertical information access portals for speciﬁc do- mains. The project started in January 2010, and will last three years. Keywords: Natural Language Processing, Syntactic Analysis, Semantic Interpre- tation, Knowledge Acquisition, Information Extraction, Information Retrieval 1. General description acquired knowledge. (3) The acquired knowl- edge should allow to build cost-eﬀectively New forms of (multilingual) information vertical IA portals for domains. access (MLIA, IA) based on Natural Lan- guage Processing (NLP, specially featuring semantic information) are being adopted by 2. Relation to other projects strong companies such as Google, Microsoft KNOW2 builds on the results of KYOTO or Yahoo: Question Answering has been de- and KNOW. KYOTO1 is a three year Euro- ployed (PowerSet -now part of Microsoft-, pean project which proposes a system that Yahoo Answers, Google), IA centered on en- allows people in communities to deﬁne the tities is being explored (Spock, Yahoo, Silo- meaning of their words and terms in a shared breaker) alongside new navigation strategies Wiki platform so that it becomes anchored (MMexplorer), and cross-lingual IA has been across languages and cultures but also so that deployed by major search engines (Google). a computer can use this knowledge to detect Our project is based on the idea that auto- knowledge and facts in text. We plan to use matic text processing, specially in the seman- and further develop the software and exper- tic layer, is already enabling a new genera- tise gathered in KYOTO. tion of MLIA systems. In order to acquire the KNOW2 is the predecessor of KNOW2, required knowledge and process free-running and it already enhanced Cross Lingual IA text accurately, our strategy has three inter- and Question Answering technology with connected threads: (1) We need to focus on improved NLP technologies for the open- speciﬁc domains, and thus apply text min- domain. With respect to KNOW, KNOW2 ing and domain adaptation techniques to im- aims to obtain better performance by using prove NLP tools and resources, including in- two main strategies: (i) moving from gener- ference and reasoning capabilities. (2) The al to speciﬁc domains and (ii) incorporating users and domain experts need to be included text-mining and collaborative interfaces. in the loop, via collaborative interfaces to the 1 http://www.kyoto-project.eu ∗ 2 TIN2009-14715-C04 http://ixa.si.ehu.es/know 3. Project coordination rich (and adapt to the domain) current multi- The ambitious goals on the project can on- lingual knowledge bases with concepts, rela- ly be achieved gathering a critical mass of re- tions and factual events. The acquisition will searchers. For this reason KNOW2 has been be driven by automatically captured docu- designed as a coordinated project integrating ment collections. the research and the multilingual abilities of - Development of a collaborative interface to three groups, which are structured in three the domain knowledge. This wiki-style inter- subprojects with well-deﬁned goals: face will allow the user community to man- Subproject 1 (EHU) focuses in manage- age the whole process, including the edition ment and design, development of collabora- of the acquired concepts, domain ontologies, tive interfaces, reasoning and inference, lay- and the extraction rules. ers, linguistic processors for Basque, question - Integration of all acquired knowledge in a answering, extraction of multilingual lexical single Multilingual Central Repository. De- knowledge, adaptation of linguistic proces- velopment of a semantic engine which will in- sors to the domain, integration of the knowl- clude new techniques for automatic reasoning edge gathered in the rest of subprojects and and inference, and which will be adapted to evaluation. the domain. Subproject 2 (UPC) focuses on the study, - Development of prototypes for the mono- evaluation and comparison of advanced text lingual and multilingual IA to the documents mining techniques to support the building and factual information extracted from them. of domain ontologies; this goal involves en- It will include Information Retrieval, Cross- hancement of machine learning techniques Lingual IA and Question Answering demon- and improvements in sintactic-semantic pro- strators. cessors and knowledge acquisition for text - Resources, tools, and applications will be classiﬁcation, information extraction, ques- evaluated in international benchmarks and tion answering and textual entailment. competitions whenever possible. Subproject 3 (UOC-UB) focuses in lin- guistic research for developing semantic 5. Deﬁning cases of use in real processors and in building lexical-semantic scenarios knowledge bases (WordNets) for Spanish KNOW2 will produce demonstrators and and Catalan using Machine-Translation and prototypes on diﬀerent cases of use in real Computer-Assisted Translation techniques. scenarios related to speciﬁc domains, such as environment, European parliament, ge- 4. Speciﬁc objectives ographic text and/or popular science and The main objective is to improve current technology (including public portals like MLIA systems with research that enables the zientzia.net and BasqueResearch, part of Al- construction of an integrated environment al- phaGalileo). We are currently working on lowing the cost-eﬀective deployment of verti- the deﬁnition of such set of cases of use in cal IA portals for domains, which comes down collaboration with collaborating companies to the following speciﬁc objectives: (EPOs). In this sense, we are opened to any - Adoption of current standards for the rep- kind of suggestions from interested compa- resentation of linguistic annotations, both of nies. documents and of semantic resources. This KNOW2 wants to apply state-of-the-art adoption will enable easier interoperability research to real scenarios. The adoption of and an easier adoption of KNOW2 technolo- recent representation standards and free soft- gy by the industry. In addition, KNOW2 will ware licenses should facilitate technology support free software licenses of all developed transfer to industrial environments. tools and resources. - Development of robust linguistic processors, including semantic processing, for Basque, Catalan and Spanish; procedures to adapt those processors, and English ones, to the tar- get domain; analysis of discourse structure. - Development of knowledge mining tech- niques, which will mine domain texts and en-
"KNOW2 Language understanding technologies for multilingual "