Powerpoint

Language Technology and the Semantic Web

You must be logged in to download this document
Reviews
Shared by: sammyc2007
Stats
views:
72
downloads:
5
rating:
not rated
reviews:
0
posted:
3/31/2008
language:
English
pages:
0
Language Technology and the Semantic Web Thierry Declerck & Paul Buitelaar (Saarland University & DFKI GmbH) We present collaborative research work on the combination of language technology (LT) and technologies for encoding (domain) knowledge in ontologies, supporting the emergence of the Semantic Web (SW), or maybe more appropriate: Semantic Webs MUMIS (dealing with multimedia content indexing and searching in the soccer domain, finished in December 2002) MuchMore (dealing with cross-lingual information retrieval in the medical domain, finished in Mai 2003) Esperonto (developing a Semantic Annotation Service for upgrading the actual Web to the Semantic Web, Sept. 2002 Mai 2005) T. Declerck, P. Buitelaar 2 Semantic Web Applications of LT Supporting accurate ontology-based semantic annotation of multilingual web documents (Knowledge Markup) Supporting Ontology Learning/Construction from linguistically/semantically annotated multilingual text (Knowledge Extraction) See also the Special Interest Group (SIG-5) OntoWeb-lt on Language Technology in Ontology Development and Use: http://ontoweb-lt.dfki.de T. Declerck, P. Buitelaar 3 Knowledge Markup and Knowledge Extraction Text/Speech Text/Speech Mining Linguistic and Semantic Annotations Concepts, Relations, Events Linguistic Analysis Morpho-Syntactic Analysis and Tagging, Semantic Class Tagging, Term/NE Recognition, Grammatical Function Tagging, Dependency Structure Analysis T. Declerck, P. Buitelaar 4 Knowledge Markup and Knowledge Extraction (2) Text/Speech/Image-Video Text/Speech/Media Mining Linguistic, Low-level Image and Semantic Annotations Concepts, Relations, Events Linguistic and Media Analysis T. Declerck, P. Buitelaar 5 Integration of Language Technology and Domain Knowledge T. Declerck, P. Buitelaar 6 Linguistic Analysis Language technology tools are needed to support the upgrade of the actual web to the Semantic Web (SW) by providing an automatic analysis of the linguistic structure of textual documents. Free text documents undergoing linguistic analysis become available as semi-structured documents, from which meaningful units can be extracted automatically (information extraction) and organized through clustering or classification (text mining). Here we focus on the following linguistic analysis steps that underlie the extraction tasks: morphological analysis, part-of-speech tagging, chunking, dependency structure analysis, semantic tagging. T. Declerck, P. Buitelaar 7 Morphological Analysis Morphological analysis is concerned with the inflectional, derivational, and compounding processes in word formation in order to determine properties such as stem and inflectional information. Together with part-of-speech (PoS) information this process delivers the morpho-syntactic properties of a word. While processing the German word Häusern (houses) the following morphological information should be analysed: [PoS=N NUM=PL CASE=DAT GEN=NEUT STEM=HAUS] T. Declerck, P. Buitelaar 8 Part-of-Speech Tagging Part-of-Speech (PoS) tagging is the process of determining the correct syntactic class (a part-of-speech, e.g. noun, verb, etc.) for a particular word given its current context. The word “works” in the following sentences will be either a verb or a noun: He works [N,V] the whole day for nothing. His works [N,V] have all been sold abroad. PoS tagging involves disambiguation between multiple part-ofspeech tags, next to guessing of the correct part-of-speech tag for unknown words on the basis of context information. T. Declerck, P. Buitelaar 9 Chunking Following Abney: chunks as the non-recursive parts of core phrases, such as nominal, prepositional, adjectival and adverbial phrases and verb groups. Chunk parsing is an important step towards making natural language processing robust, since the goal of chunk parsing is not to deliver a full analysis of sentences, but to extract just the linguistic fragments that can be surely identified. However, even if this strategy fails to produce an analysis for the whole sentence, the partial linguistic information gained so far will still be useful for many applications, such as information extraction and text mining. T. Declerck, P. Buitelaar 10 Named Entities detection Related to chunking is the recognition of so-called named entities (names of institutions and companies, date expressions, etc.). The extraction of named entities is mostly based on a strategy that combines look up in gazetteers (lists of companies, cities, etc.) with the definition of regular expression patterns. Named entity recognition can be included as part of the linguistic chunking procedure and the following sentence fragment: “…the secretary-general of the United Nations, Kofi Annan,…” will be annotated as a nominal phrase, including two named entities: United Nations with named entity class: organization, and Kofi Annan with named entity class: person T. Declerck, P. Buitelaar 11 Dependency Structure Analysis A dependency structure consists of two or more linguistic units that immediately dominate each other in a syntax tree. The detection of such structures is generally not provided by chunking but is building on the top of it. There are two main types of dependencies that are relevant for our purposes: On the one hand, the internal dependency structure of phrasal units or chunks and on the other hand the socalled grammatical functions (like subject and direct object). T. Declerck, P. Buitelaar 12 Internal Dependency Structure In linguistic analysis, for this we use the terms head, complements and modifiers, where the head is the dominating node in the syntax tree of a phrase (chunk), complements are necessary qualifiers thereof, and modifiers are optional qualifiers.Consider the following example: . “The shot by Christian Ziege goes over the goal.” The prepositional phrase “by Christian Ziege” (containing the named entity Christian Ziege) depends on (and modifies) the head noun “shot”. T. Declerck, P. Buitelaar 13 Grammatical Functions Determine the role (function) of each of the linguistic chunks in the sentence and allow to identify the actors involved in certain events. So for example in the following sentence, the syntactic (and also the semantic) subject is the NP constituent “The shot by Christian Ziege”: “The shot by Christian Ziege goes over the goal.” This nominal phrase depends on (and complements) the verb “goes”, whereas the Noun “shot” is the head of the NP (it this the shot going over the goal, and not Christian Ziege!) T. Declerck, P. Buitelaar 14 Semantic Tagging Automatic semantic annotation has developed within language technology in recent years in connection with more integrated tasks like information extraction, which require a certain level of semantic analysis. Semantic tagging consists in the annotation of each content word in a document with a semantic category. Semantic categories are assigned on the basis of a semantic resources like WordNet for English or EuroWordNet, which links words between many European languages through a common inter-lingua of concepts. T. Declerck, P. Buitelaar 15 Semantic Resources Semantic resources are captured in dictionaries, thesauri, and semantic networks, all of which express, either implicitly or explicitly, an ontology of the world in general or of more specific domains, such as medicine. They can be roughly distinguished into the following three groups: Thesauri: Semantic resources that group together similar words or terms according to a standard set of relations, including broader term, narrower term, sibling, etc. (like Roget) Semantic Lexicons: Semantic resources that group together words (or more complex lexical items) according to lexical semantic relations like synonymy, hyponymy, meronymy, and antonymy (like WordNet) Semantic Networks: Semantic resources that group together objects denoted by natural language expressions (terms) according to a set of relations that originate in the nature of the domain of application (like UMLS in the medical domain) T. Declerck, P. Buitelaar 16 The MeSH Thesaurus MeSH (Medical Subject Headings) is a thesaurus for indexing articles and books in the medical domain, which may then be used for searching MeSH-indexed databases. MeSH provides for each term a number of term variants that refer to the same concept. It currently includes a vocabulary of over 250,000 terms. The following is a sample entry for the term gene library (MH is the term itself, ENTRY are term variants): MH ENTRY = ENTRY = ENTRY = ENTRY = etc. = Gene Library Bank, Gene Banks, Gene DNA Libraries Gene Bank T. Declerck, P. Buitelaar 17 The WordNet Semantic Lexicon WordNet has primarily been designed as a computational account of the human capacity of linguistic categorization and covers an extensive set of semantic classes (called synsets). Synsets are collections of synonyms, grouping together lexical items according to meaning similarity. Synsets are actually not made up of lexical items, but rather of lexical meanings (i.e. senses) T. Declerck, P. Buitelaar 18 WordNet: An example The word 'tree' has two meanings that roughly correspond to the classes of plants and that of diagrams, each with their own hierarchy of classes that are included in more general super-classes: 09396070 tree 0 09395329 woody_plant 0 ligneous_plant 0 09378438 vascular_plant 0 tracheophyte 0 00008864 plant 0 flora 0 plant_life 0 00002086 life_form 0 organism 0 being 0 living_thing 0 00001740 entity 0 something 0 10025462 tree 0 tree_diagram 0 09987563 plane_figure 0 two-dimensional_figure 0 09987377 figure 0 00015185 shape 0 form 0 00018604 attribute 0 00013018 abstraction 0 T. Declerck, P. Buitelaar 19 CyC: A Semantic Network CYC is a semantic network of over 1,000,000 manually defined rules that cover a large part of common sense knowledge about the world . For example, CYC knows that trees are usually outdoors, or that people who died stop buying things. Each concept in this semantic network is defined as a constant, which can represent a collection (e.g. the set of all people), an individual object (e.g. a particular person), a word (e.g. the English word person), a quantifier (e.g. there exist), or a relation (e.g. a predicate, function, slot, attribute). The entry for the predicate #$mother: #$mother : (#$mother ANIM FEM) isa: #$FamilyRelationSlot #$BinaryPredicate This says that the predicate #$mother takes two arguments, the first of which must be an element of the collection #$Animal, and the second of which must be an element of the collection #$FemaleAnimal. T. Declerck, P. Buitelaar 20 Word Sense Disambiguation Words mostly have more than one interpretation, or sense. If natural language were completely unambiguous, there would be a one-to-one relationship between words and senses. In fact, things are much more complicated, because for most words not even a fixed number of senses can be given. Therefore, only in certain circumstances and depending on what we mean exactly with sense, can we give restricted solutions to the problem of Word Sense Disambiguation (WSD) T. Declerck, P. Buitelaar 21 A simplified Example of a Domain Ontology Ontology_1: Movies Title: String Date: mm/dd/yyyy Duration: minutes Type: (action, drama,..) Director: String Main Actors: Name_1: Role: Name_2: Role: Name_3: Role: … … T. Declerck, P. Buitelaar Instances Ontology_1: Movies Title: Lord of the Rings Date: Duration: Type: Director: Peter Jackson Main Actors: Name_1: Role: Name_2: Role: Name_3: Role: … … 22 Example of RDF Schema for the Movie Ontology Details of company that created special effects in this movie Films that deal solely with police activity etc… T. Declerck, P. Buitelaar 23 Integration of Ontology and Semantic Lexicon Example of Semantic Lexicon is WordNet (sometimes also referred to as a ‘Linguistic Ontology’) Ontologies are domain specific models, usually lacking linguistic information (PoS, Morphology, Syntax etc.) To be Integrated in One Resource or Kept/Accessed Separately? Standardization  Format Web-based Standards for Lexical Semantic Representation will Increase their Uptake (Easy Plug-and-Play, Remote Access, etc.) Widely Used (Lexical) Semantic Resources will lead to (Further) Semantic Standardization 24  Content T. Declerck, P. Buitelaar Defining a Linguistic Ontology for the Art World (Tentative) T. Declerck, P. Buitelaar 25 Defining Art World Concepts (Classes, “Synsets”) (Tentative) art-world.01 T. Declerck, P. Buitelaar 26 Defining Properties (“Selection Restrictions”) (Tentative) T. Declerck, P. Buitelaar 27 (Semantic) Lexicons will be… …an Important Part of the Semantic Web …Represented Using Markup Languages (RDF) …Accessible in a Remote, Distributed Fashion …Central to Further Semantic Standardization T. Declerck, P. Buitelaar 28 Multilingual terminological lexicon, attached to a domain ontology (MUMIS) <... lang="DE" type="main">Torschuss <... lang="EN" type="main">shot on goal <... lang="NL" type="main">schot op doel ein Angriffsspieler kickt den Ball zu den gegnerischen Tor <... lang="DE" type="synonym">Distanzschuss <... lang="DE" type="synonym">Nachschuss <... lang="DE" type="synonym">Schuss <... lang="DE" type="synonym">abzieh T. Declerck, P. Buitelaar 29 Extension and Formalization of the multilingual terminological lexicon, including syncategorematic information. Supporting WSD. <...lang = "DE" type = "main„ pos = „N“ mod = {„von concept = „Player“ | concept = „player“ gender = „gen“ | pos = „posspron“ } >Torschuss <...lang="DE" type="synonym„ pos = „V“ comp = {„SUBJ“ concept = „Player“} >abzieh URL: DFB home page/glossary T. Declerck, P. Buitelaar 30 Integrating Syntactic and Domain Knowledge Including Syntactic Analysis for a more accurate tagging of domain specific semantic annotation T. Declerck, P. Buitelaar 31 Abstraction over Syntactic Annotation Ontology_3: Dependencies Head Comp Mod Spec Ontology_2: PP Head: Prep Type: {LocPP,DatePP, etc.} Comp: NP Ontology_1: NP Head:N Mod: {Adj*,PP?,GenNP} Spec: {Det? PossPron?} Type: {RefNP, ProNP, DateNP,etc.} T. Declerck, P. Buitelaar Ontology_4: Grammatical Functions Subject, Object, Ind. Object NP Adjunct, PP Adjunct, etc.. 32 Merging of Syntactic and Domain Knowledge Example of a possible rule for conceptual annotation: If (Head of Subj_NP of Verb[type=soccer::shot-on-goal] is a person) => { annotate head of NP with semantic class “soccer::player”; …} Example of a rule for Instance Filling: If (term annotated with concept “soccer::player”) => { try to find information about relations “Team”, “Age” etc. } (Template Filling in Information Extraction). T. Declerck, P. Buitelaar 33 NLP-based knowledge markup T. Declerck, P. Buitelaar 34 MuchMore: DTD for Annotation umlsterms umlsterm id from to cui id from to cui id term1 term2 type code pref tui msh code pref tui msh xrceterms xrceterm semrels document sentence ewnterms semrel ewnterm sense offset gramrels gramrel id type id from to type id pos lemma chunks chunk text token T. Declerck, P. Buitelaar 35 MuchMore: Linguistic Annotation (Lemmatization, POS, Basic Chunking) Balint syndrom is a combination of symptoms including simultanagnosia, a disorder of spatial and object-based attention, disturbed spatial perception and representation, and optic ataxia resulting from bilateral parieto-occipital lesions. id="w1" id="w2" id="w3" id="w4" id="w5" id="w6" id="w7" pos="NN">Balint pos="NN">syndrom pos="VBZ" lemma="be">is pos="DT" lemma="a">a pos="NN" lemma="combination">combination pos="IN" lemma="of">of pos="NNS" lemma="symptom">symptoms pos="JJ" pos="NN" pos="CC" pos="NN" lemma="spatial">spatial lemma="perception">perception lemma="and">and lemma="representation">representation id="w20" id="w21" id="w22" id="w23" > T. Declerck, P. Buitelaar 36 MuchMore: Semantic Annotation (UMLS, EuroWordNet) Balint syndrom is a combination of symptoms including simultanagnosia, a disorder of spatial and object-based attention, disturbed spatial perception and representation, and optic ataxia resulting from bilateral parieto-occipital lesions. T. Declerck, P. Buitelaar 37 MUMIS: DTD for Linguistic Annotation Subord-Clause AdvP Document Paragraph Sentence AP NE NP PP T. Declerck, P. Buitelaar VG 38 MUMIS: DTD for Linguistic Annotation TYPE STRUK AP AP_AGR STRING W AP_HEAD T. Declerck, P. Buitelaar 39 MUMIS: DTD for Linguistic Annotation TYPE VG_TYPE VG_SUBCAT_STEM VG_AGR STRING VG SENT_STRING KLAMMER STRUK W VG_STRG VG_HEAD ... T. Declerck, P. Buitelaar VG 40 MUMIS: DTD for Linguistic Annotation STEM INFL POS TC TYPE STRING CLAUSE_PP_ADJUNKT CLAUSE_SUBJ SENT_STRING CLAUSE_PRED_SUBCAT CLAUSE CLAUSE_TYPE CLAUSE_VG_LIST CLAUSE_PP_LIST CLAUSE_NP_LIST CLAUSE_PRED_STRG CLAUSE_PRED_AGR ... 41 W T. Declerck, P. Buitelaar MUMIS: Linguistic Annotation (Lemmatization … Dependency Structure) Industrie, Handel und Dienstleistungen werden in der ersten Liste aufgeführt, wobei die in Klammern gesetzten Zahlen auf die Mutterfirmen hinweisen. (Industry, trade and services are mentioned in the first list, in which numbers within brackets point to parent companies.) from="w1" to="w5" type="NP" head=”w1,w3,w5”/> from="w6" to="w6" type="VG"/> from="w7" to="w10" type="PP" head=”w7” complement=”w8,w9,w10”/> from="w11" to="w1" type="VG"/> T. Declerck, P. Buitelaar 42 MUMIS: Semantic Annotation (Events) 7. Ein Freistoss von Christian Ziege aus 25 Metern geht über das Tor. <... lang="DE" type="main">zweite Halbzeit <... lang="EN" type="main">second half <... lang=„ES" type="main">reanudacion …. Processing with the SCHUG system: Example T. Declerck, P. Buitelaar 55 Conceptual Annotations for Multimedia Indexing and Retrieval: MUMIS Automatic Speech Recognition Formal Formal Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Speech Text Text Formal Formal Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text TransText Text Domain Modeling Formal Formal Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Soccer Text Text Ontology ASR Signals cripts DM Texts Ontology Ontology Domain Lexicon Information Extraction Formal Formal Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Text Formal Formal Formal Text Text Formal Text Formal Formal Text Text Formal Text Formal Formal Text Free Text Text Merging Formal Text Formal Text Formal AnnoText Text Merged Annotated formal text IE tations Text Legend User Interface Ontology Ontology Query UI Merging EN NL DE T. Declerck, P. Buitelaar FORMAL Annotations 56 The first user interface of MUMIS T. Declerck, P. Buitelaar 57 Esperonto:Partners Intelligent Software Components (Coord) : Semantic Web, Annotation Services. UPM : ontology development and evaluation. University of Innsbruck : Semantic Web languages. Saarland University : multilingual Annotation services, using Information Extraction UNILIV : Semantic indexation of Semantic Web content. Routing solutions. Visualization and navigation to make content presentation user-friendlier. Residencia de Estudiantes : Content provider. Cultural tour test case. Evaluation. CIDEM : Content provider. Fund finder test case. Evaluation. BioVista : Content provider. Scientific Discovery test case. T. Declerck, P. Buitelaar 58 Aim Application Service Provision of Semantic Annotation, Aggregation, Indexing and Routing of Textual, Multimedia, and Multilingual Web Content The project aims at bridging the gap between the actual World Wide Web and the semantic Web by providing a service to "upgrade" existing content to semantic Web content. Ontologies play a key role in this effort, together with multilingual Natural Language Analysis of textual documents currently in the web as free or HTML encoded texts. T. Declerck, P. Buitelaar 59 Main Goals  To bridge the gap between the current web and the Semantic Web: SemASP  Ontology-based annotation  Sources:  Static pages  Pages dinamically generated from DB  Textual and multimedia information  Web services  Added value knowledge-based services on top of the constructed semantic web  Routing based on P2P communication  Semantic aggregation  Meaning negotiation  Support Multilinguality on ontology construction, ... T. Declerck, P. Buitelaar 60 Agent Applications Multilingual NL Generation Visualization Service Provider Semantic Web Router Portal Agent Router Router Router Router Semantic indices, Concept instances Certificate Workbench Ontology Repository Service Multilinguality Reengineering Mapping Tagger/ Wrapper Tagger/ Wrapper Tagger/ Wrapper Tagger/ Wrapper SemASP Maintenance Multilingual NL Understanding XML DAML OIL RDF(S) World Wide Web T. Declerck, P. Buitelaar Web Server Provider Dynamic Information Provider Static 61 Multimedia Data Information Provider Provider Ontology-based Annotation Annotate accurately document with concepts and terms described in various semantic resources: EuroWordNet, UMLS, Soccer ontology etc. Annotate documents with relations defined in the ontology T. Declerck, P. Buitelaar 62 Ontology construction from Text There are various methodologies under investigation for extracting/learning knowledge from text, and to encode it in an ontology (see Ontology Learning Overview - OntoWeb D1.5 http://www.ontoweb.org). Many are based on Machine Learning techniques We discuss here the possibility of a rule-based approach for partial and shallow ontology construction from text, based on various levels of syntactic patterns annotated in the documents. T. Declerck, P. Buitelaar 63 Ontology construction from Text: A starting experiment: Medicine Document Set: 65 sample phrases that link symptoms with Rheumatoid Arthritis (RA). T. Declerck, P. Buitelaar 64 Ontology construction from Text: Apposition and Paranthesis (1) The effects of rheumatoid arthritis on bone include structural joint damage (erosions) and osteoporosis “ “ Linguistic Structure: [[The effects of rheumatoid arthritis] [on bone]] [include] [[structural joint damage ( erosions )] [ and] [osteoporosis]] => The Apposition (2 syntactic heads “joint” and “erosions” in one NP) including a parenthesis construction suggests a synonymy relation or a definition. Heuristic: Establishing Semantic Relations on the top of linguistic “head-modifiers” constructions T. Declerck, P. Buitelaar 65 Ontology construction from Text: Apposition with Paranthesis (2) “For symptoms of rheumatoid arthritis (pain, joint stiffness), the reference treatment is a nonsteroidal antiinflammatory drug (NSAID) such as diclofenac or ibuprofen.” Linguistic Structure [For symptoms of rheumatoid arthritis ( pain , joint stiffness )] , [the reference treatment] [is] [a nonsteroidal antiinflammatory drug ( NSAID)] Suggesting a semantic relation between („pain“ and „joint stiffness“) Classify „pain“ and „joint stiffness“ as symptom of RA. The word „symptom“ is linguistically annotated as the head of the Compl-NP of the PP starting with „For“. T. Declerck, P. Buitelaar 66 Ontology construction from Text: Apposition with Paranthesis (3) But there is a need for constraining the hypothesis: “In patients with rheumatoid arthritis (RA)” => RA is abbreviation of rheumatoid arthritis And in the sentence: “Fourteen consecutive elbows have been treated for rheumatoid arthritis (9 elbows) and for post-traumatic osteoarthrosis (5 elbows) by total elbow replacement with the GSB III implant. “, the parenthesis (9 elbows) and (5 elbows) have no semantic relations to the preceding head nouns! T. Declerck, P. Buitelaar 67 Ontology construction from Text: Apposition with commas “Etoricoxib, a selective COX2 inhibitor, has been shown to be as effective as non-selective non-steroidal anti-inflammatory drugs in the management of chronic pain in rheumatoid arthritis and osteoarthritis, …” Linguistic Structure: [Etoricoxib, a selective COX2 inhibitor,] [has been shown]… The same hypothesis as in the former examples: a semantic relation between “Etoricoxib” and “selective COX2 inhibitor”. Probably a “isa” relation T. Declerck, P. Buitelaar 68 Ontology construction from Text: Compound Analysis „Joints destructions, „joint damage“, „joint disease“, „joint stiffness“ but „joint cartilage“. „Knee joints“ vs. „tender joints” What can happen to joins, where are joints located?. Use of synsets to detect relations? „Joint cartilage“ is not a disease. T. Declerck, P. Buitelaar 69 Ontology construction from Text: PP post-modification „inflammation of joints, synovial lining of joints” Here: use of synsets for grouping that what can happen to joints? T. Declerck, P. Buitelaar 70 Ontology construction from Text: Phrase Internal Coordination “The effects of rheumatoid arthritis on bone include structural joint damage (erosions) and structural joint damage “ Linguistic Structure: [[The effects of rheumatoid arthritis] [on bone]] [include] [[structural joint damage ( erosions )] [ and] [osteoporosis]] RA causes structural joint damage AND structural joint damage (interpreting the head noun “effects” as a causation). Hypothesis: The two heads of an NP coordination are somehow related. T. Declerck, P. Buitelaar 71 Ontology construction from Text: Phrase Internal Coordination (2) “A study was conducted to determine the incidence of ulnar and peripheral neuropathy “ Linguistic Structure: … [The incidence of [[ulnar and peripheral] neuropathy]]  The AP “ulnar and peripheral” AP modifies the head noun “neuropathy”. The AP is a coordinated one, having two Adjectival heads.  Hypothesis: They correspond to two types of neuropathy T. Declerck, P. Buitelaar 72 Ontology construction from Text: Subject Verb Objetcs (Ind. Obj. etc.) [Rheumatoid arthritis is an immunologically mediated inflammation of joints of unknown aetiology] and [often leads to disability] => RA leads to Disability (effect of ellipsis resolution: RA detected as the subject of the verb „leads“, even if not realised in text. Reference resolution very important for knowledge extraction) => Lexical semantic info: collects all objects of RA leads to … =>Suggest Causality (verb lead + to) T. Declerck, P. Buitelaar 73 Ontology construction from Text: Subject Verb Objects (Ind. Obj etc.) “These changes constitute hallmarks of synovial cell activation and contribute to both chronic inflammation and hyperplasia” On line exercise! T. Declerck, P. Buitelaar 74 Future Work Still have to identify accurately the sub-set of linguistic tags, describing syntactic/semantic patterns that are relevant for ontology extraction (or even ontology mark-up). T. Declerck, P. Buitelaar 75 First Conclusions Construction of partial and shallow ontologies from (complex) syntactic patterns seems feasible. It might seem “expensive” in the sense that documents first should be (automatically) linguistically annotated. But Machine Learning methods also needs a lot of semi-automatically annotated data for training. A need to conduct a comparative evaluation taking into account as many parameters as possible. T. Declerck, P. Buitelaar 76 Practical Sessions (Adrian Raschip) • Exercise 1 : Semi-Automatic Terminological extension: Romanian and other languages. On the base of the TMX encoded MUMIS multilingual terminology • Exercise 2 : (Manual) linguistic annotation of English and Romanian Text on Soccer • Exercise 3 : Define a soccer ontology in Protégé • Exercise 4: Search for possible mapping rules between linguistic annotations and relations that might be relevant to be extracted T. Declerck, P. Buitelaar 77
Related docs
Other docs by sammyc2007
top 10 secrets for tree trimming
Views: 19  |  Downloads: 1
The mantel is a favourite place to decorate
Views: 7  |  Downloads: 0
Some tips for doing holiday decorating quickly
Views: 12  |  Downloads: 0
Simple Pine Cone Ornaments
Views: 11  |  Downloads: 0
Polish Christmas decorations
Views: 8  |  Downloads: 0
Last Minute Merry Christmas Decorating Tips
Views: 6  |  Downloads: 0
Hot Tips For Cool Holiday Decor
Views: 11  |  Downloads: 0