Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Nlm talk

VIEWS: 3 PAGES: 31

									Enabling the Semantic Web:
The role of metadata, semantics and domain ontologies

Vipul Kashyap
Presentation to National Library of Medicine 8th January, 2002

Outline


What is the Semantic Web ? Metadata, Ontologies and the Semantic Web
– A Three Level Approach for the Semantic Web





The Semantic Web Fabric: A Collection of Metadata and Ontologies
– Components of the Semantic Web Fabric – Metadata-based approach for Heterogeneous Digital Data



OBSERVER: Incremental Query Expansion across Multiple Ontologies
– Ontology Integration and Query Rewriting – Intensional Loss of Information – Extensional Loss of Information



Conclusions and Future Work

What is the Semantic Web ?


Semantics:
– “meaning or relationship of meanings, or relating to meaning …” (Webster),
– meaning and use of data (Information System perspective)



Semantic Web:
– An extension of the current web, in which information is given well-defined meaning, better enabling computers and people to work in cooperation [Berners-Lee, Hendler, Lassila, 2001]



“Emergent” Semantic Web:
– a semantic platform for people and applications to collaborate in creating, validating, and using dynamic knowledge where semantics “emerges” from the interactions

Metadata, Ontologies and the Semantic Web
Get the titles, authors, documents, maps published by the United States Geological Service (USGS) about regions having a population greater than 5000, area greater than 1000 acres having a low density urban area land cover
domain specific metadata: terms chosen from domain specific ontologies

What is Metadata ?

What are Ontologies ?

- data/information about data
- useful/derived properties of media - properties/relationships between objects - may or may not capture information content of underlying data

- collection of terms, definitions and
interrelationships - specification of a representational vocabulary for a shared domain of discourse - Semantically rich metadata capturing the information content of underlying data repositories - DL descriptions organized as a lattice

A Metadata Classification: The Information Pyramid
User Ontologies
Classifications Domain Models

Domain Specific Metadata
area, population (Census), land-cover, relief (GIS),metadata concept descriptions from ontologies

Content Descriptive Metadata

Domain Independent (structural) Metadata
(C++ class-subclass relationships, HTML Document Type Definitions, C program structure)

Direct Content Based Metadata
(inverted lists, document vectors, WAIS, Glimpse, LSI)

Content Dependent Metadata (size, max colors, rows, columns) Content Independent Metadata (creation-date, location, type-of-sensor) Data (Heterogeneous Types/Media)

A Three-Level Approach for the Semantic Web
Ontological-terms
(Domain, Application specific)

Vocabulary
used-by Content abstracted-into

used-by Metadata
(content descriptions, intensional)

abstracted-into
Data
(heterogeneous types, media)

Representation

Problem Components

Solution Components

The Semantic Web Fabric: A Collection of Metadata Descriptions and Ontologies
User Query/ Information Request User Query/ Information Request User Query/ Information Request

Inter-Ontology Relationships Manager Ontology Server Ontology Server

Metadata Repository

Metadata Server

Metadata Server
Metadata Repository

Distributed Computing Infrastructure (J2EE, .NET, CORBA, Agents)

...
DATA REPOSITORIES

...
DATA REPOSITORIES

Metadata-based Approach for handling Heterogeneous Digital Data


Annotation/Association/Extraction of Knowledge with/from Underlying Data
– Structured Databases


Mapping concepts in domain ontologies to schema metadata elements

– Text Databases


Mapping of concepts in domain ontologies to textual metadata



Information Retrieval and Analysis – Structured Databases


Distributed Query Processing across Multiple Information Sources Mapping SQL/Description Logic based queries into text retrieval expressions

– Text Databases




Re-use of Existing Semantic Knowledge
– Interoperation Across Multiple Ontologies – Loss of Information

Metadata-based Approach: Describing database objects using DL expressions
ONTOLOGICAL TERMS: AgencyConcept

DocumentConcept “All documents stored in the database have been published by some agency”
Database Documents  (AND DocumentConcept (hasOrganization AgencyConcept)) DATABASE OBJECTS: AGENCY(RegNo, Name, Affiliation) DOC(Id, Title, Agency)

hasOrganization

Advantages:



Use of ontologies for an intensional domain specific description of data Representation of extra information
– Relationships between objects not represented in the database schema

–

Using terminological relationships in the ontology

Metadata-based Approach: Mapping ontological elements to textual metadata
profession

Domain Specific !!
person active_in party

Column1

profession

person.name

party.name

<ACCRUE>(<SENTENCE>([person.name], <PHRASE>(<Input>)), <SENTENCE>([person.name], <STEM>(appointed), <PHRASE>(<Input>)), <SENTENCE>([person.name], <STEM>(become), <PHRASE>(<Input>)))

<ACCRUE>(<SENTENCE>([person.name], <STEM>(leader), [party.name]), <SENTENCE>([person.name], <STEM>(representing), [party.name]))

Parameterization !!

Metadata-based Approach: Mapping DL queries to Topic Expressions
[has_document] from (AND person (FILLS name “Alexandr Shokhin”) (FILLS profession „Prime Minister‟))

<ACCRUE>( <TOPIC>(person), <PHRASE>(<WORD>(Aleksandr), <WORD>(Shokhin)),

<ACCRUE>( <SENTENCE>(<PHRASE>(<WORD>(Aleksandr), <WORD>(Shokhin)), <STEM>(appointed), <PHRASE>(<WORD>(Prime), <WORD>(Minister))), <SENTENCE>(<PHRASE>(<WORD>(Aleksandr), <WORD>(Shokhin)), <STEM>(becomes), <PHRASE>(<WORD>(Prime), <WORD>(Minister)))))

Metadata-based Approach: Using DL expressions to reason about information
Database Documents (AND (DocumentConcept (ALL hasOrganization AgencyConcept))

Query
[hasDocument] for (FILLS hasOrganization “USGS”))

[hasDocument] for (AND DocumentConcept (ALL hasOrganization {“USGS”}))

- Reasoning with DL Expressions
- Ontological Inferences: - DocumentConcept - (hasOrganization, { “USGS” })

Challenge 1: Use of Multiple Ontologies

Challenge 2: Estimating the Loss of Information

OBSERVER:
Ontology-based System Enhanced with (terminological) Relationships for Vocabulary hEterogeneity Resolution
IRM

...
Mappings/ Ontology Server

Ontologies
Query Processor

User Query

Inter-ontology Relationships

Query Processor Mappings/ Ontology Server

Query Processor Mappings/ Ontology Server

Repositories

...

Ontologies

...

Repositories

...

Ontologies

Controlled and Incremental Query Expansion to a new Ontology
Select Domain Ontology Construct Query Expression

Query Construction
Choose Translation with minimum Loss

Local Ontology

Generate Query Plan Map concepts to data
Access data

Estimate Loss of Information Compute set of translations of query Select Next Ontology

MORE ? No END

Yes

Bibliography Data Ontology: The Red Ontology
Biblio-Thing

Document

Conference Person

Agent Author Publisher

Organization

Book Proceedings Edited-Book Thesis

Technical-Report Miscellaneous-Publication University

Periodical-Publication Doctoral-Thesis Journal
Magazine Newspaper

Technical-Manual Cartographic-Map Computer-Program
Artwork Multimedia-Document

Master-Thesis

http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/

A subset of WordNet 1.5: The Blue Ontology
Print-Media Press Newspaper Magazine Publication Journalism Periodical Book Pictorial Trade-Book Brochure TextBook Reference-Book CookBook SongBook PrayerBook Encyclopedia WordBook Journals Series

Instruction-Book

HandBook

Directory
GuideBook

Annual

Manual

Bible

Instructions

Reference-Manual

http://www.cogsci.princeton.edu/~wn/w3wn.html

Inter-ontological relationships


Synonyms
– leads to semantics preserving translations



Hyponyms/Hypernyms
– lead to semantics altering translations – typically results in loss of recall and precision



List of Hyponyms
– – – – – – – – technical-manual book proceedings thesis misc-publication technical-reports press periodical hyponym hyponym hyponym hyponym hyponym hyponym hyponym hyponym manual book book book book book periodical-publication periodical-publication

Ontology Integration and Query Rewriting
Document (ATLEAST 1 place)

{ union(Journal, union(Book, Proceedings, ..., Misc-Publication)), Publicationunion(Periodical-Publication, union(Book, ....., Misc-Publication)), Periodical-Publication Document } (ATLEAST 1 ISBN) Periodical {Journal, {union(Book, Proceedings, ..., Misc-Publication)} Periodical-Publication} Book Technical-Report Journal Series Pictorial Trade-Book Brochure Book Thesis Misc-Publication Reference-Book {Technical-Manual} CookBook SongBook

TextBook

Proceedings

PrayerBook

Instruction-Book WordBook

HandBook

Directory

Annual

Encyclopedia

Manual

Bible

GuideBook

Instructions

Technical-Manual

Reference-Manual

Intensional Loss of Information


Original Query:
– [NAME PAGES] for (AND BOOK (FILLS CREATOR “Carl Sagan”))



Modified Query:
– [NAME PAGES] for (AND document (FILLS doc-author-name “Carl Sagan”))



Terminological Relationships:
– – BOOK  (AND PUBLICATION (ATLEAST 1 ISBN)) PUBLICATION  (AND document (ATLEAST 1 PLACE-OF-PUBLICATION))



Terminological Difference:
– (AND (ATLEAST 1 ISBN) (ATLEAST 1 PLACE-OF-PUBLICATION))



Loss of Information:
– Instead of books authored by Carl Sagan, OBSERVER returns those documents by Carl Sagan that may not have an ISBN or may not have been published

Intensional Loss of Information: Disadvantages and Advantages


May not make sense as it mixes two vocabularies,
– e.g., does Book - Book make any sense ?



The problem becomes worse if the two ontologies are in different languages,
– e.g., English and Italian

 

Makes it hard for the system to differentiate between the various alternatives

On the other hand:
– An information loss interval doesn‟t make much sense to the user.

Estimating Loss of Information based on Term Extensions
Loss in Recall
Loss in Precision

Ext(Term)

Ext(Translation)

Precision = | Ext(Term)  Ext(Translation)| |Ext(Translation)|

Recall = | Ext(Term)  Ext(Translation)| |Ext(Term)|

Percentage Loss = | Ext(Term)  Ext(Translation)| |Ext(Term)| + |Ext(Translation)|
=11 1/2(1/Precision) + 1/2(1/Recall)

=> 1 -

1 (alpha)(1/Precision) + (1-alpha)(1/Recall)

0 < alpha < 1

Estimating Term Extension Intervals


Intersections
– |Ext(Expr1)  Ext(Expr2)|.low = 0 – |Ext(Expr1)  Ext(Expr2)|.high = min (|Ext(Expr1)|.high, |Ext(Expr2)|.high)



Unions
– |Ext(Expr1)  Ext(Expr2)|.low = max (|Ext(Expr1)|.low, |Ext(Expr2)|.low) – |Ext(Expr1)  Ext(Expr2)|.high = |Ext(Expr1)|.high + |Ext(Expr2)|.high



Term
– |Ext(Term)|.high = |Ext(Term)|.low = |Ext(Term)|

Estimating Intervals of Information Loss


Intervals of Precision and Recall
– Precision.high, Precision.low – Recall.high, Recall.low



Leads to Intervals of Information Loss
Loss.low = 1 1 1/2(1/Precision.high) + 1/2(1/Recall.high)

Loss.high = 1 -

1 1/2(1/Precision.low) + 1/2(1/Recall.low)

Comparison of two translations


Consider two translations:
– Trans1 with bounds low 1 and high1 – Trans2 with bounds low 2 and high2



Choosing the appropriate translation.
– Compute mLossi = (low i + highi)/2
  

if mLoss1 < mLoss2, choose Trans1 if mLoss2 < mLoss1, choose Trans2 if mLoss1 = mLoss2, choose translation with lesser interval (high i - lowi)



Need for probabilistic models
– Let (low 1, high1) = (10%, 80%) and (low 2, high2) = (20%, 60%) – mLoss2 (40%) < mLoss1 (45%) => Trans2 is chosen – However there are cases for which Trans1 returns a lower (10% - 20%) loss !

Semantic Adaptation of Precision and Recall


Term subsumes Translation
– Ext(Translation)  Ext(Term)  Ext(Term)  Ext(Translation) = Ext(Translation) – Precision = 1, – Recall = |Ext(Translation)| |Ext(Term)|



However: Term and Translation belong to different ontologies
– Ext(Term) = Ext(Term)  Ext(Translation) – Recall.low = |Ext(Translation)|.low |Ext(Translation)|.low + |Ext(Term)| – Recall.high = |Ext(Translation)|.high max(|Ext(Translation)|.high, |Ext(Term)|



Need to evolve a common framework for relating subsumption and information loss

Semantic Adaptation of Precision and Recall


Translation subsumes Term
– Analogous (Dual ?) of the previous case – Recall = 1 – Precision = |Ext(Term)| |Ext(Translation)|



Cases of no Information Loss
– Translation of a term by the intersection of its immediate parents which is also its definition – Translation of a term by the union of its immediate children if there exists a “covering” relationship between the two



Need for “extensional” inter-ontological relationships
– e.g., 20% of publications are 50% of books – characterizing degree of overlap

Computation of Precision and Recall in the absence of Semantic Relationships


Precision
– Precision.low = 0 – Precision.high = max[ min(|Ext(Term), |Ext(Translation)|.high), |Ext(Translation)|.high min(|Ext(Term), |Ext(Translation)|.low), ] |Ext(Translation)|.low



Recall
– Recall.low = 0 – Recall.high =
min(|Ext(Term), |Ext(Translation)|.high) |Ext(Term)|

Choosing an optimal translation: Local v/s Global Decision Making
Publication
Document Document Publication LOSS(Document, Book) Document

Journal

Book
Journal

Book
Journal LOSS(Publication, Journal) LOSS(Document, Publication) LOSS(Journal, Book)



Local Decision Making
– – – LOSS(Publication, Journal) > LOSS(Document, Publication) Document is chosen as the translation But LOSS(Book, Document) > LOSS(Book, Journal) !!



Global Decision Making
– – Both translations {Document, Journal} are passed on to the next level Journal is chosen as the appropriate translation

Conclusions


Analysis of the Semantic Web Technology Space
– Proposed a layered approach for analysis – Identified components of the Semantic Web Fabric



Re-use of pre-existing real world ontologies (“off the shelf”)
– – – – Mapping the ontologies to structured and text databases Mechanisms for translation of queries across different ontologies Approach for adaptation of information loss based on semantic relationships Loss of information measures to determine the semantic appropriateness of a particular ontology and translation



The future Semantic Web will be based on browsing domain specific ontologies and vocabularies
– Need to provide critical underlying infrastructure based on the above

Future Work


Extensions to current work
– – – – – Information Extraction from Textual Data Evolve a common framework to relate subsumption with loss of information Explore relationships with standards such as SQL, XML/RDF based QLs, DAML+OIL Complex probabilistic modeling for ranking translations Experimentation and Validation of measures for Loss of Information



Bootstrapping, Creation, Validation of Semantic Knowledge:
– Ongoing work in collaboration with Stanford University and University of Georgia (NSF ITR Proposal)
   

Use of statistical clustering to determine central terms Use of consensus analysis across SMEs to enrich terminology and create ontology Use of scalable knowledge composition to re-use existing knowledge and support ontology interoperation Use of IScapes to specify and validate hypotheses and feedback from the process to generate new semantic knowledge

– Interaction of above processes – Ontology Maintenance and Versioning


								
To top