Transitioning Relational Databases to Ontologies
Farid Cerbah Dassault Aviation farid.cerbah@dassault-aviation.fr
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
Outline
Problem statement Previous work The RDBToOnto tool and the RTAXON method Improving the process through database optimisation A case study in aircraft maintenance Extending RDBToOnto Conclusion
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
2
Problem statement
Relational databases are valuable heterogeneous sources for ontology learning
Better accuracy can be expected than from text corpora
Ontology learning from relational databases is not a new research issue
Limitations of existing support
Problem often restricted to finding automated ways to import “tables” into ontologies Derivation of ontologies with flat structure that look like the source databases
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife 3
Our contribution
RDBToOnto Platform
A comprehensive software support to learn finetuned ontologies A framework that eases the development and the experimentation of transitioning methods
RTAXON Method
To find out taxonomies hidden in the data
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
4
A motivating example
Typical mappings covered by several methods
Specific to RTAXON
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
5
Previous work (1)
RDB -> Ontology Transformation
Database Reverse Engineering Many transformation rules from this domain are reused for ontology learning [Behm et al. 1997], [Ramanathan & Hodges 1997], … Approaches mostly based on an analysis of the RDB schema Data correlations are considered but with the restriction "Data ≡ Key Values" Key inclusion may express inheritance
Exploiting null values semantics [Lammari et al. 2007] Partitioning of a table on the basis of null values may reveal concept hierarchies Involves data from non-key attributes
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
6
Previous work (2)
Mapping languages and tools
D2RQ RDB to OWL/RDF mapping Ontology-based access to relational databases Rewriting SPARQL queries into SQL Relational.OWL A minimal ontology of „tables‟ and „column‟ and a processor to populate this ontology with data from relational databases Can be used to exchange data between databases Triplify Plugin for web applications Converts the result of SQL queries into RDF KAON Reverse Software support to interactively map an RDB schema to a predefined ontology DataMaster Protégé Plugin to import table data into ontologies
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife 7
RDBToOnto
A user-oriented tool with a full-fledged user interface
Supports an extensive process from the access to the data to ontology generation
Includes the RTAXON converter
Though automated to a large extent, local constraints can be interactively included to progressively refine the ontologies Types of local constraints
Table and column exclusion Naming patterns for classes and instances Categorisation patterns
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
8
The RTAXON method
Major improvement over existing methods
Further refine the classes derived from the schema with subclasses found in the content of the relations Focus on reliable categorisation patterns
Categorising attribute
Type DOOR PANEL PANEL FLOOR FAIRG Floor Door Panel Fairing Access Zone
Access Zones (X 516) A/C F7X F7X F7X F7X F7X Codes 2103 281FL 300ZZ 243DF 342EZ nose cone windshield retainers umbrella access panel No.1 servicing compartment floor No.1 rear under pylon fairing Description
Two sources involved in the identification of categ. attributes
Attribute names
Redundancy in attribute extensions
Revealed by lexical clues
Entropy-based approach to find good profiles
Formal definition of RTAXON
Demo
9
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
Optimising the source databases
Another key improvement is the inclusion of a database optimisation step Many input databases suffer from data duplication problems Optimisation -> eliminate data duplication through the processing of inclusion dependencies
WorkPackages (X 82)
WP Number 33 34 34A 35 WP Title Hydraulic Power Landing Gears Company Code F0086 F564 Company Name Parker Messier-Dowty Dassault-Aviation ABS WP Number 33 34 34A 35
WorkPackages (X 82)
WP Title Hydraulic Power Company Code F0086 F564
Landing Gears
Landing Gear Emergency Control System F0214 eels, Brakes and Braking B453
Landing Gear Emergency Control System F0214 eels, Brakes and Braking B453
Data Duplication
WorkPackag es[Company Code, Company Names] Companies[ Cage Code, Name] WorkPackag es[Company Code] Companies[ Name]
Companies (X 106) Companies (X 105)
Inclusion dependency
Cage_Code (PKEY) F0086 F564 F0214 Parker
Name
Messier-Dowty Dassault-Aviation
Foreign Key Relationship
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife 10
Effect of inclusion dependency processing
Inclusion dependencies more inter-class relations (i.e. object properties).
Without ID identification
With ID identification
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
11
Identification of inclusion dependencies
RDBToOnto includes an editor to interactively define inclusion dependencies
Automated identification of inclusion dependencies
A data mining approach Based on LATINO
See presentation in this tutorial on ontology learning by Miha Grčar (JSI)
Dependencies discovered by LATINO are exported in RDBToOnto and can be validated in the ID editor
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife 12
Mining inclusion dependencies with LATINO
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
13
A case study in aircraft maintenance
KCIT(GATE-based annotator) RDBToOnto + LATINO
Radiant
OWLIM
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
14
The ontology acquisition process
The legacy data
LSA database: an heterogeneous relational database that gathers all information related to maintenance activity
Required logistic resources Aircraft parts (Product tree) Scheduling data
Standards: Documents including widely shared conceptual models
The ontology acquisition process
A multi-step transitioning process that favours modular design
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
15
Model Boostrapping + Ontology Normalisation
MSG-3
SNS/ATA
FOAF
Reusable Ontologies
<>…> <>… > …. <> …>
imports
ATA
Model Bootstrapping
Ontology Normalisation
Legacy Data
OWLIM/HKS Repository
Ontology Learning Tools
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
16
The defined RDBToOnto conversion project
75 constraints
Mostly naming patterns and inclusion dependencies
Resulting ontology
Ontology model 115 classes, 334 datatypes, 54 object properties Population 49617 class instances, 51449 object property instances
No constraints for categorisation
The ten discovered hierarchies by RTAXON are relevant Good behaviour when faced with categorisation conflicts
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
17
The generated class hierarchy
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
18
Identified object properties
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
19
RDBToOnto extension capabilities
RDBToOnto is a user-oriented tool but it is also a framework
Written in Java OWL as target language (exploiting Jena 2.5 API)
Two types of components can be added
Database readers to cover more database formats Converters to implement new learning methods New converters can have their specific global options, local constraints and GUI
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
20
Structure of RDBToONTO
Database
DBReader
Database getDatabase() Table ReadData(String name) …
RDBToOntoConverter
OntModel Convert(Database db) OntClass CreateClass(TableDef) …
MSAccessReader
DB2Reader
RTAXON
BasicConverter
can be extended by the users
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife 21
The neutral database model
DBSchema Database
*
Table
*
Column
TableDef
*
Attribute
*
friendlyNames
Key
*
String
Values
*
PrimaryKey
ForeignKey
*
Input to any converter
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
22
Conclusion
We presented a significant support for transitioning relational databases to ontologies
RDBToOnto and RTAXON method have been evaluated on significant databases
RTAXON is just a first step as many extensions can be studied Learning two-level hierarchies Automatically generating local constraints (e.g. naming patterns) More resources are available on TAO project web site, including User Guide and demos Development Guide A fully implemented sample showing how to extend the tool
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
23