Transitioning Relational Databases to Ontologies Farid Cerbah Dassault Aviation farid cerbah

Document Sample
Transitioning Relational Databases to Ontologies Farid Cerbah Dassault Aviation farid cerbah Powered By Docstoc
					Transitioning Relational Databases to
             Ontologies

                                  Farid Cerbah
                                Dassault Aviation

                    farid.cerbah@dassault-aviation.fr




ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife
                                                                                 Outline

 Problem statement
 Previous work
 The RDBToOnto tool and the RTAXON method
 Improving the process through database
  optimisation
 A case study in aircraft maintenance
 Extending RDBToOnto
 Conclusion




   ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   2
                                                             Problem statement

 Relational databases are valuable heterogeneous sources for
  ontology learning
    Better accuracy can be expected than from text corpora


 Ontology learning from relational databases is not a new
  research issue

 Limitations of existing support
   Problem often restricted to finding automated ways to
    import “tables” into ontologies

   Derivation of ontologies with flat structure that look
    like the source databases


          ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   3
                                                      Our contribution

 RDBToOnto Platform
 A comprehensive software support to learn fine-
  tuned ontologies

 A framework that eases the development and the
  experimentation of transitioning methods

 RTAXON Method
 To find out taxonomies hidden in the data




   ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   4
                                           A motivating example


                                                                        Typical mappings
                                                                          covered by
                                                                        several methods




                                                                        Specific to
                                                                         RTAXON




ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife     5
                                                           Previous work (1)
 RDB -> Ontology Transformation
   Database Reverse Engineering
     Many transformation rules from this domain are reused for
      ontology learning
      [Behm et al. 1997], [Ramanathan & Hodges 1997], …

  Approaches mostly based on an analysis of the RDB schema

  Data correlations are considered but
    with the restriction "Data ≡ Key Values"
     Key inclusion may express inheritance

  Exploiting null values semantics [Lammari et al. 2007]
      Partitioning of a table on the basis of null values may
      reveal concept hierarchies
      Involves data from non-key attributes



      ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   6
                                                            Previous work (2)
 Mapping languages and tools
  D2RQ
     RDB to OWL/RDF mapping
     Ontology-based access to relational databases
     Rewriting SPARQL queries into SQL
   Relational.OWL
      A minimal ontology of „tables‟ and „column‟ and a processor to
       populate this ontology with data from relational databases
      Can be used to exchange data between databases
  Triplify
     Plugin for web applications
     Converts the result of SQL queries into RDF
  KAON Reverse
     Software support to interactively map an RDB schema to a
      predefined ontology
  DataMaster
    Protégé Plugin to import table data into ontologies

       ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   7
              RDBToOnto
 A user-oriented tool with a full-fledged
  user interface

 Supports an extensive process from the
  access to the data to ontology generation

 Includes the RTAXON converter



 Though automated to a large extent, local constraints can be
  interactively included to progressively refine the ontologies

 Types of local constraints
        Table and column exclusion
        Naming patterns for classes and instances
        Categorisation patterns


             ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   8
                                  The RTAXON method
  Major improvement over existing methods
        Further refine the classes derived from the schema with
         subclasses found in the content of the relations
        Focus on reliable categorisation patterns

                                                             Categorising attribute
                Access Zones (X 516)
 A/C   Codes               Description            Type
                                                                                      Access Zone
F7X    2103    nose cone                          DOOR
F7X    281FL   windshield retainers               PANEL
F7X    300ZZ   umbrella access panel No.1         PANEL                  Floor       Door       Panel      Fairing

F7X    243DF   servicing compartment floor No.1   FLOOR
F7X    342EZ   rear under pylon fairing           FAIRG



 Two sources involved in the identification of categ. attributes
          Attribute names
                  Revealed by lexical clues
          Redundancy in attribute extensions
                  Entropy-based approach to find good profiles

               Formal definition of RTAXON                                                  Demo
                  ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife             9
                                                    Optimising the source databases
 Another key improvement is the inclusion of a database optimisation step
 Many input databases suffer from data duplication problems
 Optimisation -> eliminate data duplication through the processing of
  inclusion dependencies

                           WorkPackages (X 82)                                                        WorkPackages (X 82)
 WP Number               WP Title           Company Code      Company Name                WP Number               WP Title           Company Code
             Hydraulic Power                                                                          Hydraulic Power
33                                         F0086            Parker                       33                                         F0086
             Landing Gears                                                                            Landing Gears
34                                         F564              Messier-Dowty               34                                         F564
             Landing Gear Emergency Control                                                           Landing Gear Emergency Control
34A          System                         F0214           Dassault-Aviation            34A          System                         F0214

             eels, Brakes and Braking                                                                 eels, Brakes and Braking
35                                         B453             ABS                          35                                         B453




                                   Data Duplication
      WorkPackages[CompanyCode, Company Names] Companies[Cage Code, Name]                     WorkPackages[CompanyCode]  Companies[Name]


                                             Companies (X 106)
                                          Companies (X 105)
                                          Cage_Code
                                                                    Name
Inclusion dependency                        (PKEY)

                                        F0086              Parker

                                        F564               Messier-Dowty

                                        F0214              Dassault-Aviation

                                                                                Foreign Key Relationship

                             ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife                            10
            Effect of inclusion dependency processing
 Inclusion dependencies  more inter-class relations (i.e. object properties).

                               Without ID identification




                                                             With ID identification




        ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   11
                       Identification of inclusion dependencies
 RDBToOnto includes an editor to interactively define inclusion
  dependencies




 Automated identification of inclusion dependencies
    A data mining approach Based on LATINO
       See presentation in this tutorial on ontology learning by Miha Grčar (JSI)
    Dependencies discovered by LATINO are exported in RDBToOnto and can be
     validated in the ID editor

           ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   12
Mining inclusion dependencies with LATINO




    ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   13
                         A case study in aircraft maintenance




                                      KCIT(GATE-based annotator)
RDBToOnto + LATINO




                                           Radiant




       OWLIM




               ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   14
                               The ontology acquisition process

 The legacy data
   LSA database: an heterogeneous relational database
    that gathers all information related to maintenance
    activity
     Required logistic resources
     Aircraft parts (Product tree)
     Scheduling data
   Standards: Documents including widely shared
   conceptual models


 The ontology acquisition process
   A multi-step transitioning process that favours
    modular design



      ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   15
                Model Boostrapping + Ontology Normalisation


                                                      MSG-3          SNS/ATA           FOAF
                                                Reusable Ontologies


                                                                                  imports


      <>…</>
      <>… </>
      ….

      <> …</>                                         ATA
                                                                    Ontology
                        Model
                                                                   Normalisation
                     Bootstrapping




Legacy Data
                                                                                         OWLIM/HKS
                                                                                         Repository

                Ontology Learning Tools




                ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   16
      The defined RDBToOnto conversion project


75 constraints
  Mostly naming patterns and inclusion dependencies
Resulting ontology
  Ontology model
    115 classes, 334 datatypes, 54 object properties
  Population
    49617 class instances, 51449 object property
      instances

No constraints for categorisation
  The ten discovered hierarchies by RTAXON are relevant
  Good behaviour when faced with categorisation conflicts



    ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   17
                      The generated class hierarchy




ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   18
                              Identified object properties




ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   19
                           RDBToOnto extension capabilities

 RDBToOnto is a user-oriented tool but it is also
  a framework
   Written in Java
   OWL as target language (exploiting Jena 2.5 API)



 Two types of components can be added
    Database readers to cover more database
    formats
    Converters to implement new learning methods
    New converters can have their specific global
    options, local constraints and GUI



    ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   20
                                            Structure of RDBToONTO

                              Database




     DBReader                                        RDBToOntoConverter
Database getDatabase()                          OntModel Convert(Database db)
Table ReadData(String name)                     OntClass CreateClass(TableDef)
…                                               …




MSAccessReader         DB2Reader                         RTAXON                      BasicConverter



             can be extended by the users

           ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   21
                                    The neutral database model

                                                           *                      *
       DBSchema                     Database                   Table                   Column


                                    Attribute
             *
       TableDef                       *

                                               friendlyNames                           Values
                                                                      String
                                                                 *                 *
             Key


                                                      Input to any converter
                               *
PrimaryKey         ForeignKey




        ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   22
                                                                               Conclusion
 We presented a significant support for transitioning relational
  databases to ontologies

 RDBToOnto and RTAXON method have been evaluated on
  significant databases

 RTAXON is just a first step as many extensions can be studied
    Learning two-level hierarchies
    Automatically generating local constraints (e.g. naming patterns)

 More resources are available on TAO project web site, including
    User Guide and demos
    Development Guide
    A fully implemented sample showing how to extend the tool



           ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife   23