Mining Structured vs. Unstructured Data
Where is the structure and where did the semantics go?
SAP Labs LLC.
Why Mining works for structured data..
Rich semantics are usually
expressed in queries and reports
Queries which have apriori knowledge of the
Relational Data Model For relational databases, the data
model represents a combination of the
data representation specification and
its storage as relational data.
Sometimes, views can express
Data alternate representational models that
differ from the underlying tables
For relational data
There is no separation of the semantic data model and the logical storage model
Both are co-incident in a single data model and the data definition has limited semantics
The semantics are captured in the richness of the queries which form well known
associations based on expert knowledge of relationships in the data models
SAP AG 2006, xuPA Mid-Term Strategy/ Speaker Name / 2 internal/confidential
What will it take to mine unstructured data?
Why free (text) search is not the answer..
The data has no structural model for which meaningful semantics can be applied
As a result, queries have limited semantics and are not rich enough to get the
The limiting nature of ad hoc search (vs. the richness of pre-defined queries
based on known structure/semantics) limits the relevance of the output
Converting unstructured data to structured data is also not the answer..
Applying an ETL like technique to convert data to a structured form is limiting
This does not guarantee that all the data of interest can be captured
It provides for only a single (fixed) interpretation of such unstructured data
Can overlaying a semantic model onto the data be the answer?
Extract a semantic (meta) model of interest from the unstructured data
Use the structure/semantics of this model to formulate rich search/query
E.g., techniques used when searching and comparing products
– Relevant attributes from product descriptions are extracted to form a model
– These attributes are used to formulate rich searches/queries and comparisons
SAP AG 2006, xuPA Mid-Term Strategy/ Speaker Name / 3 internal/confidential
Can Mining work for both structured/unstructured data?
Reports Queries and Search that can leverage
the structure of the data model to
specify queries and search that are rich
Queries in semantics
Simple Semantic (Meta) Data Model A simple semantic data representation
model for modeling data (structured and
Meta-data based on ontologies is
Multiple Storage Model extracted from the underlying data.
Multiple Storage Model
Data Storage Model (s)
Multiple storage models including;
Data relational, XML, text, etc.
A separate logical data (meta) model distinct from the underlying storage model
Extracted from the data in a non-intrusive fashion and captured as meta-data
Single data representation model can map to multiple storage models
Structure and semantics of meta-data help structure queries, search, reports
Are embedded tags in the data a possible approach to define ontology structures?
Is it feasible to extract such semantic models and can mining based on this perform?
SAP AG 2006, xuPA Mid-Term Strategy/ Speaker Name / 4 internal/confidential