Embed
Email

NEMO

Document Sample
NEMO
Shared by: HC111123013553
Categories
Tags
Stats
views:
1
posted:
11/22/2011
language:
English
pages:
58
Dejing Dou

Computer and Information Science

University of Oregon, Eugene, Oregon



September, 2010@ Kent State University



1

Where is Eugene, Oregon?

Outline

 Introduction

 Ontology and the Semantic Web

 Biomedical Ontology Development

 Challenges for Data-driven Approaches

 The NEMO Project

 Mining ERP Ontologies (KDD’07)

 Modeling NEMO Ontology Databases (SSDBM’08,

JIIS’10)

 Mapping ERP Metrics (PAKDD’10)

 Ongoing Work







3

What is Ontology?

Formal specification of a vocabulary of

domain concepts and relationships

relating them .









4

A Genealogy Ontology

Individual birth

sex childIn

Gender Event

husband Family

marriage BirthEvent

Male

MarriageEvent

wife DeathEvent

Female divorce DivorceEvent



 Classes: Individual, Male, Female, Family, MarriageEvent…

 Properties: sex, husband, wife, birth……

 Axioms: If there is a MarriageEvent, there will be a Family

related to the husband and wife properties.

 Ontology languages: OWL, KIF, OBO … 5

Current WWW

 The majority of data resources in WWW are in human readable

format only (e.g. HTML).









human

WWW



6

The Semantic Web

 One major goal of the Semantic Web is that web-based agents

can process and “understand” data[Berners-Lee et al 2001].

 Ontologies formally describe the semantics of data and web-

based agents can take web documents (e.g. in RDF, OWL) as a

set of assertions and draw inferences from them.





Web-based

agents









human

SW 7

Biomedical Ontologies

 The Gene Ontology (GO): to standardize the formal

representation of gene and gene product attributes across

all species and gene databases (e.g., zebrafish, mouse, fruit

fly)

 Classes: cellular component, molecular function, biological

process, … Properties: is_a, part_of





 The Unified Medical Language System (UMLS): a

comprehensive thesaurus and ontology of biomedical

concepts.



 The National Center of Biomedical Ontology (NCBO) at

Stanford University

 >200 ontologies (hundreds to thousands concepts each one)

4 millions of mappings. 8

Biomedical Ontology Development

 Typically Knowledge Driven: top down process





 Some basic steps and principles:

 Discussions among domain experts and ontology engineers

 Select basic (root) classes and properties (i.e., terms)

 Go to deeper depth for sub-concepts and relationships.

Modularization may be considered if the ontology is expected

to be large.

 Add constraints (axioms)

 Add unique IDs (e.g., URLs) and textual definitions for terms

 Consistency checking

 Updating and Evolution (e.g., GO is updated every 15 minutes)



9

Challenges:

 Knowledge Sharing does not help Data Sharing

Automatically

 Annotation (like tags) helps Search in text (e.g., papers), but

not good for experimental data (e.g., numerical values)





 Three main challenges for knowledge/data sharing:

 Heterogeneity: different labs use different analysis

methods, spreadsheet attributes , DB schemas.

 Reusability: knowledge mined from different

experimental data may not be consistent and sharable

 Scalability: the size of experimental data grow much

larger than the size of ontologies. Ontology-based

reasoning (e.g., ABox) for large size data is a headache.

10

Case Study: EEG data

 Electroencephalogram (EEG) data









 Observing Brain Functions through EEG

•Brain activity occurs in cortex and

cortex activity generates scalp EEG



•EEG data (dense-array, 256 channels)

has high temporal (1msec) / poor spatial

resolution (2D), MR imaging (fMRI,

PET) has good spatial (3D) / poor

temporal resolution (~1.0 sec)

11

ERP data and Pattern Analysis

 Event-related potentials (ERP) are created by averaging across

segments of EEG data in different trials and time-locking (e.g.,

every 2 seconds) to stimulus events or response.









(A) 128-channel ERPs to visual word and nonword stimuli. (B) Time course for

P100 pattern by PCA. (C) Scalp topography (spatial distribution) of P100 pattern.



 Some existing tools (e.g., Net Station, EEGLAB, APECS, the Dien

PCA Toolbox) can process ERP data and do pattern analysis.

12

NEMO: NeuroElectroMagnetic Ontologies

 Some challenges in ERP study

 Patterns can be difficult to identify and definitions vary across

research labs. Methods for ERP analysis differ across research

sites.

 It is hard to compare and share the results across experiments

and across labs.



 The NEMO (NeuroElectroMagnetic Ontologies) project

is to address those challenges by developing ontologies

to support ERP data and pattern representation, sharing

and meta-analysis. It has been funded by the NIH as an

R01 project since 2009.



13

Architecture









14

Progress in Data Driven Approaches

 Mining ERP Ontologies (KDD’07) -- Reusability





 Modeling NEMO Ontology Databases (SSDBM’08,

JIIS’10) -- Scalability



 Mapping ERP Metrics (PAKDD’10) -- Heterogeneity









15

Ontology Mining

 Ontology mining is a process for learning an ontology,

including classes, class taxonomy, properties and axioms, from

data.



 Existing ontology mining approaches focus on text mining or

web mining (web content, usage, structure, user profiles).

 Clustering and association rule mining have been used for classes and

properties. [Li&Zhong @ TKDE 18(4), Maedche&Staab @ EKAW’00,

Reinberger et al @ ODBASE’03].

 NetAffix Gene ontology mining tool is applied to microarray data [Cheng

et al @ Bioinformatics 20 (9)]



 Our approach includes hierarchical clustering and classification

for mining class taxonomy, properties and axioms of the first-

generation of ERP data-specific ontology from spreadsheets, which

is novel.

16

Knowledge Reuse in KDD

Lack of formal

Semantics ? Pattern Evaluation





Data Mining



Task-relevant Data





Data Warehouse Selection





Data Cleaning



Data Integration





Databases

17

Our Framework (KDD’07)









A semi-automatic framework for mining ontologies 18

Four General Procedures



 Classes 75%). The

remaining factors are assumed to contain “noise”.



22

Data Preprocessing (2)

 Intensity, spatial, temporal and functional metrics

(attributes) for each factor









23

ERP Factors after PCA Decomposition



TI-max IN-mean IN-mean ... SP-min

(µs) (ROI) (µv) (ROCC) (µv) (channel#)

128 4.2823 4.7245 … 24



96 1.2223 1.3955 … 62



164 -6.6589 -4.7608 … 59



220 -3.635 -2.0782 … 58



244 -0.81322 0.29263 … 65





For Experiment 1 data, number of Factors = (474) (594)

For Experiment 2 data, number of Factors = (588) (598)

For Experiment 3 data, number of Factors = 708

24

Mining ERP Classes with Clustering (1)

 We use EM (Expectation-Maximization)

clustering

E.g. for

 Cluster/ Experiment 1 group 2 data

0 1 2 3

Pattern

P100 0 76 0 2



N100 117 1 0 54



lateN1/N 13 14 0 104

2

P300 0 61 110 42

25

Mining ERP Classes with Clustering (2)

 We use OWL to represent ERP Classes









26

Mining ERP Class Taxonomy with Hierarchical

Clustering

 We use EM clustering in both divisive and

agglomerative ways.

 E.g. for Experiment 3 data









27

Mining ERP Class Taxonomy with Hierarchical

Clustering

 We use OWL to represent class taxonomy









28

Mining Properties and Axioms with Clustering-

based Classification (1)

 We use decision tree learning (C4.5) to do classification

with the training data labeled by clustering results.









29

Mining Properties and Axioms with Clustering-

based Classification (2)

 We use OWL to represent datatype properties which are

based on those attributes with high information gain (e.g.,

top 6).









30

Mining Properties and Axioms with Clustering-

based Classification (3)

 We use SWRL to represent axioms. In FOL:









31

Discovering Axioms among Properties with

Association Rule Mining

 We use Apriori algorithm to find association rules among

properties. The split points are determined by

classification rules. In FOL, they looks like:









32

Rule Optimization

 Idea: (A → B)  (A  B → C) => (A → C)









And









33

A Partial View of the Mined ERP Data Ontology

• Our first-generation ERP ontology consists of 16 classes, 57

properties and 23 axioms.









34

Ontology-based Data Modeling (SSDBM’08, JIIS’10)

 In general, ontologies can be treated as one kind of

conceptual model. Considering the size of data (e.g.,

PCA factors) can be large, instead of building a

knowledge base to store those data, we propose to use

relational databases.



 We designed database schemas based on our ERP

ontologies which include temporal, spatial and

functional concepts.







35

Ontology Databases

Class Relation







Datat Datat

ype ype



Axioms keys



Objects constraints



Facts

Now we have bridged these.

triggers

tuples

Ontology Databases

Class Relation







Datat Datat

ype ype



Axioms keys



Objects constraints

views

Facts

triggers

tuples

Loading time in Lehigh

University Benchmark

Load Time (1.5

million facts)

(10 Universities, 20 Departments)

Query time

Query Performance

(logarithmic time)

Ontology-based Data Modeling









 For example, especially for the important subsumption

axioms (e.g., subclassof ) of the current ERP ontologies,

we use SQL Triggers and Foreign-Keys to represent

them.





40

Ontology-based Data Modeling









The ER Diagram for the ERP ontology database shows tables

(boxes) and foreign key constraints (arrows). The concepts

pattern, factor, and channel are most densely connected

41

42

NEMO Data Mapping (PAKDD’10)

 Motivation

 Lack of meta-analysis across experiment

because different labs may use different metrics







 Goal of the study

 Mapping alternative sets of ERP spatial and

temporal metrics

Problem definition





Alternative sets of ERP metrics

Challenges

 Semi-structured data

 Uninformative column

headers (string similarity

matching does not work)

 Numerical values

Grouping and reordering

Grouping and reordering

Sequence post-processing

Cross-spatial Join

Metric Set1 Metric Set2



 Process all point-

sequence curves



 Calculate Euclidean

distance between

sequences in the

Cartesian product set

(Cross-spatial join)





●●●

Cross-spatial Join

Assumptions and Heuristics

 The two datasets contain the same or similar ERP

patterns if they are from the same paradigms (e.g.,

oddball in visual/audio - watching or listening

uncommon or fake words among common words)

Gold standard mapping falls along the diagonal cells









Wrong Mappings.

Precision = 9/13

Experiment

 Design of experiment data



 2 simulated “subject groups” (samples)

 SG1 = sample 1

 SG2 = sample 2







 2 data decompositions

 tPCA = temporal PCA decomposition

 sICA = spatial ICA (Independent Component Analysis)

decomposition



 2 sets of alternative metrics

 m1 = metric set 1

 m2 = metric set 2

Experiment Result









Overall Precision: 84.6%

NEMO Related Ongoing Work



 Application of our framework to other domain

 microRNA, medical informatics, gene databases,





 Mapping discovery and integration across ontologies

related to different modalities (e.g., EEG vs. fMRI).









55

Joint EEG-fMRI Data Mapping









56

Joint work with:



Gwen Frishkoff, Jiawei Rong,

Robert Frank, Paea LePendu,

Haishan Liu, Allen Malony, and

Don Tucker 3,4





57

Thanks for your attention !



Any Question?





58


Related docs
Other docs by HC111123013553
TRACT
Views: 0  |  Downloads: 0
HUDVET_MT
Views: 1  |  Downloads: 0
WEST BENGAL PROFILE
Views: 2  |  Downloads: 0
13_Advent_Wreath_Prayer_Ce
Views: 2  |  Downloads: 0
BASE DRCL
Views: 70  |  Downloads: 0
Sheet1
Views: 1  |  Downloads: 0
fhb0787/1
Views: 0  |  Downloads: 0
No Slide Title
Views: 1  |  Downloads: 0
Designing and Using Classes
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!