VIEWS: 1 PAGES: 31 POSTED ON: 4/25/2013
NIF as a Multi-Model Semantic Information System Part 1: Relational, XML, RDF and OWL models Amarnath Gupta University of California San Diego Preamble – 1 As we design and extend the NIF system we recognize that Users will give us data in any form that is convenient for them Standard data may be stored in a flat file Web service output can be in XML Semantic Web enthusiasts may represent data using proper RDF However, regardless of the form in which data may be represented The NIF system must treat them (query, index, relate, ...) in a uniform manner The NIF system must utilize the underlying systems to query/access/index data Preamble – 2 In this presentation we intend to Explain our perspective on these different data models Provide a background on the data models we consider Offer a sense of the “semantic character” of these data models Present our design philosophy on Where to keep them separate Where to transform them into a common model What is a Data Model? A conceptual data model A formal representation of the users’/application’s mental model of data elements and their relationships that should be put in a database, manipulated, queried and operated upon A logical data model A formal description of the data model in a logical structure that a computer can use to perform the queries and other operations. In many cases, the same conceptual model can be represented by different logical models A physical data model An implementable version of the data model in terms of data structures, access structures (e.g., indices) and the set of low-level operations that a system needs to perform on the data ORM Model – Terry Halpin A Conceptual Model Uniqueness Relationship/ Object Constraint Role Value Type Inter-relationship Constraint Value Constraint n-ary Role A Logical Data Model A formal specification of The structure of the data The structure tells us how the data is organized (123, “Purkinje Cell”, Cerebellum) (828, Hippocampus, “Hilar Cell”) is not structured Often the structure of the data, together with some constraints, represent some semantics If the data are not structured (like free text), the techniques for handling them will be different. Operations on this structure Every data model is based on some mathematical principles that define what you can do with the data the nature of data values Data domains and data types operations on data values The Relational Data Model Attribute name Relation name Table: Neurons NeuronID NeuronName BrainRegion NeuroTransmitter Current Tuple 1 Purkinje Cell Cerebellum Glutamate Transient Na+ 2 Hilar Cell Dentate Gyrus GABA Ca2+ Attribute Domain all possible values the attribute can take Attribute value: Candidate key: a set of columns that uniquely determines a row Cannot be complex Relational model is a set (bag) of tuples model Metadata stored in a separate catalog which is also relational First order constraints All queries are about some combination of Selecting rows, columns Combining tables by union, intersection, join Computing data or aggregate functions Grouping and sorting A query can return only values Relation names and attribute names cannot serve as variables in a query Object Relational Model Eases some of the problems of the classical relational model Data values can be of arbitrary data types Sets (e.g., multiple currents for a neuron) Tuples (e.g., references ordered by year) Time-series (e.g., raw EEG data) Spatial Data (e.g., atlases in CCDB) Each data type can have its own operations Find all data points within a neighborhood of a spatial location Queries are still values Catalog queries and data queries cannot be mixed in a single query All industrial-strength DBMSs use some version this model Need to be a skilled DB programmer to develop custom applications on this model XML (Two Perspectives) Document Community data = linear text documents mark up (annotate) text pieces to describe context, structure, semantics of the marked text <physiologicalCondition> Oxidative stress </physiologicalCondition> has been proposed to be involved in the <biologicalProcess context=“disease”> pathogenesis </biologicalProcess> of <disease> Parkinson's disease</disease> (PD). A plausible source of <physiologicalCondition> oxidative stress </physiologicalCondition> in <brainRegion> nigral </brainRegion> <neuron> dopaminergic neurons </neuron> is the redox reactions that specifically involve <chemical> dopamine </chemical> and produce various <chemical context=“biologicalAgent”> toxic </chemical> molecules. XML (Two Perspectives) Database Community XML as a (most prominent) example of the semi-structured data model => captures the whole spectrum from highly structured, regular data to unstructured data (relational, object-oriented, marked up text, ...) <?xml version="1.0" encoding="utf-8"?> From the CARMEN group <NDTF_Annotation> <description>A new annotation file </description> <timeMarker>true</timeMarker> <timeResolution>0.000001</timeResolution> <interval group_id="04"> <eventNote timeOffset="1237888.230” attachedFile="sound1.wmv” application="realplayer">Text message for the event start.</eventNote> <eventNote timeOffset="18958585.232">Text message for the event end.</eventNote> </interval> </NDTF_Annotation > XML as a Logical Data Model XML is a tree-structured document Nodes Element nodes Children can be ordered Recursive elements (parts under parts) Attribute nodes Mandatory or optional Edges Sub-element edges Attribute edges • Trees are more flexible than tables IDRef edges Constraints • Any number of nodes can be added References anywhere without breaking the model Value restrictions, OneOf Cardinality XML as a Logical Data Model • XML has its own schema language • Lets you specify a complex type system • A database is a collection of XML trees Storing XML Mostly relational with some very clever indexing to encode the hierarchy, tree paths, and order Querying XML Elements, attribute names, values and structure can be queried Multiple trees can be joined by value Example (Xpath) http://mousespinal.brain-map.org/imageseries/detail/100002661.xml Find images of the spinal column //image[//structurelabel/text()=“SPINAL COLUMN”]/ish_image_path is a tree query XQuery and full-text XQuery Misusing and Abusing XML Using XML if your data is relational It will result in flat trees that will suffer from complex querying Encoding orders and hierarchies that need special parsing <Brand_Mixtures count=“2”> <Brand_Mixture_1> Apo-Levocarb (carbidopa + levodopa) </Brand_Mixture_1> <Brand_Mixture_2> Apo-Levocarb CR Controlled-Release Tablets (carbidopa + levodopa) </Brand_Mixture_2> </Brand_Mixtures> Using implicit multi-valuedness <atomArray atomID="a1 a2 a3" elementType="O N C" hydrogenCount="1 1 3"> <array dictRef="cml:calcCharge" dataType="xsd:decimal" units="cml:electron">0.2 -0.3 0.1</array> </atomArray> Expressing Semantics in XML Adorning elements with Namespaces A namespace is a unique URI (Uniform Resource Locator) To disambiguate between two elements that happen to share the same name To group elements relating to a common idea together <item xmlns:bp="http://www.biopax.org/release/biopax-level1.owl#"> <bp:protein ID="Protein1"> <bp:NAME>Metalloelastase</bp:NAME> <bp:XREF> <bp:unificationXref rdf:ID="Xref1"> <bp:ID>NP_304845</bp:ID> <bp:DB>RefSeq</bp:DB> </bp:unificationXref> </bp:XREF> </bp:protein> The Problem with XML Semantics Two different XML representations of the same kind of information may not be easily unifiable What did XML not encode? Resource Description Format (RDF) Rdf:statement URI(membrane Rdf:type -protein) URI(protein- mediated toxicity) Rdf:subject Rdf:object Rdf:type Rdf:type Rdf:predicate URI(eSNCA- URI(CNTFR-a) URI(modulates) mediated neurotoxicity) Rdf:property The Basic Constructs of RDF RDF meta-model basic elements All defined in rdf namespace http://www.w3.org/1999/02/22-rdf-syntax-ns# Types (or classes) rdf:resource – everything that can be identified (with a URI) rdf:property – specialization of a resource expressing a binary relation between two resources rdf:statement – a triple with properties rdf:subject, rdf:predicate, rdf:object Properties rdf:type - subject is an instance of that category or class defined by the value rdf:subject, rdf:predicate, rdf:object – relate elements of statement tuple to a resource of type statement. Relational Data vis-à-vis RDF Node to edge ratio is relatively small in many applications Number of relationships need not be fixed at design time The general tendency is keep the number of edge labels small Graph-based operations can be performed on RDF, which requires an unspecified number of joins in relational data RDF Blank Nodes RDF allows one to create anonymous objects whose existence is known but details are not There exists some neuron to which both NeuronX and NeuronY connect <neurons:NeuronX rdf:about="http://neurons.org/Neuron#NeuronX"> <conn:connectsTo> <neurons:Neuron rdf:nodeID=“n1"/> </conn:connectsTo> </neurons:NeuronX> <neurons:NeuronY rdf:about="http://neurons.org/Neuron#NeuronY"> <conn:connectsTo> <neurons:Neuron rdf:nodeID=“n1"/> </conn:connectsTo> </neurons:NeuronY> RDF Schema Declaration of vocabularies classes, properties, and relationships defined by a particular community rdfs:Class, rdfs:subClassOf Property-related rdfs:subPropertyOf relationship of properties to classes rdfs:domain, rdfs:range Provides substructure for inferences based on existing triples NOT prescriptive, but descriptive This is different from XML Schema Schema language is an expression of basic RDF model uses meta-model constructs: resources, statements, properties schema are “legal” RDF graphs and can be expressed in RDF/XML syntax Examples of RDF Inferencing From this we can infer (:alice rdf:type parent) (:betty rdf:type parent) (:eve rdf:type female-person) (:charles rdf:type :person) RDF as a Logical Data Model RDF does not distinguish between different relationships Instance-to-type Instance-to-instance Type-to-instance No transitivity inference is possible over, say, rdf:type RDF (as well as XML) has lost the notion of the abstract data type like spatial object or time Operations on object types does not mix well with RDF Constraints like uniqueness, 1-to-1 relationships, cannot be expressed SPARQL, the query language for RDF is An edge-only language – it cannot express the // construct of XML Blank nodes are treated as variables not output in the results Parts of the language are undecidable! A problem is undecidable if it can be proved that there can be no algorithm to solve it OWL Components of an OWL Ontology Vocabulary (concepts) Structure (attributes of concepts and hierarchy) Concept-to-concept, concept-to-data, property-to-property relationships Logical characteristics of relationships Domain and range restrictions Properties of relations (symmetry, transitivity) Cardinality of relations Open world vs. Closed world assumptions Contrast to most reasoning systems that assume anything absent from knowledge base is not true Need to maintain monotonicity with tolerance for contradictions OWL Classes Class of all classes rdf:subclassOf rdf:Class Basic OWL Constructs Creating OWL Classes disjointWith Neurons are not glial cells sameClassAs (equivalence) Class Gabaergic neuron is exactly the same class as neuronswhich has GABA as neurotransmitter Enumerations (on instances) Class Cerebellar lobules are Lobule I, Lobule II, … Boolean set semantics (on classes) Union (logical disjunction) Class nerve cell is union of neuron, glial cell Intersection (logical conjunction of class with properties) Class hippocampal neurons is conjunction of things of class Neuron and have property (has-soma-located-in) (hippocampus union any class that is (part-of) hippocampus) complimentOf (logical negation) Class ‘benign tumor’ is disjunct of class ‘malignant tumor’ Properties of OWL Properties Transitive Property P(x,y) and P(y,z) P(x,z) subclassOf SymmetricProperty P(x,y) iff P(y,x) is_functionally_related_to Functional Property P(x,y) and P(x,z) y=z soma_located_in inverseOf P1(x,y) iff P2(y,x) regulates is_regulated_by InverseFunctional Property P(y,x) and P(z,x) y=z is_isoform_of Cardinality Only 0 or 1 in OWL-lite and OWL-full Instances in OWL Instances are distinct from Classes In RDF there is no distinction between class and instances <Species, type, Class> <Lion, type, Species> is allowed in RDF <MyLion, type, Lion> OWL DL restrictions Type separation Class can not also be an individual or property Property can not also be an individual or class A Rough Comparison ~ RDF and OWL do not represent n-ary roles cleanly Querying OWL The are several languages in the making SPARQL engines (e.g., Virtuoso) are used often Pellet is used for reasoning tasks Subsumption Consistency New, more advanced languages like nSPARQL are coming up vSPARQL is being developed to enable views on SPARQL, which will lead to nested SPARQL queries Our goal Develop a query processor for these advanced languages Part of OntoQuest, our ontological information management system Where does NIF stand in this? Not every model is directly inter-convertible with every other model NIF is designed to Work with multiple models Ensure that the modeling capability and query capability of every model is preserved in its native form Queries in our system get translated to queries in the native forms of the databases we federate Express the local semantics of any data appropriately by Augmenting the semantic model of the data Connecting the data to NIF’s ontology Extending the NIF ontology in the process Develop a mechanism to create a common integrated model over these models this model is an ontological graph that incorporates object and temporal semantics Example of An Ontological Extension Representing time and events Phenotypes, physiology, … Instants, intervals, and periods Temporal granularity of observation Events Multi-temporal observations based on conditions on properties Considering Modeling states, objects in state, and state transitions TOWL and One-only, repeatable, and time deictic events Temporal ORM Subevents History of objects, events, roles Subtype migration, Temporal roles and role migration Progression of disease, symptom or recovery states Repeatability Questions?
Pages to are hidden for
"NIF as a Multi-Model Semantic Information System - CRBS "Please download to view full document