Semantic Basics - Semantic Grid by Levone

VIEWS: 35 PAGES: 98

									Semantic Basics: Markup, Querying, and Reasoning
Marlon Pierce Community Grids Lab Indiana University With Slides and Help from Sean Bechhofer, Carole Goble, Line Pouchard, and Dave De Roure

Preface: Beyond XML

Reductio ad Absurdum


―Physics is the study of the harmonic oscillator.‖
• H. L. Richards



―Statistical Mechanics is the study of the Ising Model‖
• H. L. Richards



―Web Service standards are the study of <xsd:any> sequences‖
• M. E. Pierce, soon to be anonymous

Which Web Service Specs?
<xs:element name="Header" type="tns:Header" /> <xs:complexType name="Header"> <xs:sequence> <xs:any namespace="##any" processContents="lax" minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> <xs:anyAttribute namespace="##other" processContents="lax" /> </xs:complexType> <xsd:complexType name="SecurityHeaderType" > <xsd:sequence> <xsd:any processContents="lax" minOccurs="0" maxOccurs="unbounded"> </xsd:any> </xsd:sequence> <xsd:anyAttribute namespace="##other" processContents="lax" /> </xsd:complexType>

Which, What, and Why?


Which is what?



Why?


• Left is the definition of the SOAP header. • Right is taken from Web Service Secure Messaging Specification. • You will find this pattern repeated pretty often in web service specifications.

• We have limited ways of linking several XML schema data models.
• XML maps relationships to trees.


Imagine schemas for science applications and computing resources. Link application and computer schemas with <xsd:any>. In my application+computer schema, does application contain computer as child node, or vice versa?



• Graphs are a more natural way of expressing many inter-relationships of concepts.

XML is not enough








XML defines “http://www.w3.org/Home/Lassila” grammars to verify is Ora Lassila and structure documents Creator Ora Lassila http://www.w3.org/Home/Lassila The grammar enforces constraints on tags <Creator> Different grammars <uri> http://www.w3.org/Home/Lassila </uri> <name>Ora Lassila</name> define the same </Creator> content XML lacks a <Document uri=“http://www.w3.org/Home/Lassila” semantic model – it <Creator>Ora Lassila</Creator> </Document> only has a surface model which is a tree. <Document uri=“http://www.w3.org/Home/Lassila” Creator=“Ora Lassila”/>

“The Creator of the Resource

XML is not enough
Meaning of XML documents is intuitively clear • ―semantic‖ markup tags are domain terms  But computers do not have intuition • Tag names per se do not provide semantics • The semantics are encoded outside the XML specification  XML makes no commitment on:  Domain specific ontological vocabulary  Ontological modeling primitives  requires pre-arranged agreement on  &  Feasible for closed collaboration • agents in a small & stable community • pages on a small & stable intranet




Semantic Web Markups often are expressed in XML but they carry extra meaning.

Enter the Semantic Web/Grid
―The Semantic Web is the representation of data on the World Wide Web. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming.‖

The Semantic Stack
XML XML Schema
Defines the syntax for structured documents. Defines rules for XML dialects (SVG, GML, etc.) and also built-in data types.

RDF
RDF Schema OWL

A data model definition language with XML bindings
A way to define RDF-based languages (DAML-OIL, OWL). An extension of RDF/RDFS with extensive property/relationship definitions for expressing logical relationships.

Semantic Markups


All semantic markup languages should be understood as assertion languages.
• We will assert that certain relationships between resources exist. • We will express this using RDF, RDFS, and OWL using XML



We must still provide tools for processing (and verifying) the assertions.

Resource Description Framework
Overview of RDF basic ideas and XML encoding.

Resource Description Framework (RDF)
 

RDF is the simplest of the semantic languages. Basic Idea #1: Triples
• RDF is based on a subject-verb-object statement structure. • RDF subjects are called resources (classes) • Verbs (predicates) are called properties. • Objects (values) may be simple literals or other resources.



Basic Idea #2: Everything is a resource that is named with a URI
• • • •

RDF nouns, verbs, and objects are all labeled with URIs Recall that a URI is just a name for a resource. It may be a URL, but not necessarily. A URI can name anything that can be described


Web pages, creators of web pages, organizations that the creator works for,….

RDF Graph Model
    

RDF is defined by a graph model. Resources are denoted by ovals (nodes). Lines (arcs) indicate properties. Squares indicate string literals (no URI). Resources and properties are labeled by a URI.

http://.../CMCS/Entries/X

http://purl.org/dc/elements/1.1/creator

http://purl.org/dc/elements/1.1/title

http://.../CMCS/People/DrY

H2O

Encoding RDF in XML


The graph represents two statements.
• Entry X has a creator, Dr. Y. • Entry X has a title, H2O.



In RDF XML, we have the following tags
• <RDF> </RDF> denote the beginning and end of the RDF description. • <Description>‘s ―about‖ attribute identifies the subject of the sentence. • <Description></Description> enclose the properties and their values. • We import Dublin Core conventional properties (creator, title) from outside RDF proper.

RDF XML: The Gory Details
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/2 2-rdf-syntax-ns#' xmlns:dc='http://purl.org/dc/elements/1.0 /'> <rdf:Description rdf:about='http://.../X‘> <dc:creator rdf:resource='http://…/people/MEP‘/> <dc:title rdf:resource='H2O'/> </rdf:Description> </rdf:RDF>

Encoding RDF as Triplets




<http://.../CMCS/Entries/X>http://purl.org/dc/ele

In addition to graphs and XML, RDF may be written as triple ―sentences‖. A triple is just the subject, predicate, and object (in that order) of a graph segment.
ments/1.1/creator<http://.../CMCS/People/DrY>

• This structure may look trivial but is useful in expressing queries (more later).

Creating RDF Documents


Writing RDF XML (or DAML or OWL) by hand is not easy.
• It‘s a good way to learn to read/write, but after you understand it, automate it.



Authoring tools are available
• OntoMat: buggy • Protégé: preferred by CGL grad students • IsaViz: another nice tool with very good graphics.



You can also generate these programmatically using Hewlett Packard Labs‘ Jena toolkit for Java.
• This is what I did in previous example.

What is the Advantage?


So far, properties are just conventional URI names.



But there is a powerful feature we are about to explore…
• Properties provide a powerful way of linking different RDF resources


• All semantic web properties are conventional assertions about relationships between resources. • RDFS and OWL will offer more precise property capabilities.



For example, a publication is a resource that can be described by RDF

―Nuggets‖ of information.



Publication also have authors

• Author, publication date, URL are all metadata property values. • But publications have references that are just other publications • DC‘s ―hasReference‖ can be used to point from one publication to another. • An author is more than a name • Also an RDF resource with collections of properties


Name, email, telephone number,

Graph Model Depicting vCard and DC Linking
dry@stateu.edu http://.../CMCS/Entry/1

dc:creator

vcard:EMAIL

dc:title

http://.../People/DrY vcard:N

H20

vcard:Given

vcard:Family

What Else Does RDF Do?


Collections: typically used as the object of an RDF statement



And that‘s about it. RDF does not define properties, it just tells you where to put them.

• Bag: unordered collection of resources or literals. • Sequence: ordered collection or resources or literals. • Alternative: collection of resources or literals, from which only one value may be chosen



But the graph model has opened some doors
• Linked querying across data models. • Reasoning about information

• Definitions are done by specific groups for specific fields (Dublin Core Metadata Initiative, for example). • RDF Schema provides the rules for defining specific resources classes and properties.

RDF Schema



RDF Schema is a rules system for building RDF languages.

RDF Schema



Take our Dublin Core RDF encoding as an example:


• RDF and RDFS are defined in terms of RDFS • DAML+OIL and OWL are defined by RDFS.

• Can we formalize this process, defining a consistent set of rules? • Can we place restrictions and use inheritance to define resources?


Previous example was valid RDF but how do I formalize the process of writing sentences about creators of entries? What really is the value of ―creator‖? Can I derive it from another class, like ―person‖?

• Can we provide restrictions and rules for properties?


• Current DC encoding in fact is defined by RDFS.

How can I express the fact that ―title‖ should only appear once?

Some RDFS Classes (Subjects and Values)
RDFS: Resource RDFS: Class
The RDFS root element. All other tags derive from Resource The Class class. Literals and Datatypes are example classes. Classes consist of entities that share properties. The class for holding Strings and integers. Literals are dead ends in RDF graphs. A type of data, a member of the Literal class. A datatype for holding XML data. This is the base class for all properties (that is, verbs).

RDFS: Literal RDFS: Datatype RDFS: XMLLiteral RDFS:Property

Some RDFS Properties
subClassOf
Indicates the subject is a subclass of the object in a statement.

subPropertyOf
Domain Range type

The subject is a subProperty of the property (masquerading as an object).
Restricts a property to only apply to certain classes of subjects Restricts the values of a property to be members of an indicated class or one of its subclasses. Denotes an instance of a particular class. Actually from RDF, not RDFS.

Sample RDFS: Defining <Property>
<rdfs:Class rdf:ID=“Property"> <rdfs:isDefinedBy rdf:resource="http://.../some/uri"/> <rdfs:label>Property</rdfs:label> <rdfs:comment>The class of RDF properties.</rdfs:comment> <rdfs:subClassOf rdf:resource="http://.../#Resource”> </rdfs:Class>


 

This is the definition of <property>, taken from the RDF schema. The ―about‖ attribute labels names this nugget. <property> has several properties
• <label>,<comment> are self explanatory. • <subClassOf> means <property> is a subclass of <resource> • <isDefinedBy> points to the human-readable documentation.

Property Relationships and Simple Reasoning


subClassOf:



subPropertityOf:

• Carole is a member of the class <Professor> • <Professor> is a subclass of <UniversityEmployee> • So Carole works for a university.



Domain and Range:

• Marlon hasSibling Susan • hasSibling is a subclass of hasRelative • So Marlon and Susan are related. • hasSibling applies to animal subjects and animal objects, so Marlon is a member of the class <Animal>.

Web Ontology Language (OWL)
Eeyore: W-O-L. That spells owl. Owl: Bless my soul! So it does!

(Many Slides Courtesy of Sean Bechhofer)

What’s an Ontology?


English definitions tend to be vague to non-specialists
• ―A formal, explicit specification of a shared conceptionalization‖



Clearer definition: an ontology is a taxonomy combined with inference rules
• T. Berners-Lee, J. Hendler, O. Lassila



But really, if you sit down to describe a subject in terms of its classes and their relationships, you are creating an Ontology.

RDFS Limitations


RDFS too weak to describe resources in sufficient detail
• No localised range and domain constraints


Can‘t say that the range of hasChild is person when applied to persons and elephant when applied to elephants Can‘t say that all instances of person have a mother that is also a person, or that persons have exactly 2 parents Can‘t say that isPartOf is a transitive property, that hasPart is the inverse of isPartOf or that touches is symmetrical

• No existence/cardinality constraints


• No transitive, inverse or symmetrical properties




Difficult to provide reasoning support
• No ―native‖ reasoners for non-standard semantics • May be possible to reason via FO axiomatisation

OWL Semantic Layering


Three language ―layers‖:
• OWL Lite


• OWL DL (Description Logic)


A subset of OWL useful for expressing classifications and simple relationships Contains all OWL constructions but with limitations that guarantee computational completeness and decidability.

Full DL Lite

• OWL Full


 

Syntactic Layering Semantic Layering

All OWL constructs with no restrictions but no guaranteed processibility.

• Layers should agree on semantics. • All legal Lite ontologies are legal DL ontologies. • All legal DL ontologies are legal Full ontologies

OWL Lite Synopsis


Built on RDFS, with usual RDFS classes (see previous table in these slides).

• Includes a special class, <Thing>, that is the superclass of all OWL classes. • Built in class <Nothing> that is the most specific class (has no instances or subclasses). • Built-in class <Individual> for instances of classes.







Expresses concepts such as equivalent classes, synonymous properties. Allows you to assert that properties can be inverse, transitive, and symmetric.

In OWL, properties may apply to either individuals or to all members of a class. So <worksForIU> applies to Marlon but not Dave.



Class Axioms:

Some OWL DL and OWL Full Extensions

• oneOf: a class can be defined by its members (ex: daysOfWeek defined by members)


An Enumeration class

• disjointWith


More Boolean Relationships:
• unionOf, complementOf, intersectionOf



Unrestricted cardinality
• Ex: daysOfWeek as cardinality of 7

Differences Between DL and Full


Both DL and Full use the same OWL vocabulary Difference #1: DL classes and properties cannot also be individuals (instances), and vice versa.
• See previous slide.



• That is, there is a strict separation between type and subClassOf. • So if you use <Merlot> as <rdf:type> of <Wine>, you can‘t subclass <Merlot> to add additional properties in OWL DL. • ―subClass versus instance‖ decisions should be made based on the intended use of the ontology.




Difference #2: All DL properties are required to be either

Don‘t make Merlot an instance if you are developing an ontology to describe your wine collection, which consists of many bottles of Merlot (instances), and you want to use OWL DL

• owl:ObjectProperty: used to connect instances of two classes. • owl:DataTypeProperty: used to connect class instances with XML schema types and RDF literal strings. • (OWL Full allows us to tag DataTypeProperties as owl:InverseFunctionalProperty, so we can create a string literal instance that uniquely identifies a class instance. )

An OWL Example
An Earth Systems Grid example (Courtesy of Line Pouchard)

An Example Ontology: Climate Data




The example shows how to construct a really simple ontology and instance. We don‘t use it to encode all data but rather to encode metadata about data files. Two classes:
• dataset • Parameter
• Where is the data file (URI) that has the temperature associated with this dataset?





One property:

 

Several parameters: cloud_medium, bounds_latitude, temperature Line Pouchard (ORNL) created this for ESG using Protégé and OilEd.

• hasParameter

Let’s Begin


Front matters: OWL ontologies begin with the <Ontology> header.
• A useful place to put metadata about the document. • Line uses the Dublin Core to establish authorship.



Next, define two classes: dataset and parameter.
• Class definitions are almost trivial. • We really state what something is by its properties.


Deep philosophical arguments here, I‘m sure.



Most of the work will go into defining the property, hasParameter.
• Begins on bottom of next slide • But the full extent of the definition requires a separate slide.

Ontology header With Dublin Core Parameters. Class Definitions

hasParameter Definition

Defining hasParameter
 

hasParameter domain: it applies to the dataset class. hasParameter range: it applies to a list of 3 OWL Things
• Cloud_medium, bounds_latitude, and temperature. • This is done using the awkward RDF list structure.




These three OWL Things are then defined.
• They are each of type ―parameter‖


―Give me the first of the rest recursively until I get to nil‖

• Each may also be further defined by additional properties and classes.


That is, members of the parameter class.

• Or it may be out of scope. I may just need to know that the bounds_latitude for particular dataset is located in some resource with a specific URI.

Temperature has units, for example, bounds_latitude needs starting and stopping values in decimal degrees,etc.

Parameter: Cloud_medium

Parameter: Bounds_latitude

Parameter: temperature

Finally, Apply It to Something


What is the file PCM.B06.10.dataset1?
• It‘s a member of the dataset class, which we have defined.



What properties does it have?
• bounds_latitude and cloud_medium, as all such members do.



Where can I get the bounds_latitude for this data set?
• It‘s in the file indicated by the rdf:resource.

OWL Enriched RDF Metadata about PCM.B06.10.dataset1

Is It Lite, DL, or Full?


Our ontology example is (at least) DL because we include the oneOf property.

OWL Equivalence and Inheritance
<owl:Class rdf:ID=‖user‖> <owl:equivalentClass rdf:resource=‖person‖> <owl:Class> <owl:Class rdf:about=‖#magneticSpe ctrometer‖> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource=‖#hasMagnet s‖> <owl:allValuesFrom rdf:resource=‖#Spectrome ter‖> </owl:Restriction> </rdfs: subClassOf> </owl:Class>

Other logical relationships that can be asserted: •inverseOf, •TransitveProperty, •SymmetricProperty, •FunctionalProperty, •InverseFunctionalProperty

Illustration of Inverse Properties

Querying Semantic Data
The Data Access Working Group (DAWG)

What Is Semantic Querying?




Don‘t confuse querying with inference. Querying just means retrieving data from Semantic data models.



Examples



For RDF-like structures, this amounts to querying triples

• Post a query to the world of distributed RDF data nuggets.

• Finding an Email address from a person‘s vCard. • Searching across subgraphs: get me the email of the author of this document (Dublin Core + vCard). • Persistent/scheduled queries on updates to several multimedia databases.

The DAWG Working Group


Unfortunately, there are no standards for querying RDF, etc. The W3C Data Access Working Group DAWG is filling the query gap. This is a work in progress:
• Formed Feb 2004.
• Use Cases and Requirements: http://www.w3.org/TR/rdf-dawg-uc/ • BRQL Query Language: http://www.w3.org/2001/sw/DataAccess/rq23/ • There are solutions, like RDQL/SquishQL • These are just not ―official‖





A Simple Query


Consider the following RDF triple
• <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> "BRQL Tutorial― • Recall this is equivalent to the sentence ―book1 [has] title ‗BRQL Tutorial‘‖ • We may have a large set of such triples in our data store.



We want to make a query on this data like this: ―What is the title of book1?‖

The Query and the Results


We can construct queries on any of the parts of the triple, such as
SELECT ?title WHERE { <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> ?title . }



Thus just means ―what is the title of book1?‖ ?title = "BRQL Tutorial―

So What?




This was a trivial example in which we posed a query on the triple‘s object, which was a string. But the object of the triple may be a URI (an RDF resource), not just a literal. For complicated graphs, this means that the query returns a ―pointer‖ to another section of the graph. This means that we can make linked queries that allow us to navigate graphs.
• Or we may construct queries against subjects or verbs of triples.





Linked Queries Across Graph Sections
dry@stateu.edu http://.../CMCS/Entry/1

dc:creator

vcard:EMAIL

dc:title

http://.../People/DrY vcard:N

H20

What is the given name of the creator of Entry 1?

vcard:Given

vcard:Family

What If You Can’t Wait?






BRQL is still a work in progress. If you need something now, there is Jena‘s RDQL. RDQL allows you to pose triplet queries similar BRQL
• Jena has a programming interface that allows you to construct and execute these queries against RDF.

Tools for Playing with Things


Jena Toolkit: Java packages from HPLabs for building Semantic Web applications.
• http://www.hpl.hp.com/semweb/ • Both IsaViz and Protégé use this.



IsaViz: A nice authoring/graphing tool
• http://www.w3.org/2001/11/IsaViz/



Protégé: Another ontology authoring tool
• http://protege.stanford.edu/



SiRPAC
• Allows you to parse RDF, convert RDF/XML into graphs and triplets. • http://www.w3.org/RDF/Validator/

Other Tutorials


Original Semantic Grid GGF tutorial material is here:
• http://www.semanticgrid.org/presentations/on tologies-tutorial/



Beginner and Advanced OWL tutorials are here:
• http://www.co-ode.org/resources/ • Lectures cover working examples (pizza ontology) built with Protégé. • http://www.semanticgrid.org/presentations/on tologies-tutorial/

Advanced OWL Tutorial
Courtesy of Sean Bechhofer

OWL Syntaxes


Abstract Syntax
• Used in the definition of the language and the DL/Lite semantics



OWL as RDF triples (and thus as, e.g. RDF/XML or N3)
• the ―official‖ concrete syntax • mapping rules describe how to translate from abstract syntax to triples.



XML Presentation Syntax
• XML Schema definition

OWL Ontologies


An OWL ontology consists of a number of Classes, Properties and Individuals
• All identified via URIs.



Classes
• Have ―definitions‖ providing their characteristics



Properties
• Characteristics such as transitivity or functionality • Domains and Ranges



Individuals
• Class membership • Relationships to other individuals • Concrete values.

XML Datatypes in OWL
 



OWL supports XML Schema primitive datatypes Clean separation between ‖object‖ classes and datatypes Philosophical reasons:
• Datatypes structured by built-in predicates • Not appropriate to form new datatypes using ontology language



Practical reasons:
• Ontology language remains simple and compact • Implementability not compromised – can use hybrid reasoner

OWL Class constructors




OWL has a number of operators for constructing class expressions. Boolean operators
• and, or, not



Restrictions
• slot fillers with explicit quantification



Enumerated Classes.
• explicit enumerations of the class members

OWL Class Constructors
Constructor Classes intersectionOf (and) unionOf (or) complementOf (not) Example Human intersectionOf(Human Male) unionOf(Doctor Lawyer) complementOf(Male)

oneOf
someValuesFrom allValuesFrom minCardinality maxCardinality

oneOf(john mary)
restriction(hasChild someValuesFrom Lawyer) restriction(hasChild allValuesFrom Doctor) restriction(hasChild minCardinality (2)) restriction(hasChild maxCardinality (2))

OWL Class constructors


The operators have an associated semantics
• Given in terms of a domain:


D I:concepts ! (D) I:properties ! (D £ D) I:individuals ! D

• and an interpretation function I
  

• I is then extended to concept expressions.

OWL Constructor Semantics
Construc tor Example Semantics

Classes

Human

I(Human)

I(Human) Å I(Male) I(Doctor) [ unionOf unionOf(Doctor Lawyer) I(Lawyer) compleme complementOf(Male) D n I(Male) ntOf intersectio intersectionOf(Huma nOf n Male)

oneOf

oneOf(john mary)

{I(john), I(mary)}

OWL Constructor Semantics
Constructor Example Semantics

someValuesFr restriction(hasChild {xj9y.hx,yi2I(hasChild)Æ y2I(Lawyer)} om someValuesFrom Lawyer)
allValuesFrom restriction(hasChild {xj8y.hx,yi2I(hasChild) ) y2I(Doctor)} allValuesFrom Doctor) minCardinalit y restriction(hasChild {x|# hx,yi2I(hasChild) ¸ minCardinality 2} (2))

maxCardinalit restriction(hasChild {x|# hx,yi2I(hasChild) · y maxCardinality 2} (2))

OWL Axioms


Axioms allow us to add further statements about arbitrary concept expressions and properties
• Disjointness, equivalence, transitivity of properties etc.



An interpretation is then a model of the axioms iff it satisfies every axiom in the ontology.
Example
EquivalentClass(Man intersectionOf(Human Male)) DisjointClasses(Animal Plant) SameIndividualAs(Geor geWBush PresidentBush)

Axiom
EquivalentClass es DisjointClasses SameIndividual As

Semantics
I(Man) = I(Human) Å I(Male) I(Animal) Å I(Plant) = ; I(GeorgeWBush) = I(PresidentBush)

Basic Inference Tasks


Inference can now be defined w.r.t. interpretations/models.

• C subsumes D w.r.t. K iff for every model I of K, I(D) µ I(C) • C is equivalent to D w.r.t. K iff for every model I of K, I (C) = I (D) • C is satisfiable w.r.t. K iff there exists some model I of K s.t. I (C)  ; • x is an instance of C w.r.t. K iff for every model I of K, I(x) 2 I(C) • hx,yi is an instance of R w.r.t. K iff for, every model I of K, (I(x),I(y)) 2 I(R)



Querying knowledge

Why Reasoning?


Why do we want it?



Given key role of ontologies in the Semantic Web, it will be essential to provide tools and services to help users:
• Design and maintain high quality ontologies, e.g.:
 

• Semantic Web aims at ―machine understanding‖ • Understanding closely related to reasoning




• Answer queries over ontology classes and instances, e.g.:
 

Meaningful — all named classes can have instances Correct — captured intuitions of domain experts Minimally redundant — no unintended synonyms Richly axiomatised — (sufficiently) detailed descriptions Find more general/specific classes Retrieve annotations/pages matching a given description

• Integrate and align multiple ontologies

Why Decidable Reasoning?
 

OWL DL constructors/axioms restricted so reasoning is decidable Consistent with Semantic Web's layered architecture
• XML provides syntax transport layer • RDF(S) provides basic relational language and simple ontological primitives • OWL DL provides powerful but still decidable ontology language • Further layers may (will) extend OWL




Facilitates provision of reasoning services
• Known ―practical‖ algorithms • Several implemented systems • Evidence of empirical tractability

Will almost certainly be undecidable



Understanding dependent on reliable & consistent reasoning

Other Links

XML Primer
General characteristics of XML

Basic XML
  







XML consists of human readable tags Schemas define rules for a particular dialect. XML Schema is the root, defines the rules for making other XML schemas. Tree structure: tags must be closed in reverse order that they are opened. Tags can be modified by attributes • name, minOccurs Tags enclose either strings or structured XML

<complexType name="FaultType"> <sequence> <element name="FaultName" type="xsd:string" /> <element name="MapView/> <element name="CartView―/> <element name="MaterialProps" minOccurs="0" /> <choice> <element name="Slip" /> <element name="Rate" /> </choice> </sequence> </complexType>

Namespaces and URIs
 



XML documents can be composed of several different schemas. Namespaces are used to identify the source schema for a particular tag. • Resolves name conflicts—‖full path‖ Values of namespaces are URIs. • URI are just structured names.  May point to something not electronically retrievable • URLs are special cases.

<xsd:schema

xmlns:xsd="http://www.w 3.org/2001/XMLSchema" xmlns:gem="http://comm grids.indiana.edu/GCWS/S chema/GEMCodes/Faults‖> <xsd:annotation> … </xsd:annotation> <gem:fault> … </gem:fault> </xsd:schema>

Metadata and the Dublin Core
Define metadata and describe its use in physical and computer science.

What is Metadata?
 

Common definition: data about data ―Traditional‖ Examples
• Prescriptions of database structure and contents. • File names and permissions in a file system. • HDF5 metadata: describes scientific/numerical data set characteristics such as array sizes, data formats, etc.





Metadata may be queried to learn the characteristics of the data it describes. Traditional metadata systems are functionally tightly coupled to the data they describe.
• Prescriptive, needed to interact directly with data.

Descriptive Metadata and the Web


Traditional metadata concepts must be extended as systems become more distributed, information becomes broader
• Tight functional integration not as important • Metadata used for information, becomes descriptive. • Metadata may need to describe resources, not just data.



Everything is a resource
• People, computers, software, conference presentations, conferences, activities, projects.



We‘ll next look at several examples that use metadata, featuring
• Dublin Core: digital libraries • CMCS: chemistry

The Dublin Core: Metadata for Digital Libraries


The Dublin Core is a set of simple name/value properties that can describe online resources.
• Usually Web content but generally usable (CMCS) • Intended to help classify and search online resources.

 

DC elements may be either embedded in the data or in a separate repository. Initial set defined by 1995 Dublin, Ohio meeting.

Thought Experiment: Construct Your Own Metadata Set




Describe yourself: your occupation, your interests, your place of residence, your parents, spouse, children,…. Take each sentence:




Metadata is just a collection of these name/value pairs. For particular fields (like publishing), we can define a conventional set of property names.

• The verbs become properties • The verbs‘ objects are property values.

The Dublin Core: Metadata for Digital Libraries


The Dublin Core is a set of simple name/value properties that can describe online resources.
• Usually Web content but generally usable (CMCS) • Intended to help classify and search online library resources. • Digital library card catalog.

 

DC elements may be either embedded in the data or in a separate repository. Initial set defined by 1995 Dublin, Ohio meeting.

Dublin Core Elements


Content elements:
• Subject, title, description, type, relation, source, coverage.



Intellectual property elements:
• Contributor, creator, publisher, rights



Instantiation elements:
• Date, format, identifier, language



In RDF, these are called properties.

Encoding the Dublin Core




DC elements are independent of the encoding syntax. Rules exist to map the DC into
• HTML • RDF/XML



We provide more detailed info on RDF/XML encoding in this seminar.

Sample RDF/HTML
<head> <title>Expressing Dublin Core in HTML/XHTML meta and link elements</title> <meta name="DC.title" content="Expressing Dublin Core in HTML/XHTML meta and link elements" /> <meta name="DC.creator" content="Andy Powell, UKOLN, University of Bath" /> <meta name="DC.type" content="Text" /> </head>

Where Do I Put the Dublin Core Metadata?


Dublin core elements may be placed directly in HTML pages.
• Still need DC-aware crawlers or applications to find and use them.



Or you may have a large database on DC entries that are used by DCaware applications.
• We‘ll examine a WebDAV-based scheme for chemistry in a second.

Dublin Core Element Refinements






Many of these, and extensible See http://dublincore.org/documents/dc mi-terms/ for the comprehensive list of elements and refinements Examples:
• isVersionOf, hasVersion, isReplacedBy, references, isReferencedBy.

OWL DL


Use of OWL vocabulary restricted
• Can‘t be used to do ―nasty things‖ (i.e., modify OWL) • No classes as instances



Standard DL/FOL model theory (definitive)
• Direct correspondence with (first order) logic • Reasoning via DL engines


DL

Some problems with oneOf/inverse Would need built in datatypes for performance

• Reasoning for full language via FOL engines


OWL Full


No restriction on use of OWL vocabulary (as long as legal RDF)
• Classes as instances • Assertions about vocabulary

Full



RDF style model theory
• Reasoning using FOL engines


via axiomatisation

• Semantics should correspond with OWL DL for suitably restricted KBs

XML for Knowledge Representation
1.

2.

3.





Definition of self-describing data in worldwide standardized, non-proprietary format. Structured data and knowledge exchange for enterprises in various industries. Integration of information from different sources to uniform documents. Exchange of knowledge bases between different AI languages, knowledge bases and databases, application systems, etc. But….

History: From RDF to OWL


Two languages developed by extending (part of) RDF
• OIL: developed by group of (largely) European researchers • DAML-ONT: developed by group of (largely) US researchers (in DARPA DAML programme)



Efforts merged to produce DAML+OIL
• Development was carried out by ―Joint EU/US Committee on Agent Markup Languages‖ • Extends (―subset‖ of) RDF



DAML+OIL submitted to W3C as basis for standardisation
• Web-Ontology (WebOnt) Working Group formed • WebOnt group developed OWL language based on DAML+OIL • OWL language now a W3C Recommendation (Feb 2004)

RDFS Takeaway


RDFS defines a set of classes and properties that can be used to define new RDF-like languages.
• RDFS actually bootstraps itself.

 

You can express inheritance, restriction If you want to learn more, see the specification
• http://www.w3.org/TR/2003/WD-rdf-schema20030123/



But don‘t trust the write up:
• Concepts are best understood by looking at the RDF XML. English descriptions get convoluted.



If you want to see RDFS in action, see the DC:
• http://dublincore.org/2003/03/24/dces#

Web Ontology Language Requirements
Desirable features identified for Web Ontology Language:


Extends existing Web standards
• Such as XML, RDF, RDFS



Easy to understand and use
• Should be based on familiar KR idioms

  

Of ―adequate‖ expressive power Formally specified Possible to provide automated reasoning support

Short History of Description Logics
Phase 1: Phase 2:
• Incomplete systems (Back, Classic, Loom, . . . ) • Based on structural algorithms • Development of tableau algorithms and complexity results • Tableau-based systems for Pspace logics (e.g., Kris, Crack) • Investigation of optimisation techniques • Tableau algorithms for very expressive DLs • Highly optimised tableau systems for ExpTime logics (e.g., FaCT, DLP, Racer) • Relationship to modal logic and decidable fragments of FOL

Phase 3:

Latest Developments
Phase 4:


• Mature implementations • Mainstream applications and Tools
Databases
• Consistency of conceptual schemata (EER, UML etc.) • Schema integration • Query subsumption (w.r.t. a conceptual schema) • Ontology engineering (design, maintenance, integration) • Reasoning with ontology-based markup (meta-data) • Service description and discovery



Ontologies and Semantic Web (and Grid)

• Commercial implementations


Cerebra system from Network Inference Ltd

What Does This Have to Do with Grid Computing?


RDF resources aren‘t just web pages



Consider the CMCS chemistry example that they needed to describe the provenance, annotation, and curation of chemistry data.

• Can be computer codes, simulation and experimental data, hardware, research groups, algorithms, ….

 

CMCS maps all of their metadata to the Dublin Core. The Dublin Core is encoded quite nicely as RDF.

• Compound X‘s properties were calculated by Dr. Y.

vCard: Representing People with RDF Properties


The Dublin Core tags are best used to represent metadata about ―published content‖
• Documents, published data



vCards are an IETF standard for representing people
• Typical properties include name, email, organization membership, mailing address, title, etc. • See http://www.ietf.org/rfc/rfc2426.txt



Like the DC, vCards are independent of (and predate) RDF but are map naturally into RDF.
• Each of these maps naturally to an RDF property • See http://www.w3.org/TR/2001/NOTE-vcard-rdf20010222/

Example: A vCard in RDF/XML
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:vcard='http://www.w3.org/2001/vcard-rdf/3.0#'> <rdf:Description rdf:about='http://cgl.indiana.edu/people/GCF' vcard:EMAIL='gcf@indiana.edu'> <vcard:FN>Geoffrey Fox</vcard:FN> <vcard:N vcard:Given='Geoffrey' vcard:Family='Fox'/> </rdf:Description> </rdf:RDF>

Linking vCard and Dublin Core Resources
 

The real power of RDF is that you can link two independently specified resources through the use of properties. We do this using URIs as universal pointers



Linking these resource nuggets allows us to pose queries like

• Identify specific resources (nouns) and specifications for properties (verbs) • The URIs may optionally be URLs that can be used to fetch the information.



Linkages can be made at any time

• ―What is the email address of the creator of this entry in the chemical database?‖ • ―What other entries reference directly or indirectly on this data entry?‖ • Don‘t have to be designed into the system

A Simple Jena RDQL Example
Model model=new ModelMem(); Model.read(new FileReader(―a.rdf‖)); String queryString = "SELECT ?x, ?fname WHERE (?x,<http://www.w3.org/2001/vcard -rdf/3.0#EMAIL>, ?fname)" Query query=newQuery(queryString); query.setSource(model); QueryExecution qe=new QueryEngine(query); QueryResults results=qe.exec();

Building Semantic Markup Languages


XML essentially defines syntax rules for markup languages.







We also would like some limited ability to encode meaning directly within markup languages. The semantic markup languages attempt to do that, with increasing sophistication. Stack indicates direct dependencies: OWL is defined in terms of RDF, RDFS.

• ―Human readable‖ means humans provide meaning

Eric Miller, http://www.w3.org/2002/Talks/www2002-w3ct-swintro-em/

Other Semantic Markup Languages


RDF Schema (RDFS):



DARPA Agent Markup Language (DAML):
• DAML-OIL is the language component of the DAML project. • Defined using RDF/RDFS.
• Developed by the W3C‘s Web-Ontology Working Group • Based on/replaces DAML-OIL

• Provides formal definitions of RDF • Also provides language tools for writing more specialized languages. • We‘ll examine in more detail.



Web-Ontology Language (OWL):

What Are Description Logics?


A family of logic based Knowledge Representation formalisms
• Descendants of semantic networks and KL-ONE • Describe domain in terms of concepts (classes), roles (relationships) and individuals



Distinguished by:
• Formal semantics (typically model theoretic)
 

Decidable fragments of FOL Closely related to Propositional Modal & Dynamic Logics Sound and complete decision procedures for key problems Implemented systems (highly optimised)

• Provision of inference services
 


								
To top