Introduction to Semantic Web
Page 1 8/10/2010
• Web in its current form is an application on the internet that delivers
information. Ex: browsing daily news
• Current applications involving the web integrate data and information. Ex: online
• Next generation web is expected integrate a variety of resources and devices
and support knowledge sharing among machines.
– Exploit the economies of scale possible by machines processing of knowledge.
• How to tell the machines about the resources and how to specify concepts? How
can machines acquire knowledge? How to share knowledge among machines?
How to enable them to make decisions based on these?
• Need to specify resources, concepts, knowledge and other artifacts used in
human decision making in a form usable by machines.
• Machines can then integrate and analyze information and make decisions and
• In this lecture we will examine technology, tools, frameworks, and applications
enabling the next generation web, the semantic web.
• We will also discuss an intelligent search engine serving municipal services in a
real semantic web application (Chapter 4)
Page 2 8/10/2010
References for today’s discussion
1. W3C school’s tutorials
2. Taxonomies and the semantic web by
Alistair Miles, CISTRANA workshop,
Feb 2006, Rutherford Appleton Lab
Page 3 8/10/2010
HTML, XML, RDF, and OWL
– HTML stands for Hyper Text Markup Language
– An HTML file is a text file containing small markup tags
– The markup tags tell the Web browser how to display the page
– XML stands for eXtensible Markup Language
– XML is a markup language much like HTML
– XML was designed to carry data, not to display data
– XML tags are not predefined. You must define your own tags
– XML is designed to be self-descriptive
– XML is a W3C Recommendation
Page 4 8/10/2010
HTML, XML… RDF, ..
– RDF stands for Resource Description Framework
– RDF is a framework for describing resources on the web
– RDF provides a model for data, and a syntax so that
independent parties can exchange and use it
– RDF is designed to be read and understood by computers
– RDF is not designed for being displayed to people
– RDF is written in XML
– RDF is a part of the W3C's Semantic Web Activity
– RDF is a W3C Recommendation
• Lets discuss the details.
Page 5 8/10/2010
– OWL stands for Web Ontology Language
– OWL is built on top of RDF
– OWL is for processing information on the web
– OWL was designed to be interpreted by
– OWL was not designed for being read by people
– OWL is written in XML
– OWL has three sublanguages
– OWL is a web standard
Page 6 8/10/2010
Natural language Natural language Ontology
Programming language Web ontology
Programming language is a strict syntaxed language for expressing
algorithms (steps) for execution by a computing device.
Web ontology is for expressing web related concepts.
Web ontology language (OWL) is a technology for accomplishing this.
Protégé-OWL is a tool that implements OWL.
Page 7 8/10/2010
Taxonomy and web ontology
• Taxonomy is a science of classification. F: Taxonomy
• Ontology is specification of conceptualization. F: Ontology
• XML allows for meaningful tags. T: XML
• Resource Definition Framework is an XML language for defining
resources on the web (www). T: RDF
• Web Ontology Language (OWL) T:OWL
• RDF is an assertional language intended to be used to express
propositions using precise formal vocabularies, particularly those
specified using RDFS [RDF-VOCABULARY], for access and use
over the World Wide Web, and is intended to provide a basic
foundation for more advanced assertional languages with a
similar purpose. The overall design goals emphasize generality
and precision in expressing propositions about any topic, rather
than conformity to any particular processing model.
Page 8 8/10/2010
RDF and OWL
• OWL is a semantic extension of RDF: it
allows for specification of logical
dependencies between information
structures. (as defined by Miles: ref 2)
• OWL works on structured information
• RDF is for structuring information.
• OWL is an information model.9
Page 9 8/10/2010
Page 10 8/10/2010
Intelligent Search Engine for online
access to municipal services (Ch 4):
• Citizens can perform 80% of the city services from home
• When somebody is looking for a service one must be able to locate it
• You can collect, categorize and list all the services (.. Taxonomy)
• However searching through this list may not yield expected results
using traditional search engines.
– Search results are based on the description of the services and co-
occurrence of the words in the query.
– Ex: A citizen want to dispose a washing machine should search for “special
collection of large items”
• Cannot force citizens to learn government language
• When a service is looked upon a set of related services should be made
• Search engine is a first step in the roadmap to citizen self-service
Page 11 8/10/2010
Zaragoza Municipal services
roadmap (Fig. 4.1)
Positioning Intelligent search Engine Citizen channels Citizen self-service
Page 12 8/10/2010
Application of semantic web
• Three ways that Zaragosa used semantic
1. Statistical approach to interpretation of
citizen requests. (fig. 4.3)
2. Enhanced-keyword based approach to
interpretation of citizen requests. (fig. 4.4)
3. Applying semantic distance to interpreting
citizen requests. (fig. 4.5)
Page 13 8/10/2010
Usage of the three methods
• First approach is cheapest and consumes less
resources and the semantic web approach is
the most expensive.
• Zaragosa architecture arranges the three in a
pipeline architecture where each stage is
triggered only when previous stage did not
result is satisfactory results.
Page 14 8/10/2010
How does it work?
• Traditional search engines retrieve documents based on
occurrences of keyboards vs. Zaragosa SOA (ZS) has
understanding of its services, information and data.
• ZS knows persons can change addresses, car owners pay taxes,
construction work requires permits, building bars near schools is
not good etc.
• All this information is stored in an ontology: a computer
understandable description of what e-services are.
• This ontology allows ZS to understand citizens’s query and thus
returns meaningful results.
• ZS also uses natural language understanding software to
translate free text queries of citizens into the ontology. (see fig.
Page 15 8/10/2010
interaction (Fig. 4.6 modified)
Natural language Semantic Result
NLP Tagger (KT) Distance
Page 16 8/10/2010
Search vs. Intelligent Search
• Search for keywords • Search for
• Result in ranked list keywords, semantic
of documents concepts.
• Users need to invest • Results in actual
time and effort to relevant document
filter the right piece • Perceived as search
of information out of engine that
the overall results understands the
Page 17 8/10/2010
ZS Domain Ontology
• Development of an ontology starts with
detailed study of the services offered by
• Objective is to extract all relevant terms
belonging to this domain from existing
• ZS ontology contains four main classes:
agent, process, event, object
Page 18 8/10/2010
ZS Domain Ontology (contd.)
• Agent: entity participating in an action
• Process: A series of actions that a
citizen can do using the online services
offered by the city government.
• Event: any social gathering or activity.
• Object: any entity that exists in the city
which can be used for or by a service
offered by the city government.
Page 19 8/10/2010
Using the ontology
• Approach is to establish a semantic similarity
between a question provided by a citizen and
the FAQs already available.
• Ontology needs to be complete in order to
contain all the necessary terms to satisfy the
• Ontology is completed with a number of
thesauri to identify synonyms. Ex: baby and
• Context information is used to tackle any
Page 20 8/10/2010
Natural Language Process for ZS
• Knowledge tagger automatically annotates
text according to domain ontology
• Series of linguistic analyzers, sentence
splitters, simple tokenizers, spell checkers
and morphological databases.
• Outcome of this analysis is a annotated text
equivalent of the query.
• Then the query is synthesized in terms of
domain ontology: RDQL, SPARQL, … SQL
Page 21 8/10/2010
Semantic Annotation of city
• Collect and index the information about
• Semantic processing results in
ontological entities: concepts, instances,
attributes, and relations
• Output of this process is semantically
described services that can checked
against citizen’s queries.
Page 22 8/10/2010
Overall Architecture of ZS
Search Systems web services
Ontology Systems NLP systems
Ontology cache NLP cache
Ontology Subsytem NLP subsystem
Web services Web services
Page 23 8/10/2010
• Zaragosa is an powerful SOA that uses
semantic knowledge to better serve its
• Its roadmap is open with ability to
extend the system through its WS
Page 24 8/10/2010
Networked SOA for Zaragosa
Services Service Customer
Page 25 8/10/2010