Organization and Structure of Information using
Semantic Web Technologies
Jennifer Golbeck, Amy Alford, James Hendler
Semantic Web and Agents Project
Maryland Information and Network Dynamics Laboratory
University of Maryland, College Park
Today's web has millions of pages that are dynamically generated from content
stored in databases. This not only makes managing a large site easier, but is
necessary for fully functioning ecommerce and other large, interactive websites.
These local databases, in one sense, are not full participants in the web. Though
they present normal looking HTML pages, the databases themselves are not
interconnected in any way. Organization X has basically no way of using or
understanding Organization Y's data. If these two want to share or merge
information, the database integration would be a fairly significant undertaking. It
would also be a one time solution. If Organization Z entered the picture, a new
merging effort would have to be undertaken.
As the web stands, this has not been a significant problem. By design, the web has
been a vehicle for conveying information in a human readable form – computers
had no need to understand the content. As dynamic sources of information have
become omnipresent on the web, the World Wide Web Consortium has
undertaken efforts to make information machine readable. This technology,
collectively called the Semantic Web, allows computers to understand and
communicate with one another. For site designers, this means data from other
sites can be accessed and presented on your own website, and your own public
data can be made easily accessible to anyone. It follows that just as web pages are
currently hyperlinked, data can also be linked to form a second web behind the
scenes, allowing full across-the-web integration of data.
Dynamically generated pages driven by databases are becoming commonplace for
most large websites, and even for medium and small ones. This trend, combined
with the proliferation of non-text media files as one of the primary forms of
content, poses several problems to the current web architecture. Search engines
have difficulty indexing database driven pages, and cannot directly index the raw
data used in the back end. Media searches, such as image or MP3 searches, are
notoriously bad, because there is no text from which to extract keywords that
could be used to index the media.
The nature of the web, with interconnected information, does not extend backend
databases or media either. It is usually not possible for a web designer to use
information from an external database to drive their own site. The databases are
not publicly accessible for queries, nor is the underlying organization of the
The Semantic Web is a vision for the future of the World Wide Web that will give
meaning to all of this data, as well as making it publicly accessible to anyone who
is interested. While some web sites and designers will want to keep their backend
data proprietary, many will find it in their interest, for design and public interest,
to use semantic encodings.
This chapter will introduce the semantic web, explain how to organize content for
use on the semantic web, and show several examples of how it can be used.
Throughout the discussion, we will describe how the technologies affect the
human factors in web design and use.
What is the Semantic Web?
The World Wide Web today can be thought of as a collection of distributed,
interlinked documents, encoded using (primarily) HTML. Any person can create
their own HTML document, put it online, and point to other pages on the Web.
Since the content of these pages is written in natural language, computers do not
“know” about what is in the page, just how it looks. The Semantic Web makes it
possible for machine-readable annotations to be added, linked to each other, and
used for organizing and accessing Web content. Thus, the Semantic Web offers
new capabilities, made possible by the addition of documents that encode the
"knowledge" about a web page, photo, or database, in a publicly accessible,
machine readable form. Driving the Semantic Web is the organization of content
into specialized vocabularies, called ontologies, which can be used by Web tools
to provide new capabilities. In this section, we present the basic ideas underlying
ontologies, what they can be used for, and how they are encoded on the Web.
Vocabularies and Ontologies
An ontology is a collection of terms used to describe a particular domain. Some
ontologies are broad, covering a wide range of topics, while others are limited
with very precise specifications about a given area.
The general elements that make up an ontology are the following:
• Classes – general categories of things in the domain of interest
• Properties – attributes that instances of those classes may have
• The relationships that can exist between classes and between properties
Ontologies are usually expressed in logic-based languages. This allows people to
use reasoners to analyze the ontologies and the relationships within them. XML
(eXtensible Markup Language) exists to add metadata to applications, but there is
no way to connect or infer information from these statements. For example, if we
say “Sister” is a type of “Sibling” and then say “Jen is the sister of Tom,” there is
no way to automatically infer that Jen is also the sibling of Tom in XML. By
encoding these relationships in a logic based language, these and other more
interesting inferences can be made. Reasoners are the tools that use the logical
statements to make inferences about classes, properties, instances, and their
relationships to each other.. These reasoners can be used in advanced applications,
such as semantic portals or intelligent web agents. Making data on the web
accessible for these types of services and applications, ontologies, and languages
for developing them, is a focus in the emerging Semantic Web.
The “formal” models of the domain enabled by the ontologies provide a number
of new capabilities, but also require extra work with respect to entering the
metadata appropriately, developing the vocabularies, etc. To justify the significant
added effort required for good encoding of the semantics behind a given
application, users should understand some of the benefits that will be available,
and doors that are opened. There are many places where the semantic web can
improve the way things are done on the web now, and add new capabilities
beyond what is available now on the web. The following sections enumerate some
of the visions for the Semantic Web as put forth by the World Wide Web
Consortium’s Web Ontology Working Group in a document outlining use cases
and requirements for ontologies on the Web (Heflin, 2003).
A Web portal is a web site that provides information content on a topic. While the
term has become common for full-web search engines such as Google, portals in
the traditional sense are also domain specific pages that do not necessarily have a
search feature. The goal is to provide users with a centralized place to find links,
newsgroups, and resources on a topic.
For portals to work well, they need to be good sources of information to
encourage the community to participate in maintaining and updating their content.
To create a semantic web portal, where information is well annotated and
maintained in a semantic web format, the same is true. Users need some
motivation to do the markup that makes the site work.
The vision for semantic web portals is to not only make them available as online
web pages, but also to integrate them into tools. On web pages, users can find
resources based on their semantic markup. To encourage users to create their own
metadata, tool integration of portal features is key. For example, if a scientist
authoring a paper or web page uses a particular term from an online ontology, the
semantic web portal feature should return other sources with similar markup.
Results would most certainly return related web pages. They will also provide
links to images, video, audio files, or datasets whose content is described by the
same term. By these sorts of providing useful information and resources, which
could not be found with a standard text-based keyword search, users will be
encouraged to mark up their documents so that they make take advantage of the
What allows this system to work more fully is the integration of the markup
process with the portal. The portal provides the most advantage to users while
they are creating their own semantic web documents. Thus, after providing
information to the user, the portal itself is extended when the new markup is
published. This interactive cycle means that semantic web portals will reach out
to incorporate external resources, as well as creating a dense web of semantically
Ontologies can be used to provide semantic annotations for collections of images,
audio, or other non-textual objects. Though one may choose to argue for
keywords and natural language processing for computers to understand text
documents, extracting information about media is much more difficult. Though
some file formats do contain information about the file and the media, there is no
way for a machine to understand what is happening in a picture, or the
significance of who is pictured. Ontologies to describe media and its content
address this problem.
Multimedia ontologies can be of two types: media-specific and content-specific.
Media specific ontologies describe the format of files and related information. For
an image, ontologic markup may include file format and file size, plus
information about how the image was produced, such as the camera that took the
photo or focal length. Content-specific ontologies allow an author to describe
what the media is about. For a photo, this could include the date and time it was
taken, where it was taken, who and what is in the picture, and what is happening.
For other media, like sound, attributes like lyrics, chord progressions, or historical
information may also be relevant. Data about the contents can be related to
detailed instances declared in other files,
Web site management
Websites for even small organizations can have large collections of documents
which fall into many categories. These can include news releases and
announcements, papers, forms, contact pages, and downloads. As the number of
documents increases, finding them, without structure, becomes all the more
difficult. Even a taxonomy with a strong hierarchical structure can be insufficient.
This is clearly seen in web directories, such as Yahoo!, where finding a particular
page is difficult, even in a subset of the hierarchy.
An ontology-based web site allows users to search and navigate using specific,
ontologically defined terms. This will make documents easier to find, and cross-
references easier to track down. Later on, this chapter will discuss one website
using semantic markup as its foundation.
Documentation of systems is often very complex. Large sets of documents with
overlapping scopes have several presentation challenges. Since documents are
generally grouped thematically, it is not unusual for several sets of documentation
to address different aspects of the same sub problem. For a client who is trying to
find data on the sub problem alone, there is sometimes no choice but to navigate
through several sets of complex documents. Even when the desired information is
contained in one set, the level of detail can often be overwhelming.
Troubleshooting problems on a website, for example, usually demands a less
detailed analysis from the user when compared to the system administrator.
Ontologies can be used to build an information model which allows the
exploration of documents in a different way. Users can choose to look for specific
topics, even if they are small, and see information on that topic, as well as how it
connects to the documentation of the encompassing categories. Different levels of
abstraction can also be specified, so that depending on user preferences, varied
levels of detail are made available.
Agents and services
The development of intelligent agents for the web is an area of intense interest.
With the evolution of the semantic web, the groundwork is being laid that will
allow agents to understand web based information, and act upon it. Agent tasks
include scheduling and planning (Payne et al., 2002), trust analysis (Golbeck et
al., 2003), ontological mapping, and interaction with web services.
Web services are sets of functions that can be executed over the web. When
services are semantically marked up, they become available for agents to find,
compose, and execute in conjunction with data also found on the semantic web.
Already, there are hundreds of web services, and a fast growing number of agents
and tools (Sirin et al., 2002) that can work with them.
Ubiquitous computing describes a movement from hard-wired personal
computing devices, to embedding devices in the environment and making them
available to any other wireless device. For these systems to work effectively, each
device needs to make itself known to the environment and advertise what types of
inputs it requires and what it is able to output. When agents are introduced to the
system, needing to configure a collection of services and devices to accomplish a
goal, it is important to have the ability to reason over the descriptions of the
devices and their capabilities.
Semantic Web Example
Consider the example of making a page about a recent trip to Paris. The page
would include some text describing the trip and when it happened, undoubtedly a
picture of the travelers in front of the Eiffel Tower, and perhaps links to the hotel
where the user stayed, to the City of Paris homepage, and some helpful travel
books listed on Amazon. As the web stands now, search engines would index the
page by keywords found in the text of the document, and perhaps by the links
included there. Short of that vague classification, there is no way for a computer
to understand anything about the page. If the date of the trip, for example, were
typed as "June 25-30", there would be no way for a computer to know that the trip
was occurring on June 26th, since it cannot understand dates. For the non-textual
elements, such as the photo, computers have no way of knowing who is in the
picture, what is happening, or where it occurs.
On the semantic web, all of this information and more would be available for
computers to understand. A number of research efforts have explored the
representation of ontological information on the Web (see references 15,20,17,
16). A language called DAML+OIL was released in March 2001 as the result of a
joint committee of US and European researchers working together to develop a de
facto standard. In November of 2001, the W3C created the Web Ontology
Working Group to develop a recommendation based on DAML+OIL. The
resulting language, OWL is emerging as the standard language to use for these
applications, and a set of tools for OWL is being produced as part of the W3C
process and under both US and EU funding. OWL is based on the Resource
Description Framework (RDF) and its extension RDF Schema .
Using OWL, users can encode the knowledge from the webpage, and point to
knowledge stored on other sites. To understand how this is done, it is necessary to
have a general understanding of how Semantic Web markup works.
With OWL, users define classes, much like classes in a programming language.
These can be sub-classed and instantiated. Properties allow users to define
attributes of classes. In the example above, a "Photo" class would be useful.
Properties of the Image class may include the URL of the photo on the web, the
date it was taken, the location, references to the people and objects in the picture,
as well as what event is taking place. To describe a particular photo, users would
create instances of the Image class, and then fill in values for the Image's
properties. In a simple table format, the data may look like this
Date Taken: June 26, 2001
Location: Parc Du Champ De Mars, Paris, France
Person in Photo: John Doe
Person in Photo: Joe Blog
Object in Photo: Eiffel Tower
On the Semantic Web, resources (collectively Classes, Properties, and Instances
of classes) are all given unique names, and referred to by their URI (Uniform
Resource Indicator). That URI will be the web address of the document
containing the code, with a '#' and the name of the object appended to the end. For
example, if the document describing the trip is at
http://www.example.com/parisTrip.owl, the URI of the photo would be
Since each resource has a unique name, it allows authors to make reference to
definitions elsewhere. In our ontology above, the author can make definitions of
the two travelers, John Doe and Joe Blog:
First Name: John
Last Name: Doe
First Name: Joe
Last Name: Blog
Then, in the properties of the photo, these definitions can be referenced. Instead of
having just the string "John Doe", the computer will know that the people in the
photo are the same ones defined in the ontology, with all of their properties.
Person in Photo: http://www.example.com/parisTrip.owl#JohnDoe
Person in Photo: http://www.example.com/parisTrip.owl#JoeBlog
It is also possible to make reference to instances defined in other ontologies. In
the simple table above, the property "Object in Photo" is listed as the simple
string "Eiffel Tower." If someone has created a Paris History Ontology with a
formal definition of the Eiffel Tower in another document, the URI of that
resource can be used in place of the string. Thus, our property would become
something like the following:
Object in Photo:
The benefit of this linking is similar to why links are used in HTML documents. If
a web page mentions a book, a link to its listing on Amazon offers many benefits.
Thorough information about the book may not belong on a page that just
mentions it in passing, or an author may not want to retype all of the text that is
nicely presented elsewhere. A link passes users off to another site, and in the
process provides them with more data. References on the Semantic Web are even
better at this. Though the travelers in this example may not know much about the
Eiffel Tower, the authors of the Paris History Ontology may have included all
sorts of interesting data about the history, location, construction, and architecture
of the Eiffel Tower in their definition. By making reference to that definition in
the description of the trip, the computer understands that the Eiffel Tower in the
photo has all of the properties described in the History Ontology. This means,
among other things, that agents and reasoners can connect the properties defined
in the external file to our file.
Once this data is encoded in a machine understandable form, it can be used in
many ways. Of course, the definition of ontologies and resources is not done in
simple tables as above. The next section will give a general overview of the
capabilities of RDF and OWL, used to formally express the semantic data. Once
that is established, we will present several examples of how this semantic data can
be used to produce and augment traditional web content.
Encoding Information on the Semantic Web
The basic unit of information on the Semantic Web, independent of the language,
is the triple. A triple is made up of a subject, predicate, and object or value. In the
example from the previous section, one triple would have subject "JohnDoe",
predicate "age", and value "23". Another would have the same subject, predicate
"First Name", and value "John". One detail that has been skipped over so far,
however, has been the issue of URIs. On the Semantic Web, everything, other
than strings, is represented by its unique URI. Thus, if our ontology is located at
http://www.example.com/parisTrip.owl, the triples will be1:
In the two examples above, the predicates relate the subject to a string value. It is
also possible to relate two resources through a predicate. For example
Actually, we slightly simplify the treatment of datatypes, as the details are not
relevant to this chapter. Readers interested in the full details of the RDF encoding
are directed to Re s o u r c e Description Framework (RDF).
Each of these triples forms a small graph with two nodes, representing the subject
and object, connected by an edge representing the predicate. The information for
John Doe is represented in a graph as shown below:
Figure 1: Three triples, rooted with http://www.example.com/parisTrip.owl#JohnDoe as the
subject of each
Taking all of descriptors from the previous section and encoding them as triples
will produce a much more complex graph:
Figure 2: The graph of triples from the Paris example
As documents are linked together, joining terms, these graphs grow to be large,
complex interconnected webs. To create them, we need languages that support the
creation of these relationships. Though there are many languages that can be used
on the semantic web, the most popular is RDF, with OWL emerging as a new,
more powerful extension.
RDF and RDFS
The Resource Description Framework (RDF) - developed by the World-Wide
Web Consortium (W3C) - is the foundational language for the Semantic Web. It,
along with RDF Schema, provides the basis for creating vocabularies and instance
data. This section presents the basics of syntax and an overview of the features of
RDF and RDFS. There are full books written on RDF, and this chapter is far too
brief to give a thorough coverage of the language. The rest of this section will
give a general overview of the syntax of RDF and OWL and their respective
features, but is not intended as a comprehensive guide. Links to thorough
descriptions of both languages, including an RDF primer and an OWL Guide,
each with numerous examples, are available on the W3C’s Semantic Web
Activity website at http://w3.org/2001/sw.
There are several flavors of RDF, but the version this chapter will focus on is
RDF/XML, which is RDF based on XML syntax. The skeleton of an RDF
document is as follows:
The tag structure is inherited from XML. The RDF tag begins and ends the
document, indicating that the rest of the document will be encoded as RDF. Inside
the rdf:RDF tag is an XML namespace declaration, represented as an xmlns
attribute of the rdf:RDF start-tag. Namespaces are convenient abbreviations for
full URIs. This declaration specifies that tags prefixed with rdf: are part of the
namespace described in the document at http://www.w3.org/1999/02/22-rdf-
In the example detailed in previous sections, we made some assumptions. For
creating triples about people, we assumed there was a class of things called
People with defined properties, like "age" and "first name". Similarly, we
assumed a class called "Photo" with its own properties. First, we will look at how
to create RDF vocabularies, and then proceed to creating instances.
RDF provides a way to create instances and associate descriptive properties with
each. RDF does not, however, provide syntax for defining Classes, Properties, and
describing how they relate to one another. To do that, authors use RDF Schema
(RDFS). RDF Schema uses RDF as a base to specify the a set of pre-defined RDF
resources and properties that allow users to define Classes and restrict Properties.
The RDFS vocabulary is defined in a namespace identified by the URI reference
http://www.w3.org/2000/01/rdf-schema#", and commonly uses the prefix "rdfs:".
This namespace is added to the rdf tag.
Classes are the main way to describe types of things we are interested in. Classes
are general categories that can later be instantiated. In the previous example, we
want to create a class that can be used to describe photographs. The syntax to
create a class is written:
The beginning of the tag "rdfs:Class" says that we are creating a Class of things.
The second part, "rdf:ID" is used to assign a unique name to the resource in the
document. Names always need to be enclosed in quotes, and class names are
usually written with the first letter capitalized, though this is not required. Like all
XML tags, the rdfs:Class tag must be closed, and this is accomplished with the "/"
at the end.
Classes can also be subclassed. For example, if an ontology exists that defines a
class called "Image", we could indicate that our Photo class is a subclass of that.
The rdfs:subClassOf tag indicates that the class we are defining will be a subclass
of the resource indicated by the rdf:resource attribute. The value for rdf:resource
should be the URI of another class. Subclasses are transitive. Thus, if X is a
subclass of Y, and Y is a subclass of Z, then X is also a subclass of Z. Classes
may also be subclasses of multiple classes. This is accomplished by simply
adding more rdfs:subClassOf statements.
Properties are used to describe attributes. By default, Properties are not attached
to any particular Class; that is, if a Property is declared, it can be used with
instances of any class. Using elements of RDFS, Properties can be restricted in
All properties in RDF are described as instances of class rdf:Property. Just as
classes are usually named with an initial capital letter, properties are usually
named with an initial lower case letter. To declare the "object in photo" property
that we used to describe instances of our Photo class, we use the following
This creates a Property called "objectInPhoto" which can be attached to any class.
To limit the domain of the property, so it can only be used to describe instances of
the Photo class, we can add a domain restriction:
Here, we use the rdfs:domain tag that limits which class the Property can be used
to describe. Here, the rdf:resource is used the same way as in the subclass
restriction above, but we have used a local resource. Since the Photo class is
declared in the same namespace (the same file, in this case) as the
"objectInPhoto" property, we can abbreviate the resource reference to just the
Similar to the rdfs:subClassOf feature for classes, there is an rdfs:subPropertyOf
feature for Properties. In our photo example, the property "person in photo" is a
subset of the "object in photo" property. To define this relation, the following
syntax is used:
Sub-properties inherit any restrictions of their parent Properties. In this case, since
the objectInPhoto property has a domain restriction to Photo, the personInPhoto
has the same restriction. We can also add restrictions. In addition to the domain
restriction which limits which classes the property can be used to described, we
can add range restrictions which limit what types of values the property can
accept. For the personInPhoto Property, we should restrict the value to be an
instance of the Person class. Ranges are restricted in the same way as domains:
Once this structure is set up, our instances can be defined. Consider the previous
triple that we described as plain text:
First Name: Joe
Last Name: Blog
Here, JoeBlog is the subject, and is an instance of the class Person. There are also
Properties for age, first name, and last name. Assuming we have defined the
Person class and its corresponding properties, we can create the Joe Blog
In the simplest case, the classes and properties we are using are declared in the
same namespace as where our instances are being defined. If that is not the case,
we use namespace prefixes, just as we used with rdf: and rdfs:. For example, if
there is a property defined in an external file, we can add a prefix of our choosing
to the rdf tag:
Once this new namespace is introduced, it can be used to reference classes or
properties in that file:
The OWL (Web Ontology Language) is a vocabulary extension of RDFS that
adds the expressivity needed to define classes and their relationships more fully.
Since OWL is built on RDF, any RDF graph forms a valid OWL ontology.
However, OWL adds semantics and vocabulary to RDF, and RDFS, giving it
more power to express complex relationships.
OWL introduces many new features over what is available in RDF and RDFS.
They include, among others, relations between classes (e.g. disjointness),
cardinality of properties (e.g. "exactly one"), equality, characteristics of properties
(e.g. symmetry), and enumerated classes. Since OWL is based in the knowledge
engineering tradition, expressive power and computational tractability were major
concerns in the drafting of the language. Features of OWL are well documented
online (McGuinness, van Harmelen, 2003), and an overview is given here. Since
OWL is based on RDF, the syntax is basically the same. OWL uses Class and
Property definitions and restrictions from RDF Schema. It also adds the following
Equality and Inequality:
• equivalentClass – This attribute is used to indicate equivalence between
classes. In particular, it can be used to indicate that a locally defined Class
is the same is one defined in another namespace. Among other things, this
allows properties with restrictions to a class to be used with the equivalent
• equivalentProperty – Just like equivalentClass, this indicates equivalence
• sameIndividualAs – This is the third equivalence relation, used to state that
two instances are the same. Though instances in RDF and OWL must have
unique names, there is no assumption that those names refer to unique
entities or the same entities. This syntax allows authors to create instances
with several names that refer to the same thing.
• differentFrom – This is used just like sameIndividualAs, but to indicate that
two individuals are distinct.
• allDifferent – The allDifferent construct is used to indicate difference
among a collection of individuals. Instead of requiring many long lists of
pair wise "differentFrom" statements, allDifferent has the same effect in a
compact form. AllDifferent is also unique in its use. While the other four
attributes described under this heading are used in the definition of classes,
properties, or instances, allDifferent is a special class for which the property
owl:distinctMembers is defined, which links an instance of
owl:AllDifferent to a list of individuals. The following example, taken from
the OWL Reference (van Harmelen et al., 2003), illustrates the syntax:
• inverseOf – This indicates inverse properties. For example,
"picturedInPhoto" for a Person would be the inverseOf the "personInPhoto"
property for Photos.
• TransitiveProperty – Transitive properties state that if A relates to B with a
transitive property, and B relates to C with the same transitive property,
then A relates to C through that property.
• SymmetricProperty – Symmetric properties state that if A has a symmetric
relationship with B, then B has that relationship with A. For example, a
"knows" property could be considered transitive, since if A knows B, then
B should also know A.
• FunctionalProperty - If a property is a FunctionalProperty, then it has no
more than one value for each individual. "Age" could be considered a
functional property, since no individual has more than one age.
• InverseFunctionalProperty – Inverse functional properties are formally
properties such that their inverse property is a functional property. More
clearly, inverse functional properties are unique identifiers.
Property Type Restrictions:
• allValuesFrom – This restriction, along with someValuesFrom, are used in
a class as a local restriction on the range of a property. While an rdfs:range
restriction on a Property globally restricts the values a property can take,
allValuesFrom states that for instances of the restricting class, the value for
the restricted Property must be an instance of a specified class.
• someValuesFrom – Just like allValuesFrom, this is a local restriction on the
range of a Property, but it states that there is at least one value of the
restricted property has a value from the specified class.
• intersectionOf – Classes can be subclasses of multiple other classes. The
intersectionOf statement that a class lies directly in the intersection of two
or more classes.
Cardinality restrictions are made for a class, and specifies how many values for a
property can be attached to an instance of a particular class.
• minCardinality – This limits the minimum number of values of a property
that are attached to an instance. A minimum cardinality of 0 says that the
property is optional, while a minimum cardinality of 1 states that there must
be at least one value of that property attached to each instance.
• maxCardinality - Maximum cardinality restrictions limit the number of
values for a property attached to an instance. A maximum cardinality of 0
means that there may be no values of a given property, while a maximum
cardinality of 1 means that there is at most one. For example, and
UnmarriedPerson should have a maximum cardinality of 0 on the
hasSpouse property, while a MarriedPerson should have a maximum
cardinality of 1.
• cardinality – Cardinality is a convenient shorthand for when maximum
cardinality and minimum cardinality are the same.
OWL also offers a usability benefit, in addition to the expressiveness described
below. Some syntax was renamed from DAML+OIL, the predecessor to OWL.
This replaced some confusing bits of syntax with more descriptive and
understandable names. Other features, such as the qualified cardinality
constraints, which many people considered both confusing and unnecessary, were
removed from OWL all together. Both OWL and DAML+OIL were based on
RDF and RDFS. DAML+OIL had duplicated some terms from these base
languages, putting identical syntax in two namespaces. This could lead to
questions of whether, say, a daml:Property or an rdf:Property were different,
when they were, in fact, identical. OWL removed any shadowing of the
underlying languages, leaving just the one option to users. Furthermore, OWL
divides the language into three subsets: OWL Lite, which is a subset of OWL DL,
which is, in turn, a subset of OWL Full. The benefit of these three levels is that
the more complex features are preserved in OWL Full, while OWL Lite and OWL
DL offer smaller subsets to the user, each with various features removed.
Tools for Creating Semantic Web Markup
To drive any application, it is necessary to create large amounts of RDF. Though
authoring RDF and OWL by hand is an option, there are many tools available to
make the process more transparent. This section will present a few of the general-
purpose tools used to create content2.
Both of these tools were developed in our lab and are available for download at
Users will often want to create Semantic Web markup for individual web pages,
photos, or concepts, rather than making a mass conversion of existing data. One
of several tools available to assist the user in creating instances is the RDF
Instance Creator (RIC) (Golbeck et al., 2002). The tool lets users import existing
ontologies, choose a class from those available, and then create an instance by
simply filling in a form.
Figure 3: The RDF Instance Creator (RIC) in action
When a class is selected, the user is presented with a workspace that lists all of the
known properties of that class. In the screen shot shown in Figure 5, the user is
creating an instance of the class "Athlete". The known properties of Athlete, such
as "weight", "eyeColor", and "height" are shown in the workspace, and the user
can enter the values. Though these first properties just take strings as values, RIC
also allows the user to link objects. The "plays" property shown below, for
example, requires an instance of the "Sport" class as its value. The user can either
create a new instance of a sport to act as the object in the triple, or an existing
instance can be linked in.
RIC also facilitates the extension of existing ontologies. Users may add a property
to any existing class, and the RDF for the new property is stored in the local
output file. Users have the capability to add new classes, as well. These may be
independent classes, or subclass any classes that have been imported from other
ontologies. For users who are new to the semantic web with limited understanding
of the underlying languages, a lightweight tool like RIC can hide most of the ugly
details, and jumpstart the instance authoring process.
Ontology Manipulation and Instance Creation
Most people are not ontological engineers, domain experts, or logicians, or even
programmers, so its unlikely that they will be able to read, sort through, and grasp
how to apply large ontologies, much less construct their own. Aside from the
difficulty of learning how to model content in a reasonably correct and formal
way, current Web focused knowledge engineering tends to involve either an
interruption of normal workflow and techniques (e.g., switching to an RDF editor
to create RDF content which is then linked to an HTML page (McGuiness, van
Harmelen , 2003), (Bechhofer, Ng), (Staab et al., 2002)) or a wholesale
abandonment of prior practice. While there are many tools for easing ontology
creation and knowledge acquisition, few focus on how normal Web authors work.
Most tools are geared only toward ontology development (Musen et al., 2002)
This forces the author into a two-step situation where either the author must first
create the content and then annotate, or create all of the content in a knowledge
creation context and then render it to HTML in some fashion.
SMORE (Semantic Markup, Ontology and RDF Editor) is a tool whose design is
driven by the idea that much Semantic Web based knowledge acquisition will
look more like Web page authoring than traditional knowledge engineering. It
blurs the line between normal content creation and Semantic annotation, but
SMORE also supports ad hoc ontology use, modification, combination, and
In keeping with the main design principle of seamless integration of content
creation and annotation, SMORE provides built-in support for performing routine
web-oriented tasks in the context of semantic markup. For instance, SMORE
contains a fully featured WYSIWYG text/html editor that allows users to create
and deploy web pages. Besides providing standard features for web page design,
the editor facilitates the generation of semantic markup by acting as a medium
through which the user can compose semantic triples of his data. Users can select
portions of text from the web page and insert them into triple placeholders (that
follow the standard subject-predicate-object model).
When trying to expose the information encoded in natural language to a software
agent, it seems natural to produce a translation of the information. Some of the
information is extracted from the text and encoded as RDF. The process of
creating accurate metadata from text is not terribly difficult, but this problem
becomes more acute with non-textual sources, such as photographs, in part
because the information "in" a photograph is not already encoded in a linguistic
SMORE lets users add triples to a document that describe a particular photo as a
whole. One of its interesting features also allows sub-image annotation. Using
standard drawing-like tools (squares, circles, polygons, etc.), the user delineates a
region of a photo. The user then can represent facts about that region. One crucial
fact about these regions is that the user can assert is what they depict. Subsequent
annotations can then be about the depicted object. For example, in the screen shot
below, the photo depicts Bonnie, an orangutan housed at the National Zoo. One of
Bonnie’s identifying features is a bulbous forehead, and in this markup, the
feature is mapped from the overall photo, semantically described, and noted in
connection with other info about Bonnie.
Figure 4: The SMORE interface, showing the sub-image markup feature
Case Study: http://owl.mindswap.org
In addition to the development languages and applications to support the vision of
the Semantic Web, there is a large community working to implement it. In this
section we will look at the implementation details of http://owl.mindswap.org: a
website produced entirely using semantic web technology, and serving as an
example to show how semantic markup can be used as the fundamental structure
for information on websites.
Figure 5: The http://owl.mindswap.org homepage
On a web site generated using Semantic Web technology, as with any good
dynamically generated website using traditional database methods, the average
user does not see anything different from hard-coded HTML site. This means that
users who are viewing the page do not need to even be aware of the underlying
technology, and the usability of the website is not affected.
The real human factors change arises for the web site managers. Instead of
potentially complicated software with a centralized and engineered database,
information for a Semantic Web based site can be distributed across the web, and
automatically incorporated as dynamic content. For example, in a current database
backed dynamic website, a website that presents the day’s headlines would
potentially have to collect stories and news from a variety of wire services,
convert each source of that data into the database format, and then load it into the
database before it can appear on the page. In a world using Semantic Web
technology, each wire service would maintain its news headlines as RDF or OWL
documents that would be available on the web. To display this information, the
centralized news service would only need to do a one time description of how the
ontology used for marking up the news of each wire service maps to the ontology
or formatting used for the website. Because the wire services automatically update
their news, the centralized site would merely have to retrieve the latest RDF or
OWL document from each service and use the pre-defined mappings to present
that data on a page. By allowing each source to maintain their own data, the
central site that presents that data is freed from maintaining a central database,
updating that database, and worrying about consistency between central news
service and wire services.
One may argue that a system of automated retrieval and conversion of data in the
traditional database model is quite similar to the scenario described above. An
even clearer benefit can be seen in the case where a wire service may change the
format of their news. In a system of automated conversion, a human user would
have to manually update the code that does the conversion. A change as simple as
swapping the position of article date and article author, or changing the name of a
field, say from “Author Name” to “Byline” could break a converter. Conversely,
in the Semantic Web model, the ontology dictates the structure of data. As long as
a revised ontology is based on the original version, the system will continue to
function. This gives the maintainer the freedom to update mappings leisurely,
since breakage is less likely to occur.
One final benefit, before explaining how to implement such a site, is that of the
standard format of data. Because all Semantic Web data is based on standards, it
is easy to connect information maintained by separate sources. If a wire service
connects an author to each article, and a separate service maintains information
about authors, Semantic Web technology makes it possible to automatically
connect the two because of the shared data format. Thus, not only is a hyperlinked
web of connected pages built, but a secondary web of connected data emerges.
Users could read an article form a wire service, click on the authors name to get
some biographical information about them from a second service, and then
perhaps click on their hometown to find out more about the place with
information provided by yet another service. In some cases, where the ontologies
of each service interlink themselves, this is a trivial task for the website manager.
Even in more complex cases, the burden on the manager is much lighter because
of the common format permits the ability to easily link data from distributed
Mindswap is the Maryland Information and Network Dynamics Lab Semantic
Web Agents Project, based at the University of Maryland, College Park. The
website http://owl.mindswap.org was created to showcase the tools and
technologies developed in the lab, and to become the first website generated using
Semantic Web technology exclusively.
RDF and OWL are used to store all of the local information for the site. A
collection of ontologies describes any domain relevant concepts, such as people,
news items, downloads, and paper references. The site is divided into categories,
and instance data is presented on each page.
Data exists in two places simultaneously. RDF and OWL files that contain the
data are available on the web server and available for download, allowing
interchange with other sites. RDF is also stored in a backend database, and is
manipulated and accessed via the Redland application Framework.
Redland is an object-oriented library written in C that has three major features:
• Classes to represent the core concepts of RDF, including URIs, literals,
nodes, and statements.
• A fast, standards compliant parser for RDF/XML and NTriples.
• A triple store that provides facilities for querying and modifying data.
The triple store is an abstract interface for an underlying data store, whose actual
implementation can be chosen at run time. Data can be stored using Berkeley DB
,a low level database system, Parka (Evett et al., 1994), an inferencing database
designed with triples in mind, in memory, or on disk.
Using a database of RDF as the backend for the web site raises the question “why
not just use a standard database?” The answer is that to do so would require
building an extensive hierarchy anyway, and would not be portable to other sites.
With an RDF base, stored in a more traditional database, all of the site's data can
be accessible to anyone on the web. No capabilities are taken away with this
approach, since the RDF can easily be edited or changed, and the backend
database can be updated in real time.
Adding, Editing, and Removing Data
Any authorized user can add, remove, or edit data on the site in real time by using
one of several interfaces for creating new RDF instance data. This includes a web-
based form of the RDF Instance Creator tool (see section 4.2), as well as an
interface for editing raw RDF. In the event that a user just wants to see the
backend data without changing it, each page offers a set of links that show all of
the RDF in either raw text form, or through the web based version of RIC.
The Redland framework not only mirrors RDF found on the owl.mindswap.org
site – it can also import data from other web servers. This allows querying based
on ontologies created by other organizations. Any user can submit the URIs of
their RDF and OWL data through a form on the site. That data is immediately
added to the database, and will appear on any pages that use the same semantics.
If any external pages are changed, users can request an update via the website, so
even the external data in the database is kept consistent with the files.
HTML web pages are generated from the database. For example, one of the
Mindswap ontologies defines a class called "Swapper", which is used to refer to
any members or affiliates of the lab. Subclasses of "Swapper" include "Graduate
Students", "Faculty", and "Alumni" to name a few. The "People" page on the
website queries through Redland for all subclasses of “Swapper”, and retrieves all
instances of each of those subclasses. A nested list in HTML is then used to
represent the hierarchy of types of “Swappers” and information about each
Figure 4: The People page from http://owl.mindswap.org
Because instances are interlinked, users of the site are not restricted to viewing
information from a specific category. A common example of this is finding items
created by a particular person. Any RDF instance generated for the Mindswap site
can have a “creator” property, which will be an instance of a “Swapper.” This
makes it easy to find and list all RDF entries created by a particular person.,
including news items, papers, and software.
Another example of interlinking is that each software project has a property to
express the language used. Because these languages are part of an RDF hierarchy,
with the ultimate superclass “Programming Language, it is possible with a simple
user interface to show not only projects using, say, Java, but also all projects
using an "Object Oriented Programming Language", since that is part of the
"Programming Language" hierarchy.
One of the first issues raised in this chapter was the inability of web developers to
use another party's database to drive their own site. With web sites such as this,
that is no longer the case. Since every page is generated in real time using RDF,
the site is not limited to using its own content alone. On the homepage, in-house
news items are presented first, followed by links to and descriptions of news from
the W3C and other semantic web sources. These third party news sources are
seamlessly integrated into the site, since processing them is just a matter of
reading the RDF. Similarly, any other website could use data from the
MINDSWAP RDF files that are also web available.
Web Scraping and Formatted Content
It is not uncommon to find HTML web pages that present data in a well
formatted, structured layout. Tables of data certainly fall into this category, as do
many auto-generated pages, such as eBay listings or Amazon.com products. If
data is available in this format, it can be "scraped" into a tool that can output
One of these screen scrapers (Golbeck et al., 2002) is available as part of a
package called SMORE (discussed later in this section). To scrape a page or set of
pages, the user creates a "wrapper" that describes how the HTML tags in the
document relate to the contents. In the simple HTML below, the person's name is
immediately preceded by the "<b>" tag, and proceeded by the "</b>" tag. The
email address follows similar rules using the "<i>" tag. An interesting feature of
the scraper is its ability to take information from between tags as well as from
within them. This allows users to scrape the URL’s of images or links and mark
them up. In this example code, the URL of a photo is contained within the "img"
tag, and needs to be extracted.
Mr. Doe can be emailed at <i>email@example.com</i>.
By specifying the three points above, the software can extract a simple table of
data from a page. The screen scraper also has the capability to crawl over a
number of pages. This means that even if a server generates a different page for
each person, each page can be scraped and the data can be aggregated. Once a
table of data has been collected, the user can specify how each column should be
translated into RDF. Columns may be turned into class names, instances of
existing classes, or attached to an instance as values for pre-defined properties.
The situation is similar for spreadsheets and simple databases. Since these types
of files are often highly structured in a straightforward way, the step of "scraping"
is unnecessary, and direct conversion to RDF is fairly simple.
Mindswap has two tools, ConvertToRDF (Golbeck et al., 2002) and
ExcelToRDF, which allow users to turn comma delimited data files and Microsoft
Excel spreadsheets into RDF data. In both cases, users specify what Class to
instantiate for each row of data, and which properties of that class correspond to
each column. Consider the following simple database:
Name,Height,Weight,Eye Color,Hair Color
ConvertToRDF uses a simple text file stating that each row corresponds to a
Person (as defined in an existing ontology), and that each column corresponds to a
particular, pre-defined property of Person. With these converter tools, it is trivial
to produce thousands of RDF triples in minutes. Depending on the detail within a
given database, this may result in fairly rich data models with minimal effort from
 Golbeck, J., Grove,M., Parsia, B., Kalyanpur, A., Hendler, J. (2002). New
Tools for the Semantic Web in Proceedings of 13th International
Conference on Knowledge Engineering and Knowledge Management
EKAW02 Siguenza, Spain.
 NCI Center for Bioinformatics caCORE :
 McGuinness, D., van Harmelen, F. (2003) Web Ontology Language
(OWL): Overview. http://www.w3.org/TR/owl-features/
 Heflin, J. (2003). Web Ontology Language (OWL) Use Cases and
 Payne, T., Singh, R., & Sycara, K. (2002). Calendar Agents on the
Semantic Web. IEEE Intelligent Systems, Vol. 17 (3), 84-86.
 Sirin, E., Hendler, J., Parsia, B. (2002). Semi-automatic Composition of
Web Services using Semantic Descriptions. Web Services: Modeling,
Architecture and Infrastructure Workshop at ICEIS2003.
 Golbeck, J., Parsia, B., Hendler, J. (2003). Trust Networks on the
Semantic Web” Proceedings of Cooperative Intelligent Agents 2003.
 van Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D., Patel-
Schneider, P., Stein, L. (2003). Web Ontology Language (OWL)
Reference Version 1.0 W3C Working Draft 21 February 2003.
 Evett, M., Hendler, J., & Spector, L. (1994). Parallel Knowledge
Representation on the Connection Machine, Journal of Parallel and
Distributed Computing, 22, 168-184.
 Musen, M., Fergerson, R., Grosso, W., Noy, N., Crubezy, M., & Gennari,
J. (2002). Component-Based Support for Building Knowledge-Acquisition
Systems. Conference on Intelligent Information Processing (IIP 2000) of
the International Federation for Information Processing World Computer
Congress (WCC 2000). Beijing.
 Bechhofer, G. & Ng, G. OilED. http://img.cs.man.ac.uk/oil/.
 Staab, S., Sure, Y., Erdmann, M., Wenke, D., Angele, J., Studer, R.
(2002). OntoEdit: Collaborative Ontology Development for the Semantic
Web. Proceedings of the first International Semantic Web Conference
2002 (ISWC 2002). Sardinia, Italia.
 Jan Winkler. RDFedt. http://www.jan-winkler.de/dev/e_rdfe.htm.
 The DAML+OIL Language. http://www.daml.org/2001/03/daml+oil-
 SHOE (Simple HTML Ontology Extensions).
 The OIL Language. http://www.ontoknowledge.org/oil/.
 Extensible Markup Language (XML). http://www.w3.org/XML/.
 Resource Description Framework (RDF). http://www.w3.org/RDF/.
 RDF Schema. http://www.w3.org/TR/rdf-schema/.
 Ontobroker. http://ontobroker.aifb.uni-karlsruhe.de/index_ob.html.
 McGuinness , D., van Harmelen , F. OWL Web Ontology Language.
 Mutton, P. & Golbeck, J. (2003). Visualizing Ontologies and Metadata on
the Semantic Web. Proceedings of Information Visualization 2003.