Semantic web by dsouzaankit

VIEWS: 12 PAGES: 31

More Info
									                                           CHAPTER 1

    1.1 INTRODUCTION

      Increasingly, the World Wide Web (WWW) is used to support and facilitate the delivery of
teaching and learning materials [Barker]. This use has progressed from the augmentation of
conventional courses through web-based training and distance learning to a newer form of
WWW-based education, e-learning . E-learning is not just concerned with providing easy access
to learning resources, anytime, anywhere, via a repository of learning resources, but is also
concerned with supporting such features as the personal definition of learning goals, and the
synchronous and asynchronous communication, and collaboration, between learners and between
learners and instructors .


      Researchers have proposed that, in an e-learning environment, the educational content
should be oriented around small modules (or learning objects) coupled with associated semantics
(or metadata) to be able to find what one wants, and that these modules be related by a
“dependency network” or “conceptual web” to allow individualised instruction. Such a
dependency network allows, for example, the learning objects to be presented to the student in an
orderly manner, with prerequisite material being presented first. Additionally, in an e-learning
environment, students must be able to add extra material and links (i.e. annotate) to the learning
objects for their own benefit or for that of later students.


      The Semantic Web is a collaborative movement led by the World Wide Web Consortium
(W3C) that promotes common formats for data on the World Wide Web. By encouraging the
inclusion of semantic content in web pages, the Semantic Web aims at converting the current
web of unstructured documents into a "web of data". It builds on the W3C's Resource
Description Framework (RDF).


      According to the W3C, "The Semantic Web provides a common framework that allows
data to be shared and reused across application, enterprise, and community boundaries."




1|Page
     The term was coined by Tim Berners-Lee, the inventor of the World Wide Web and
director of the World Wide Web Consortium ("W3C"), which oversees the development of
proposed Semantic Web standards. He defines the Semantic Web as "a web of data that can be
processed directly and indirectly by machines."


     While its critics have questioned its feasibility, proponents argue that applications in
industry, biology and human sciences research have already proven the validity of the original
concept.

           The main purpose of the Semantic Web is driving the evolution of the current Web by
enabling users to find, share, and combine information more easily. Humans are capable of using
the Web to carry out tasks such as finding the Irish word for "folder", reserving a library book,
and searching for the lowest price for a DVD. However, machines cannot accomplish all of these
tasks without human direction, because web pages are designed to be read by people, not
machines. The semantic web is a vision of information that can be readily interpreted by
machines, so machines can perform more of the tedious work involved in finding, combining,
and acting upon information on the web.

           The Semantic Web, as originally envisioned, is a system that enables machines to
"understand" and respond to complex human requests based on their meaning. Such an
"understanding" requires that the relevant information sources is semantically structured, a
challenging task.

Tim Berners-Lee originally expressed the vision of the Semantic Web as follows:

I have a dream for the Web [in which computers] become capable of analyzing all the data on
the Web – the content, links, and transactions between people and computers. A ‘Semantic
Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day
mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to
machines. The ‘intelligent agents’ people have touted for ages will finally materialize.

The Semantic Web is regarded as an integrator across different content, information applications
and systems. It has applications in publishing, blogging, and many other areas.

2|Page
Often the terms "semantics", "metadata", "ontology’s" and "Semantic Web" are used
inconsistently. In particular, these terms are used as everyday terminology by researchers and
practitioners, spanning a vast landscape of different fields, technologies, concepts and
application areas. Furthermore, there is confusion with regard to the current status of the
enabling technologies envisioned to realize the Semantic Web. In a paper presented by Gerber,
Barnard and Van der Merwe the Semantic Web landscape is charted and a brief summary of
related terms and enabling technologies is presented. The architectural model proposed by Tim
Berners-Lee is used as basis to present a status model that reflects current and emerging
technologies.

   1.2 PROBLEM STATEMENT:

       Today’s life demands more of interaction among different people to succeed in work. No
work can be done successfully in isolation, Similarly the interaction between the end user and the
contents on the web is of high demand. Our seminar topic; an online quiz an application of
semantic web also tries to establish a connection with the end user. In these first the contents are
actually extracted out and using an API we obtain an RDF file. The file is further queried using
SPARQL to identify subject, predicate and object. The user will be given a sentence and asked to
identify either subject or predicate from the sentence. Based on his/her answer marks will be
evaluated and will be displayed at the end.

   1.3 PROBLEM DESCRIPTION:

          Our main aim is to develop an online vocabulary quiz using semantic web, where the
student will be asked to identify subject, predicate or object of the incomplete sentence displayed
to them and according to that marks will be evaluated to him/her.

         For identifying the subject, predicate and object from the given input file we have made
use of Open-calais API. This API is used to convert the given input file to RDF format and on
that RDF file we will fire query using SPARQL to find the different parts of sentence viz.,
subject, predicate and object. These parts of the sentence is then stored in the database to
formulate the question. The question would be in the form of fill-in-the blanks. Here the user will
be given the option to choose one of the four options to fill the blank so as to find the proper
combination of subject-object-predicate.
3|Page
       In this case the input file is first uploaded on to the API named Open-Calais. This API
makes use of certain OWL ontology and Natural processing language to produce the RDF file.
This file is then downloaded onto the hard disk which is further used in SPARQL to extract the
required field from it. Now instead of uploading the text file as the input to this API we can also
give the link of some XML website for the same purpose.




4|Page
                                        CHAPTER 2

2.1      LITERATURE REVIEW

Following is a brief description of this new set of technologies (which evolve around Web
services), in the chronological order of development.

      2.1.1    Semantic Web

First introduced by Tim Berners-Lee, it is about a new form of web content that would be
meaningful to computers and thus allow computers to infer meaningful relationships between
data. Semantic Web technologies are supposed to be a solution to the problem of allowing the
data in the internet available to a much broader range of consumers (either human or machines)
preferably through automated agents. Semantic web is not a new web but is an extension to the
current one, in which information is given a well-defined meaning and thus enabling machines to
process and “understand” the data that they merely display at present.

Semantic web is all about metadata and not about AI, but it can help in AI activities.

Semantic Web introduces the notion of Agents, which are standalone applications that will
collect and process semantic data to help end users in their decisions. For example, an agent may
help a user to find the least cost route from Colombo to New York, by collecting relevant
information from the airliners.




5|Page
                               Fig-2.1: Layers in the semantic web

As seen in Fig-2.1 the Semantic Web principles are implemented in layers of Web technologies
and standards. The Unicode and URI layers make sure the use of international characters sets and
provide the method of identifying objects in Semantic Web. The XML layer with namespace and
schema definitions assures the integration of Semantic Web definitions with the other XML
based standards. With RDF and RDFS, it is possible to make statements about objects with URIs
and define vocabularies that can be referred to by URIs. This is the layer where we can give
types to resources and links (properties). The Ontology layer supports the evolution of
vocabularies as it can define relations between the different concepts. With the Digital Signature
layer, agents or end users can detect any alterations to documents.

The top three layers (Logic, Proof and Trust), are currently being researched under W3C and
simple application demonstrations are being constructed. The Logic layer enables the writing of
rules while the Proof layer executes these rules. Trust layer provides a mechanism to determine
whether to trust a given proof or not. Proof and Trust are very important concepts in Semantic
Web since if one person says that X is blue and another says that X is not blue, we need a way to
determine which is true.

W3C Semantic Web Working groups have established and standardized on the lower
technological layers and it is already possible to implement concrete applications based on this

6|Page
work. However, at the higher technological layers more research are still on progress. In addition
to the technologies, work is needed to develop tools and easy user interfaces that support users in
understanding the metadata and adding the metadata into the Web. When more and richer
metadata appears, there will be huge amounts of opportunities for various applications.

   2.1.2       OPEN CALAIS WEB SERVICE

           The Calais initiative seeks to help make all the world's content more accessible,
interoperable and valuable via the automated generation of rich semantic metadata, the
incorporation of user-defined metadata, the transportation of those metadata resources
throughout the content ecosystem, and the extension of its capabilities by user-contributed
components.

    COMPONENTS:

        The Calais Web Service is the core and provides for the automated generation of rich
semantic metadata in RDF format.

A series of sample applications demonstrate how the Web Service can be utilized and serve as a
starting point for other development activities.

Active support is provided to developers who want to incorporate Calais capabilities in their
applications and web sites.

The Calais initiative is sponsored by Thomson Reuters and built on ClearForest technology.

    FUNCTIONS:

        From a user perspective it’s pretty simple: You hand the Web Service unstructured text
(like news articles, blog postings, your term paper, etc.) and it returns semantic metadata in RDF
format. What’s happening in the background is a little more complicated.

Using natural language processing and machine learning techniques, the Calais Web Service
examines your text and locates the entities (people, places, products, etc.), facts (John Doe works
for Acme Corporation) and events (Jane Doe was appointed as a Board member of Acme



7|Page
Corporation). Calais then processes the entities, facts and events extracted from the text and
returns them to the caller in RDF format.

   2.1.3        REST

REST stands for Representational State Transfer. (It is sometimes spelled "ReST".) It relies on a
stateless, client-server, cacheable communications protocol -- and in virtually all cases, the
HTTP protocol is used.

REST is an architecture style for designing networked applications. The idea is that, rather than
using complex mechanisms such as CORBA, RPC or SOAP to connect between machines,
simple HTTP is used to make calls between machines.

In many ways, the World Wide Web itself, based on HTTP, can be viewed as a REST-based
architecture.

RESTful applications use HTTP requests to post data (create and/or update), read data (e.g.,
make    queries),   and   delete   data.    Thus,   REST   uses   HTTP   for   all   four   CRUD
(Create/Read/Update/Delete) operations.

REST is a lightweight alternative to mechanisms like RPC (Remote Procedure Calls) and Web
Services (SOAP, WSDL, et al.). Later, we will see how much more simple REST is.

Despite being simple, REST is fully-featured; there's basically nothing you can do in Web
Services that can't be done with a RESTful architecture.

   2.1.4        RDF

RDF was developed by the W3C as part of its semantic web effort. It was started as an extension
of the PICS content description technology, and is a W3C recommendation since 1999.

RDF is used as the foundation for representing metadata about Web resources such as the title,
author, and modification date of a Web page, copyright and licensing information about a Web
document. It provides syntax for expressing simple statements about resources, where each
statement consists of a subject, a predicate, and an object (triples). In RDF resources are
identified by URIs and thus can be used to describe web based resources effectively.


8|Page
      RDF VOCABULARY:

The vocabulary defined by the RDF specification is as follows:

            Classes
                 rdf
                       o rdf:Resource - the class resource, everything
                       o rdf:XMLLiteral - the class of XML literal values
                       o rdf:Property - the class of properties
                       o rdf:Statement - the class of RDF statements
                       o rdf:Alt, rdf:Bag, rdf:Seq - containers of alternatives, unordered
                           containers, and ordered containers (rdfs:Container is a super-class of the
                           three)
                       o rdf:List - the class of RDF Lists
                       o rdf:nil - an instance of rdf:List representing the empty list

                 rdfs

                o rdfs:Literal - the class of literal values, e.g. strings and integers
                o rdfs:Class - the class of classes
                o rdfs:Datatype - the class of RDF datatypes
                o rdfs:Container - the class of RDF containers
                o rdfs:ContainerMembershipProperty - the class of container membership
                       properties, rdf:_1, rdf:_2, ..., all of which are sub-properties of rdfs:member


            Properties

                 rdf

                  o rdf:type - an instance of rdf:Property used to state that a resource is an
                         instance of a class
                  o rdf:first - the first item in the subject RDF list
                  o rdf:rest - the rest of the subject RDF list after rdf:first
                  o rdf:value - idiomatic property used for structured values


9|Page
                      o rdf:subject - the subject of the subject RDF statement
                      o rdf:predicate - the predicate of the subject RDF statement
                      o rdf:object - the object of the subject RDF statement
                      o rdf:Statement, rdf:subject, rdf:predicate, rdf:object are used for reification
                          (see below).

                   rdfs

                      o rdfs:subClassOf - the subject is a subclass of a class
                      o rdfs:subPropertyOf - the subject is a subproperty of a property
                      o rdfs:domain - a domain of the subject property
                      o rdfs:range - a range of the subject property
                      o rdfs:label - a human-readable name for the subject
                      o rdfs:comment - a description of the subject resource
                      o rdfs:member - a member of the subject resource
                      o rdfs:seeAlso - further information about the subject resource
                      o rdfs:isDefinedBy - the definition of the subject resource

This vocabulary is used as a foundation for RDF Schema where it is extended.

                Serialization formats

                  RDF/XML serialization XML.svg

                  Filename extension     .rdf

                  Internet media type    application/rdf+xml[10]

                  Developed by World Wide Web Consortium

Two common serialization formats are in use.

                The first is an XML format. This format is often called simply RDF because it was
introduced among the other W3C specifications defining RDF. However, it is important to
distinguish the XML format from the abstract RDF model itself. Its MIME media type,



10 | P a g e
application/rdf+xml, was registered by RFC 3870. It recommends RDF documents to follow the
new 2004 specifications.

               In addition to serializing RDF as XML, the W3C introduced Notation 3 (or N3) as a
non-XML serialization of RDF models designed to be easier to write by hand, and in some cases
easier to follow. Because it is based on a tabular notation, it makes the underlying triples
encoded in the documents more easily recognizable compared to the XML serialization. N3 is
closely related to the Turtle and N-Triples formats.

Triples may be stored in a triple store.
        An example of a RDF statement is:
<http://www.example.org/index.html> <http://www.example.org/terms/creation-date> "August
16, 1999"
   which states that the index.html web page’s creation date is August 16, 1999.
        With the advent of XML, for writing down and exchanging RDF documents an XML
based syntax was introduced. It was called RDF/XML and currently this specification is the most
popular way of writing RDF documents. The above RDF triple can be written down in
RDF/XML in the following way:
<rdf:Description rdf:about="http://www.example.org/index.html">
<exterms:creation-date>August 16, 1999</exterms:creation-date>
</rdf:Description>
One limitation of RDF is that it does not have any facilities for defining structures or hierarchies.
RDFS was thus introduced as an extension to RDF, to complement RDF with a type system. It
provides the facilities needed to specify classes and properties (defined as a directed binary
relation) in RDF. In RDFS, a set of new terms has been introduced to define classes, subclasses
and properties applicable to them.

        For example the following two statements defines a new class called ex:MotorVehicle
and a sub class of it called ex:Van.
ex:MotorVehicle rdf:type rdfs:Class .
ex:Van rdfs:subClassOf ex:MotorVehicle .


In addition to that, custom properties can also be defined

11 | P a g e
ex:author rdf:type     rdf:Property .
ex:author rdfs:range ex:Person .
ex:author rdfs:domain ex:Book .
In the above example a property called ex:author has been defined to represent the relationship
between a Person and a Book.
By using these new terms, RDFS can represent much richer metadata about any web resources
than RDF. For this reason the new Semantic Web languages (e.g. OWL) proposed by W3C are
using RDFS as a base and extend it. Even before the semantic web was introduced, there had
been lot of applications of RDFS. One particularly good example is the “Dublin Core Metadata
Initiative”. The Dublin Core metadata standard defines a simple and effective RDFS element set
for describing a wide range of networked resources. The Dublin Core elements are very much
popular in the semantic web and are frequently used to describe semantic web resources such as
semantic web services.

    2.1.5      OWL

The Semantic Web needs a support of ontologies, which is defined as explicit specification of a
conceptualization . Ontologies define the concepts and relationships used to describe and
represent an area of knowledge. To describe such Ontologies we need a language that has
capabilities to specify at least the following:

       Terminology used in a specific context (domain description)
       constraints on properties
       the logical characteristics of properties
       the equivalence of terms across ontologies

OWL has been designed and introduced by W3C to fulfill the above requirements. OWL
(Ontology Web Language) is a layer on top of RDFS, which defines an additional set of terms to
describe the relationships between the resources in a much richer fashion.

DAML being the predecessor of OWL was a result of a DARPA project. It was later integrated
with OIL, a European Union project and submitted to W3C for recommendation. After lots of
coordination with the current RDF work, it was released as a W3C recommendation in early
2004 in the name OWL . However, unlike in RDF where triples can be written down in any
12 | P a g e
format, in OWL XML format is used. Thus, the XML format used to write down OWL
statements are part of the OWL standard itself.

       Sub Languages of OWL

The OWL Language should be a compromise between rich semantics for meaningful
applications (expressive power) and feasibility/implementability. To accommodate these
colliding requirements three different sublanguages of OWL has been introduced.

OWL Lite – has the least expressive power and meant to users whose main requirement is a
classification hierarchy.

OWL DL - supports those users who want the maximum expressiveness without losing
computational completeness (all entailments are guaranteed to be computed) and decidability (all
computations will finish in finite time) of reasoning systems.

OWL Full - is meant for users who want maximum expressiveness and the syntactic freedom of
RDF with no computational guarantees.

       OWL Classes and Properties

Many uses of an ontology will depend on the ability to reason about individuals. In order to do
this in a useful fashion we need to have a mechanism to describe the classes that individuals
belong to and the properties that they inherit due to class membership. OWL extends the notion
of classes defined in RDFS, with its own owl:Class construct:

<owl:Class rdf:ID="Region"/>
<Region rdf:ID="CentralCoastRegion" />
OWL also extends the notion RDFS property by introducing two different types of properties
Data-type properties: relations between instances of classes and RDF literals or XML Schema
data types.
Object properties: relations between instances of two classes.
OWL also introduces a set of property characteristics, which results in considerable inference
capabilities.



13 | P a g e
TransitiveProperty - P(x,y) and P(y,z) implies P(x,z)
SymmetricProperty - P(x,y) iff P(y,x)
FunctionalProperty - P(x,y) and P(x,z) implies y = z
inverseOf - P1(x,y) iff P2(y,x)
InverseFunctionalProperty - P(y,x) and P(z,x) implies y = z
For example the locatedIn property below is defined as a Transitive Property with domain
owl:Thing (every class defined in OWL is a subclass of owl:Thing).
<owl:ObjectProperty rdf:ID="locatedIn">
 <rdf:type rdf:resource="&owl;TransitiveProperty" />
 <rdfs:domain rdf:resource="&owl;Thing" />
 <rdfs:range rdf:resource="#Region" />
</owl:ObjectProperty>
An OWL inference engine thus can infer that any thing, including another region can be located
inside a region, and that if given Sri Lanka is located in Asia and Colombo located in Sri Lanka
then Colombo is located in Asia.

       Constraints on Properties

OWL provides a set of constructs that allows to further constraint the range of a property in a
specific context. For example, if an instance is to become a member of a subclass based on the
value of a property, it can be defined using these constructs. Some of common constraints of this
nature are given below:

allValuesFrom: requires that for every instance of the class that has instances of the specified
property, the values of the property are all members of the class indicated by this clause

someValuesFrom: similar to above but requires only one property instance to have a value from
the class indicated by this clause

minCardinality, maxCardinality and cardinality: used to restrict the number of occurrences a
property can have inside a class

hasValue: defines the membership of an instance to a particular class based on a value of one of
its properties


14 | P a g e
       Equivalence of Terms across Ontologies

In order for the semantic web to have the maximum impact, ontologies need to be widely shared.
In order to minimize the intellectual effort, ontologies need to be re-used. To do this number of
other constructs have been introduced, that would define equivalence or difference between
OWL individuals.

equivalentClass, equivalentProperty, sameAs – Defines the evivalence of classes, properties and
instances respectively.

differentFrom, AllDifferent – provides the opposite effect of sameAs

In addition to the constructs given above, OWL-DL and OWL-Full have some other constructs
that helps in defining complex classes.

    2.1.6        SPARQL

            SPARQL (pronounced "sparkle", a recursive acronym for SPARQL Protocol and RDF
Query Language) is an RDF query language, that is, a query language for databases, able to
retrieve and manipulate data stored in Resource Description Framework format. It was made a
standard by the RDF Data Access Working Group (DAWG) of the World Wide Web
Consortium, and considered as one of the key technologies of semantic web. On 15 January
2008, SPARQL 1.0 became an official W3C Recommendation.

            SPARQL allows for a query to consist of triple patterns, conjunctions, disjunctions, and
optional patterns.

            Implementations for multiple programming languages exist."SPARQL will make a huge
difference" according to Sir Tim Berners-Lee in a May 2006 interview.

         There exist tools that allow one to connect and semi-automatically construct a SPARQL
query for a SPARQL endpoint, for example ViziQuer.In addition, there exist tools that translate
SPARQL queries to other query languages, for example to SQL and to XQuery.

        SPARQL allows users to write unambiguous queries. For example, the following query
returns names and emails of every person in the dataset:


15 | P a g e
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?email
WHERE {
?person a foaf:Person.
?person foaf:name ?name.
?person foaf:mbox ?email. }
     This query can be distributed to multiple SPARQL endpoints (services that accept SPARQL
queries and return results), computed, and results gathered, a procedure known as federated
query.

QUERY FORMS:
               The SPARQL language specifies four different query variations for different
purposes.
SELECT query
            Used to extract raw values from a SPARQL endpoint, the results are returned in a
table format.
CONSTRUCT query
            Used to extract information from the SPARQL endpoint and transform the results into
valid RDF.
ASK query
          Used to provide a simple True/False result for a query on a SPARQL endpoint.
DESCRIBE query
         Used to extract an RDF graph from the SPARQL endpoint, the contents of which is left
to the endpoint to decide based on what the maintainer deems as useful information.
Each of these query forms takes a WHERE block to restrict the query although in the case of the
DESCRIBE query the WHERE is optional.
Example

      Another SPARQL query example that models the question "What are all the country
capitals in Africa?":

PREFIX abc: <http://example.com/exampleOntology#>
SELECT ?capital ?country

16 | P a g e
WHERE {
    ?x abc:cityname ?capital ;
     abc:isCapitalOf ?y .
    ?y abc:countryname ?country ;
     abc:isInContinent abc:Africa .
}
      Variables are indicated by a "?" or "$" prefix. Bindings for ?capital and the ?country will be
returned.

       The SPARQL query processor will search for sets of triples that match these four triple
patterns, binding the variables in the query to the corresponding parts of each triple. Important to
note here is the "property orientation" (class matches can be conducted solely through class-
attributes or properties - see Duck typing)

       To make queries concise, SPARQL allows the definition of prefixes and base URIs in a
fashion       similar   to   Turtle.   In     this   query,   the   prefix    "abc"    stands    for
http://example.com/exampleOntology#.

      2.1.7       Jena API

         Executing SPARQL queries with the Jena API

While the command-line sparql tool is useful for running standalone queries, Java applications
can call on Jena's SPARQL capabilities directly. SPARQL queries are created and executed with
Jena via classes in the com.hp.hpl.jena.query package. Using QueryFactory is the simplest
approach. QueryFactory has various create() methods to read a textual query from a file or from
a String. These create() methods return a Query object, which encapsulates a parsed query.

The next step is to create an instance of QueryExecution, a class that represents a single
execution of a query. To obtain aQueryExecution, call QueryExecutionFactory.create(query,
model), passing in the Query to execute and the Model to run it against. Because the data for the
query is provided programmatically, the query does not need a FROM clause.

For a simple SELECT query, call execSelect(), which returns a ResultSet. The ResultSet allows
you to iterate over each QuerySolution returned by the query, providing access to each bound

17 | P a g e
variable's value. Alternatively, ResultSetFormatter can be used to output query results in various
formats.

Below Listing shows a simple way to put these steps together. It executes a query against
bloggers.rdf and outputs the results to the console.

Executing a simple query using Jena's API

// Open the bloggers RDF graph from the filesystem
InputStream in = new FileInputStream(new File("bloggers.rdf"));
// Create an empty in-memory model and populate it from the graph
Model model = ModelFactory.createMemModelMaker().createModel();
model.read(in,null); // null base URI, since model URIs are absolute
in.close();
// Create a new query
String queryString =
        "PREFIX foaf: <http://xmlns.com/foaf/0.1/> " +
        "SELECT ?url " +
        "WHERE {" +
        "      ?contributor foaf:name \"Jon Foobar\" . " +
        "      ?contributor foaf:weblog ?url . " +
        "      }";
Query query = QueryFactory.create(queryString);
// Execute the query and obtain results
QueryExecution qe = QueryExecutionFactory.create(query, model);
ResultSet results = qe.execSelect();
// Output query results
ResultSetFormatter.out(System.out, results, query);
// Important - free up resources used running the query
qe.close();




18 | P a g e
2.2     CURRENT STATUS:

At present, there is no automation in various test maker software as far as preparing the question
bank is concerned. The examiner is supposed to prepare all the questions manually by going
through the appropriate knowledge resource and then he/she has to set the appropriate questions
for the test.

2.3     RELEVANCE OF OUR PROBLEM STATEMENT IN THE LINE OF
    RESEARCH :

Hence, we intend to harness the power of the semantic web, natural language processing and
machine learning techniques to automate the knowledge gathering process to some extent. We
have used the open Calais web service for achieving our above goal in an easy and convenient
manner. Open Calais uses Semantic Linked Open Data concept to identify entities, facts, events,
etc. Our software is able to gather information directly from raw text sources like text files, web
pages and xml data via the open Calais output RDF file. The information thus extracted by
querying the output RDF is stored in an RDBMS for future use.

Thus, the semantic web service is a core part of our solution to making an automated test-maker.




19 | P a g e
                                         CHAPTER 3

    3.1         METHODOLOGY EMPLOYED TO SOLVE THE PROBLEM:

        Our main aim is to develop an online vocabulary quiz using semantic web, where the
student will be asked to identify subject, predicate or object of the incomplete sentence displayed
to them and according to that marks will be evaluated to him/her.

          Part 1: For identifying the subject, predicate and object from the given input file we
have made use of Open Calais API. This API is used to convert the given input file to RDF
format and on that RDF file we will fire the query using SPARQL to find the different parts of
sentence viz., subject, predicate and object. Initially we are uploading the file to this API. This
API returns the RDF file for this given input file. The RDF file is in the form of XML file. Now
this RDF file is queried using SPARQL to get the desired parts of sentence such as subject-
predicate-object.

          Part 2: Once the parts of each sentence are obtained they are stored in the database
using JDBC connectivity. When the student starts the test our agent displays any random
sentence as the question with four options for it. The student is asked to identify the missing part
of sentence. Based on the response obtained his/her score is evaluated and displayed at the end of
test.

    3.2         SOFTWARE TOOLS REQUIRED:

The software tools employed in this project includes:

    1. NetBeans: This mainly acts as the IDE used to design the java program.


    2. Jena: Jena is an open source Semantic Web framework for Java. It provides an API to
        extract data from and write to RDF graphs. The graphs are represented as an abstract
        "model". A model can be sourced with data from files, databases, URLs or a combination
        of these. A Model can also be queried through SPARQL and updated through SPARQL.




20 | P a g e
    3. Open Calais: This is the web service or an API which is used in our project. The Calais
        Web Service is the core and provides for the automated generation of rich semantic
        metadata in RDF format.




    3.3          HARDWARE REQUIREMENTS:

The online vocabulary quiz developed as part of this project will have the following hardware
requirements:

                                         CLIENT SIDE
PROCESSOR                         RAM                              HARD DISK SPACE
At least 500 MHz                  At least 128 MB                  Less than 20 MB
                                    A good Internet connection



                                         SERVER SIDE
PROCESSOR                         RAM                              HARD DISK SPACE
At least 1 GHz                    At least 256 MB                  Less than 20 MB
                                  High Speed Internet connection




21 | P a g e
                                         CHAPTER 4

    4.1        BLOCK DIAGRAM:




                            Figure 4.1 Block diagram of the entire system

        Here, first the Student/end user will login to give the test. Now the questions for the test
are not kept ready before hand. Instead they are prepared dynamically. The request from the
student is sent to the Question maker system. This system makes use of the concepts of semantic
web to prepare the questions. It’ll send the raw text to the API for processing. Or it can also give
extract the data from a particular website using RSS feed and then forward this data to the API.
The API makes use of certain ontology and then prepares the RDF-XML file for the accepted
input. Now the Question maker system uses this RDF file to obtain subject-predicate-object of
each sentence. This is done using the query language SPARQL which was discussed earlier.
Once the parts of each sentence are obtained they are stored in the database and then the
questions are given randomly to the students.

22 | P a g e
To understand it better and in technical terms we have the following UML diagrams:

    4.2        USE CASE MODEL:

The following diagram shows the use case model for our project:




                    Figure 4.2 Use case diagram for Online vocabulary test


This shows the functions provided by system to the student. When student requests to start the
test all those actions specified using the include stereotype are performed thus preparing the
questions dynamically.




23 | P a g e
    4.3        SEQUENCE DIAGRAM:
The following diagram show the sequence diagram of activities involved when a student starts
the test. It shows the proper sequence and the flow of actions performed by our system. This
diagram needs no explanation as it is self-explanatory.




                     Figure 4.3 Sequence diagram for online vocabulary test




24 | P a g e
                                         CHAPTER 5

    5.1        RESULT ANALYSIS:

        We tested the vocabulary test on different types of input files on different systems. It was
observed that the result we obtained depends upon two major factors namely:
       The internet connectivity speed (I).
       The processing speed of the system (P).


As the speed of internet connectivity improves the time taken by our intelligent agent to retrieve
the RDF file for the given input file decreases and thereby improving the performance of our
system. Another important factor is the processing speed of the system in which this agent is run.
This is also important because once the RDF file is received it takes some time to store the
subject, predicate and object in the database. Hence faster the system lesser would be the taken
by our intelligent agent to complete the task.


        Thus we come to a final statement that the result of our system (R) is directly
proportional to internet connectivity speed (I) and processing speed of system (P).


                       R α I ......(1)
                       R α P ......(2)




25 | P a g e
    5.2        SCREENSHOTS :




    Figure 5.1 Screenshot of UI accepting input and output file path and generating RDF file




                        Figure 5.2 Screenshot after generating RDF file
26 | P a g e
               Figure 5.3 Screenshot of Vocabulary test GUI




                  Figure 5. 4 Screenshot of the question




27 | P a g e
                          Figure 5.5 Final screenshot displaying the score


    5.3         BENEFITS:

       Helps the students to test their vocabulary.
       Instructor need not have to set the paper before-hand.
       Easy to use.
       User friendly GUI.
       Questions are dynamically generated before the test begins.


    5.4         LIMITATIONS:

    1. Input has to be restricted to short and simple statement structure. This ensures descent
        parsing of open Calais to get more proper output. Complex sentences may be ignored.
    2. RDF file by open Calais may not always contain all of the three parameters- subject, verb
        and predicate of each sentence. Sometimes, the sentence extraction may result in tuples
        with missing parameters.
    3. Until now, we are not taking dynamic input by live parsing of set of web pages but we
        are only taking text files as input.
    4. Still the instructor is required to at least fill the appropriate extra options for each
        question.

28 | P a g e
    5. Forming WH-type questions is not possible at present. So we have restricted ourselves to
        a vocabulary test.
    6. There are no difficulty levels in the test with a set of questions having a common
        difficulty level within each set. Classification of questions as per difficulty is not
        automated.
    7. Internet connection is required at any cost as the core part of the implementation is a web
        service.




29 | P a g e
                                         CHAPTER 6

    6.1 CONCLUSION AND FUTURE SCOPE:


        This project was concerned with using semantic web related technologies to dynamically
select and compose web services to satisfy client requests and finally execute these web services
to return the result to the client. Even if semantic web is out of the research labs, semantic web
services are not. We used the W3C recommended semantic web services language called OWL-
S to semantically annotate our web services. OWL-S specification is still under the R&D stages
and going through rigorous changes as new features are added and existing features are altered.
Even if OWL-S specification as it currently stands is very rich in describing the web services,
reliable third party library support for manipulating these semantic descriptions are not available
as of yet. Another consideration is that using these rich descriptions in general contexts such as
public web is still not very practical as many web-based agents are still not capable to reading
these specifications. The broker we have implemented in this project is concerned with some
core features of the OWL-S specification and is suitable for a specific context or a domain,
where the available core OWL-S features can be used in an intuitive manner to obtain
predictable results (a mandatory requirement for any enterprise scale application). In that sense,
this project works as an ambitious effort to bring the newest developments in the academia
through to the industry.
Another important fact is that when someone in the industry wants to do something related to
semantics (for example in Enterprise Application Integration) they always go for their own
proprietary/custom formats rather than going with available standard formats, due to the
complexity and generality of these formats and the performance considerations of implementing
these full standards. We think this project keeps an important step towards identifying a useable
set of core features to be used in industry oriented, semantic web service related applications.




30 | P a g e
    6.2 REFERENCES:


    1. F. Zablith, M. Fernandez & M. Rowe, The OU Linked Open Data: production and
        consumption. In Proceedings of Linked Learning 2011: 1st International Workshop on
        eLearning Approaches for the Linked Data Age, Proceeding 8th Ex-tended Semantic
        Web Conference (ESWC2011), CEUR 717, Heraklion, Greece, (May 2011). Available
        at: http://ceur-ws.org/Vol-717/paper1.pdf


    2. E. Kaldoudi, N. Dovrolis, D. Giordano & S. Dietze, Educational Resources as Social
        Objects in Semantic Social Net-works. In Proceedings of Linked Learning 2011: 1st
        International Workshop on eLearning Approaches for the Linked Data Age at the 8th
        Extended Semantic Web Conference (ESWC2011), Heraklion, Greece (2011),
        Available at: http://ceur-ws.org/Vol-717/paper11.pdf


    3. IEEE paper on “Personal Learning Environments on the Social Semantic Web” by
        Carsten Kebler.


    4. Semantic web for dummies by Jeffery T. Pollock.


    5. Semantic web programming by Wiley publications.


    6. Open Calais API documentation at
        http://www.opencalais.com/documentation/opencalais-documentation.


    7. Online video lecture conducted by Emanuele Della Valle on realization of semantic web
        application.
        http://videolectures.net/iswc08_dellaValle_rswa




31 | P a g e

								
To top