Docstoc

A Semi-Automatic Semantic Annotation and Authoring Tool for a

Document Sample
A Semi-Automatic Semantic Annotation and Authoring Tool for a Powered By Docstoc
					     A Semi-Automatic Semantic Annotation and Authoring
             Tool for a Library Help Desk Service

                            ¨
                Antti Vehvilainen                               ¨
                                                        Eero Hyvonen                            Olli Alm
               Helsinki University of                Helsinki University of            University of Helsinki and
                Technology (TKK)                      Technology (TKK)                  Helsinki University of
               Laboratory of Media                   Laboratory of Media                  Technology (TKK)
                     Technology                         Technology and                  Semantic Computing
               Semantic Computing                    University of Helsinki                Research Group
                  Research Group                     Semantic Computing                 http://www.seco.tkk.fi
               http://www.seco.tkk.fi                    Research Group
                                                     http://www.seco.tkk.fi
                                                                                           Olli.Alm@tkk.fi
           Antti.Vehvilainen@tkk.fi
                                                   Eero.Hyvonen@tkk.fi


ABSTRACT                                                          vices, where the database of the service is composed of pre-
This paper discusses how knowledge technologies can be uti-       viously answered questions, i.e., QA pairs. In such a service
lized in creating help desk services on the semantic web. To      the user has a question in mind, and the service has two
ease the content indexer’s work, we propose semi-automatic        major tasks:
semantic annotation of natural language text for annotat-
ing question-answer (QA) pairs, and case-based reasoning
techniques for finding similar questions. To provide an-               1. Finding relevant previous answers. A search method is
swers matching with the indexer’s and end-user’s informa-                needed to find the already answered relevant QA-pairs
tion needs, methods for combining case-based reasoning with              from the repository.
semantic search and browsing are proposed. We integrate
                                                                      2. Authoring a new answer. An existing QA pair may
different data sources by using large ontologies of upper
                                                                         satisfy the customers information need, but usually
common concepts, places, and agents. Techniques to uti-
                                                                         some kind of adaptation of the old answer case is needed.
lize these sources in authoring answers are suggested. A
                                                                         Usually answers are created and modified manually by
prototype implementation of a real life ontology-based help
                                                                         a human editor.
desk application is presented as a proof of concept. This
system is based on the data set of over 20,000 QA pairs and
the operational principles of an existing national library help   The research problem of this paper is to investigate how to
desk service in Finland.                                          support semi-automatic answer authoring in a QA help desk
                                                                  service. Our methodology is to use semantic web technolo-
1.   INTRODUCTION                                                 gies in content annotation, in utilizing the QA repository,
Companies and public organizations widely use help desk           and in integrating information available online on the web
services in order to solve problems for their customers. The      with the authoring process and the answers.
classic example of a help desk service is a call center, where
support persons answer questions by phone or by email. As         In this paper, when we use the term indexing we refer to
help desk services are being transferred to the Web, it’s more    the old, existing way of doing indexing where index terms
and more common that the customers have also the possi-           are just strings without an ontological reference. We use the
bility to solve their problems by themselves by using the         term annotation to refer to the new way of using annotation
knowledge and content accumulated at the service, with-           concepts that have an ontological reference.
out contacting a support person directly [5]. A simple ap-
proach, for example, is to publish Frequently Asked Ques-         1.1 The Existing Service
tions (FAQ) lists on the web. The option to use a simple          The research is based on a real life case study: we use the
and fast question-answer (QA) self-service is appreciated not     data set of the operational Ask a librarian service1 offered
only by the customers, but by the authors of the answers,         nationally in Finland by the editors of the Libraries.fi2 por-
too. Their time is saved, if the QA service can automati-         tal. In this service the clients can send questions to a virtual
cally provide an answer to the customer. Furthermore, the         librarian via email, and a librarian of the service provides
author can use the accumulated QA knowledge of the ser-           an answer within three working days. Some of the ques-
vice by herself, which helps in authoring the answers and         tions that the clients send are simple and the librarian can
improves the quality of the answers.                              answer them straight away. These include questions about
                                                                  the opening times of a library, how to make an inter-library
This paper discusses applications of semantic web technolo-       loan etc. However, most of the questions require that the
gies to help desk services. We focus on QA help desk ser-
                                                                  1
                                                                   http://www.kirjastot.fi/tietopalvelu
                                                                  2
                                                                   Libraries.fi provides access to Finnish Library Net Services
                                                                  under one user interface, see http://www.libraries.fi.
                 Figure 1: Question text, concepts found by Poka and similar questions in Opas UI.



librarian uses more time to investigate the subject of the                  new QA pair? This problem was considered especially
question. These include questions like I’m wondering where                  crucial by the practitioner.
I could find information about studies of the library and in-
formation science? or I’m giving a presentation of Nokia.
Where I could find helpful information? Answers to these
                                                                    1.2 The Proposed Solution
                                                                    The problems described above are approached by describ-
questions span typically a few paragraphs of text and con-
                                                                    ing a prototype of a semantic annotation and authoring tool
tain some links to useful web sites. The librarians report
                                                                    Opas4 [20]. The system is intended to be used by the librar-
that on average they use from half an hour to an hour to
                                                                    ians in authoring answers in the Ask the librarian service.
compose such an answer.
                                                                    In the following, we first show how semi-automatic semantic
Each QA pair has been indexed using the YSA thesaurus3
                                                                    annotation can be used to help in choosing concepts for the
of some 23,000 common Finnish terms. At the moment the
                                                                    semantic annotation of QA pairs, based on ontologies. Then
data set consists of over 20,000 QA pairs. A keyword-based
                                                                    the problem of finding relevant answers for a new incoming
search service is available on the web for both end-users and
                                                                    question is approached by using ideas of case-based reason-
answering librarians to use.
                                                                    ing (CBR) [1]. It is also shown present how a common upper
                                                                    ontology can be used to integrate different data sources to
In the service, several problems were identified by enquiring
                                                                    help in authoring answers. We then present the results of
the librarians employed by the service:
                                                                    the early evaluations conducted with the prototype. In con-
                                                                    clusion, contributions of the work are summarized, related
     1. Accessing accumulated knowledge. For a new submit-          work discussed, and directions of further research outlined.
        ted question, the first thing to do is often to find out if
        there already exists a similar or at least related answer
        in the knowledge base.
                                                                    2. SEMI-AUTOMATIC SEMANTIC ANNO-
                                                                       TATION
     2. Exploiting external resources in authoring. How to          When interviewing the librarians, two problems related to
        integrate different data sources and services, such as       the indexing the QA pairs were brought up: 1) Choosing
        library systems on the web, and then use these sources      the appropriate indexing terms for annotating a question-
        in authoring a new answer?                                  answer pair is often consuming and difficult. 2) There are
     3. Semantic annotation. How to help the librarian in           different conventions used in indexing by different people,
        choosing the appropriate annotation concepts for a          which makes the content unbalanced. For example, one li-
3                                                                   4
    http://vesa.lib.helsinki.fi                                          http://www.seco.tkk.fi/applications/opas/
                                        Figure 2: Specifying an annotation concept




       Figure 3: Annotating a question with an annotation concept that wasn’t found in the question text.



brarian may use a few general terms to describe an answer,         tegrated it with Opas. The following describes briefly how
whereas another uses a large number of more detailed terms.        Poka works.

Our solution approach to these problems is to combine              2.1 Extracting Annotation Concepts
ontology-based semi-automatic annotation [13] and machine          Poka provides the QA indexer with a list of possible an-
reasoning. The idea is to create a knowledge-based system          notation concepts as ontological concepts (URIs), and the
that automatically provides the annotator with a suggestion        indexer chooses which concepts she wants to use. The selec-
of potential annotation concepts based on the textual mate-        tion of the concepts is based on the words and expressions
rial and other knowledge available, such as the QA database,       found in the question and answer.
earlier annotations, and common knowledge about indexing
practices. The initial suggestion is then checked and edited       The librarians currently choose the indexing terms manually
by the human editor as she likes. This strategy not only           from the General Finnish Thesaurus YSA6 . The terms in
helps the annotator in finding annotation terms (from tens of       YSA are (with some exceptions) common noun terms, such
thousands of choices) but also enforces the annotators to use      as dog, astronomy, or child. In addition, the indexer may
right terms based on the underlying annotation ontologies.         use free indexing terms that are not explicitly listed in the
Furthermore, content is likely to become more balanced be-         thesaurus. Free terms can be common nouns, such as names
cause every annotator starts her job from a suggestion based       of flowers or animals, or proper nouns, such as person names
on the same logic. By encoding indexers’ knowledge and             (e.g., John F. Kennedy) or geographical places (Finland,
common indexing practices as rules, or by using automatic          Beijing). These categories of words, and free indexing terms
techniques such as collaborative filtering [7], it is possible to   not explicitly listed in the thesaurus, are treated by Poka in
help especially novice indexers in their job even further.         the following way.
As a first step towards such a knowledge-based semi-
automatic annotation tool, we created an ontology-based
information extraction tool Poka5 for textual data, and in-
5                                                                  6
    http://www.seco.tkk.fi/applications/poka/                           http://www.vesa.lib.helsinki.fi
                        Figure 4: An example of an existing QA pair and it’s index terms



2.1.1 Common Nouns                                              of the recognizer is first to search for full names within the
In order to map common nouns in YSA with corresponding          text at hand. After that, occurrences of the first and last
ontology concepts, YSA was transformed into the General         names are mapped to full names. Simple coreference reso-
Finnish Upper Ontology (YSO)7 [11]. YSO contains over           lution within a document is implemented by mapping the
20,000 Finnish indexing concepts organized into 10 major        individual name occurrences to corresponding unambiguous
subsumption hierarchies. Each concept is associated with        full name if there exist one. Individual first names and sur-
one or more term labels, which allows mapping of words          names without corresponding full names are discarded.
and terms onto YSO concepts (URIs).
                                                                A strength of Poka’s extraction process is that it recognizes
First, the input question is analysed by a morphological        also untypical names, unlike the tools based on gazetteers,
analyser and a syntactic parser FDG8 [18]. It produces to-      such as tools that use the initial named entity recognition of
kenized output of the text in XML-form. FDG produces a          the Gate framework[3]. Searching potential names is started
lemmatized form of the word(s), morphological information,      from the uppercase words of the document. With mor-
syntactical information, and type and reference of functional   phosyntactic clues some hits can be discarded. For example,
dependency to another token within a sentence, if there exist   first names in Finnish rarely have certain morphological af-
one.                                                            fixation like -ssa (similar to English preposition in) or -lla
                                                                (preposition on). Also the FDG-parser’s surface-syntactic
For concept matching, also the labels of YSO-concepts are       analysis is used as clues for revealing the proper names.
lemmatized. Lemmatized concepts are indexed in a pre-
fix trie for efficient extraction. Lemmatization of text and       Person name recognition may produce false hits. One wrong
concept names helps to achive better recall in the extraction   hit of full name may cause the corresponding wrong first and
process; syntactical forms of words vary greatly in languages   last name occurrences to be mapped to a full name. The
with heavy morphological affixation[17]. The architecture         good thing is that all the occurrences of the false name can
can be extended to support other languages with different        be corrected by discarding the full name.
language-dependent syntactic parsers.
                                                                2.2 Free Annotation Concepts
2.1.2 Place Names                                               Poka doesn’t always suggest all annotation concepts that
Place name recognition in Poka is based on the same             the librarian wants to use, even if the corresponding word
method as common noun recognition. In this case, the place      can be found in the text to be annotated, and the word
ontology of the MuseumFinland portal [10] extended in the       is considered a legal annotation concept. This happens al-
CultureSampo-project9 is used instead of YSO.                   ways with free annotation concepts that by definition are
                                                                not included in the ontology explicitly. Obviously, human
                                                                intervention is necessary in such cases.
2.1.3 Person Names
Poka’s name recognition tool is a rule-based information
                                                                Our approach to the problem of extracting free annotation
extraction tool without initial gazetteers. The main idea
                                                                concepts is to provide a mechanism by which the end-users
7
  http://www.seco.tkk.fi/ontologies/yso/                         can define new free annotation entries in the ontology and
8
  http://www.connexor.com, Machinese Syntax                     share them with other annotators. A new annotation con-
9
  http://www.seco.tkk.fi/projects/kulttuurisampo/                cept is defined by simply telling the system its class, label,
 Figure 5: An example using an existing QA pair and a link from the link library in authoring an answer.



and an optional comment. For example, the term ”leikki-            cially, if the input text is long, a considerable number of pos-
auto” (toy car) is not present in YSO ontology because lots        sible annotation concepts are usually found. In such cases
of things can be used as toys, and it does not make much           it is useful to rank the concepts according to their likely rel-
sense to list them all in the system. On the other hand, the       evance, and provide the end-user with a simple mechanism
concept toy car is useful from the indexing and information        for evaluating and deleting the irrelevant annotations.
retrieval view points. In this case, the user can interactively
create a new concept as a subclass of an existing ontological      Opas uses the idea [16] of searching for semantic cluster(s)
concept, here toy (“lelu”), label it, here “leikkiauto” (toy       from the term set for determining the relevance of indexing
car), and use it in the annotation. When searching for con-        concepts: terms in semantic clusters are ranked more rele-
tent later on by using the concept toy (“lelu”), also QA pairs     vant than semantically isolated terms. For example terms
annotated with toy car (“leikkiauto”) can be retrieved with        doctor, sickness and medication form a semantic cluster. For
the additional information that in this case the QA pair is        common noun terms we use the concept relations defined in
about toy cars in particular. The new concept of toy car           the YSO ontology to identify these clusters.
also be utilized in various ways in the user interface, e.g., as
a search category in view-based semantic search [10]. Free         In [8], an ontological extension of the classic tf-idf (term fre-
indexing terms with the same name can be distinguished             quency – inverse document frequency) method is developed,
with different URIs and with an additional comment.                 which enables us to identify synonyms and to utilize the
                                                                   concept hierarchies of the ontology. We apply this work so
Unknown but relevant annotation concepts without a corre-          that more weight is given to concepts that appear frequently
sponding concept in the ontologies are frequently encoun-          in the text but haven’t been used often as annotation con-
tered also in name recognition because new names (e.g.,            cepts in previous questions. In addition, Opas can suggest
names of pop stars) are constantly introduced as time goes         annotation concepts that are usually used together. For ex-
by. The same approach used with free annotation concepts           ample, if a question has the concept aviation extracted, and
can be employed here, too.                                         there are lots of questions annotated with both aviation and
                                                                   airplane, the concept airplane can be suggested for annota-
In some cases where a word does not have an exact match            tion concept, even though it is not explicitly present in the
with an ontological concept, Poka is able to suggest related       question text.
annotation concepts based on the ontology. Such reasoning
can be based, for example, on the morphological structure of       Our preliminary experiments with annotation concept
a compound word or the functional dependencies produced            weighting seem to suggest that relatively more weight should
by the FDG-parser.                                                 be given to terms that have a high term frequency, and the
                                                                   effect of inverse document frequency should be relatively
                                                                   smaller. The reasoning behind this is that if, say, the con-
2.3 Ranking Annotation Concepts                                    cept poetry appears in a question many times, it seems that
Previous sections analyzed situations where a semantic an-         the concept is relevant to the question even though it has
notator produces too few relevant annotation concepts. A           been used frequently as an annotation concept in previous
reverse problem with automatic semantic annotation is that         questions. So, in Opas the main weight is determined by the
often too many irrelevant concepts are suggested. Espe-
Figure 6: A book search based on the index terms and their views that were found in Helsinki City Library
Classification System.



term frequency, whereas inverse document frequency and se-      1) ensure that the indexer uses a concept found in the on-
mantic clusters have a smaller impact on the weight.            tology and 2) suggest semantically related indexing concepts
                                                                that the librarian perhaps didn’t consider.
2.4 An Example
Figure 1 depicts the first screen that the librarian sees when
she has decided to answer a question. The end-user has          3. UTILIZING CASE-BASED REASONING
submitted a question about Arto Paasilinna’s (a Finnish au-        TO FIND SIMILAR QUESTIONS
thor) life and his books (on the left, in the box “Kysymys-     Case-based reasoning (CBR) [1] is a problem solving
teksti” (Question Text). On the right, in the box “Oppaan       paradigm in artificial intelligence where new problems are
 o a a a
l¨yt¨m¨t k¨sitteet” (Indexing Concepts Found) there are         solved based on previously experienced similar problems,
two common noun concepts “teokset” (writings) and “es-          cases. The CBR cycle consists of four phases: 1) Retrieve
     a
itelm¨t” (plays). Poka has also identified the person name       he most similar case or cases, 2) Reuse the retrieved case(s)
“Arto Paasilinna”. Below the question text, there is the au-    to solve the problem, 3) Revise the proposed solution and
thoring component (“Vastaajan apurit”) (Authoring Tools)        4) Retain the solution as a new case in the case base.
to be discussed in detail in section 4.
                                                                Since similar QA pairs recur in QA services, we decided to
Figure 2 depicts the case where the free annotation con-        investigate the usefulness of CBR in QA indexing and infor-
cept “leikkiauto” (toy car) is encountered. In this case,       mation retrieval. CBR has been used in help desk applica-
Poka analyses the compound term into pieces and suggests        tions previously. For example, Goker and Roth-Berghofer [6]
the concept “leikkikalu” toy because it is found in the YSO     argue that CBR can successfully be used in a help desk ser-
ontology as a potentially related concept based on the first     vice and by using CBR in help desk service an organization
part of the compound. The librarian can then define the          can strengthen the common knowledge and reduce the time
narrower concept toy car with the label “leikkiautot” toy       needed to answer a help request. Kai et al. [12] have found
cars by clicking on the link in the middle.                     out that users of a CBR-based help desk system tend to re-
                                                                member solutions longer since they feel that they’ve solved
Figure 3 depicts the case where Poka is unable to make any      the problem themselves, even though the solution was re-
suggestions, and the librarian wants to add the new anno-       trieved and possibly adapted from the case base.
tation concept writer (“kirjailijat”) in the ontology. As she
is typing in the word, Opas uses semantic autocompletion        What Opas brings in to traditional CBR approach is that it
[9] to suggest matching annotation concepts in YSO. The         integrates semantic annotation to the steps of the CBR cy-
floating box on the bottom right displays information about      cle. For the first step, Opas contains a CBR component
a concept, its preferred and alternative labels, related con-   that automatically searches for similar questions based on
cepts, subconcepts, and superconcepts. This information is      the concepts that Poka has extracted from the question
displayed when the librarian points the concepts with the       text. The weighted annotation concept list discussed in sec-
mouse. The purpose of the autocompletion component is to        tion 2.3 is used as the basis for the search with the following
 Figure 7: An example of link library links that are found based on Poka’s annotation concept suggestions.



modifications: 1) The concepts that the indexer has selected     be used as basis for the new answer by clicking the link
are given a substantially higher weight since their relevance   (the white paper sheet with a pen). Figure 5 depicts how
has been confirmed by the indexer. 2) The extracted places,      the librarian has used an existing answer as a basis for the
names and specified concepts are given a higher weight due       answer.
to their specificity.
                                                                As the retrieval of similar QA pairs can be seen as the first
4.   INTEGRATING DIFFERENT DATA                                 step in the CBR cycle, using them in authoring component
                                                                can be seen as a part of the second step: Reuse the retrieved
     SOURCES IN ANSWER AUTHORING                                case(s) to solve the problem.
When discussed the current service with the librarians, a
few things were remarkable about the information sources
that the librarians use when answering a question. Firstly,     4.2 Authoring Using a Library Classification
nearly all of the librarians said that they use the reference       System
library with real books to find useful resources. Secondly,      An ontology for a library classification system was created
even though nearly all the librarians agreed that the ques-     for Opas, and then the Helsinki City Library Classifica-
tions tend to repeat themselves, not many of them system-       tion System (HCLCS) 10 was converted into this ontologized
atically use the question archive to find old similar ques-      form. The basis for the classification ontology is Simple
tions. Besides that, it is remarkable that when the librari-    Knowledge Organisation System (SKOS)11 and the conver-
ans aren’t able to answer a question in three working days,     sion was made following the guidelines given in [19]. In ad-
they nevertheless send an answer to the client. This answer     dition to class hierarchies the HCLCS contains index terms,
usually contains pointers to different information resources,    and each of these terms has got a relation to a library class.
for example web sites, that might contain the answer to the     For example the term Treatment of alcoholics has got a
question.                                                       relation to the library class 371.71 Alcohol policy.

Based on the remarks described above, we decided to add         Index terms in the HCLCS contain also views, as can be seen
an authoring component to Opas. The purpose of this com-        in the figure 6. For example the term pieces of art (”Teok-
ponent is to help the librarian to compose the answer using     set”) embodies different viewpoints such as bibliographies
different information sources. The authoring component can       and art collections. Each of these viewpoint is related to
be seen in the figure 1 (”Vastaajan apurit”). What is com-       a library class. These relations between index terms and
mon to these authoring components is that each of them uses     library classes are used to search for books that could be
the annotation concept suggestions produced by Poka to          relevant to the answer. These books are searched based on
query external resources. The common upper ontology YSO         the library class, as depicted in the figure 6. The librarian
acts as a ”glue” between different information resources. In     can use the results of the book search 1) for searching an an-
the following the subcomponents of the authoring compo-         swer for the question and 2) by enhancing the answer with
nent are explained.                                             links to interesting books.

4.1 Authoring Using Existing QA Pairs
Existing QA pairs can be used as a basis for composing
the new answer. In figure 4 the librarian has opened one
                                                                10
of the questions in order to see whether it provides useful          http://hklj.kirjastot.fi/
                                                                11
information for answering the question. The answer can               http://www.w3.org/2004/02/skos/
4.3 Authoring Using a Link Library                                 Currently the book search component isn’t using semanti-
The editors of the Libraries.fi maintain a collection of links      cally annotated content, but instead fetches web pages and
to interesting web sites. This link library is categorized using   then parses the results from the HTML content. In con-
the same classification system that is used in the HCLCS.           sequence, one of the major benefits of the semantic web,
An ontology was created and then the data was converted            disambiguation of terms (for example, ”Nokia” as an enter-
into an ontologized form in a similar manner than described        prise and as a city) is not possible. Opas would benefit more
in the previous section. The figure 7 depicts a screenshot of       from a system with semantically annotated content.
this link library. The links are categorized by the HCLCS
(”Henkil¨bibliografiat”, ”Lastenkirjastoty¨”, etc.), and the
         o                                    o                    The utilization of case-based reasoning in Opas can be seen
librarian has opened one category to see whether there are         somewhat shallow. The ideas of CBR and the steps of the
interesting links. These links can be added to the answer          CBR-process fit well with Opas, but the details of each step
text as can be seen in the figure 5.                                could be examined more carefully. For example a framework
                                                                   for similarity assessment presented in [4] could be utilized
5.   EVALUATION                                                    for the retrieval of similar QA pairs.
To evaluate the current version of the prototype and to find
out librarians’ initial attitudes towards the new version of       A result of the the evaluation was that the annotation con-
the system, a few user tests were run with real users of the       cept suggestions weren’t optimal. Sophisticated methods for
                                                                   ranking the suggestions and finding out which concepts re-
service. The tests were conducted so that the librarian was
                                                                   ally are relevant for a user query should be investigated and
first introduced with the prototype and its features. Then,
she was asked to answer a question using the prototype.            developed further.
The questions were real questions of the existing version of
the service. Finally, the librarian was interviewed about the      6.1 Related Work
answering process.                                                 To search for similar questions some other approaches would
                                                                   have been possible as well. For example Kohonen et al. [15]
The results of the evaluation were encouraging. All librari-       demonstrate how Self Organizing Maps [14] (SOM) can be
ans found the features of the prototype useful and said that       used to organize a vast collection of patent abstracts and
they would take the prototype into use, if it were possible.       then use the SOM to search if similar patents exist for a
The most impressing and useful feature for the librarians          new patent application. A standard text search by using for
seemed to be the authoring features of the prototype, espe-        example the Java search engine Lucene12 would also prob-
cially the component that searches for existing similar ques-      ably yield sufficient results when searching for similar ques-
tions automatically. All librarians were also pleased with         tions. However these methods don’t take into account the
the authoring features that enable to add resources (old an-       semantics of the text, and we want to be able to utilize the
swers, links, book references) to the answer by clicking a         semantic relations defined in the common upper ontology
button.                                                            YSO.

The annotation concept suggestions were welcomed, but              As for semantic authoring, David Aumuller [2] presents a
not as eagerly as the authoring components. Some of the            technique to semantically author Wiki pages. The technique
librarians said that the concept suggestions were entirely         is not just for adding annotations to the pages but also for
irrelevant. The semantic autocompletion component that             editing the text. His ideas could be applied in authoring the
searches for concepts in YSO was considered useful. Based          answers.
on the tests, nothing can yet be said about how good the
ranking of the concept suggestions was.                            6.2 Future Work
                                                                   Currently Opas is focused on the indexers’ role in QA appli-
When a librarian hasn’t selected and confirmed any of the           cations but Opas will include the end-users’ side, too. Here
suggested annotation concepts, the authoring component             we work on questions such as: how to classify the QA pairs
fetches resources based on all of the concepts in the list.        for semantic view-based search, how to do semantic recom-
However, when the librarian had selected one or more sug-          mending in order to show other interesting answers, and
gestions to be used, it was confusing that still the authoring     how to integrate the system with semantic content and ser-
component fetched resources related to unselected concepts.        vices at other locations on the web related to the end-user’s
Although these resources were given a smaller weight and           information needs. The CBR component that searches for
thus they were lower in the result list, it seems that when        similar questions can be used with little modifications at the
the librarian has selected one or more concept suggestion          end-users’ side, too.
or inserted a free annotation concept, the other, unselected
concepts should be ignored totally in the result lists of the
authoring components.
                                                                   Acknowledgments
                                                                   Our work is a part of the National Semantic Web Ontology
                                                                   Project in Finland (FinnONTO)13 , funded by the National
6.   DISCUSSION                                                    Funding Agency for Technology and Innovation (Tekes) and
First experiments with combining semi-automatic seman-             a consortium of 36 public organizations and companies.
tic annotation and authoring with the ideas of case-based
reasoning seem promising. Even though the evaluation of
the prototype wasn’t extensive, it can be concluded that
Opas would be a valuable tool to librarians if taken into
                                                                   12
use. However, systematic empirical evaluations of the appli-            http://lucene.apache.org
                                                                   13
cation are yet to be done.                                              http://www.seco.tkk.fi/projects/finnonto/
7.   REFERENCES                                                   Applications and Perspectives, Proceedings of the 2nd
 [1] A. Aamodt and E. Plaza. Case-based reasoning:                Italian Semantic Web Workshop University of Trento,
     foundational issues, methodological variations, and          Trento, Italy, 14-15-16 December 2005, 2005.
     system approaches. AI Commun., 7(1):39–59, 1994.
                                                              [14] T. Kohonen. The self-organizing map. Proceedings of
 [2] D. Aumueller. Semantic authoring and retrieval within         the IEEE, 78(9):1464–1480, Sep 1990.
     a wiki, Aug 2005. Demo paper, 2nd European
     Semantic Web Conference 2005 (ESWC2005).                 [15] T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi,
                                                                   J. Honkela, V. Paatero, and A. Saarela. Self
 [3] K. Bontcheva, V. Tablan, D. Maynard, and                      organization of a massive document collection. Neural
     H. Cunningham. Evolving gate to meet new challenges           Networks, IEEE Transactions, 11(3):574–585, May
     in language engineering. Natural Language                     2000.
     Engineering, 10(3/4):349—373, 2004.
                                                                                                         e
                                                              [16] G. S. Luit Gazendam, Veronique Malais´ and
 [4] G. Falkman. Issues in Structured Knowledge                    H. Brugman. Deriving semantic annotations of an
     Representation A Definitional Approach with                    audiovisual program from contextual texts. In
     Application to Case-Based Reasoning and Medical               Semantic Web Annotation of Multimedia
     Informatics. PhD thesis, Chalmers University of               (SWAMM’06) workshop, 2006.
     Technology, G¨teborg University, 2003.
                   o                                               http://www.cs.vu.nl/ guus/papers/Gazendam06a.pdf.

 [5] S. Foo, S. C. Hui, P. C. Leong, and S. Liu. An                    o
                                                              [17] L. L¨fberg, D. Archer, S. Piao, P. Rayson,
     integrated help support for customer services over the        T. McEnery, K. Varantola, and J.-P. Juntunen.
     world wide web: a case study. Comput. Ind.,                   Porting an english semantic tagger to the finnish
     41(2):129–145, 2000.                                          language. In Proceedings of the Corpus Linguistics
                                                                   2003 conference, pages 457–464. UCREL, Lancaster
 [6] M. Goker and T. Roth-Berghofer. The development               University, 2003.
     and utilization of the case-based help-desk support
     system homer. Engineering Applications of Artificial                                 a
                                                              [18] P. Tapanainen and T. J¨rvinen. A non-projective
     Intelligence, 12(6):665–680, Dec 1999.                        dependency parser. Proceedings of the 5th Conference
                                                                   on Applied Natural Language Processing, pages 64–71,
 [7] J. H. Herlocker, J. A. Konstan, and J. Riedl.                 1997.
     Explaining collaborative filtering recommendations. In
     Computer Supported Cooperative Work, pages               [19] M. van Assem, M. R. Menken, G. Schreiber,
     241–250. ACM, 2000.                                           J. Wielemaker, and B. Wielinga. A method for
                                                                   converting thesauri to rdf/owl. In Third International
 [8] M. Holi, E. Hyv¨nen, and P. Lindgren. Integrating
                      o                                            Semantic Web Conference ISWC 2004, volume 3298,
     tf-idf weighting with fuzzy view-based search. In             2004.
     Proceedings of the ECAI Workshop on Text-Based
     Information Retrieval (TIR-06), Aug 2006. To be                        a                          o
                                                              [20] A. Vehvil¨inen, O. Alm, and E. Hyv¨nen. Combining
     published.                                                    case-based reasoning and semantic indexing in a
                                                                   question-answer service, June 20 2006. Poster paper,
 [9] E. Hyv¨nen and E. M¨kel¨. Semantic autocompletion.
            o              a a                                     1st Asian Semantic Web Conference (ASWC2006).
     In Proceedings of the 1st Asian Semantic Web
     Conference (ASWC-2006), Beijing, Sep 4-9, 2006.
     forth-coming.

            o          a a
[10] E. Hyv¨nen, E. M¨kel¨, M. Salminen, A. Valo,
     K. Viljanen, S. Saarela, M. Junnila, and S. Kettula.
     MuseumFinland – Finnish museums on the semantic
     web. Web Semantics: Science, Services and Agents on
     the World Wide Web, 3(2–3):224–241, Oct 2005.

             o
[11] E. Hyv¨nen, A. Valo, V. Komulainen, K. Sepp¨l¨, aa
     T. Kauppinen, T. Ruotsalo, M. Salminen, and
     A. Ylisalmi. Finnish national ontologies for the
     semantic web - towards a content and service
     infrastructure. In Proceedings of International
     Conference on Dublin Core and Metadata Applications
     (DC 2005), Nov 2005.

[12] H. Kai, P. Raman, W. Carlisle, and J. Cross. A
     self-improving helpdesk service system using
     case-based reasoning techniques. Computers in
     Industry, 30(2):113–125, September 1996.

[13] N. Kiyavitskaya, N. Zeni, J. R. Cordy, L. Mich, and
     J. Mylopoulos. Semi-automatic semantic annotations
     for web documents. In SWAP 2005, Semantic Web

				
DOCUMENT INFO