Tech Focus Question Answering by lindahy


More Info
									                                                                                                                                  Issue 4 July 2003

                                                                     Tech Focus: Question Answering
                                                   In each issue of LT Update, we bring you a          Once the system has a selection of these ‘an-
                                                   brief primer on an important area of speech         swer candidates’, the question-answering sys-
  • Welcome
                                                   and language technologies. In this issue, Diego     tem typically performs a more in-depth analy-
  • Tech Focus:                                    Mollá provides an introduction to controlled        sis of both the question and the answer. At this
    Question Answering                             languages.                                          stage, the answer candidates may be fed to an
                                                   Computers are very good at processing informa-      answer extraction module that determines the
  • The Australasian Language                                                                          best answer on the grounds of semantic simi-
                                                   tion stored in structured formats such as
    Technology Association                                                                             larity between the question and the answer
                                                   databases. However, a lot of information — most
  • The Australasian Language                      of the HTML web pages that make up the World        candidate. This is the approach taken by the
    Technology Workshop and                        Wide Web, for example — is contained in text.       system developed by the Language Computer
                                                                                                       Corporation (
    Summer School                                  Although it’s easy for humans to make sense of
                                                   text, it’s much less easy for computers to do so.   Another popular approach to finding the an-
  • Changes in the Speech World                                                                        swer to a question is to methodically deter-
                                                   The availability of all this ‘unstructured data’    mine all the patterns that the answer to a spe-
  • Upcoming Events                                in digital form puts us in the situation where,     cific type of question should have; this is the
                                                   although the information is physically there        approach taken by the Russian company
                                                   and accessible to computers, it is difficult to     InsightSoft (
                                                   get a computer to find what you need. The
Welcome to the fourth issue                        aim of a question-answering system is to proc-      Finally, the current availability of huge volumes
                                                   ess a question formulated by the user and to        of text via the World Wide Web has sparkled the
of LT Update!                                      find the answer to that question by searching       development of a third approach. Companies
LT Update is a free publication from the Cen-      through the available text.                         like Microsoft and IBM are experimenting with
tre for Language Technology, produced with                                                             data-intensive approaches based on an exten-
                                                   Question answering has recently attracted in-       sive use of web searches. Each search uses a
the generous support of CSIRO. The Update is
                                                   tensive research interest, spurred by the com-      variation of the question as search query, and
a twice-yearly hard and soft copy publication,
                                                   petition-based annual Text Retrieval Confer-        the final answer is determined by looking at the
backed up by timely email alerts, that aims to     ence ( Current question answer-
keep you abreast of developments in the                                                                most frequent strings in the retrieved pages.
                                                   ing systems employ an array of techniques
speech and language technologies in Australia      within the area of language technology. For         At Macquarie University we are exploring ques-
and New Zealand. If you’re not yet a subscriber,   example, a typical question answering system        tion answering methods and their integration
sign up at               first tries to locate the documents that are        into web search engines. JustAsk! is the search
LTUpdate. If you are a subscriber and you want     most likely to contain the answer. To achieve       engine used in Macquarie University’s
to change your subscription details, visit the     this, the system typically uses techniques bor-     webpages. It incorporates a natural-language
site and key in the six-character passcode         rowed from the area of Document Retrieval,          interface front-end, very similar in concept to
printed on the top of your mailing label.          preselecting documents in much in the same          AskJeeves ( AnswerFinder
                                                   way that a web search engine returns a list of      ( w w w. c l t . m q . e d u . a u / R e s e a r c h /
In this issue, we look at an area of technology                                                        answerfinder.html) is an answer extraction sys-
                                                   web pages that match a user query.
that is set to make search engines smarter:                                                            tem that finds answers by comparing the logi-
in Tech Focus, Diego Mollá explores the emerg-     But this is only the beginning. The preselected     cal forms of questions and answer candidates.
ing area of question answering. Also in this       documents are further analysed in order to
                                                   detect text fragments that are likely to con-       The technology for question answering sys-
issue, we report on the new Australasian Lan-
                                                   tain the answer. So, if you ask What Ameri-         tems is maturing. Some web search engines
guage Technology Association, and its associ-
                                                   can general is buried in Salzburg?, the sys-        like AskJeeves incorporate simple question
ated workshop and summer school. Plus: how                                                             answering technology to find web pages that
                                                   tem will filter out fragments that do not con-
the speech industry is changing, both nation-                                                          contain the answer to your question; and
                                                   tain names of persons; or, if you ask Where
ally and internationally. If what’s here piques                                                        Google’s ability to return summaries of the
                                                   does cinnamon come from?, only text frag-
your interest, you can find out more via the                                                           web pages tailored to the user question
                                                   ments with references to locations are se-
links for this issue at our website:                                                                   makes it possible to find the answer to some
                                                   lected. This is largely done by resorting to
visit                                                               of the questions simply by reading the sum-
                                                   named-entity recognisers that spot all the
                                                   names in the text and classify them into a          mary. It’s very likely that, before long, ques-
What’s your view ?                                 pre-defined list of categories (e.g., person        tion-answering capabilities will become a
If you have comments on LT Update, or ideas        names, organisations, dates, numbers, time          standard feature in major web search en-
on things you’d like to see us cover, just         expressions, and locations).                        gines: search engines are set to get smarter!
The Australasian                                                                 Upcoming Events
Language Technology                                      National
Association                                              •    Third Annual Conference for Standards and Process in Publishing (Open Publish
                                                              2003): 28-31 July 2003. Star City, Sydney.
In late 2002, the Australasian Language Technol-
ogy Association (ALTA) was formed, and in early          •    16th Australian Joint Conference on Artificial Intelligence (AI’03): 3-5 December
2003 the inaugural executive committee was                    2003. Perth.
elected. The purpose of ALTA is to promote lan-          •    Australasian Language Technology Summer School and Australasian Language
guage technology research and development in                  Technology Workshop: 8-12 December 2003. University of Melbourne.
Australia and New Zealand; to organise regular                http://
events for the exchange of research results and          •    The 8th Australian and New Zealand Intelligent Information Systems Conference
for academic and industrial training; to co-ordi-             (ANZIIS2003): 10-12 December 2003. Macquarie University, Sydney. http://
nate activities with those of allied fields in the  
region and with umbrella organisations at the in-
ternational level; and to engage with institutions       International
in the government, commercial, academic and
                                                         •    41st Annual Meeting of the Association for Computational Linguistics (ACL-20 03):
public sectors in the pursuit of these objectives.
                                                              7-12 July 2003, Sapporo, Japan.
ALTA aims to organise regular events for the ex-
change of research results and for academic and          •    11th ELSNET Summer School on Language and Speech Communication, on Language
industrial training, and will co-ordinate activities          and Speech Technology in Language Learning. 7-18 July 2003. Lille, France.
with other professional societies. ALTA now has a   
website at visit the site to find       •    Seventh Workshop on the Semantics And Pragmatics of Dialogue (DiaBruck 2003): 4-6
out more and to sign up to ALTA’s mailing lists.              September 2003. Saarland University, Germany.
                                                         •    Workshop on Mobile and Ubiquitous Information Access. 8 September 2003. Udine,
Australasian Language                                         Italy.
Technology Summer School                                 •    Recent Advances in Natural Language Processing (RANLP-2003): 10-12 September
and Australasian Language                                     2003. Borovets, Bulgaria.
Technology Workshop                                      •    International Machine Translation Summit IX: 23-27 September 2003. New Orleans,
ALTA is organising a week-long combined summer                USA.
school and workshop from 8-12 December 2003 at           •    1st Indian International Conference on Artificial Intelligence (IICAI-03): 18-20
the University of Melbourne. The Australasian Lan-            December 2003. Hyderabad, India. http://
guage Technology Summer School will consist of           •    8th Pacific Rim International Conference on AI (PRICAI): 9-13 August 2004.
about 10 short courses, targetted at postgraduate             Auckland, New Zealand.
students and researchers in academia and indus-
try. There will be introductory courses on text
technologies, speech technologies, statistical lan-    Changes in the Speech Vendor World
guage processing and data-intensive linguistics.
                                                       There have been some interesting changes in the speech world in the last six months, both nation-
Advanced courses will be offered on a selection of
                                                       ally and internationally, and SpeechWorks is a company that figures prominently in both.
the following topics: parsing, generation, dialogue
systems, information extraction, question answer-      Here in Australia, VeCommerce announced in February that it had entered into an alliance with
ing, agents, machine learning, and human-com-          SpeechWorks International, Inc. and Genesys Telecommunications Laboratories, Inc., a subsidiary
puter interaction. Courses will take place on 8-9      of Alcatel, to create what was heralded as a new force in the global speech landscape.
and 11-12 December.                                    And on the international stage, at the end of April, ScanSoft Inc – a Xerox spin-off — agreed to
                                                       purchase SpeechWorks in a stock-swap transaction valued at US$132 million. That’s hot on the
The Australasian Language Technology Workshop
                                                       heels of ScanSoft’s acquisition of Philips speech-processing business in January 2003.
will be held on Wednesday 10 December, and will
provide a forum for the presentation and discus-       Who gets LT Update?
sion of new research in language technology. A
                                                       LT Update is a product of Macquarie University’s unique teaching program in the human language
call for papers will be circulated in June. ALTA is
                                                       technologies. This program, funded under the Federal Government’s prestigious Science Lecture-
also exploring the possibility of hosting an indus-    ships Initiative, is the only teaching program in Australia that focuses on delivering a rich educa-
try night on 10 December, and invites expressions      tion in the twin areas of spoken language processing and natural language processing, widely
of interest. The aim is to create a forum where        viewed as critical technologies in the next few years. LT Update is provided as a service for
language technology developers from industry and       alumni from this program, so it provides both a community for those with similar interests, and at
academia can present their technologies to the         the same time a very focussed channel to a group of people with particular skills. Thanks to
language technology community, and also to spe-        CSIRO’s generous support, subscriptions are currently free: visit
cially invited senior figures from industry, educa-    LTUpdate to register. You can also access this newsletter electronically via site and you’ll also find
tion and government. Visit the ALTA web site at        there web links to all the items mentioned in this issue as well as pointers to further resources. for more information.                  Editor: Robert Dale (

To top