Docstoc
EXCLUSIVE OFFER FOR DOCSTOC USERS
Try the all-new QuickBooks Online for FREE.  No credit card required.

0

Document Sample
0 Powered By Docstoc
					LE 1182 TREE

DELIVERABLE IDENTIFICATION
 Identification number           LE 1182-D-20
 Type                            Management Report
 Title                           The TREE Project - Final Report
 Status                          Draft
 Deliverable                     D20
 Date                            February 3, 1999
 Version                         1.0
 Number of pages                 20
 Author(s)                       Jeremy Ellman, Alan Goddard; MARI Group Ltd
                                 Anders Green, Joakim Nivre; Göteborg University
                                 Luca Gilardoni; Quinary
                                 Alan Wallington; UMIST
                                 Siobhan Walsh; Newcastle City Council
 WP/ Task responsible            WP0
 Project contact point:          Jeremy Ellman
                                 MARI Computer Systems
                                 Wansbeck Business park
                                 Rotary Parkway
                                 Ashington
                                 Northumberland
                                 NE63 8QZ
                                 Tel: +44 191 402 0191
                                 Fax: +44 191 402 1112
                                 E-Mail: Jeremy.Ellman@mari.co.uk
 EC project officer              Pierre-Paul Sondag
 Status                          Public
 Actual distribution             Consortium / EC
 Supplementary notes
 Key words
 Abstract                        The is the project’s Final Report.
 Status of the abstract          Public

 Received on
 Recipient's catalogue number




DOCUMENT EVOLUTION
  Version        Date           Status           Authors       Notes
  1.0            11/11/98       first draft      AG
  1.1            3/2/99         Revised          JE            Circulated for partner feedback
  1.2            9/2/99         Revised          AG            NCC Contribution included for partner
                                                               feedback
                                                                                               Language Engineering Project LE-1182 TREE




1.          Executive Summary ............................................................................................................ 3
     1.1 Technical achievements ...................................................................................................... 3
     1.2 Validation results................................................................................................................. 4
     1.3 Impact and future prospects................................................................................................. 4
2.          Project Timetable ................................................................................................................ 5
     2.1 Contractual matters ............................................................................................................. 5
     2.2 Stages of work ..................................................................................................................... 5
     2.3 External reviews .................................................................................................................. 5
     2.4 Conferences, exhibitions, user group meetings ................................................................... 5
     2.5 Other important events ........................................................................................................ 6
3.          Achievements ...................................................................................................................... 6
     3.1 Demonstrator system or service .......................................................................................... 6
     6.5 Contact list of Project User Group.Names ........................................................................ 20




                                                                       -2-
LE-1182 TREE                                                                                     Final Report



1.       Executive Summary
1.1      Technical achievements

The technical objective of the TREE project has been to develop a web-based service offering multilingual access to
job vacancies, allowing users not only to search for jobs in their own language but also to get automatically
generated descriptions of the jobs in the same language.
The TREE system designed to meet this aim has evolved throughout the course of the project. Three successive
prototypes have been built; the first (P01) based on requirements from user groups, and the others (P02 and P03)
based on feedback of users from the previous version. The scope of the job areas covered has increased dramatically
from the Tourism and Leisure domain (P01) to all job sectors (P02 and P03). The architecture also changed
significantly between P02 and P03 to counter the instability caused by some elements of the underlying technology.
In its current version (P03), the system consists of three main modules: the user interface (UI), the database system
(including both a vacancy database and a terminology database), and the generator. (In addition, there are various
tools for importing data into the databases.)
The UI is based on standard Web technologies (HTML and JavaScript), and is accessible through any standard
(frames-supporting) browser. It has been tested with Microsoft Explorer (version 4) and Netscape Communicator
(version 4). The UI now (i.e. P03) supports five languages (Flemish, French, English, Finnish, Swedish), as against
three languages in previous versions.
The core of the UI is the vacancy search mechanism. This uses a hierarchy of list-boxes for the selection of job titles,
and clickable maps to select the desired locations. The search results in a list of hyper-text links (of the format
‗jobtype in location‘) to jobs which fulfil the search criteria entered by the user. If there are no appropriate jobs in
the location specified, the geographical scope of the search is progressively widened.
The TREE System is supported by a database system designed to hold vacancy data in a language-neutral format.
This is achieved by storing references to language dependent items, such as the job description, by means of codes
which refer to a terminology database containing equivalent terms in all the languages supported. The database
system, implemented upon an Oracle RDBMS, is therefore split in two main components, one for job vacancies and
the other for terminology.
The vacancy database has been designed with the aim of being generic enough to enable storage of any kind of job
vacancies. The schema is normalised and attributes fall into two main categories: (i) terminology data: attributes
whose values are expected to be term codes that enable translation to different languages (e.g. job title, qualifications
required); (ii) value data: attributes whose meaning is language independent (such as numbers or street addresses).
An API is provided to enable the analyser/loader module to take incoming data and to store vacancies in the correct
format, while access from the search engine is via normal SQL queries.
The terminology database stores terminology deemed relevant to the TREE domain. For each of the vacancy
attributes containing terminology data, a hierarchy of terms is maintained. For each term, it is possible to define
synonyms (including multi-word expressions) in all the supported languages.
The generator module in the TREE system has been developed with the aim of using a natural language grammar
with a simple semantic component to produce multilingual job adverts in different languages from a language
independent database representation. The generator produces HTML-coded texts to be displayed as the result of
searching the job database. The input to the generator is a string representing the general structure of the database
entry for a specific job advert together with the terminology that applies to some particular job. From the internal
structure the generator then builds a syntactically well formed text in the current language using the terminology and
grammar. The generator as such is written in Prolog with some I/O code written in C. In the latest prototype (P03)
the generator is run as a standalone application compiled into binary code using a Warren Abstract Machine
representation of the generator constructed using BinProlog 5.75.
The aim for the generator module was to provide a way to make it possible to use well known linguistic technology
to generate job adverts in natural language. The generator engine as such is not domain dependent nor is the
environment it was implemented in. The database interface, terminology and language grammars need to be
developed but other modules can be reused without much effort. This makes it possible to transfer technology to
other parts of the software industry.
Originally, the TREE system was supposed to contain an analysis module, using information extraction techniques to
gather vacancy data from free text job ads. Throughout the course of the project, however, the emphasis has
changed, becoming more focused on the input of vacancy data from large existing databases. In order to prove the
viability of this approach, two data conversion packages have been implemented, one for vacancy data from VDAB

                                                           -3-
                                                                             Language Engineering Project LE-1182 TREE

(Belgium), the other for vacancy data from AMS (Sweden). Another kind of input module has been developed to
allow NCC job vacancies, input via a Lotus Notes form, to be automatically inserted into the database.

1.2       Validation results

To ensure that the TREE application will work as intended it is not only required that the logical function of the
linguistic search mechanism works, e.g. that definitions for interpretation of search terms and presentation of the
result to the user are established. It is also required that the system meets the demands of usability from a
psychological point of view. If the user does not understand the functionality of TREE, it does not matter how
sophisticated the techniques behind the search mechanism are; the user will still not find any relevant vacancies. The
purpose of this study is to investigate the usability of the TREE user interface from psychological aspects of human-
computer interaction
The research tasks undertaken in the evaluation of PO3 reflect both quantitative and qualitative approaches. In all,
the use of methods served to:
     Monitor (via System log) and assess user interaction with PO3 in user testing beds across Europe.
     Establish User views and attitudes towards TREE (usability issues and information provision).
     Validate the development of PO3 as a user driven product. (Compared to the status of PO2 and other
      employment based websites).
For the PO3 user trials, a cross-site approach was adopted. Each TREE language with the exception of French was
tested. Therefore users include English, Swedish, Flemish and Finnish. Three sites, Newcastle, Antwerp and
Goteborg represent the sample. In Goteborg, the trials will include a sample population of both twenty Swedish users
and twenty Finnish users. This cross-site approach will facilitate the requirement for a comparative assessment of
user feedback and will allow for a pan-European application of these findings.
The WAMMI (Website Analysis and Measurement Inventory) questionnaire was used as an aid to assessing user
reaction to TREE. WAMMI, which was delveloped by Nomos, Sweden, is a visitor orientated evaluation tool for
designing better web sites, as seen from the eyes of the end-user. It is based on a questionnaire that aims to measure
users‘ subjective opinion related to site attractiveness, control, efficiency, helpfulness and learnability. WAMMI
enables users to tell site developers how useful and useable they found the site.
The cross-site analyses of the results of the PO3 TREE user trials are not clear-cut. Apart from the UK where the
results of the pre-set tasks, WAMMI and interviews are in the main supportive of one other, this was not the case
with Flemish and Swedish users. Hypothetically, if users score higher on system navigation performance, one would
expect there to exist higher levels of user receptivity to a system. Reflecting on the results of Swedish and Flemish
users, it is surprising that despite that on average Flemish users took twice as long to navigate TREE and as the
system log data suggested encountered more problems, they were more receptive to TREE when compared to
Swedish users.
According to the Nomos website measurement inventory, UK users (a sample that contained proportionally more
inexperienced internet users than other sites) rated TREE well above average for all criteria, assigning a figure of
74/100 for global usability. Similarly, Flemish users attributed an above average rating for global usability, which
indicates that the site speaks the user language, although for other criteria, negative values were allocated. In
comparison, Swedish users gave a rating to TREE that scored below the general average for attractiveness,
controllability, efficiency, learnability and global usability. Many of these users did however provide feedback that
contained some valid constructive criticism.

1.3       Impact and future prospects

TREE would appear to be well placed for exploitation, particularly in Europe. The continuing expansion of the EU
means that more and more European citizens have the right to move freely between neighbouring EU countries, and
find work there. Indeed to is one of the EU‘s goals to facilitate the ability of EU citizens to live and work in any
other EU country. Compared to some other parts of the globe (e.g. North or South America) Europe has a great
diversity of languages within a relatively small geographical area. Owing to the highly developed transport system
EU citizens can move easily and quickly between EU member states; in the process often moving from one linguistic
zone to another. It is even possible to move between linguistic zones whilst travelling inside the same European
country (e.g. Belgium, Ireland, Spain, and the UK).
However, inability to speak the language(s) in a given country can inhibit prospective employees from looking for
jobs there, which likewise reduces the pool of suitable employees for employers in that country. There is thus a need
for a system such as TREE, which can be accessed from anywhere in the world, and provides multiple linguistic
interfaces to the same group of jobs spread across a number of countries.


                                                          -4-
LE-1182 TREE                                                                                      Final Report

There is obviously a need for such a system, but there do not currently seem to be systems other than TREE with this
profile. There are a number of systems offering jobs in one or more countries, but with a UI only in a single
language. There are thus a number of possibilities for exploiting TREE. Partners could host a TREE system and
charge large employers, national employment services, commercial employment agencies, etc. to carry their
vacancies. The software could equally be sold or franchised to a third party to host their own site. Outside Europe,
any other multi-lingual country could be a target for exploitation of TREE, Canada being a strong possibility.

2.        Project Timetable
2.1       Contractual matters

The TREE contract commenced in the late November 1995, and terminated on 20 November 1998. The main change
to the composition of the consortium occurred at the end of 1997 when VDAB withdrew from the project. This
created a problem, as VDAB (the Belgian employment service) was a major provider of job vacancy information.
Fortunately talks between GU and AMS (the Swedish employment service) subsequently resulted in AMS‘s
participation in the project, as a provider of job vacancy information.

2.2       Stages of work

The progression of the TREE project (as defined in the Technical Annexe) is based upon a three-iteration cycle of:
     implementing a prototype
     testing the prototype
     validating the prototype by means of user-trials, and feeding the results back into the next iteration
The main milestones of the project are the production of the three prototype systems, and their associated user trials
(see table below).
WP               WP Name                                From - To             Comments
Number
WP 0             Project Management                     Nov. 95 – Nov.        Co-ordinate and manage consortium.
                                                        98                    Report to CEC.
WP 1             Requirements Baseline                  Nov. 95 – May 96      Study user requirements; produce TREE
                                                                              functional specification. Risk analysis.
WP 2             Specification, Implementation and      May 96 – May 97       First iteration of implement-test-validate
                 Assessment                                                   cycle.
WP 3             Specification, Implementation and      Feb. 97 – Oct. 97     Second iteration of implement-test-
                 Assessment                                                   validate cycle.
WP 4             Intermediate Assessment                Aug. 97 – Nov.        Investigate possible TREE products; alter
                                                        97                    work, exploitation plans accordingly.
WP 5             System Refinement                      Jan. 98 – Aug. 98     Third and final iteration of implement-test-
                                                                              validate cycle.
WP 6             Wider Validation and                   Sept. 97 – Dec. 98    Publicise TREE. Validate with additional
                 Dissemination                                                users and feedback to WP 5.
WP 7             Exploitation, Promotion                April 97 – July 98    Reassess user requirements, dissemination,
                                                                              and promotion. Feedback to WP 5.

2.3       External reviews

Two external reviews took place during the course of the project. The first on 14-15 January 1997 on the MARI site
at Ashington, Northumberland, UK. The second review was on 18 March in Luxembourg.
The results of these reviews are summarised in the ‗Evaluation and Assessment‘ section (below).

2.4       Conferences, exhibitions, user group meetings

TREE was presented at the Association for Computational Linguistic's Fifth Conference on Applied Natural
Language Processing in March/April 1997 (pp 269-276) Washington DC. USA.
      'Multilingual Generation and Summarization of Job Adverts: the TREE Project' (H.Somers, B.Black, J.Nivre,
      T.Lager, A.Multari, L.Gilardoni, J.Ellman, A. Rogers)
The project was also presented informally by Somers at the Australian Natural Language Postgraduate Workshop
held at the University of Melbourne, January 1998, as part of the Australian Natural Language Processing Fortnight.

                                                            -5-
                                                                                Language Engineering Project LE-1182 TREE

Also, see the table under the heading ‗project-level dissemination, awareness, publications, etc‘.

2.5      Other important events

See the table under the heading ‗project-level dissemination, awareness, publications, etc‘.

3.       Achievements
3.1      Demonstrator system or service
3.1.1 main functions supported
The purpose of the work has been to produce a system in which the job seekers can look for job vacancies, and get
job adverts, presented in a language of choice, irrespective of what language the advert originated from. As a part of
this, an on-line search function for European job adverts has been developed, intended for use on the Internet.
In addition, TREE provides users with links to many different soucres of information, of partivcular relevance to
those who are looking for work in another EU country. This information covers a range of topics such as
employment law, housing, legal status, etc.


3.1.2 technologies and components

3.1.2.1           Introduction
The TREE system has evolved throughout the course of the project. Three successive prototypes have been built; the
first (P01) based on requirements from user groups, and the others (P02 and P03) based on feedback of users from
the previous version. The scope of the job areas covered has increased dramatically from the Tourism and Leisure
domain (P01) to all job sectors (P02 and P03). The technological base has changed from mSQL (P01), to Oracle and
the Oracle Web Server (P02 and P03). The architecture also changed significantly between P02 and P03 to counter
the instability caused by some elements of the underlying technology.
Also, the UI was changed for each version; initially in response to user opinion, and subsequently when it was found
that the resulting UI made it very difficult for users to actually select the jobs they wanted. Initial versions (P01 and
P02) were in three languages (French, Flemish, English), whereas P03 also supported Finnish and Swedish.

3.1.2.2           User Interface
The UI is based on standard Web technologies (HTML and JavaScript), and is accessible through any standard
(frames-supporting) browser. It has been tested with Microsoft Explorer 4 and Netscape Communicator 4.05.
The UI now (i.e. P03) supports five languages (Flemish, French, English, Finnish, Swedish), as against three
languages in previous versions.
The core of the UI is the vacancy search mechanism. This uses a hierarchy of list-boxes for the selection of job titles,
and clickable maps to select the desired locations. The search results in a list of hyper-text links (of the format
‗jobtype in location‘) to jobs that fulfil the search criteria entered by the user. If there are no appropriate jobs in the
location specified, the geographical scope of the search is progressively widened.
Clicking on one of the hyper-text links causes details of the job to be displayed, along with the option of viewing the
original text of the advert.
The UI has evolved considerably over the course of the project. User groups who tried P01 requested a free-text
interface for the entry of search criteria in P02. As a result P02 was developed with a hybrid interface; a free-text
interface as the default, with the option of using a list-based Java applet. However, in practice the free-text approach
was found to be unsatisfactory, as most users who initiated searches failed to find any vacancies matching the job-
titles they had entered. As a result, the P03 interface adopted the strategy described above.

3.1.2.3           DB Structure and Design
The Tree System is supported by a repository designed to hold Job ads data in a language-neutral format. This is
done by storing references to language dependent items, such as the job description, by means of codes which refer
to a terminology repository holding linguistic variants and hypernym, hyponyms relations between terms.
The repository, implemented upon an Oracle RDBMS, is therefore split in two main components, one for job ads


                                                            -6-
LE-1182 TREE                                                                                     Final Report

and the other for terminology.
The job ads schema has been designed with the aim of being generic enough to enable storage of any kind of job ads.
The schema is normalised (in a relational sense) and attributes fall in two categories:
 terminology data: attributes whose value are expected to be term codes (fillers) to enable translation to different
  languages; examples are the job description, or qualifications needed
 value data: i.e. attributes whose meaning is language independent (such as numbers, for number of jobs posted,
  or street addresses).
An API is provided to enable the analyser/loader module to take incoming data and to store ads in the correct format,
while access from the search engine is via normal SQL queries.
The terminology module provides storage services for terminology entries deemed relevant to the TREE domain. For
each of the Job Schema attributes deemed to contain linguistic expressions a hierarchy (really a DAG) allowing the
expression of taxonomies of terms is maintained. For each term, uniquely identified by a code (the filler), used to fill
the schema db, it is possible to define synonyms (including multi-word linguistic expressions) in all the Tree
supported languages.

3.1.2.4           NL Generation
The generator module in the TREE system was developed with the aim of using a natural language grammar with a
simple semantic component to produce multilingual job adverts in different languages from a language independent
database representation. This approach enables flexibility when defining the output from the system, both in terms of
how the output is formatted and in terms of consistency within the generated text, depending on the database content.



                Database                       Core                       Output
               entry string                  Generator                    HTML
                                              Engine                       Text




                                      Text              HTML
                                   Grammars           formatting




               From database entry to HTML. The picture shows a
               schematic overview of the Natural Language Generator of
               the TREE sytem.
The generator module consists of the following parts: a core generator engine and a module for HTML encoding,
which are language independent, and a grammar, terminology lexicon and postprocessor, specific to each language.
The generator as such is written in Prolog with some I/O code written in C. In the current version of the prototype
the generator is run as a standalone application compiled into binary code using a Warren Abstract Machine
representation of the generator constructed using BinProlog 5.75.
Text Grammars
The grammars developed for the TREE project provide HTML-coded texts to be displayed as the result of searching
a job database. The input to the generator is a string representing the general structure of the database entry for a
specific job advert together with the terminology that applies to some particular job. From the internal structure the
generator then builds a syntactically well formed text in the current language using the terminology and grammar.
The input string reflects the relational structure of the database. The data from the input string is stored internally in
the Prolog engine which operates on its own internal database. Only minor consistency checking is performed during
the generation process to ensure that there is enough data to generate at least some vital information such as the job
title and the location. The way the generator works it will suppress parts of the advert text in cases where there is no
string data or when the language specific term cannot be found. This way generated adverts only show information
which add something which is relevant to the advert, or which is at least explicitly specified.
The generator starts by identifying the type of advert to generate. In this version of the system a short version and a
longer version can be generated. The short version only contains information about the job title and the location of

                                                           -7-
                                                                              Language Engineering Project LE-1182 TREE

the job. Using the English generator a longer advert might appear as below:


         Linguist in Stockholm (Number of jobs: 2)
            * Description:
                Linguist
            * Education requirements:
                + University education
            * Specifications:
                Work time: full-time day - 40 hours per week - 5 days per week
                Duration: full-time
                Age: from 22 to 28 years
                Experience: some/extensive experience
                Salary: per month, 12000 - 27000 SEK
                Language skills: English (excellent), Dutch (excellent), Danish (good)
            * Contact:
                   in writing with CV for the attention of
                   J. Doe
                   Foo AB
                   Box 2402
                   12345 STOCKHOLM
                   http://www.foo.se
                   doe@foo.se


The grammar rules have the following form, where B is a grammatical category and C is a constraint on this
particular rule (in effect a Prolog call to the database).
Head ---> B1, ..., Bn # C1, ..., Cn.
Optionally the body statement can be a list of words.
For instance the grammar rule to present the URL for the company
looks like this:

       text:sem(url, Vac) --->
         ['<A HREF="'], [Url], ['">'],
         [Url],
         ['</A>']
          #
          url(Vac, sem(url, Url)).

The last clause has the form url(ID, Sem) and is a call to one of the interface predicates to the internal database.
When there is value specified in the slot URL in the database the string is returned and the HTML-code to link to
the employer is returned and shown in the advert:


           <A HREF="http://www.foo.se">http://www.foo.se</A>
By combining the special functionality of the core generator with the full strength of the Prolog interpreter it is
possible to use Prolog debugging features and control predicates to support the development of grammars.
Concluding Remarks on Generator
The process of grammar development is highly dependent on the target language. For the case of Swedish the
grammar was constructed using the English generation grammar as a model. Only minor changes needed to be done.
Finnish however needed great consideration e.g. in order to handle case properly. The fact that there was no Finnish
Prolog programmer available required very close co-operation with a Finnish translator. But given that there are
language experts available in the language for which a grammar is to be written the need to use programmers fluent
in the language for each new grammar is not strictly necessary.




                                                             -8-
LE-1182 TREE                                                                                     Final Report

The aim for the generator module was to provide a way to make it possible to use well known linguistic technology
to generate job adverts in natural language. The generator engine as such is not domain dependent nor is the
environment it was implemented in. The database interface, terminology and language grammars need to be
developed but other modules can be reused without much effort. This makes it possible to transfer technology to
other parts of the software industry.

3.1.2.5           Analysis and NLP Issues
The Tree System is designed to deal with large databases of job advertisements, specifically those of AMS and
VDAB. These databases represent job adverts in a structured form, with a number of fields containing different types
of information e.g. job titles, location, minimum age, qualifications etc, and with a number of different coding
conventions used to represent the information e.g. plain text, numerical codes.
For each database a program reads the adverts from a text file dump. Fields will be skipped if the information is not
to be used by TREE. Otherwise the information will either be converted into TREE codes if there are language
independent terms to represent the information, or left as it is, if the information will be the same in all supported
languages e.g. numbers, or text that will not be translated such as addresses.
Finally, this "TREE format" information is input into the job schema database via a 'C' API.
A further version of the analyser was developed to allow NCC job vacancies, input via a Lotus Notes form
developed as part of the project, to be automatically sent by e-mail, and inserted into the database.
As the scope of TREE has become more ambitious with the transition from PO1 to PO3, the role of analysis in
TREE has changed and with it the nature of the analysis module. Paradoxically, given the expansion in TREE's
technical ambitions, the demands made on the analysis module have become fewer, and in particular in PO2 and
PO3, the new technology of Information Extraction was not used to anything like the extent that was envisaged at the
start of the project. The reasons for this change are briefly outlined below. A brief description of the original
assumptions concerning the role of analysis in TREE will first be given and then the reasons why this was not
followed through in PO2 and PO3.
Information Extraction
It was originally assumed that TREE would cover just one employment area namely the hotel and catering industry.
The number of job titles would correspondingly be limited, as would likely types of employers, e.g. hotels
restaurants etc., the likely qualifications and skills required, and probably the nature of the benefits. Indeed, it would
not be unreasonable to talk about typical catering jobs as opposed to typical jobs in the computing industry. It was
also assumed when TREE was first started that the system design would permit users offering jobs to submit via an
e-mail feed job advertisements more or less without restrictions.
To deal with this restricted range of job advertisements, the analysis technique chosen fell into the relatively new
paradigm of analogy or example-based processing. In the following paragraphs we explain the analysis process and
discuss our reasons for preferring this to a more traditional string matching or parsing approach.
Example-Based Processing
The input that the TREE system will accept is partially structured, but with much scope for free-text input. One
possible way of analysing this would be to employ a straightforward pattern-matching approach, searching for
"trigger phrases" such as employer:name is seeking job-title, with special processors for analysing the slot-filler
portions of the text. This simple approach has certain advantages over a more complex approach based on traditional
phrase-structure parsing, especially since we are not particularly interested in phrase-structure as such. Furthermore,
there is a clear requirement that our analysis technique be quite robust: since the input is not controlled in any way,
our analysis procedure must be able to extract as much information as possible from the text, but seamlessly ignore -
or at least allocate to the appropriate "unanalysable input" slot - the text which it cannot interpret.
However, both these procedures can be identified as essentially "rule-based", in the sense that linguistic data used to
match, whether fixed patterns or syntactic rules, must be explicitly listed in a kind of grammar, which implies a
number of disadvantages, which we will mention shortly. An alternative is suggested by the paradigm of "example-
based" processing (Jones, 1996) now becoming quite prevalent in MT (Sumita et al., 1990; Somers, 1993) though in
fact the techniques are very much like those of the longer established paradigm of case-based reasoning.




                                                           -9-
                                                                              Language Engineering Project LE-1182 TREE

A flexible approach
In the example-based approach, the "patterns" are listed in the form of model examples, such as typical hotel and
catering job advertisements from a database that would be used by TREE. Semi-fixed phrases are not identified as
such, nor are there any explicit linguistic rules. Instead, a matcher matches new input against the database of already
(correctly) analysed models, and interprets the new input on the basis of a best match (possibly out of several
candidates) robustness is inherent in the system, since "failure" to analyse is relative.
The main advantage of the example-based approach is that we do not need to make explicit what the linguistic
patterns look like. Instead, the common patterns will be implicit in the database. To see how this works to our
advantage, consider the following. Let us assume that our database of already analysed examples contains an
advertisement which includes the following: Knowledge of Dutch an advantage, and which is linked to a schema
with slots filled roughly as follows:
         SKILLS:LANGUAGE:LANG:nl
         SKILLS:LANGUAGE:REQ:"an advantage"
Now suppose we want to process advertisements containing the following texts:
         Knowledge of the English language needed.(1)
         Some knowledge of Spanish would be helpful. (2)
         Very good knowledge of English. (3)
In the rule-based approach, we would probably have to have a "rule" which specifies the range of (redundant)
modifiers (assuming our schema does not store explicitly the level of language skill specified) that fillers for the req
slots can be a past-participle, a predicative adjective or a noun, and are optional, and so on. Such rules carry with
them a lot of baggage, such as optional elements, alternatives, restrictions and so on. The biggest baggage is that
someone has to write them.
In the example-based approach, we do not need to be explicit about the structure of the stored example or the inputs.
We need to recognise Dutch, English and Spanish as being names of languages, but these words have
"terminological status" in our system. If the system does not know "would be helpful", it will guess that it is a
clarification of the language requirement, even if it may not be able to translate it. Furthermore, we can extend the
"knowledge" of the system simply by adding more examples: if they contain "new" structures, the knowledge base is
extended; if they mirror existing examples, the system still benefits since the evidence for one interpretation or
another is thereby strengthened.
The matching algorithm
The matcher, which has been developed from one first used in the MEG project (Somers et al., 1994) processes the
new text in a linear fashion, having first divided it into manageable portions, on the basis of punctuation, lay-out,
formatting and so on. The input is tagged, using a standard tagger, e.g. (Brill, 1992) There is no need to train the
tagger on our text type, because the actual tags do not matter, as long as tagging is consistent.
The matching process then involves "sliding" one phrase past the other, identifying "strong" matches (word and tag)
or "weak" (tag only) matches, and allowing for gaps in the match, in a method not unlike dynamic programming. The
matches are then scored accordingly. The result is a set of possible matches linked to correctly filled schemas, so that
even previously unseen words can normally be correctly assigned to the appropriate slot.
The approach is not without its problems. For example, some slots and their fillers can be quite ambiguous: cf.
moderate German required vs. tall German required (!) while other text portions serve a dual purpose, for example
when the name of the employer also indicates the location. However, the matcher is extremely flexible, and if on-line
or e-mail feedback to the user submitting the job advertisement were to be assumed, this should means that the
analysis module can degrade gracefully in the face of such problems.
Information Extraction in PO2 and PO3
The change from PO1 to PO2 saw a major change in the nature of job input and in the nature of the jobs themselves.
The principle source of jobs would no longer be individuals or companies e-mailing in their job advertisements; it
would be unlikely that the prototypes would accumulate a sufficiently impressive database of job advertisements if
this had remained the case. Instead, a large existing database of job advertisements was to be used. Furthermore, it
was decided to allow job advertisements in all job sectors. Both changes could be accommodated by bringing in the
Flemish Employment Exchange VDAB.
The VDAB job database is structured with different fields containing different types of information e.g. job titles,
location, minimum age, salary, qualifications and so on. One field contains up to 270 characters of free-text for


                                                          -10-
LE-1182 TREE                                                                                    Final Report

additional information that the employer might wish to give.
As far as information extraction was concerned there were a number of consequences of this change. Firstly, for
almost all the information that we would wish to extract from a job advertisement, the task had already been done,
and the task for analysis was of converting a VDAB code or representation into the TREE code. To this end, a
program was written that would either "skip", "convert" or "copy" the information in the VDAB fields into the
TREE Schema.
There remained the free-text fields in the VDAB database and it was considered using the MEG analyser to extract
information from these. This proved unsuccessful for two reasons:
Firstly, and most importantly, there was no information in the free-text fields that corresponded to information slots
in the TREE schema; these could all be filled using the information in the structured part of the VDAB database.
Secondly, example-based information extraction requires a corpus of typical examples. However, the point about the
VDAB free-text fields was that information that didn't fit in anywhere else went in these fields. All the standard
aspects of the job advertisement had already been placed in their own fields. Consequently what was typical of the
free-text fields was their very lack of typicality. The option of adding to the TREE schema slots the sort of
information that could be extracted from the free-text fields did not occur.
The work on example based information extraction was not totally abandoned at this stage, although the effort
expanded on this particular task was reduced. Instead, work continued on converting the MEG analyser, and a search
was made for job databases that would be broadly comparable to VDAB in terms of size and type of job covered, but
which would be predominantly free-text.
It must be stressed that compared to the original aims, this was an ambitious undertaking. The move to all job sectors
requires a very much larger corpus of model job advertisements than would be the case if we were only dealing with
the hotel and catering trades. Although some aspects of a job advertisement are likely to be common to all job
advertisements, one would not expect an advertisement for the chief executive of a large company to be formatted in
the same way as an advertisement for casual bar staff. Then, once the model examples have been collected, they need
to be analysed correctly by a human. This involves taking the list of schema slots, which represents the information
that TREE considers worth extracting, and then determining what portion of each job advertisement corresponds to
any of the schema slots.
Job Hunter
A search for large databases of job advertisements in predominantly free-text form was made. Surprisingly few
instances were found. Job Hunter < http://www.jobhunter.co.uk>, however, was one such case. This is a compilation
of the job advertisements that appear in most of the principle regional newspapers in the UK.
Unfortunately, using Job Hunter proved much harder than expected for a number of reasons:
     There appeared to be "house styles" with a job from one newspaper having much in common in terms of style
      and layout with other advertisements from the same newspaper. This meant that model examples ideally needed
      to be found for the different newspapers as well as for many of the different job sectors and so a large corpus of
      examples was required. Consequently, a long time was spent associating advertisements with their correct TREE
      slots. A total of 300 jobs were so analysed for a corpus, but the results when used were very poor, suggesting
      that the corpus wasn't large enough.
     Much of the information, in particular geographical and contact information could only be understood by
      someone with local knowledge. This reflects the fact that the advertisements originated in local newspapers. For
      example, the first part of a telephone number representing the local area may be omitted if telephoning within
      the same local area.
These problems together with the fact that many large databases of jobs were already structured lead us to abandon
example-based information extraction for PO3.

3.2       software and hardware requirements

The delivered TREE prototype is based on an Oracle rdbs, which is fed or accessed for maintenance by application
programs mostly written in C/C++/PL-SQL/Java/Prolog (integrated into C) and accessed by Web based services
implemented on an Oracle Web server with html/Java script.
Current prototype run on Sun Workstations (Solaris 2.5 or higher) with Oracle 7.3/8. A Java 1.1 compliant virtual
machine is needed for running a tool to inspect terminology, while modification to software modules could require a
Java development environment (Symantec Visual Café 2.5 used, but others allowed as well), C/C++ compilers (gcc
2.7.2 or higher) and Prolog compiler BinProlog 5.75.


                                                          -11-
                                                                               Language Engineering Project LE-1182 TREE

The system could be made available on different Unix architectures provided availability of Oracle and SW
compilers. Moving to different RDBMS (e.g. Sybase) or HTTP servers (e.g. Apache) do not pose architectural
problems. However, major changes would be necessary, as another mechanism would have to be written to replace
the PL/SQL and its supporting infrastructure.

3.3       prerequisite software and portability

The software produced by the TREE consortium is strongly tied to the particular software packages on which it is
based: principally the Oracle database server (version 7.3), the Oracle Web Server and the PL/SQL Agent. Although
all major database products are based on ANSI-standard SQL, they all offer their own non-standard SQL extensions.
Likewise they all have their own customised API and/or other access mechanism (PL/SQL in Oracle‘s case). This
means that portability is not always that simple a goal to achieve.
Oracle was chosen because it is an industrial-strength product, which offered an integrated Web-database solution.
PL/SQL is a procedural extension of SQL (itself a non-procedural language). PL/SQL procedures can be compiled
and stored in the Oracle database. They can then be run using the PL/SQL Agent. The Oracle Web Server can be
configured to recognise which cgi-type requests are to PL/SQL procedures/packages. It then channels these requests
to the PL/SQL Agent.
It is thus possible to write PL/SQL procedures/packages which accept input data from an HTML form, use this data
to compose SQL queries to the database, and return a fully formatted HTML page to the browser to display the
result.
The above functionality was found to be a powerful tool in the development of the TREE software (and certainly
preferable to the immensely convoluted Oracle ‗C‘ API which was another alternative). However, the use of these
technologies does mean that to move to another web server and/or database would require that significant portion of
the TREE code be rewritten in some other language or script.
On the other hand, a number of elements are wholly or largely portable:
     the HTML (in five linguistic variations)
     the scripts for the creation of the database schema
     the hierarchies of job-codes (and other terms) translated into five languages



3.4       Research results
3.4.1 key research areas addressed and advances made
There was considerable pressure from users (VDAB, NCC) to cover all employment areas and build a useful and
viable system for multilingual access to job vacancy information. The project was consequently less innovative and
experimental than planned with regard to research objectives. However, two areas deserve to be mentioned.
The first is the area of text generation, where the approach developed in the TREE project has some interesting
features. First of all, it is an integrated approach, combining not only grammar rules, templates and canned text into a
single formalism, but also text planning, sentence planning and sentence realisation into a single efficient process of
text realisation. Secondly, it is a structure-driven approach, which means that the generation process is guided
primarily by the aim of generating a well-formed text, with semantic content being instantiated and refined as a ―side
effect‖, using semantic database constraints as restrictions on the applicability of (text) grammar rules. We believe
this to be a very useful and efficient strategy in applications where the generated texts have a predictable and fairly
rigid (though not completely fixed) structure. The approach is currently being tested on other domains, and a lengthy
article presenting the results is under preparation (Nivre, J. & Lager, T. Constraint-based text generation: A
structure-based approach. Manuscript).
The second area is that of information extraction, which was originally intended to play a major role in the TREE
system. However, for reasons already outlined above, the emphasis of the project shifted in such a way that very little
effort has in fact been devoted to this problem (cf. section 1.1.2.5).

3.5       Other results
3.5.1 project-level dissemination, awareness, publications, etc
Consortium members participated in a number of activities to disseminate information on the TREE project, and to
promote the exploitation of the product:



                                                            -12-
LE-1182 TREE                                                                        Final Report



Meeting                                      Date                   Attendee(s)    Location
Project Line Conference                      11-12 January 96       MARI           Luxembourg
LE Concertation Meetings                     November       1995–   MARI           Luxembourg, etc.
                                             November 1998
UK IGC                                       6-7 June 96            NCC            Newcastle
NCC: Exploitation Meeting with               Sept 96                NCC            Newcastle
Siemens Electronics.
presentation to LE symposium                 October 1996           GU             Göteborg
―Communication or Cacophony?
Opportunities for Language Engineering
in Information Society‖
presentation at national conference at the   January 1997           Quinary        Rome
Ministry of Telecommunications
 EURHOTEC Trade Show — the pan-              February 1997          NCC            Amsterdam
European Hotel Technology Exhibition
and Conference
Reed International                           25 February 97         MARI           Newcastle
presentation (including demonstration) at    March 1997             MARI
LE Concertation Meeting
presentation at the UNICOM Conference        March 1997             MARI           London
―Natural Language Processing:
Extracting Information for Business
Needs‖(Published paper.)
presentation at the Association for          3-4 April 97           UMIST          Washington      DC,
Computational Linguistics Fifth                                                    USA
Conference on Applied Natural
Language Processing
participation at the ICT Conference, IEE     April 1997             NCC            Birmingham
(Institute of Electrical Engineers)
presentation at exhibition on business       June 1997              Quinary        Rome
information services (Published paper.)
Gateway to Europe Conference                 October 1997           NCC            Newcastle
presentation the 3rd ERCIM Workshop          November 1997          MARI           Obernai.
―User Interfaces for All‖
presentation at the Australian Natural       January 1998           UMIST          Melbourne
Language Postgraduate Workshop
London Enterprise Agency                     2 February 98          YN             London
Centrepoint Streets Ahead Job Agency         3 February 98          YN             London
St Basil‘s Foyer ENTA Training Centre        11 February 98         YN             Birmingham
London Connection                            26 February 98         YN             London
Joint Meeting – Anti-Poverty Advisory        27 February 98         NCC            London
WG
Telematics Applications Conference           March 1998             NCC, Quinary   Barcelona
Stamford Foyer                               6 March 98             YN             Stamford




                                                            -13-
                                                                             Language Engineering Project LE-1182 TREE




Meeting                                     Date                     Attendee(s)             Location
YMCA England Training for Life              18 March 98              YN                      London
Centrepoint Foyers                          19 March 98              YN                      Peterborough
NACRO                                       1 April 98               YN                      London
Newcastle Foyer                             9 April 98               NCC, YN, MARI           Newcastle
GOSIP                                       29 April 98              NCC                     Newcastle
demonstration at language technology        May 1998                 GU                      Göteborg
exhibition
Foyer Federation                            6 May 98                 YN                      London
SAHA                                        7 May 98                 YN                      London
Europe for Youth Conference                 8/9 May 98               YN                      London
Offenders Employment Forum                  18 May 98                YN                      London
Centrepoint – New Deal                      26 May 98                YN                      London
presentation at Transnational EVS           June 1998                NCC
Seminar ―On the Move‖ with French
and UK Foyers, German
Jugendwohnheim and youth projects and
Danish projects, and SOS (DGXXII)
Gateshead Youth Organisations Council       11 June 98               NCC                     Gateshead
Tyneside Foyer                              11 June 98               NCC                     Newcastle
National Foyer Federation Conference        July 1998                NCC                     London
Promotion of TREE at meeting with UK        July 1998                by NCC                  Newcastle
cabinet minister
Presentation to Youth Exchange Centre       September 1998           NCC                     Newcastle


3.5.2 Contribution to standards making, licenses and patents generated.
Tree has conformed to the standards of publishing on the World Wide Web. No patents have been generated.

4.       Evaluation and Assessment
4.1      Validation

The results of the user trials were encouraging, although not uniform across all language groups.
The following feedback came for example from the UK user trials:
User's First Impressions: 90% reacted positively to the TREE front-end. In their accounts, most commented that
         TREE appeared as simple and basic to use. Only 10% expressed that TREE appeared daunting at first
         glance.
Information Content; Language and Terms: All users understood the use of language and terms used in TREE.
         Many expressed that the language and terminology was straightforward and catered for an open audience of
         people.
Vacancy Details: Whilst the majority felt that the details provided on TREE were well presented and met their basic
         information requirements, most of these went onto say that more information was needed particularly if the
         job required reallocation to another EC country.
Navigation: 95% said that in general terms, they did not find TREE difficult to navigate.
Functionality of Icons: All users fully understood the purpose of the icons. Some of these said that this reflected the
         demonstration provided by the reviewer.
Interactive Map: Over one half of the sample thought that the map was very user friendly. 45% of users were less
         receptive to the map mechanism. Most of these made references to the difficulty associated with identifying
         towns and specific areas within a city.
Search: User Friendliness: All respondents expressed that the structured search tool was a helpful method for
         retrieving relevant job vacancy details. 95% indicated that they found it very easy to use.
Search Tool Preference: 70% said that they preferred the existing structured search facility rather than a free text
         entry tool. 25% of respondents expressed a preference for a free text entry search tool and one user
         recommended that TREE should have both methods available to users, as both are complimentary to one
         another.


                                                          -14-
LE-1182 TREE                                                                                     Final Report

Belgian users were significantly more sophisticated, and gave specific advice on fonts and use of icons. For example
it was reported that the Dutch flag does not represent Flemish to Belgians. Their reaction to TREE was not as
positive as that of UK users (although still acceptable), possibly because they suffered slow access to the system.
Swedish users were less positive about TREE.

4.2       Feedback

Feedback was obtained from both formal reviews (of which there were two), and user groups.
Review reports
Review Jan 97
Following the 2 day review the reviewers agreed that the project should continue but with the proviso that a number
of suggestions should be followed. These proposals were as follows:
     Revisit User Requirements Survey
     Revision of the TREE business plan
     Revision of the TREE project plan
     To find a second ‗user‘ partner to replace ManPower.
 The reviewers expressed a generally favourable attitude towards the aims of the project, the cohesion and technical
abilities of the consortium and the technical progress made. One specific comment concerning the user-interface of
the demonstrator was that users should not be allowed to enter free-text for a given form field if there were only a
limited number of valid options and that such form fields should be menu driven (thus limiting users‘ options to
those available in the menu).
Review March 98
The reviewers noted that since the last review there had been: ‗Evident commitment of the consortium to implement
the reviewers‘ recommendations‘. The overall view of the reviewers was favourable, although there were
reservations on some points.

Of the twelve headings given specific scores, two were rated ‗poor‘, seven as ‗satisfactory‘ and three as ‗good‘. The
areas criticised were ‗User involvement and commitment‘ and ‗Ability and commitment to exploit the results‘. The
points praised were ‗Promotion and dissemination activities‘, ‗Level of European added value‘ and ‗Analysis of
market sectors for application of the results, and exploitation plans‘.
The reviewers recommended that:
‗In the remainder of the project lifetime, the consortium is to take vigorous effort to stimulate broader user interest in
the project, for instance through dedicated workshops with national, as opposed to local authorities, agencies or
other institutions related to the employment market. Validation should start ASAP, and details on this should be
communicated to the EC (validation plan). Exploitation plans from the public and private users should likewise be
submitted to the EC well before the project is over.‘
User group meetings
Throughout the duration of the project, MARI Group Ltd, Newcastle City Council and (in the final year its
subcontractor YouthNet) have publicised the TREE service in an attempt to exploit its potential to employers, job
seekers, careers advisors, existing job vacancy/placement networks and other organisations, and voluntary agencies
working with young people.
Audience reaction and interest has generally been good from the non-governmental organisation (NGO) sector. The
resulting action has been varied, since levels of understanding, availability of IT resources and involvement in
international work also differ enormously between the different agencies. The lack of IT resources among certain
agencies hinders dissemination, since one method is to send information about TREE to projects using e-mail.
The educational sector, on the other hand, tends to be well equipped with IT and possess a good understanding of the
possibilities of IT and react positively. The presentation at the International Youth for Europe Conference, with
delegates from 14 countries, was received positively and led directly to contacts with Careers. A group of delegates
are also planning to use TREE to facilitate trans-national work experience exchanges.
Those NGOs with a clearly defined purpose that are part of a trans-national network in turn create comprehensive
Internet services; so the European Youth Hostel Federation has a web site, which has sites for its 14 country
members, which represent 1681 hostels. In turn they are part of the International Youth Hostel Federation which has
had 300,000 hits on its web site. Bookings can be made on-line. The European Alliance of YMCA (EAY) represents
3 million users and also has a well-set up network, right across Europe, with links to sites in 18 countries and

                                                           -15-
                                                                              Language Engineering Project LE-1182 TREE

information about YMCA facilities in another 15 countries. These agencies have tended to be receptive to the
concept of TREE and both these networks have agreed to host a link to TREE on their sites.
There was a positive reaction from the delegates at the Trans-national EVS Seminar ―On the Move‖, but most
delegates were not equipped with Internet facilities to enable them to participate within the deadlines of the EVS
programme.
Those agencies, that are starting out and are still at the stage of ―conscious ignorance‖, are trapped in a vicious
circle. Because they are not well equipped with Internet facilities, staff can not access the web, send E-mail or
participate in information exchange and therefore undervalue the value of the Internet; this opinion was voiced by
some agencies at the Foyer Federation conference.
There are also agencies which set up web sites, but they have not been maintained and have been allowed to atrophy
perhaps when there is a management change: invariably their web site remains under construction for large periods
of time, which in turn means that users do not increase.
Likewise there are variations in the perceived need for TREE, even with those agencies that have IT resources and
expertise. The spread of the use of English within trans-national agencies has restricted the need for TREE: with one
EC Department there has been a move to require all trans-national placements to be posted on a data base only in
English, rather than in more EC languages, as would be expected.
Finally, Göteborg University has had close contacts with the Swedish national employment organisation, AMS, a
large potential user of the TREE technology. This has resulted in the use of AMS terminology resources and vacancy
data for the third TREE prototype, as well as support from AMS for user trials of the Swedish version, which was
carried out with real job seekers at AMS offices. During 1998 a series of bilateral meetings between Göteborg
University has taken place, both in Göteborg and Stockholm, where one of the items discussed has been the future
exploitation of TREE. AMS seems to be interested in an extension of TREE, on the one hand for Swedish immigrant
languages, and on the other hand for the matching of job seekers as well as vacancies.

4.3      Internal collaboration

The members of the Consortium collaborated well together (see the 1998 Project Review, comment on the heading
‗Project cohesion and synergy‘ is ‗Research-industry partnership seems to work well‘).
Given that the whole basis of the TREE project was its trans-national nature, it would have been difficult to achieve
the same results with a development team drawn from any one country. Being able to specify how the system should
work depended on having viewpoints on a number of different facets of life in a variety of EU countries. Educational
systems, qualifications, unemployment/benefits systems, post codes and other location identifiers were just some of
the subjects which needed a wide overview, to avoid adopting a convention which would map well onto conditions
in one country, but would be wholly inappropriate in another.
Inevitably, collaboration between a number of different organisations adds an overhead compared to working within
a single organisation. However, few individual organisations could furnish the diversity of experience and expertise
needed for this project. As with all collaborative projects, a certain spirit of compromise and collaboration is needed,
given the lack of a hierarchical command structure. The TREE consortium managed to collaborate successfully.
The areas of expertise necessary for the project necessitated organisations from radically different backgrounds. NLP
expertise could probably only have been obtained from an academic organisation. Vacancy data required major
employment services and other large employers, and Web and database skills were most appropriately sought from
commercial companies.
The number of technical partners was not large. However, even with only four partners producing code and software
modules, the iterative cycles of integration become significantly more involved and complex. Experience of partners
has shown that even a slightly higher number of active partners (say six or seven) makes even the best managed
integration exercise a major headache.

5.       Conclusions and future prospects
5.1   Synthesis and conclusion
5.1.1 technical feasibility
The TREE prototype has been through three iterations: the architecture is now considered to be stable and reliable.
The run-time section of TREE is built from standard components; Web server, database, CGI-style script
mechanism. In P03 the implementations of the above components actually used were Oracle Web Server, Oracle
database and PL/SQL, respectively. However, with a certain amount of work it would be quite possible to port the

                                                          -16-
LE-1182 TREE                                                                                   Final Report

current TREE code onto any suitable combination of Web server, database and scripting-language.
Thus, while the current implementation of TREE is closely linked to the Oracle environment, the software could be
adapted to a different environment.
There are a number of specific issues affecting the technical feasibility of TREE. These are discussed briefly below.
Scalability
TREE is based upon standard technologies and an industrial-scale database. Given an adequate platform and
infrastructure there is no obstacle to TREE's scalability. The major areas where scalability is an issue are:
   quantity of data in the database. The major area for increase will be due to the number of vacancies held. As
    well as the vacancy data itself, each vacancy has a number of pre-generated display strings stored in the
    database. For each vacancy there are n*2 database rows of stored strings (where n is the number of languages
    supported, and ‗2‘ refers to the fact that both long and short format display strings are stored). Oracle should be
    capable of coping with any foreseeable increases in this area.
   the number of hits on the Web server. Again standard Web solutions should be able to cope with this problem.
Stability
The stability of the TREE system was greatly improved during the development of P03. In P02 there were two
different methods of searching. Both of these searches used its own Oracle Web Request Broker (WRB) Cartridge.
These two cartridges were written in ‗C‘. They used an API to access the database via PL/SQL routines that had
been compiled and stored in the database. The output from these PL/SQL routines was then fed into the generator
(written in Prolog). The output from the generator (in the appropriate language: French, Flemish, etc.) was written to
the screen: it constituted the results of the users search.
This method worked but was unstable; any run-time error in one of the cartridges could crash the corresponding Web
Listener, making the site unavailable to the outside world.
For P03 the (WRB) Cartridges were dispensed with altogether. Also the generator was only run offline. Software
was written to run the generator for each vacancy in the database, and generate the corresponding output in each
language for both long and short formats. This output was stored in a new table (generator_cache) in the database.
When a user launched a search the output for the screen could then be obtained by a series of relatively simple
database accesses, using PL/SQL procedures.
Conclusion
The overall conclusion is that the TREE architecture is realistic and practical. TREE is scalable and, given a certain
amount of effort, portable. The system is easily extensible to different languages, assuming that the necessary
linguistic expertise is available. The user interface has undergone considerable modification, but the user trials
showed that it could still be made more comprehensible and intuitive for the user. In short, that the system is
technically feasible.
5.1.2 economical viability
An initial break-even analysis for a TREE service based at ACE can be guided by current costs at ACE and some
proposed charging structures and presumed distributions of adverts and bulk space contracts.
Starting with the simplest charging scheme of 1ECU per month per job advert, the annual break-even would be
74000 adverts. This is based on three not unreasonable assumptions:
   All adverts are only live for one month.
   All adverts are single sales, not purchased through bulk ‗media space‘ contracts
   TREE has no other sources of income
In addition, if ACE hosted a free CV/resume database, then recruiters could be charged for searches. Links to
professional and trade publications should also be considered.
A TREE service is clearly viable, but much work will be needed to ensure its success. However, a shift in sales
towards bulk contracts substantially changes the break-even point in terms of the number of contracts that need to be
sold. Most of the scenarios below break-even after general costs are added.
The conclusion must be that the distribution of contract types will be critical to TREE‘s success, and that efforts
should be concentrated on the larger contracts (100 advert-weeks and above). Basing contracts on number of adverts
rather than advert-weeks should also be considered.




                                                         -17-
                                                                             Language Engineering Project LE-1182 TREE

5.2      Business perspectives
5.2.1 Anticipated development of the market sector addressed
The market sector addressed has undergone major modifications during the course of the project. These changes
have been mainly in the area of Internet-based job offering/search related services, but also due to change in
regulations through Europe. The first factor - i.e. the Web explosion, was well anticipated by the project aim of
exploiting the media. However the increase in Web usage has far exceeded the expectations of most observers. As a
consequence, generic job search services appeared and at the same time a number of companies offer job posting on
the web directly. On the regulation side, the European job market has opened new possibilities, e.g. by allowing
activities of temporary job placement agencies, which were restricted or even prohibited in the past.

Two main business models were identified. The first model is exemplified by search services, much on the line of
Web search engines / news watchers but focused on job posting, gathering job ads on the net and making them
available for search, generally supported by advertising or related services fees. The second model focused on the
needs of a single organisation, which could be a commercial company, a job agency, a public organisation or some
other agency (Tree participants, NCC, YN and AMS mostly falling in this latter category).

There are two major issues of relevance in the Tree project, i.e. firstly being able to generate multilingual
descriptions and secondly being able to search ads by a terminologically rich and sound structure. At present no
existing service which is known to the authors combines their two elements. Both could play a significant, albeit
maybe different, role, in both the above emerging business models. The ability to generate multilingual descriptions
could significantly leverage services whenever targets explicitly include different language users, like in the case of
Swedish/Finnish for AMS or French/Flemish for VDAB (not to mention potentially the EC itself!). The ability to
sort out ads on the basis of a semantically meaningful description could prove a major boost to search services when
adequately coupled with suitable analysis mechanisms. This would be the case whether or not applied to multiple
languages - this feature could prove a major improvement even on single language based services.

5.2.2 Importance attached to the project in-house; interest shown by prospective
customers
The different TREE partners attach importance to different aspects of the TREE product.
The training arm of the MARI Group has a major involvement in government sponsored training initiatives for
unemployed young people. A number of presentations of TREE have taken place government representatives and
civil servants who were very interested in the possibilities offered by TREE. Discussions are continuing over the
possibility of funding to use TREE for the dissemination of job vacancies.
UMIST's main interest in TREE has been in information extraction technologies and in the use and availability of
terminological resources. A number of outside bodies have expressed an interest in these technologies. For example,
UMIST was recently invited to exhibit the TREE system at the EUROMAP seminar being held on January 19th in
Cambridge. At this seminar there will be a number of in invited speakers from a number of fields including research,
EU, business, etc.
Quinary‘s main interest in the TREE project has been in refining technology for managing interactions between
structured (db) and unstructured (text) information in a multilingual environment; technology which could be used in
fields completely different from the one which was the primary objective of Tree (labour market).
Within Göteborg University numerous presentations of the project at internal seminars and workshops have resulted
in collaboration with other internal projects. There is also considerable interest in reusing results of the project
internally; in particular developing a generation package based on the generator developed for the TREE system. As
far as commercial exploitation is concerned, there is potential interest from AMS and the Swedish Immigrants‘
Institute to develop a customised version of TREE to support Swedish immigrant languages.
Newcastle City Council have exploited the work done during TREE to develop public access to career opportunities
within the council, which is itself a major local employer.

5.3   Exploitation planning
5.3.1 Benefits to the partners and consortium
The consortium believes that TREE is a product which has real marketing potential. Demonstrations to major
employers and employment services (including the UK and Swedish Employment Services) have provoked a lot of
interest, and contacts are continuing with a view finalising contracts. The option discussed with these organisations
was the hosting of job vacancies on the current TREE site. However, other TREE partners are now in the process of
installing TREE and the associated Oracle software. They will then be in a position to host job vacancies for third

                                                         -18-
LE-1182 TREE                                                                                    Final Report

party organisations themselves.
Another option being examined is selling/franchising TREE software to private employment agencies who want to
publicise their vacancies in different countries: present their appears to be no other system available which has
TREE‘s multi-lingual capacity.
UMIST has submitted a proposal under the title of 'ELM Employment for Linguistic Minorities' to the Engineering
and Physical Sciences Research Council (EPSRC) for funding to investigate the possibility of creating a TREE like
product for 'Non-Indigenous Minority Languages' (NIML) in the UK. It is expected that much of the TREE results
and technologies will be carried over to ELM, and contact will be maintained with other consortium members.


5.3.2 business plan
MARI
The following options have been identified as being appropriate for the market exploitation of TREE:
Hosted Service
In this configuration ACE, as a consortium member, would undertake to run and manage a TREE service, whilst
MARI would be responsible for selling the ‗TREE service‘ into the market, and would undertake all aspects of
vacancy and service management.
Alternately MARI could integrate an existing Internet/intranet service or as a stand-alone element with an existing
service. The TREE service would then be ‗branded‘ with the customer‘s name.
Licence Software
In this configuration MARI (and other partners) would ‗sell‘:
A complete solution of a hardware and software licence for TREE. Such a configuration may well require the
configuration of the customers‘ systems including all Internet links
A licence for TREE service to run with customers‘ existing service or systems or a new service to augment their
current services. The University of Gothenburg is likely to follow this route with the Swedish National Employment
Service, AMS.
Technology Skills
MARI and other consortium members would offer the skills they have developed during the TREE project, for
example in development of the database job descriptions and translation into target languages.
Linking of TREE ‗engine‘ to other management information systems such as Lotus Notes. This is the route that will
be followed by Newcastle City Council.
NCC
Newcastle City Council is preparing to go live with TREE in April 1999. Over 100 employers will use the system
including every area of the City Council and many of its associated bodies including all schools in the Newcastle
area. The TREE system will be fully integrated within the Newcastle City Council‘s Intranet and Internet sites. Via
the Intranet site TREE will be accessible to over 2,000 employees of the Council.
Newcastle City Council‘s Internet site is currently visited over 3,000 times per week. According to web site
statistics, our main visitors are from Universities and independent sector companies. The Internet site is also
available to the general public in all 19 libraries in the City and through five 24 hour on street information terminals
located across the City.
The TREE system should enable the City Council to save a minimum of £15,000 in its first year of operation through
reduced operating costs. Longer-term benefits have not been determined at present. Use of the TREE system
however will enable the City Council with it‘s commitments to ensuring equality of opportunities and making the
Council more accessible for all of its citizens.
Quinary
Although there is no current plan to directly exploit a TREE-based Job Search service in Italy, one of Quinary‘s
current three main directions for business development a concerns a language enabled information management
system, where Quinary expects to get a significant share of next year‘s revenue from both consulting and providing
system integration services. In this area, the Tree project has already been kept exposed as a clear example of a
potential area of exploitation for web related - language enabled - services, and technology and experiences
developed are a key element in Quinary marketing activities in this business area.


                                                          -19-
                                                                           Language Engineering Project LE-1182 TREE


6.      Appendices
6.1     Digitised audio-visual record of project achievements, runnable under Windows 95, 98 or
NT:
6.1.1 Screencam of TREE
A screencam of the TREE site was produced, and is included in the TREE CD-ROM.
<<MARI to produce a screencam : this is at final_report/screencam stored as
both .exe and non-exe versions. Backup versions also in PC in conference
room>>


6.1.3 Slide show
A Power Point slide show of TREE was produced, and is included       in the TREE CD-ROM.
<<MARI to produce a PowerPoint               slide   show     (has   NO   embedded    screencan)   :   this   is   at
final_report/slides/FINAL_REP.ppt>>

6.2     List of public deliverables and reports

A list of public deliverables and reports has been compiled, and can be accessed from the TREE WWW site.
<<this has been done, but the link (from the top level index.html page) is commented out until all the deliverables
are finished. File containing the list is …./P03/ui/public_deliverables.htm >>

6.3     Project leaflet and/or brochure

A copy og the TREE brochure is enclosed with this report.

6.4     Papers presented at conferences, published articles, etc
Rome January 97. Gilardoni L. presented a paper for Quinary at Conferenza sul Trattamento Automatico Delle
Lingue Nella Società dell‘Informazione - Ministero delle Poste ("Text Classification For the Financial and Business
World")..Published in a special issue of ‗La Comunicazione‘ – Istituto Superiore Poste e Telecomunicazioni.
The paper was general on LE techniques and potentialities and included a section on TREE.
TREE project was presented at the Association for Computational Linguistic's Fifth Conference on Applied Natural
Language Processing in March/April 1997Washington DC. USA, and several other conferences
 "Multilingual Generation and Summarization of Job Adverts: the TREE Project” Somers H, Nivre J, Multari
  A, Lager T , Gilardoni L, Ellman J , and Black W. in Proc ANLP 1997 (pp269-276)
 "Foreign Language Information Extraction: An Application in the Employment Domain” Ellman J, Somers
  H, Nivre J, Multari A, Lager T , Gilardoni L, Rogers A, and Black W. Proc UNICOM Workshop: Natural
  Language Processing: Extracting Information for Business Needs, March 1997
 ‗Seeing the wood for the TREEs’ Phil Turner§, Alex R. Rogers‡, Susan Turner§ and Jeremy Ellman in Proc 3 rd
  ERCIM Workshop ―User Interfaces for All‖ , Obernai.
The project was also presented informally by Somers at the Australian Natural Language Postgraduate Workshop
held at the University of Melbourne, January 1998, as part of the Australian Natural Language Processing Fortnight.

6.5     Contact list of Project User Group.Names

User groups (NCC, GU)
Arbetsmarknadsstyrelsen (AMS), SE 171 99 Solna, Sweden. Tel. +46 8 7306000 (Contact: Clas Almén.)




                                                       -20-

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:22
posted:5/27/2010
language:French
pages:20