corpas2003a

Document Sample

Stats
views:
76
posted:
2/17/2008
language:
English
pages:
12
TOURISM AND TRAVEL LAW: ELECTRONIC RESOURCES FOR

A CORPUS-BASED MULTILINGUAL GENERATION PROJECT*



Gloria CORPAS PASTOR**





1. Introduction



Internet offers a wealth of information on legal systems and documents, not only

in English as lingua franca, but also in languages with lesser web presence, such as

Spanish, German and Italian. Electronic legal resources are core to the TURICOR

project, as raw material for a corpus-based natural language generation (NLG) system

and reliable data reservoir for legal translation and comparative law. This

multidisciplinary R&D project is a joint effort of 22 researchers in Departments of

Translation and Interpreting, Documentation, Philology, History of Law and Legal

Institutions, Commercial Law, and Computing from three Spanish Universities ─

University of Málaga, the headquarters, University of Alcalá de Henares (Madrid) and

University Pablo de Olavide (Seville). The aim of this paper is to describe on-going

research with special reference to corpus building from electronic legal resources.





2. The TURICOR Project – An Overview1



In line with recent developments in corpus-based EBMT (example-based machine

translation), TM (translation memories) systems and other electronic tools (the

translator’s workbench), the TURICOR project sets out to explore the possibilities of

corpus linguistics for automatic text generation and specialised translation. Our final

objective is to develop a prototype NLG system for producing legal documents (tourism

contracts) in each of the four target languages2 in parallel. The starting point will not be

a source text in one language, but a language independent interlingua content

representation to be expressed by means of text sentences in any or all languages

selected. With this aim in mind, a multilingual corpus (both parallel and comparable)

will be compiled from tourism law websites in the Internet. A protocol will be laid out

for searching the WWW, and retrieving, encoding and storing (hyper)texts. The data

extracted from the corpus will provide researchers with a rich gamut of information

about tourism advertising strategies, restricted languages, terminology, specialised



*The research reported in this paper has been carried out in the framework of project TURICOR: A multilingual

corpus of tourism contracts (German, Spanish, English, Italian) for automatic text generation and legal

translation [TURICOR: compilación de un corpus de contratos turísticos (alemán, español, inglés, italiano) para

la generación textual multilingüe y la traducción jurídica]. (Spanish Ministry of Science and Technology, ref.

no. BBF2003-04616, 2003-2006).

** Senior lecturer in Translation and Interpreting (University of Málaga, Spain).

1

This section follows closely G. Corpas Pastor’s paper “TURICOR: Compilación de un corpus de

contratos turísticos (alemán, español, inglés, italiano) para la generación textual multilingüe y la

traducción jurídica”, in E. Ortega Arjonilla et al. (eds.), Panorama actual de la investigación en

traducción e interpretación, Vol. II, Granada, Atrio, 2003, pp. 373-384.

2

As legal systems vary in the case of transnational languages or even within the same country, language

varieties have also been diatopically restricted. Thus, the TURICOR project covers Spain, Germany, Italy,

Great Britain (England and Wales, Scotland, Northern Ireland), the Isle of Mann, Eire and United States

of America.

lexicography, comparative law, translationese, contrastive rhetoric and linguistics.

TURICOR will also prove an invaluable tool for translators’ training and the teaching of

languages for special purposes. In addition, the (e-commerce) tourism industry will

greatly benefit from both the NLG system and the knowledge and lexical databases to

be implemented within the project.



2.1. Background and Hypothesis

The Information Society3 has brought about dramatic changes not only in well-

established areas of scientific and technological development, but also in all aspects of

human life. In fact, there are plenty of websites and tutorials devoted to promoting ICT

(Information and Communication Technology) among students and ordinary citizens. A

representative example is the Internet for Information and Communication Technology

tutorial from the RDN Virtual Training Suite for further education of the University of

Bristol (UK)4 ─ a set of free, open-access online tutorials, initially designed to help

students, lecturers and researchers improve their Internet information literacy and IT

skills.

The availability and pervasiveness of this technology world wide has lead

countries to place increasing emphasis on the opportunities and the fruits promised by

information, communication and multimedia/multilingual technology in the global

village. In this new world order that is driven by knowledge and exchange of

information and ideas, surviving in the information age therefore depends on access to

national and global information networks. Hence the EC latest policies and programmes

tend to be orientated towards e-learning, e-commerce and all sorts of technology

development. In particular, European scientific research programmes are clearly geared

towards the application of language engineering to the so-called “language industries”

in Europe’s emerging language and speech technology marketplace. Within the

framework of EC multilingual and multicultural research programmes, namely MLIS-

Multilingual Information Society (1996-1999), Human Language Technologies (1998-

2002) and e-Content (2001—), a number of projects5 have been undertaken on

automated processing and translation technologies, such as corpus-and-web-based

machine translation systems6, text summarising and multilingual text generation7.



3

This report is available full text in electronic format, see eEurope: An Information Society for All.

Action Plan. Prepared by the Council and European Commission for the Feira European Council: 10-20

June 2000. http://europa.eu.int/information_society/eeurope/index_en.htm. (2 Dec. 2003).

4

http://www.vts.rdn.ac.uk/ (1 Dec. 2003)

5

Similar Spanish and European R&D projects are listed in Aguayo et al., “Traducción automática y

generación textual: herramientas, grupos y proyectos de investigación”, en G. Corpas Pastor (ed.),

Recursos documentales, terminológicos y tecnológicos para la traducción del discurso jurídico (español,

alemán, inglés, italiano, árabe), Granada, Comares, 2003, pp. 1-32. See also the HLT website: http://www.

hltcentral.org (2 Dec. 2003).

6

Among the most relevant projects are INTERLEX (Developing General and Terminological Multilingual

Databases to be exploited in the Internet from Translation Dictionaries in Electronic Format), MULTEXT

(Multilingual Text Tools and Corpora), NL-TRANSLEX (Machine Translation for Dutch and

English/French/German), TRANSACCOUNT (Translation of Annual Account and Financial Reporting

Documents between IAS and FR Accounting System), TT2 (TransType2 - Computer-Assisted

Translation), MULTEXT-EAST (Multilingual Text Tools And Corpora For Central And Eastern European

Languages), METIS (Statistical Machine Translation UsIng Monolingual Corpora).

7

Some outstanding examples are AGILE (Automatic Generation of Instructions in Languages of Eastern

Europe), APOLLO (An Open Workbench for Multinational Document Creation and Maintenance), GIST

(Generating Instructional Text), MABLE (Multilingual Authoring Of Business Letters), MANDES

(Integrated and Efficient Multilingual Document Management System with Translation and

Tourism plays an increasing important role not only in national economies and the

European Single Market, but also in the global village e-commerce8. Hence, the main

goal of the TURICOR project9 is to survey electronic resources and Internet-driven e-

contents for compiling virtual, specialised multilingual corpus10 with a view to

implementing a prototype NLG system11 capable of generating tourism contracts (TCs)

in English, German, Italian and Spanish. We are fully aware that so-called tourism law12

is the epitome of an interdisciplinary field, as Pengilley13 had rightly pointed out more

than a decade ago:

Except in the case of specific regulatory legislation such as the licensing of travel agents,

for example, there is no such thing as the law of tourism and travel … The law speaks in terms

of general principle and one has to adapt such general principle to specific fact situations in the

travel and tourism industry. There is a law of competition. There is a law of contract. There is a

law of consumer protection. All apply to the travel and tourism industry.

As in the case of travel agents licensing, some tourism contracts are governed by

specific regulations harmonised by International Laws and/or EC Directives which are

to be therefore transposed by all Member States in the form of specific laws, regulations

and administrative provisions. Within the scope of the TURICOR project, three main

types of so-called “tourism contracts” have been selected as objects of study, namely,

package travel contracts14, timesharing contracts15 and (air, sea, rail or road)







Layout/Editing Capabilities), METEO (Development and provision of multilingual information service),

MUSI (Multilingual Summarisation Tool for the Internet), etc.

8

See Directive 2000/31/EC of the European Parliament and of the Council of 8 June 2000 on certain

legal aspects of information society services, in particular electronic commerce, in the Internal Market

('Directive on electronic commerce'). Official Journal L 178, 17/07/2000, pp. 1-16. File no. 32000L0031.

http://www.europa.eu.int/scadplus/leg/en/lvb/l24204.htm (2 Dec. 2003).

9

For an overview of the project, see G. Corpas Pastor, “TURICOR: Compilación de un corpus de contratos

turísticos (alemán, español, inglés, italiano) para la generación textual multilingüe y la traducción

jurídica”, in E. Ortega Arjonilla et al. (eds.), Panorama actual de la investigación en traducción e

interpretación, Volume II, Granada, Atrio, 2003, pp. 373-384.

10

According to the Expert Advisory Group on Language Engineering Standards (EAGLES), a corpus is “a

collection of pieces of language that are selected and ordered according to explicit linguistic criteria in

order to be used as a sample of the language” (“Text Corpora Working Group Reading Guide”, EAG-

TCWG-FR-2, 1996. http://www.ilc.cnr.it/EAGLES96/corpintr/corpintr.html (2 Dec. 2003).

11

“Natural Language Generation (NLG) is the subfield of artificial intelligence and computational

linguistics that focuses on computer systems that can produce understandable texts in English or other

human languages. Typically starting from some nonlinguistic (sic) representation of information as input,

NLG systems use knowledge about language and the application domain to automatically produce

documents, reports, explanations, help messages and other kinds of texts”, in E. Reiter and M. Dale,

Building Natural Language Generation Systems, Cambridge, Cambridge University Press, 2000, p. 1.

12

On tourism law and contracts, see, for example, A. Aurioles Martín, Introducción al Derecho

Turístico: Derecho Privado del Turismo, Madrid, Tecnos, 2002; R. Caballero Sánchez (ed.), Legislación

Sobre Turismo. Madrid, Mc Graw Hill, 2000; D. Grant and S. Mason, Holiday Law, London, Sweet &

Maxwell, 2003; and M. McDonald, European community tourism law and policy, Dublin, Blackhall,

2003.

13

W. Pengilley, The Law of Travel and Tourism, London, Blackstone Press, 1990, p. 115.

14

Also package travel, package holidays and package tours or just packages in accordance with Council

Directive 90/314/EEC of 13 June 1990. Official Journal L 158, 23/06/1990, pp. 59-64. File No.

31990L0314. http://www.europa.eu.int/scadplus/leg/en/lvb/l32019.htm (5 Nov. 2003).

15

Also time-share contracts, timeshare contracts, timeshares or contracts relating to the purchase

of the right to use immovable properties on a timeshare basis, as in the Council Directive 94/47/EEC of

the European Parliament and the Council of 26 October 1994. Official Journal L 280, 29/10/1994, pp.

83-87. File No. 31994L0047. http://www.europa.eu.int/scadplus/leg/en/lvb/l32016.htm (5 Nov. 2003).

passenger transport contracts16. Other highly demanded contracts in the tourism

industry, subject to no specific regulations, such as travel insurance, hotel management,

or catering contracts, to name but a few, are also to be addressed in further stages of the

project.

In close connection with the interdisciplinary nature of the TURICOR project and

its main goal, the following integrative, full-fledged hypothesis has been adopted as a

starting point: (i) it is possible to set up a protocol for compiling specialised corpora

which are representative of a given economic sector from just Internet electronic

resources; (ii) such specialised Internet-driven corpora could then be used to solve

pressing problems of natural language processing (NLP) research in order to improve

machine translation, translation memories, natural language generation and terminology

management systems; (iii) a corpus-based multilingual NLG system is expected to

significantly contribute to boosting the economic growth and rapid development of a

particular industry sector (eg. tourism e-commerce and marketing); and (iv) specialised,

multilingual, Internet-driven electronic corpora are an added-value research tool for

spin-off studies on translation, terminology and text-linguistics, on the one hand, and

legal, economic, advertising or marketing issues, on the other hand.



2.2. Objectives and Methods

In the light of the fourfold hypothesis (see section 2.2.), our basic goal can be

further elaborated by stating three clear objectives, namely, (a) to build up a

multilingual macrocorpus (Turicor), composed of several parallel and comparable

subcorpora of tourism law documents derived from electronic resources available in the

WWW; (b) to design and implement a Turicor-based information-exchange

standardised computer programme for the automatic production of multilingual

documents; and, finally, (c) to study tourism law and the main textual forms as samples

of specialised communication in restricted registers in the four target languages.

In order to meet our first objective ─ Internet-driven corpus compilation ─,

electronic resources for law and tourism in the WWW will be located and evaluated17,

according to a validation system developed within the framework of a previous I&D

research project 18, which draws on well-known standards19. Next, a protocolised work





16

International private air transport law is most recently regulated by the Montreal Convention for the

Unification of Certain Rules for International Carriage by Air (28 May 1999), available in .html format

from http://tlc.unn.ac.uk/tlcpg.asp?pageID=5 (5 Dec. 2003). Former Conventions (Warsaw, 1929;

Geneve, 1948; Rome, 1952; Guadalajara, 1961; Montreal, 1978) and Protocols (The Hague, 1955;

Guatemala, 1971; Montreal, 1975 and 1978), as well as Chicago Acts and related Protocols can be

accessed as .PDF full-text bilingual version (English and French) from the Institute of Air & Space Law,

McGill University (Montreal, Canada) website: http://www.iasl.mcgill.ca/index2.htm (4 Dec. 2003).

17

In a previous R&D project (Ref. No. PB98-1399, Spanish Ministry of Education, 1999-2002) it was

found that it is possible at least to find and file package travel general terms and conditions from various

Spanish and German travel agents and tour operators websites. At this stage, the WWW will be searched

for package travel contracts in English and Italian, plus other types of contracts greatly demanded by the

tourism industry (passenger transport, travel insurance, hotel management, catering, on-line bookings of

air fares, hotel rooms, rental cars, etc.) in the four languages involved in the project.

18

For a detailed account of the PB98-1399 project, see the papers edited by G. Corpas Pastor (opus cit.)

and Mª E. Gómez Rojo’s review (in this volume).

19

We refer to J. E. Alexander and M. A. Tate’s Web Wisdom: How to Evaluate and Create Information

Quality on the Web, Mahwah, New Jersey, Lawrence Erlbaum Associates, 1999; and also A. Cooke’s

book A guide to finding quality information on the Internet: selection and evaluation strategies, London,

Library Association Publishing, 1999.

procedure for ad hoc corpus creation will be established following the PB98-1399

project guideless, including but not limited to, directions as to the selection of

appropriate information retrieval systems (I.R.S.) for downloading documents (legal

regulations, contracts forms and samples) in the corpus database; detailed instructions

about the composition of the varios subcorpora (diasistematic constraints of documents,

size, type and format considerations, number of languages, degree of communicative

specialisation, etc.); a set of coding tags (headers and DTD) following the TEI, as

developed in the previous project; a range of off-line web browsers to capture whole

web pages (contents and hypertextual structure) at once; and instructions about corpus

database management, alignment and concordancer tools.

The second objective ─ implementing a corpus-driven multilingual prototype

NLG system ─ will require evaluation and corpus-validation of current state-of-the-art

NLG, EBMT and TM systems according to EAGLES standards20. For sentence

planning and generation, a domain interlingua ontology will be constructed upon a set

of general and tourism law concepts to be contrasted with any translation units obtained

after automatically aligning and segmenting bi-texts. Finally, a multilingual NLG

software will be developed on the basis of the language-independent grammar and a

relational database, plus a combination of fuzzy matches algorithms and example-based

MT systems.

As a by-product of the two former objectives, the last objective involves

exploiting the data collected during the three-year project in various ways related to

discourse, communication and law. For example, corpus management will provide

invaluable data for terminological databases and formal text prototypes; the legal

discourse of the tourism industry will be finely characterised; national, communitary

and international specific regulations governing tourism contracts will be reviewed and

compared; bilingual sub-corpora of original documents and its corresponding translated

(or target) texts will allow research into translationese, legal translation teaching and

transgenre; and even the prototype NLG system might serve as basis for further

software development in the areas of Translation Technologies and Internet Access

Devices (IADs) for e-commmerce and e-advertising (as it is rare to see an e-commerce

website without e-advertising!).





3. ‘Package holidays’ regulations in Eire: a case study



As described in the previous section, a project major objective is to mine the

Turicor macrocorpus from the World Wide Web automatically21. The Turicor

multilingual parallel subcorpus is a bi- or multilingual corpus of originals and their

translations into one or more languages. It will include strictly related ‘mirror’

documents in the project four target languages: (a) communitary tourism and travel law

regulations; (b) any bi- or multilingual related websites retrieved from Internet

(legislation, reference, forms and contracts); and (c) translations from professionals or



20

As redefined by Hovy et al. (eds.). Multilingual Information Management: Current Levels and Future

Abilities. Report for National Science Foundation, 1999. http://www.cs.cmu.edu/~ref/mlim/index.html

(17 Sept. 2003), conformant to ISO/IEC 9126.

21

Documents will be searched and retrieved from the Internet whenever possible. However, some

documents will be have to be accessed from other electronic resources (CD-Roms, for instance) or rather

scanned. In addition, it should be pointed out that access to real samples of contracts can be extremely

difficult and time-consuming

translation students. In its turn, the Turicor comparable subcorpus will encompass a

wide range of texts: (a) tourism and travel contracts samples, (b) tourism and travel

legal forms, (c) relevant travel agencies and tour operators websites, (d) domestic

tourism and travel regulations (Statutory Instruments, Acts of Parliament, Royal

Decrees, relevant judicial decisions, etc.). That is to say, original legal documents that

have been produced independently of each other in the four target languages, but that

are considered to be similar (therefore, comparable) in terms of text type, form and

function, topic, specialisation and so forth.

A plan initial stage task will be, then, to search the web for eligible documents.

To illustrate the point, we will present a case study on automatic location and retrieval

of rules and regulations governing package holidays in the Republic of Ireland (Eire).



3.1. Searching the Internet

A reliable but expensive way to access legal information in the WWW is to

subscribe to commercial services such as Westlaw22, LexisNexis23 or Celex24. However,

the money expenditure may not be worthwhile for simple research purposes, as basic

search skills unable users to have access to plenty of free electronic resources at a

mouse click.

Any reliable search requires careful selection of relevant key words and

information retrieval systems. While indexation concepts are basic for global search

engines to automatically retrieve web pages contents, appropriate choice of I.R.S. can

be of paramount importance for more structured searches. For example, package

holidays, law and Republic of Ireland have been entered as key words for a first

Boolean search query using Google25 and All the Web26. However, the results obtained

are far from expected, as they contain a lot of noise and irrelevant information on travel

forums and chats, cheap flights and accommodation offers, advertising, tabloid news,

etc.









22

http://web2.westlaw.com (5 Dec. 2003).

23

http://www.lexisnexis.com/ (5 Dec. 2003).

24

http://www.europa.eu.int/celex (5 Dec. 2003). Access to the “Expert Search” option requires a user

name and a password.

25

http://www.google.com (5 Dec. 2003).

26

http://www.alltheweb.com (5 Dec. 2003).

Fig. 1. Global Search Query (All the Web).



Attempts to narrow down the results by refining the key words tended to be

equally unsuccessful. This is partly because indexed key words do not seem to be the

real problem ─ any searches for Eire laws and regulations on package holidays have to

be redirected towards alternative information retrieval systems. A safer strategy would

be to resort to metasearch engines, such as Metacrawler27 and Highway6128, to find law

search engines:









27

http://www.metacrawler.com (5 Dec. 2003).

28

http://www.highway61.com (5 Dec. 2003).

Fig. 2. Metasearch Query (Metacrawler).



From there on, a next step will be locating websites devoted exclusively to the

Republic of Ireland legal system or just dealing with Eire legal resources as one of their

sections. A cursory look ended up with a good number of useful portals, gateways, legal

indices, resource guides and link pages, among which are the following:

AccessToLaw29, Carrow's Irish Law Links30, Legal-Island31, Lex Scripta: Legal

Megasites32, LLRX - Guide to European Legal Databases33, The Bar Council & Bar

Library of Northern Ireland34. According to Internet evaluation models, these top

quality websites would satisfy the criteria for efficient, valuable electronic information

resources, as they are updated on a regular basis (monthly, weekly or even daily), their

contents are logically ordered and accurate, identification dates of webmasters, contact

experts or official bodies are systematically provided, graphic and multimedia design is

user-friendly, related links (e.g. law databases and e-journals) are carefully selected, etc.





29

http://www.accesstolaw.com (5 Dec. 2003).

30

http://www.carrow.com/linkirish.html (5 Dec. 2003).

31

http://www.legal-island.com (5 Dec. 2003).

32

http://www.lexscripta.com/legal/omnibus/megasites.html (5 Dec. 2003).

33

http://www.llrx.com/features/europe.htm (5 Dec. 2003)

34

http://www.barlibrary.com/links.htm (5 Dec. 2003).

Most websites are provided with an internal search engine for quick reference, while

some of them offer WWW search tutorials as yet another asset.

In order to proceed with the search, we have chosen one of the aforementioned

added-value law directories: AccessToLaw. A well-structured gateway, it covers United

Kingdom (England and Wales, Scotland, Northern Ireland), the Commonwealth

(Australia, Canada, Gibraltar, Malta, etc.) and other jurisdictions, such as Channel

Islands, Isle of Man, Republic of Ireland, Europe and major World Law resources. As a

general resource, it provides links to legal search engines and gateways, learned legal

journals and reference books, law electronic libraries and publishers, on line solicitors

and barristers, professional organizations, etc. on a wide range of subject areas, such as

criminal law, ecclesiastical law, family law, international law, property or shipping law,

to name but a few.









Fig. 3. Specific search (AccessToLaw - homepage).



As regards the Republic of Ireland (Eire), AccessToLaw offers a wealth of

information within the “Other Jurisdictions” section. For instance, it includes an

electronic full-text version of the Irish Constitution of 1937, plus a list of amendments

effected since the Constitution was enacted in 1937 up to November 2002. Primary

sources can be mainly accessed via links to the Government of Ireland35 and

particularly to the Oireachtas36 and the Law Reform Commission. Acts, Instruments,

decisions, provisions etc. can be also found through the Irish Law Site hosted by

University College of Cork Law Faculty and its two database initiatives: BAILLI

(British and Irish Legal Information Institute) and IRLII (Irish Legal Information

Initiative Site); the personal site of independent member of the Irish Senate, Feargall

Quinn, and two directories for Northern Ireland and Eire law (the Legal Eagle Links

website of solicitor, D. O' Reilly, and the Legal Island Site). Primary legislation (Acts of

the Oireachtas 1997 onwards) and secondary legislation (Statutory Instruments 1922

onwards) are contained in the Irish Statute Book (1922 onwards); courts and case laws

are arranged by subjects and alphabetically37 (Irish Supreme Court and Court of

Criminal Appeal Decisions 1997 onwards; Irish High Court Decisions 1996 onwards;

Irish Competition Authority Decisions 1991 onwards; Irish Information Commissioner's

Decisions 1998 onwards), whereas it is also possible to have access to other Irish law

materials (Irish Law Reform Commission Papers and Reports 1976 onwards, full text

Parliamentary Debates 1919, Bills and Explanatory Memoranda from the Houses of

Oireachtas, latest publications and annual reports issued by central government

departments, agencies and state sponsored bodies).

Secondary sources are also well represented by numerous links to full-text

versions of electronic law journals, legal textbooks (eg. D. Whelan’s Guide to Irish

Law, 2001), publications by government departments and state organisations,

University teaching materials, dictionaries and directories of legal professions, etc. For

instance, there is direct access to the 2003 launched EPPI (Enhanced British

Parliamentary Papers on Ireland 1801-1922) bibliography database.



3.2. Retrieving and processing the data

Package holidays rules and regulations can be found in the Irish Statute Book

database (primary legislation, Acts of the Oireachtas). The Package Holidays and

Travel Trade Act, 199538 enables effect to be given to the Council Directive

90/314/EEC of 13 June 1990 of the European Communities on package travel, package

holidays and package tours. It amends39 The Transport (Tour Operators and Travel

Agents) Act, 198240. Both Acts can be cited together as The Transport (Travel Trade)

Acts, 1982 and 1995.

Once the data have been located, the next step for corpus building is to retrieve

the corresponding documents. However, automatic downloading can be impaired by the

fractal, interactive, dynamic and graphical nature of WWW hypertexts. One major

problem is the one-by-one format access to single web pages/nodes, which can turn

downloading into a complex, time-consuming effort. For example, The Package

Holidays and Travel Trade Act, 1995 in HTML version consists of four parts ─ I.

Preliminary and General, II. Regulation of Travel Contract, III. Security, IV.

Amendment of Transport (Tour Operators and Travel Agents) Acts, 1982 ─, divided

35

An internal search engine locates information from all government sites.

36

The Oireachtas (Parliament) consists of two Houses – the Dáil Éireann (the House of Representatives,

directly elected) and the Seanad Éireann (the Senate, indirectly elected).

37

The “Courts Service: Ireland” Section includes information on the Irish courts system, court rules,

court offices and law terms, plus a legal diary, press releases, publications and legal links.

38

http://www.irishstatutebook.ie/ZZA17Y1995S1.html (5 Dec. 2003).

39

Also referred to: Companies Act, 1963 (No. 33), Hotel Proprietors Act, 1963 (No. 7), Petty Sessions

(Ireland) Act, 1851 (c. 93) and Public Offices Fee Act, 1879 (c. 58).

40

http://www.irishstatutebook.ie/ZZA3Y1982.html (5 Dec. 2003).

into 34 sections (and subsequent subsections) plus schedule. Downloading the whole

document means storing all its parts and sections one after the other, either as only one

document or else as 35 shorter documents! (In fact, single sections can be retrieved

individually from the WWW).









Fig. 4. Package Holidays and Travel Trade Act, 1995 (Irish Statute Book Database).



It should be pointed out, though, that global retrieval of WWW fragmented

content can be conveniently speeded up by offline browsers able to retrieve and store

whole websites (contents and navigation design), like GNU Wget41 or WebStripper42.

Further problems relate to the loss of meaningful parts of the hypertext, such as

graphic and multimedia components and bullets, logos, banners …, missing navigation

design and reading paths, relevant formal layout and format, etc. All that (and much

more) is lost in a plain text format pre-processed for corpus management purposes,

since the next stage in the project workflow involves conversion of the .HTML

document into .TXT format. For example, the following figure illustrates Part III,

section 25 (“Insurance”):









41

Free, shareware. http://www.gnu.org/software/wget/wget.html (5 Dic. 2003).

42

http://www.webstripper.net (5. Dic. 2003).

Insurance. 25.-(1) The package provider shall have insurance under one or more

appropriate policies with an insurer authorised in respect of such business in a Member

State under which the insurer agrees to indemnify consumers (who shall be insured

persons under the policy), against- (a) the loss of all money paid over by them under

or in contemplation of contracts for relevant packages, and (b) where applicable to the

package concerned, the cost of repatriation of consumers based on administrative

arrangements established by the insurer to enable repatriation of such consumers, in

the event of insolvency of the package provider.(2) The package provider shall ensure

that it is a term of every contract with a consumer that the consumer acquires the

benefit of a policy of a kind mentioned in subsection (1) in the event of the

insolvency of the package provider.(3) In this section "appropriate policy" means one

which does not contain a condition which provides (in whatever terms) that no liability

shall arise under the policy, or that any liability so arising shall cease- (a) in the

event of some specified thing being done or omitted to be done after the happening of

the event giving rise to a claim under the policy, (b) in the event of the failure of

the policy holder to make payments to the insurer in connection with that policy or

with other policies, or (c) unless the policy holder keeps specified records or

provides the insurer with information therefrom.





Fig. 5. Package Holidays Act and Travel Trade Act, 1995, section 25 (.TXT format).



Documents retrieved from the Internet are stored in the corpus database both in

their original format (usually .HTML or .PDF) and in a plain format (.TXT) suitable for

corpus management. For each of them, a TEI-conformant DTD is provided. In this case

search, the “package travel” Act would belong to the (a) section of the multilingual

comparable corpus43 (likewise other similar domestic legislation from the remaining

countries covered in the TURICOR project). In addition, it would be stored in both

.HTML and .TXT formats, it would be conveniently identified (DTD file) and it would

include pointers to (i) type of tourism contract [“packages”], (ii) language [“English”],

(iii) type of regulation [“domestic law”] and (iv) jurisdiction [“Eire”].





4. Conclusion



This paper has provided a brief summary of the TURICOR project, with a view to

corpus building from Internet electronic resources. A search methodology has been

illustrated by means of a case study on domestic packages regulations in the Republic of

Ireland. This methodology comprises three main stages: (a) global Boolean search, (b)

law metasearch and (c) jurisdiction search. It could be successfully applied to all kinds

of legal searchers, be either domestic, international or communitary laws. In short, the

TURICOR project is beginning to open new, exiting research venues for comparative law,

legal translation, documentation and corpus-based NLP and NLG systems.



[Recibido el 6 de Diciembre de 2003. Aceptada su publicación el 14 de Diciembre de

2003]









43

Similarly, communitary regulations would belong for instance to the multilingual parallel corpus, as

they are translated into all official EC languages.


Share This Document


Other docs by mohd nizul his...
Pretexting: Your Personal Information Revealed
Views: 59  |  Downloads: 0
viewPDF
Views: 214  |  Downloads: 0
legal7
Views: 31  |  Downloads: 0
docdelMinutesFeb05
Views: 21  |  Downloads: 0
Small_Business_Guide
Views: 185  |  Downloads: 1
68166
Views: 182  |  Downloads: 3
RebuildingAmericasDefenses
Views: 69  |  Downloads: 2
GMP No 27
Views: 159  |  Downloads: 1
af58
Views: 159  |  Downloads: 0
postpn255
Views: 19  |  Downloads: 0
by registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!