Linguistic Knowledge Generator

Document Sample
Linguistic Knowledge Generator Powered By Docstoc
					                                       Linguistic Knowledge Generator
                                                          Satoshi SEKINE                  *
                        Tokyo Information and Communications Research Laboratory
                                   Matsushita Electric Industrial Co.,Ltd.
            Sofia ANANIADOU                                 Jeremy            J.CARROLL                         Jun'ichi         TSUJII
                                              Centre for Computational Linguistics
                         U n i v e r s i t y of M a n c h e s t e r I n s t i t u t e of S c i e n c e a n d T e c h n o l o g y
                                P O Box 88, Manchester M60 1QD, United K i n g d o m

 1       Introduction                                                        on human introspection. In the adaptation of exist-
                                                                             ing MT systems, linguists add and revise the knowl-
The difficulties in current NLP applications are sel-                        edge by inspecting a large set of system translation
dom due to the lack of appropriate frameworks for                            results, and then try to translate another set of sen-
encoding our linguistic or extra-linguistic knowledge,                       tences from given domains, and so on. The very fact
hut rather to the fact that we do not know in advance                        that this trial and error process is time consuming
what actual znstances of knowledge should be, even                           and not always satisfactory indicates t h a t h u m a n in-
though we know in advance what types of knowledge                            trospection alone cannot effectively reveal regularities
are required.                                                                or closure properties of sublanguages.
   It normally takes a long time and requires painful                           There have been some proposals to aid this pro-
trial and error processes to adapt knowledge, for ex-                        cedure by using programs in combination with huge
ample, in existing MT systems in order to translate                          corpora [4] [51 [13][7]. But the acquisition prog . . . . in
documents of a new text-type and of a new subject                            these reports require huge amounts of sample texts in
domain. Semantic classification schemes for words,                           given domains which often makes these methods un-
for example, usually reflect ontologies of subject do-                       realistic in actual application environments. Further-
mains so that we cannot expect a single classifica-                          more, the input corpora to such learning programs
tion scheme to be effective across different domains.                        are often required to be properly tagged or anno-
To treat different suhlanguages requires different word                      tated, which demands enormous manual effort, mak-
classification schemes. We have to construct appro-                          ing them far less useful.
priate schemes for given sublanguages from scratch                              In order to overcmne the difficulties of these meth-
                                                                             ods, we propose a Linguistic Knowledge Generator
   It has also been reported that not only knowledge
                                                                             (LKG) which working on the principle of "Gradual
concerned with extra-linguistic domains but also syn-                        Approximation" involving both h u m a n introspection
tactic knowledge, such as subcategorization frames of                        and discovery programs.
verbs (which is usually conceived as a part of general
                                                                                In the following section, we will explain the Grad-
language knowledge), often varies from one sublan-
                                                                             ual Approximation approach. Then a scenario which
guage to another [2].
                                                                             embodies the idea and finally we describe an experi-
   Though re-usability of linguistic knowledge is cur-
                                                                             ment which illustrates its use.
rently and intensively prescribed [3], our contention
is that the adaptation of existing knowledge requires
processes beyond mere re-use. T h a t is,                                    2      Gradual A p p r o x i m a t i o n
     1. There are some types of knowledge which we
                                                                             Some of the traditional learning programs which are
        have to discover from scratch, and which should
                                                                             to discover linguistic regularities in a certain dimen-
        be integrated with already existing knowledge.
                                                                             sion requires some amount of training corpus to be
     2. It is often the case t h a t knowledge, which is nor-                represented or structured in a certain way. For ex-
        mally conceived as valid regardless of subject do-                   ample, a program which learns disambiguation rules
        mains, text types etc., should be revised signifi-                   for parts-of-speech may require training d a t a to be
        cantly.                                                              represented as a sequence of words with their correct
                                                                             parts-of-speech. It may count frequencies of trigram
    In practical projects, the ways of achieving such
                                                                             of parts-of-speech in corpora to learn rules for disam-
a d a p t a t i o n and discovery of knowledge rely heavily                  biguation. On the other hand, a program to discover
     *SEKINE     is   currently    a    visitor    at    U.M.I.ST.           the semautic classes of nouns may require input d a t a
*                                                       (sentences) to be accompanied by their correct syn-

AcrEs DE COLING-92, NANTES,23-28 Ao~r 1992                           560             PROC.OF COLING-92, NANTES, AUG. 23-28. 1992
tactic structures, and so on.                                                         hypotheses based on mrperfect descriptions of
   This is also the case for statistical programs. Mean-                              corpora
ingful statistics can be obtained only when they are
                                                                                 2. hypotheses thus proposed result in more accu-
applied to appropriate units of data. Frequencies of
                                                                                    rate, less imperfect knowledge
characters or trigrams of characters, for example~ arc
unlikely to be useful for capturing the structures of                            3. tbe more accurate, less imperfect knowledge in
the semantic domains of a given sublanguage. In                                     2., results m a more accurate description of the
short, discovery processes can be effectively assisted                              corpora
or carried out, if corpora are appropriately repre-
sented for the purpose. However, to represent or tag                               The same process will be repeated from 1., but this
corpora appropriately requires other sorts of linguistic                        time, based on the more accurate descriptions of c o l
or extra-linguistic knowledge or even the very knowl-                           pora than the previous cycle. It will yield further,
edge whicb is to he discovered by the program.                                  more accurate hypothesized knowledge and descrip-
   For example, though corpora amtotated with syn-                              tions of corpora and so on.
tactic structures are usefld tbr discovering semantic
classes, to assign correct syntactic structures to cor-
pora requires semantic knowledge in order to prevent                            3       Algorithm
proliferation of Imssible syntactic structures.
   One possible way of avoiding this chicken-and-egg                            In this sec.tkm, we describe a scenario to illustrate
situation is 1o use roughly approximated, imperfect                             how our idea of the "Gradual Approximation" works
knowledge of semantic domains in order to hypothe-                              to obtain knowledge front actual corpora. The goal of
mac correct syntactic structures of sentences ill cor-                          the scenario is to discover semantic classes of nouns
pora. Because such approximated semantic knowl-                                 which are effective for determining (disambiguating)
edge will contain errors or lack necessary information,                         internal structures of compound nouns, which con-
syntactic structures   &ssiglled to s e n t e n c e s ii1 c o r p o r a         sist of sequences of nmms. Note that, because there
may contain errors or imperfections.                                            is no clear distinction in Japanese between noun
   Ilowever, if a program or Imnran expert could pro-                           phrases a n d conlllOUlld u o n n s consisting of s e q u e n c e s
duce more accurate, less imperfect knowledge of se-                             of nouns, we refer to them collectively as compound
mantic domains from descriptions of corpora (as-                                nouns. The scenario is comprised of three programs,
signed syntactic structures), we could use it to pro-                           ie. Japanese tagging program, Automatic Learning
duce more accurate, less erroneous syntactic descrip-                           Program of Semantic Collocations and clustering pro-
tions of corpora, and repeat the same process again                             gram.
to gain fllrtber zmprovcment botb in knowledge of se-                              There is a phase of human intervention which accel-
mantic domains and in syntactic descriptions of cor-                            erates the calculation, but in this scenario, we try to
pora. Thus, we may be able to converge gradually                                minimize it. In the following, we first give an overview
to botb correct syntactic descriptions of corpora, and                          of the scenario, then explain each program briefly, and
semantic classifications of words.                                              tlnally report on an experiment that fits this scenario.
   In order to support such convergence processes,                              Note that, though we use this simple scenario as an
I,KG has to maintain the following two types of data.                           illustrative example, the same learning program can
                                                                                be used in another inore complex scenario whose aim
  1. knowledge sets of various dimensions (morphol-                             is, for example, to discow~r semantic collocation be-
     ogy, syntax, semantics, pragmatics/ontology of                             tween verbs aml noun/prepositional phrases.
     extra-linguistic domains etc.), which are hypoth-
     esis,eel by humans or by discovery programs, and                           3.1      Scenario
     all of which are imperfect in the sense that they
                                                                                This scenario Lakes a corpus without any significant
     contain erroneous generalizations, lack specific
     information, etc.                                                          annotation as tile input data, and generates, ms the
                                                                                result, plausibility values of collocational relations be-
  2. descriptions of corpora at diverse levels, which                           tween two words and word clusters, based oll the cal-
     are based on the hypothesised knowledge in 1.                              culated semantic distances between words.
     Because of the hypothetical nature of the knowl-                              The diagram illustrating this scenario is shown in
     edge in 1, descriptions based on it inevitably con-                        Figure 1. The first program to be applied is the
     tain errors or lack precision.                                             "Japanese tagging program" which divides a sentence
                                                                                into words and generates lists of possible parts-of-
   Based on these two types of data, both of which                              speech for each word.
contain imperfect, the whole process of discovering                                Sequences of words with parts-of-speeches are then
regularities in sublanguage will be performed as a re-                          used to extract candidates for compound nouns (or
laxation process or a gradual repetitive approxima-                             noun phrases consisting of noun sequences), which
tion process. That is,                                                          are the input for the next program, the "Auto-
                                                                                matic Learning Program for Semantic Collocations"
  1. hunmn specialists or discovery programs make                               (ALPSC). This program constitutes the main part of

Aergs DE COLING-92, NANTES, 23-28 AOf/'r 1992                             561           PROC. OF COLING-92, NANTES, AUO. 23-28, 1992
the scenario and produces tile ahove-ment,ioned out-                       2. it learns by using a combination of linguistic
put                                                                           knowledge and statistical analysis
   Tbe output of the program contain errors. Errors
here mean that the plausibility values assigned to cob                     3. it uses a parser which produces all possible anal-
locations may lead to wrong determinations of com-                            yses
pound noun structures. Such errors are contained in
                                                                           4 it works as a relaxation process
the results, because of the errors in the tagged data,
the insufficient quality of the corpus and inevitable
                                                                           While it is included as a part of a larger repetitive
imperfections in tile learning system.
                                                                         loop, this program itself contains a repetitive loop.
   From the word distance results, word clusters are
computed by the next program, the "Clustering Pro-
gram". Because of tile errors in tile word distance
data, the computed clusters may be counter-intuitive.
We expect human intervention at this stage to formu-
late more intnitively reasonable clusters of nouns.                          Before formally describing the algorithm, the %l-
   After revision of the clusters by human specialists,                  lowing shni)le example illustrates its working.
the scenario enters a second trial. That is, the ALPSC                       A parser produces all possible syntactic descrip-
re-computes plausibilit.y values of collocations and                     tions aluong words in the form of syntactic depen-
word distances based on tile revised clusters, the                       dency structures, The description is represented by a
 "Chlstering Program" generates the next generation                      set of tupies, for example, [head uord, s y n t a c t i c
of dusters, and humans intervene to formulate more                       r e l a t ± o n , argument]. The only syntactic relation
reasonable clusters, and so on, and so forth. It is ex-                  m a tuple is MOD for this scenario, but it can be ei-
pected that word clusters after the (i+l)-th trial be-                   ther a grammatical relation like MOD, SUB,J, OBJ,
comes more intuitively understandable than that of                       etc. or a surface preposition like BY, WITH, etc.
the i-th trial and that the repetition eventually con-                   When two or more tupies share tile same argument
verges towards ideal clusters of nouns and plausibility                  and tile same syntactic-relation, but have different
values, in the sense that they are consistent both with                  head-words, there is an ambiguity.
human introspection and the actual corpus.                                   For example, tile description of a compound noun:
   It should be noted that, while the overall process                    " F i l e t r a n s f e r o p e r a t i o n " contains three tuples:
works as gradual approximation, the key program in
                                                                                 [transfer, MOO, file]
the scenario, the ALPSC also works in tile mode of
                                                                                 [operation, MOD, file]
gradual approximation as explained in Section 3.2.2.
                                                                                 [operation, HOD, transfer]

3.2     Programs          and      Iluman         interven-                 The first two tup]es are redundant, because one
                                                                         word can only be an argument in one of tile tuples.
                                                                         As repeatedly clahned in tile literature of natural lan-
We will explain each program briefly. Ifowevcr the                       gnage understanding, in order to resolve this ambi-
ALPSC is crucial and tmique, so it will be explained                     gmty, a system may have to be able to infer extra-
in greater detail.                                                       linguistic knowledge. A practical problem here is
                                                                         that there is no systematic way of accumulating such
                                                                         extradinguistic knowledge for given subject fields.
3,2.1   Program: J a p a n e s e t a g g i n g p r o g r a m
                                                                            That is, mdess a system ha-s a full range of contex-
This program takes Japanese sentences as an input,                       tual understanding abilities it cannot reject either of
finds word boundaries arid puts all possible parts-of-                   the possihle interpretations as 'hnpossible'. The best
speech for each word under adjacency constraints.                        a system can do, without full understanding abilities,
From the tagged sentences, sequences of nouns are                        is to select more plausible ones or reject less plausi-
extracted for input to the next program.                                 ble ones. This implies that we have to introduce a
                                                                         measure by which we can judge plausibility of :inter-
3.2.2   Program: A n t o m a t i c L e a r n i n g P r o g r a m
        of S e m a n t i c C o l l o c a t i o n s ( A L P S C )            The algorithm we propose computes such measures
                                                                         from given data. It gives a plausibility value to each
This is the key program which computes plausibil-                        possible tuple, based on the sample corpus. For exam-
ity values and word distances. In this scenario, the                     pie, when tile tuples ( t r a n s f e r , ROD, f i l e ) and
ALPSC treats only sequences of nouns, but it can                         (opera~:ion, MOD, f i l e ) are assigned 0.5 and 0.82
generally applied for any structure of syntactic rela-                   as their plausibility, this would show the latter tuple
tionships. It is an unique program with the following                    to be more plausible than the former.
points [8]:                                                                 The algorithm is based on the assumption that
                                                                         the ontological characteristics of the objects and ac-
  l. it does not need a training corpus, which is one of                 tions denoted by words (or linguistic expressions in
     the bottle necks of some other learning programs                    general), and the nature of the ontological relations

AcrEs DE COLING-92, NANTES, 23-28 AOOT 1992                        562          PROC. OF COLING-92, NANTES, AUG. 23-28, 1992
a m o n g tbein, are exhibited, though implicitly in sam-               2. tlypotbesis-tuples have a plausibility value which
pie texts. For example, uouns denoting objects which                       indicates their reliability by a n u m b e r between
belong to tile same ontological classes tend to appear                     0 and 1. If an instance-tuple occurs frequently
m similar linguistic contexts.                                             in the corpus or if it occurs where there are no
   Note that we talk about extra-linguistic 'ontology'                     alternative tuples, the plausibility value for the
for the sake of explaining the basic idea bebind the                       corresponding hypothesis m u s t be large. After
actual algorithm, ttowever, as you will see, we do                         analysing all the sentences of tile corpus, wc get
not represent such things as ontological entities in                       a set of sentences with weighted instanee-tuples.
the actual algorithm. T h e algorithm sinrply counts                       Each instauee-tuple invokes a hypothesis-tuple.
frequencies of co-occurrences a m o n g words, and word                     For each hypothesis-tulflC, we define the plausi-
similarity algorithms interpret such co-occurrences ms                     bility value by the following formula. This for-
contexts.                                                                   m u l a is designed so that tile value does not ex-
   T h e a l g o r i t b m in this progranr computes the plan-              ceed 1.
sibility values of hypothesis-tuples like ( o p e r a t i o n ,
M0D, I i l e ) , etc., basically by counting frequencies of                              v4 = 1. I ] I1- v,,,,)                  (2)
instance-tuples [ o p e r a t i o n , ROD, f ± l e ] , etc. gen                                          i
crated from the input data.
                                                                           At this stage, the word-distances can be used to
Terminology       and notation                                             modify the plausibility wdnes of the hypotbesiu-
                                                                           tupies. T h e word-dist>tnces are either defined
                                                                           externally using h u m a n mtuitiou or calculated
  instance-.tuple {h, r, a] : a token of a dependency rc-                  in the previous cycle with a formula given later
      lat, ion; part of the analysis of a sentence in a                    Disl`ance between words induces a distance be-
      corpus.                                                              tween bypothesis-tuples, 'lk~ speed up the cal-
                                                                           culation and to get, better resull`s, we use sim-
  hypothesis-tuple ( h , r , a ) : a dependency relation;                  ilar hypothesis eft~cts. T h e plausibility value
     an abstraction or type over identical instance-                       of a hypothesis-tuple is modified based on the
     tuples                                                                word distance and plausibility value of a simi-
                                                                           lar hypothesis. For each bypotbesis-tuple, the
  }:yale-- repeat time of the relaxation cycle.                            plausibility-vMue is increased only as a conse-
                                                                           quence of the similar hypothesis-tuple which has
   (;'r,, : Gredit of instance tuple 7' with identification
                                                                           the greatest ellect. T h e new plausibility value
       mmfl, er i, {0, 1]                                                  with similar hypothesis-tulfie effect is calculated
                                                                           10y tile following formula.
   V,,~ : Plausibility value of a hypothesis-tuple T in
       cycle 9. [0, 1]

   D~ (w.,rvb) : distance between words, w~ and wb m
     cycle 9. TO, 11                                                                                                             (3)
                                                                            llere, the hypotbesis-tuple 7" is the hypothesis-
   Algorithm                                                                tuple which has the greatest effect on the
                                                                            hypothesis-tuple "/' (origmM one). Ilypotbems-
   T h e following explanation of the algorithm assumes                     tuple T and T ' }lave all the same elements ex-
 that the i l l l n l t s ar e sentences.                                   cept one, "Fbe distance between ~" and 2/" is
                                                                            the distance betweei1 the different elements, w a
   1 For a sentence we use a simple g r a m m a r to find                   and wb. Ordinarily the dilierence is in the head
     all tuples lmssibly used. Each instance-tuple is                       or a r g u m e n t element, but when the relation is
     then given credit in proportion to the n u m b e r of                  a preposition, it is possible to consider distance
     conlpeting tuples.                                                     front another preposition.

                                      1                                     Distances between words are calculated on the
              (~ =                                          (1)             basis of similarity between hypothesis-tuples
                     number o f competing tuples
                                                                            about them. T b e formula is as follows:
      This credit sbows which rules are suitable for this
      sentence. On the first iteration the split of the                              D~ ( . . . . . ~)       Z,~, (v4 - V4,) e   (4)
      credit between a m b i g u o u s analyses is uniform as                                                       n
      shown above, but on subsequent iterations plau-
      sibility values of tire hypothesis-tuples VT    a-1 be-               7' and 7" are bypothesis-tuldes whose a r g u m e n t s
      fore the iteration are used to give preference to                     are w~ and wb, respectively and whose heads and
      credit for some analyses over others. The formula                     relations are the s a m e fl is a constant parame-
      for this will be given later.                                         ter

ACRES DE COLING-92, NANTES, 23-28 AO~" 1992                       563        PROC. OF COLING-92, NANTES, AUG. 23-28, 1992
  5. T b i s procedure will be repeated f r o m the begin-                             g r a d u a l l y converge on correct clusters by r e p e a t i n g
     ning, but m o d i f y i n g the credits o f i n s t a n c e - t u p i e s         tiffs a p p r o x i m a t i o n .
     between a m b i g u o u s analyses using tim plausibil-                               At this stage, s o m e correct clusters in the produced
     ity values of hypothesis-tuples. T h i s will hope-                               clusters are extracted. T h i s i n f o r m a t i o n will be an
     fully be m o r e accurate t h a n the previous cycle.                             input of the next trial of A L P S C .
     On the first iteration, we used j u s t a constant
     figure for the credits of instanee-tuples.                         But
     this t i m e we can use the plausibility value of the                             4         Experiment
     hypothesis-tuple which was deduced in the pre-
     vious iteration. Hence with each iteration we ex-                                 We conducted an e x p e r i m e n t using c o m p o u n d nouns
     pect m o r e reliable figures, q.'o cMcuLate the new                              f r o m a c o m p u t e r m a n u a l according to the scenario.
     credit of instance-tuple T, we use:                                               T h e resnlt for other relations, for e x a m p l e preposi-
                                                                                       tional a t t a c h m e n t , would be not so differeut f r o m this
                             c~, -          vT~'                          (s)               T h e corpus consisted of 8304 sentences. As the
                                                                                       result of J a p a n e s e t a g g i n g p r o g r a m , 1881 candidates,
                                                                                       616 kinds of c o m p o u n d nouns were extracted.
      Ilere, V.r m the n u m e r a t o r , is the plausibility
                                                                                            T h e n A L P S C took these c o m p o u n d nouns as an in-
      value of a hypothesis-tuple which is the s a m e tu-
                                                                                       put. Tuple relations were supposed between all words
      pie as the instance-tuple T V,~ in the denom-
                                                                                       of all c o m p o u n d nouns with the syntactic relation
      inator are the plausibility values of c o m p e t i n g
                                                                                       ' M O D I F Y ' . A tuple has to have a preceding a r g u m e n t
      hypothesis tnples in the sentence and the plau-
                                                                                       and a following h e a d For example, from a c o m p o u n d
      sibility value of the s a m e hyl)otbesis-tuple itself.
                                                                                       noun with 4 words, 5 a m b i g u o u s tuples and 1 firm tu-
      ct is a coustant p a r a n l e t e r
                                                                                       pie can be extracted, because each element can be the
  6. Iterate step 1 to 5 several times, until the infor-                               a r g u m e n t m only one tuple. An initial credit of 1/3
     m a t i o u is saturated.                                                         was set for each instance-tuple whose a r g u m e n t s are
                                                                                       the first word of the c o m p o u n d noun. Similarly, a
                                                                                       credit i / 2 was set for each instance-tuple in which
3,2.3      Program: C l u s t e r i n g       i)rogram
                                                                                       the second word is an a r g u m e n t .
Word clusters are produced based on the word dis-                                           No word distance i n f o r m a t i o n was introduced in
tance d a t a which are c o m l m t e d in the previous pro-                           the first trial. T h e n the learning process was started.
g r a m A non-overlapping clusters a l g o r i t h m with the                               We have shown the results of first trial in Table 1
m a x i m u m m e t h o d was used. T h e level of the clusters                        and e x a m p l e s m Figure 2
was adjusted e x p e r i m e n t a l l y to get suitable sizes tor                          T h e results were classified as correct or incorrect
bu Foau intervention.                                                                  etc.. ' C o r r e c t ' m e a n s t h a t a hypotbesis-tuple which
                                                                                       has the highest plausibility value is the correct tu-
                                                                                       pie within a m b i g u o u s tuples. 'Incorrect' m e a n s it
3.2.4      H u m a n zT~lerventzon. Select: c l u s t e r s
                                                                                       isn't. 'ludefinite' m e a n s t h a t plausibility values of
T h e clusters m a y m b c r i t errors contained in the word                          s o m e hypothesis-tuples have the s a m e value. 'Un-
distance d a t a T h e errors can be classified into the                               certain' m e a n s t h a t it is impossible to declare which
following two types.                                                                   hypothesis tuple is the best w i t h o u t context.

  1 A Correct cluster overlaps with two or more geu-
    crated clusters.

  2 A generated d u s t e r overlaps with two or more
                                                                                       j     4      tl   41     [       r /             s        r       1        I
    correct c h l s t e r s
                                                                                       I     5      II    4     I       o       I       o        I       2        I
     Note t h a t 'correcU bere m e a n s that it is correct
m t e r m s of truman intuition. To ease the laborious
j o b of correcting these errors by band, we ignore the                                    Table I: Results of e x p e r i m e n t after first A L P S C
 first type of error, which is much harder to remove
 than the second one. It is not ditlieult to remove the
 second type of error, because the n u m b e r of words                                   Tile clustering p r o g r a m produced 44 clusters based
 in a single cluster ranges from two to about thirty,                                  on the word distance data. a s a m p l e of the clusters is
 and this n u m b e r is m a n a g e a b l e for h m n a n s . We try                  shown in Figure 3. T h e average n u m b e r of words in
 to extract purely 'correcU clusters or a subset of a                                  a cluster was 3.43, Each produced cluster contained
 correct cluster, from a generated cluster.                                            one to twenty five words. T h i s is good n u m b e r to
     It is our contention that, thougll chlsters contain                               treat manually. T h e h u m a n intervention to e x t r a c t
errors, and are m i x t u r e s of clusters based on h u m a n                         correct clusters resulted in 26 clusters being selected
intuition and clusters c o m p u t e d by process, we will                             from 44 produced clusters. T h e average n u m b e r of

ACRES DE COLING-92, NANTES, 23-28 Aot'rr 1992                                    564             Paoc. oi: COLING-92, NAYrES, AUO. 23-28, 1992
words in a cluster is 2.96, It took a linguist who is            and syntactic restrictions. This feature of compound
familiar with computer 15 minutes. A sample of the               nouns made it hard to get a higher percentage of cor-
selected clusters is shown in Figure 4.                          rect answers in our experiment. Extra-processing to
  These clusters were used for the second trial of               address these problems can be introduced into our
ALPSC, The results of second trial are shown in Ta-              system.
ble 2.                                                              Because tile process concerns huge amount of lin-
                                                                 guistic data which also has ambiguity, it is inevitable
                                                                 to be experimental. A sort of repetitive progress is
                                                                 needed to make the system smarter. We will need to
                          a0     I     3 !         f~        I   perform a lot of experiments in order to determine
                          s      I     ~/           i        t   the type of the human intervention required, as there
                          0      L-- ° -1           ~        J   seems to be no means of deternfining this theoreti-
[ t~l-U-i~-I-TF-               ~ - -[---TC--/                    cally.
[ _ ~ L ( r 4 . 1 ) L (21.1) I (- ) _~_ (-)_A                       This system is aiming not to sinmlate human
                                                                 linguists who conveutionally have derived linguis-
    Table 2: Results of experiment after second trial            tic knowledge by computer, but to discover a new
                                                                 paradigm where automatic knowledge acquisition
                                                                 prograrns and human effort are effectively combined
                                                                 to generate linguistic knowledge.

5      Discussion
                                                                 6       Acknowledgements
The scenario described above embodies a part of our
ideas. Several other experiments have already been               Wc wouht like to thank our colleagues at the
conducted, based on other scenarios such a.q a see               CCIo and Matsushita, in particular, Mr.a.Phillips,
nario for finding clusters of nouns by which we can re-          MrK.Kageura, Mr.P.Olivier and Mr.Y.Kamm, whose
solve ambiguities caused by prepositional attachment             comments have been very usefifl.
in English. Though this works in a similar fashion
as the one we discussed, it has to treat more serf
ous structural ambiguities and store diverse syntactic           t{cferences
                                                                  [1] Ralph Grishman: Discovery Procedures for Sub-
   Though we have not compared them in detail, it
                                                                      language Selectional Patterns: Initial Experi-
call be expected that the organization of semantic
clusters of nouns tbat emerge in these two scenarios                  ments Comp. Linguistics Vol. 12 No.3 (1986)
will be different from each other. One reflects colloca-          [2] Sofia Auauiadou: Sublanguage studies as the Ba-
tional relations among nouns, while the other reflects                sis for Conqmter Support for Multilingua} Com-
tlmse between nouns and verbs. By merging these two                   munication t'roceedin9 s of 7'ermplarL '90, Kuala
scenarios into one larger scenario, we may be able to                 Lumpur (1990)
obtain more accurate or intuitively reasonable notnr
clusters. We are planning to accumulate a number of                  [3] A Zampolli: Reusable Linguistic Resources (In-
SUCh scenarios and larger scenarios. We hope we can                      vited paper) 5th Conference of the E.A.C.L.
report it. soon.                                                        (199i)
    As t\)r tim result of the particular experiment in the
                                                                     [4] Kenneth Ward Church: A Stochastic Parts Pro-
 previous section, one eighth of the incorrect results
                                                                         gram and Noun Phrase Parser for Unrestricted
have progressed after one trial of the gradual approx-
                                                                         Text 2nd Uon/erer~ce on A.N.L.P (1988)
 iruatiou. This is significant progress in the processing.
 For humans it wmdd be a tremendously laborious job                  [5] Donald llindle anti Mats Rooth: Structural Am-
 as they would be required to examine all the results.                   biguity and l,exical Relations °~9th Conference of
 What bunlans did in the experiment is simply divide                     the A.C.L. (1991)
 the produced clasters.
    Although the clusters are produced by a no-                      [6] Smaja and McKeown: Automatically Extracting
 overlapping clustering algorithm in this experiment,                    and Representing Collocations for language gen-
 we are developing an overlapping clustering program.                    eration 28th (7onfeve,~ce of tt*e A.C.L. (1991)
 l[opefulty it will produce clusters which involve the
                                                                     [7] Uri Zernik and Paul aacobs: Tagging for Learn-
 concept of word sense ambiguity. It will mean that
                                                                         rag: Collecting Thematic Relations from Corpus
 a word can belong to several clusters at a time. 3"he
                                                                         13th UOLING-90 (1990)
 method to produce overlapping clusters is one of our
 current research topics.                                            [8] S.Sekine, J.J.Carroll, S. Ananiadou, a.Tsujii:
    Examining the results, we can say that the clus-                     Automatic Learning for Semantic Collocation
 ter effect is not enough to explain word relatious of                   3rd Conference on A.N.L.P. (1992)
 compound nouns, q'here might be some structural

ACRES DE COL1NG-92, NANKS, 23-28 AoGr 1992               565              Pgoc. OF COLING-92, NAI,rtEs, AUG. 23-28, 1992
              ~_Sentenees )                                {Tr4~(file).a~(high speed),:~(discource).~t,--7"
             [-° .....           I                         :~-V(charaeter).t~(sentence-structure).~
                                                            (words-and-phrase).~-~2~, supposition).~(KkNk).
                                                           ~ll~(chinese-ehareter), -y~ ~e ~ 0 (directory).
                                                           ,~'.~T'/~(pop-up), ~y--
                                                           (letter), ~.~9 (back-),~(white-)}
                                         (,,,oo.,,on)     (~(retrieval)}
                                                         IiPl(for~ard-), 3W-(copy)]
                                                         {~/:i~(multiple).--~7~ (default)}
                  (c,.,o.)                               {~T,(display).~(change)}
                                                         {±(upper-).~,~(management), ~,(delete)}
                                                         {~(expression). 7,~V (font)}
         Figure i. Diagram of the scenario
                                                          Figure 3. Sample of Produced Clusters in first trial

[yJl~--7":group] [~-?: executel
            [l~:pe~ission] [3~:chareter]                 {:~:r(eharacter),~(words-and-phrases),qi~(KANA),
     (~'n,--~" ~ )   O.000                                ~(chinese--character), ~y-(letter)}
     (~%--7" i ~ )   O. 002                              I:~:l~(discourse),~(sentence-structure)}
    (~L,-7" 3L~) O.997               X                   {vw~/~(file),-7~ ~y ~U (directory)}
    (~    I~*q)       ~. 000         o                   t~(maiu-),~PJ(assistanee)}
    (~l~tY~t'-l~)     O. 000                             {g(t~(execute),~(refer)}
[J:v-:error] [~Lw~:management] [n,-~-7:routin]           {~(full-),~(half)}
    (~- ~)       0.505        o                          {~-~,(display).~(change)}
    (~J~---~-7) 0.494                                    I~Lu~(management).~J~,(delete)}
                                                         {~l~(expression). 7~b(font)}
[~1~--7":loop] [147:finish] [-7-xt- : test]              1~,2(input).~2(output)}
     Or,--v" 147")   O.02~                               I~(modifieation).f~(creation)}
     (147 -Y-~,~)    O. 976       x                      (~i(right).~(left)}

         Figure 2. Sample of the Results                         Figure 4. Sample of Selected clusters

                             23-28AO~" 1992
      AcrEsDECOLING-92,NANTES,                          566                              AUG.23-28, 1992

Shared By: