Towards a Combined Attempt at Simultaneous Synset Linkage and
Aadil Amin Kak Nazima Mehdi Aadil A. Lawaye Farooq A. Shiekh Muneera Hakim
Dept. of Linguistics
University of Kashmir
1. The lexicographers while making com-
Abstract mon synset entries can at that very time
feel that there may be some language
specific associated or related concepts in
their language. At that very time they
The paper focuses on a method for the ex- can easily note down the extra senses
pansion of the Indowordnet during linkage. (language specific senses) of the synset
The method proposes an idea of simulta- members. If the process of the expansion
neous expansion and linkage. The present will be undertaken later on, it will need
paper also proposes running a parallel more time and effort and probably some
‘storage tool’ to store the extra senses of senses might be missed.
the synset members (language specific 2. This method can also be a useful tool at
senses) during linkage. The paper also pre- times to reduce confusion and problems
sents an overview of inclusion of larger faced by lexicographers by getting a bet-
units ( Proverbs, Compound words, Adjec- ter and unambiguous view of the concept
tival phrases). at hand.
3. This method should also be able to act as
a strong facilitator for future inclusion of
1 Introduction sister languages and dialects of the lan-
The expansion of the wordnet for the ongoing guages being worked on.
project, Indradhanush is a proposed step for the 2.2 Method of Expansion
future implementation. The present paper is a
proposal which mainly focuses on working not Regarding the how of the process, the lexico-
only on the process of synset linkage but on the graphers, while linking synsets shall look for any
expansion of the wordnet simultaneously. extra sense, which may be their-language specif-
The paper has been divided into three sections. ic. While looking for ‘equality in concepts’, it
Section 1 tries to propose as to why should not will be natural for the lexicographer to come
both processes i,e synset linkage and the wordnet across (in the mind at least) different senses and
expansion go on simultaneously. other culture specific concepts. For this very
Section 2 of the paper proposes a tentative idea purpose there shall be separate interface provided
as to how it can be done. to the users that can be linked to the main tool
Section 3 proposes the inclusion of the larger provided to the lexicographers. The lexicograph-
units of language which have until now not been ers shall enter the extra senses of the synset
talked about members (which are language specific and not
presently available in wordnet). The interface
2 Proposals shall have facility for storing temporary Synset
ID, Concept, Componential Analysis of the con-
2.1 Simultaneous Expansion cept, Synset Members, Category and Example.
Why should not both the processes i.e. the The interface shall have a save option and an edit
Wordnet expansion and the synset linkage go on option. The storage tool shall look like as
together? We propose this as the basis of the fol-
The example shows how it will be easy for extra
senses to pop-up in the mind and how these lan-
guage specific concepts can be added.
Synset ID Lexical Category
184.108.40.206 Culture Specific Terms
The storage tool shall also be used to store cul-
tural specific terms which has to be included in
the wordnet at a later stage. But the culture spe-
Concept Componential Analysis
cific terms should be tagged to identify their cul-
ture-specificity which will help in their identifi-
Synset Members cation. A common tag shall be provided for the
culture specific terms of all the languages.
Example 2.2.2 Name of the Places in a Language
The place names in a language shall also be en-
tered in the storage tool with their conceptual as
Save Edit Exit
well as componential analysis. The recurring oc-
curance and the validity of componential analysis
will be dealt later in a separate section. There
Different concepts/senses should be categorized should be a tag provided to the lexicographers
as follows and simultaneously should be in- for the places names in a language. E.g. Srinagar,
cluded in the tool. shopian, pulwama. Etc.
2.2.1 Language/Culture Specific Concept 2.2.3 Flora and Fauna
220.127.116.11 Language Specific Senses The lexicographers shall use the flora and fauna
This categorization will handle extra senses of sources of the area to include the specific plant
the synset members. These members though al- and animal species names in the storage tool
ready present in the pivot language carry some which are not covered in the ongoing wordnet
extra sense/senses which are specific to the target concepts.
language. For example It has been proposed earlier that all the culture
specific terms, the place names, the names of the
Entry in Main Tool Entry in Storage Tool
specific species have to be transliterated (and
Concept Hindi Kash- Concept Kashmiri coined in some languages like Sanskrit) to all
miri other member languages of the IndoWordnet. It
A colour- is proposed here that a tranliterator tool be incor-
porated in the system for transliterating those
present in terms in other languages. The above mentioned
cally for treesh specific terms will be stored with their phonetic
drinking. sounds. The transliterator will use the phonetics
seas of all language specific terms of all the languages
Partial over- as its input and will transliterate it in the target
cooking language. The transliterator can identify what to
transliterate either by their specific tags or they
cooking vItsun can be as subset of the conceptual pivot of the
which doesn’t wordnet.
burn the rice
The burning but leaves a 2.3 Inclusion of Larger Units
of the food burning smell.
while cook- jalnaa dazun Partial over- In this section we also propose that the larger
ing cooking units of language such as proverbs, compound
(burning) of words, adjectival phrases etc be considered as
vItsun, separate units. This is illustrated below.
burn the milk
but leaves a
2.3.1 Proverbs The two groups of categories of compound
words shall be included in the wordnet as illus-
The proverbs shall be considered as units differ-
trated in the table below.
ent from their individual members. These units
shall be entered as separate concepts. If the con- Entries in the Current Working Tool
cepts are already present they shall be entered as
synset members against the concept. The prov- Conceptual L4 L5
Hindi Kashmiri L3
erbs which are present in specific language shall Ups & utaar- --do-- --do--
be entered in the storage tool. For example. Conceptual
downs in life chadau
heri-bon equivalent if
Conceptual Kashmiri (L2) (L3) (L4) chaavun
Hindi (L1) The younger --do-- --do--
Pivot chota- lokut-
Giving two siikh ti rachun tI kabab maternal --do--
benefits by an sanp bhi ti. uncle
act maray aur (saving seekh (iron rod This place or udhar- hokun- --do-- --do--
lathi bhi na on which the kebab is that place idhar yokun
TuuTe roasted) as well as the
However, this may not always be the case when Language Specific Entries in the Storage Tool
there is a proverb to proverb linkage. Consider Concept descrip-
the following example tion
A wooden block
Conceptual Hindi (L1) Kashmiri (L3) (L4) used to support a -- Conceptual
Pivot (L2) carpet weaver from equivalent--
Moving idly --conceptual bang'an ---Conceptual ----do- both sides
here and equivalent-- dәr' natsun equivalent--- --- A specific basket
there (lit. danc- used to carry meals
ing in Can- to the field by peas-
fields) A pot in which food
Here, we have a proverb in Kashmiri which does
not appear to be there in Hindi. In this case in- so, concepts denoted by compound words should
be equated by corresponding compound words in
stead of the proverb Hindi will have a conceptual
equivalent. other languages where ever possible and when-
This can be useful means of incorporating prov- ever it is not possible then conceptual equiva-
erbs in the framework without considering the lents should be used.
individual concepts of words of which the prov- 2.3.3 Adjectival Phrases
erb is made.
The main question is how to include adjectival
2.3.2 Compound Words phrases such as participles and infinitives in
We propose that all the compound words be Wordnet Indradanush. Should we take such
taken as separate conceptual units irrespective of phrases as separate units or not, as the sense of
the concepts of their members. If the concept of a these words are different in adjectival phrases
compound word is present in the main tool it than in other contexts. E.g.
shall be entered as a synset member, and if it is udta parinda, chalti gaaDi, gaate jarney
language specific they shall be entered in the The bold and underlined words being verbs in
separate tool. ordinary context, but acting as adjective in these
phrases. Thus the change of category changes the
Language specific compound words in
concept itself. So for such units we need to give
Kashmiri such as yar-bal (body of water-
bank), bon-i-bagh (chinar-garden), bati-
The adjectival phrases such as the participles can
paji (lit. cooked rice-large basket), labi-
be language specific also. For example wadwun
thamb (lit. wall-wooden piller), etc.
bacchi (lit. crying baby/a baby who cries at the
Compound words conceptually present
smallest excuse) wadwun insaan (lit. crying per-
in more than one language or all other
son/a person who cries at the smallest excuse)
languages, such as utaar-chadau (ups
wadwun nab ( lit. weeping sky/ when it is very
and downs), duup-chaavun (sunshine- cloudy )
shade), idhar-udhar (here-there), etc.
The language specific adjectival phrases shall be line Lexical Database". International Journal of
entered in the storage tool with their concepts. Lexicography, Vol 3, No.4, 235-244.
2.4 Componential Analysis Sinha, Manish., Mahesh Reddy and Pushpak Bhat-
tacharyya. (2006). An Approach towards Construc-
It is proposed that for the given concepts there tion and Application of Multilingual Indo-
should be a componential analysis associated WordNet. 3rd Global Wordnet Conference ( GWC
with the concepts as this method will convey the 06), Jeju Island, Korea, January, 2006.
concept more clearly and in a more professional
manner. The idea of whether or not to incorpo-
rate a strict form of componential analysis is a
proposal which has to be thought about in detail.
But it appears that (a) if a formal componential
analysis frame is agreed upon and (b) if lexicog-
raphers are trained in it, the task will definitely
become much easier. The formalization of the
componential analysis part should work as a very
important tool for removing ambiguity. Further-
more, this can lead to the concept being more
formal (a set of yes-es and no-s) which probably
can be more machine friendly.
To conclude, the paper is basically an effort to
make the expansion of IndoWordnet less labori-
ous. It seems more logical to implement both
linkage and expansion simultaneously. The paper
is a tentative step in that direction and, with
modifications and strict formalization, can be
used as a guide for the same.
Cruse D.A. (1986). Lexical Semantics, Cambridge
University Press. Kulkarni M., Dangarikar C.,
Kulkarni I., Nanda A. And Bhattacharyya P.
(2010), Introducing Sanskrit Wordnet, Global
Wordnet Conference (GWC10), Mumbai, India.
Kachru, B. B. ( 1969 ). A Reference Grammar of
Kashmiri. Urbana-Champagne: U of Illinois Press.
Kak, A. A., and R. Talashi. (2004). Keshur: Akh
Grammari Vetshnai (Kashmiri: A Grammatical
Description). Jammu: Nihar Publications.
Koul, O. N., and K. Wali. ( 2006 ). Modern Kashmiri
Grammar. USA: Dunwoody Press.
Miller G., R. Beckwith, C. Fellbaum, D. Gross, K.J.
Miller. (1990). “Introduction to WordNet: An On-