Docstoc

vocab

Document Sample
vocab Powered By Docstoc
					                             National University of Singapore




           The Emergence of
             Vocabulary
               The Birth of Buay Tahan




                     Yeo Huimin, Michelle (U012962U)
                     Lam Kar Yan, Karen (U022623Y)
                          Wong Xiu Ming (U023617U)



USC3001 Complexity
4th April 2003
                                                                                                   Review

                       The Emergence of Vocabulary
                                        The birth of buey tahan

Communication between individuals is facilitated by vocabulary, which we understand to be a set of
utterances and meanings. It is of interest to understand how sets of utterances and meanings evolved to
produce a standardised vocabulary. In this paper, we shall do a critical review on 2 simulation models
that attempt to explain the basis of the emergence of vocabulary. They are the Imitation model and the
Obverter model. A third model, the Interaction model, is also briefly described. Also, we will evaluate the
applicability of these models in the context of a multi-lingual society such as Singapore. In particular, we
will assess how these models may or may not apply to the “Singlish” vocabulary.


Introduction
It never ceases to amaze people how human languages, in spite of the lack of any blatant orchestration
from a master planner, have attained such high levels of complexity as well as infinitely intricate
structures, syntax and variations. Until now, the study of language emergence and development has been
largely crippled by the lack of an effective way to simulate interactions between communicative agents.
With the advent of unprecedented computing power, however, it is now possible to simulate interactions
between large numbers of agents. In this paper, agent-based simulations of the emergence of vocabulary
are reviewed. Language is distinguished from vocabulary here; learning to understand a language at its
most basic and functional level involves parsing the speech stream into chunks, or words, which reliably
mark distinct meanings and therefore form a vocabulary. Phonology, syntax and lexis develop
hierarchically after many iterations and selection processes; these qualities are secondary products that do
not necessarily increase the functionality of a language, and are thus not considered explicitly in the
following models. Furthermore, it is relatively simpler to model vocabulary emergence and effective
models can serve as a foundation to which future modifications could be made. 2 models, the Imitation
Model and the Obverter Model, will be discussed here. A third, the Interaction Model, is also briefly
described. Certain common terms are used in these models and their definitions are stated below:
Utterance (u) – word or sound used by agents in communication with one another. In this paper, signal,
                utterance and word are used interchangeably. Collectively, utterances are denoted by
                U.
Meaning (m) – meaning of a certain utterance. Collectively, meanings are denoted by M.
Homophones – utterances with the same forms or phonetic representations but have different meanings.
Ps – number of agents in the population.
Assumptions shared among the 3 different models discussed here are as follows:
 Initially, before interaction, each agent has his individual existent utterance-meaning (U-M)
   mappings.
 Every agent has an equal likelihood of meeting and interacting with one another.

In these models, self-organisation is apparent as agents following simple rules give rise to convergent
behaviour --- complex sets of vocabularies. As the level of convergence between utterances and
meanings increases, a consistent vocabulary emerges.

1   Imitation model

1.1      Description of the Imitation model
It has been observed that imitation is an innate attribute among animals for communication. Based on this
natural instinct, the Imitation model (1) assumes that agents communicate by imitating the utterances
used by each other. In this model, imitation serves as a basic mechanism for inter-agent communication.
Either of 2 strategies of imitation is adopted as agents try to convey a message and be understood:
1) Imitation in random direction, either agent A imitating agent B, or vice versa. Each case
   carries the same probability. [This strategy goes purely by random imitation via local



                                                                                                          1
                                                                                                      Review
      interaction between agents, and agents are unaware of majority signal usage in the
      population]

2) Imitation abiding by the majority in the population; Agent A imitates B if the utterance
   used by B for a certain meaning is more widely used than that used by A. [Please refer to
   Assumption 2]
Through simulations, a set of common vocabulary emerges.

1.2      Assumptions involved
 Agents in the population utilise the imitation strategies mentioned above for communication. That is,
    when 2 agents who do not use the same utterance to signify a particular meaning meet, an imitation
    incident takes place.
 Agents employing strategy 2 above have surveyed the population and have knowledge of the
    majority predisposition of utterance-meaning mappings.
 The fewer the homophones, the better the vocabulary and thus, more distinct (D) it is. And the more
    distinct a vocabulary, the likely it is to emerge as a commonly used word.

1.3       How it works: Mathematics & Formulae

To assess the efficiency of a vocabulary in conveying messages, 2 parameters are introduced: consistency
(C) as well as distinctiveness (D).

Consistency is the average number of common utterance-meaning (U-M) mappings among the agents.
The higher the consistency of an utterance, the more prevalent or “successful” it is. If there are a total of i
meanings and j utterances in the population, the number of matching U-M pairs out of all such possible
pairs, Ci, would be given by,




The nominator denotes summation of choosing 2 agents of interest out of all agents using utterance uj to
map meaning mi. And the denominator denotes the random sampling of 2 agents from the whole
population, Ps.


      Consistency would then be the average Ci of the individual U-M mappings, i.e.


      [Note: C takes positive values; when C=1, all agents have the same vocabulary and C is at its
      maximum]

Distinctiveness is the measure of homophones; the fewer homophones for a meaning, the more distinct
the utterance, and we deem it more “successful” due to its lower level of ambiguity. Consider having I
agents: If N is the number of distinctive utterance (not a homophone) that the Ith agent uses, and ξj is the
probability that the Ith agent correctly interprets utterance uj, the degree of homophony in the U-M
mappings of agent I, DI, is given by:




                                                                                                             2
                                                                                                    Review
Thus, distinctiveness, being the average of D in the population, would be defined as
                                             I




      [Note: D takes positive values; when D=1, all agents have no homophones in their vocabulary. It
      should be noted that homophones are not necessarily ambiguous due to the context provided by, e.g.
      preceding utterances. This is perhaps why homophony is rife in existent vocabularies. Thus,
      distinctiveness has a somewhat limited value in judging the effectiveness of a vocabulary at
      communicating meanings.]

1.4       Simulation results

      Simulations are run with M = 10, for various values of Ps and U. The average values of consistency C
      and Distinctiveness D are recorded. Table 1a (below) shows the results for both strategies 1 and 2 in
      100 runs, with each run having 20,000 interactions.

                                         C                       D
                              U=10 U=30 U=50 U=10              U=30    U=50
                    Ps = 10 1.00       1.00     1.00    0.77   0.92    0.94
        Imitation
                    Ps = 30 0.98       0.98     0.98    0.76   0.92    0.95
       Strategy 1
                    Ps = 50 0.82       0.80     0.80    0.78   0.92    0.95
                    Ps = 10 1.00       0.80     0.63    0.74   0.92    0.95
        Imitation
                    Ps = 30 1.00       1.00     1.00    0.73   0.92    0.95
       Strategy 2
                    Ps = 50 1.00       1.00     1.00    0.74   0.91    0.94
       Table 1a. Simulation results for strategies 1 and 2.


1.4.1 Observations & conclusions from the simulation
    Almost all values of C reaches 1.00, meaning that a set of common vocabulary among the agents in
    the population emerges. It is also noticed that for strategy 1, as population P s increases, C is reached
    slower. For higher values of Ps, the 20,000 interactions in each run are no longer sufficient to attain a
    consistency of 1.00. Conversely, for strategy 2, it is generally harder to obtain C of 1.00 for smaller
    Ps. [in real life, this actually happens when there is no obvious “majority” utterance for a particular
    meaning, but several utterances for the same meaning at equal proportion. Thus, there is
    “competition” between such U-M mappings. And more utterances (U) would decrease the chances of
    complete consistency.]
    Generally, consistency of vocabulary is usually reached with strategy 1, and in most cases of strategy
    2, except in cases where competing mappings co-exist, as mentioned above.

      Distinctiveness (D) does not tend to 1.00 as close as consistency does. D < 1 means that homophones
      cannot be totally eliminated. Generally, strategy 1 and 2 give similar results, and the trend is that
      regardless of Ps, D increases when number of utterance increases. [This certainly reflects real-life
      situation: even when U (=50) is 5 times of the number of M (=10), there are still homophones
      present; i.e. there are similar signals having different meanings. This shows to be quite realistic as
      natural languages used by humans also have a certain degree of homophones, even though there are
      many different utterances.]

1.5           Our Critique

1.5.1 Advantages of the Imitation model
    Firstly, this model is very simple, involving only a few variables, U, M and Ps. Thus, it is able to
    show general trends in the emergence of common vocabularies by the looking at the terms
    consistency (C) and distinctiveness (D). Also, imitation, to some extent, reflects the way people
    started communication before a standard series of words are created. And this model would help us
    understand better how imitation could have contributed to the emergence of language, which is based

                                                                                                           3
                                                                                                      Review
    on utterances. However, the emphasis of this paper is on vocabulary and not language. Moreover,
    occurrence of homophones are taken into account; thus not only C is considered when evaluating
    whether a vocabulary is “successful”, D is also important.

1.5.2 Disadvantages of the Imitation model
    Many assumptions were made in the Imitation model, and some of them make the model unrealistic.
    For instance, for strategy 2, “following the majority”, agents are assumed to have complete
    understanding of the majority tendency. If there is a large population of individuals, agents can only
    have a regional knowledge and almost impossible to have a complete sampling of the whole
    population. Another important point is that no distinction has been made between the listener and the
    speaker; i.e. there is no separate matrix for the 2 interacting agents. Also, while simplicity in a model
    is an advantage, it can also be a disadvantage as assumptions made to simplify it may render it
    unrealistic.

1.5.3 Possible improvements
    An overall refinement would be to make a distinction between the widely used utterances and the
    widely understood; and this would be discussed in the next section regarding the Obverter model.

2   The Obverter Model

2.1     A description of the model
Oliphant and Batali (2) have argued that the “Obverter” model of learning is a more efficient procedure
than the Imitate-Choose model. In the latter, an agent using Imitation Strategy 2, when trying to convey a
meaning, u, chooses a signal that is most widely-used by the population. This is in sharp contrast to a
person who learns by the Obverter model, who chooses a signal that is most widely-understood.

2.2     Assumptions involved
 Each agent has surveyed the population and has knowledge of the majority predisposition of
    utterance-meanings mappings. However, in a variation of the Obverter model that will be briefly
    described later in this section, i.e. the Unit-statistic procedure, there is no demand for this prior
    knowledge.
 Each agent is capable of learning and exploiting regularities in each other’s behaviour, i.e. their
    speaking and listening matrices, in his or her efforts at communicating.

2.3      How it works: Mathematics & Formulae
Table 2a gives the average speaking and listening matrices of a hypothetical population, whereby
pij = proportion of population using a particular utterance, uj, for a particular meaning, mi, and
qij = proportion of population who interpret a particular utterance, uj, to be a particular meaning, mi.

pij        u1         u2        u3                         qij u1          u2        u3
m1         0.500      0.100     0.400                      m1 0.400        0.525     0.325
m2         0.425      0.450     0.125                      m2 0.375        0.425     0.250
m3         0.100      0.475     0.425                      m3 0.225        0.050     0.425
Table 2a. Table of average speaking & listening matrices of a hypothetical population.

From the above table, we can deduce that an agent, A, using the Obverter model would not use u1 (the
most widely-used utterance) to indicate m1, but would instead use u2 (the most widely-understood
utterance). Assuming that all agents in the population use the Obverter model and will thus favour u2
over other utterances for m1, we can set pij (m1, u2) = 1.0 -- a binary representation. However, agent B,
who is interpreting A’s utterance, will consider what the utterance is most widely-used for and not what it
is most widely-understood to be. Thus, agent B will interpret u2 as the meaning m3, and qij (m3, u2) = 1.0.
After applying this process to all other values in Table 2a, we will obtain the following table.




                                                                                                           4
                                                                                                  Review


pij         u1        u2        u3                         qij u1          u2         u3
m1          0.0       1.0       0.0                        m1 1.0          0.0        0.0
m2          0.0       1.0       0.0                        m2 0.0          0.0        0.0
m3          0.0       0.0       1.0                        m3 0.0          1.0        1.0
Table 2b. Table of speaking & listening matrices of a hypothetical population after applying the Obverter
learning process.

The effectiveness of communication is measured here by communicative accuracy, which is largely
similar to what the Imitation model defines as Consistency, except that separate matrices for speaking
and listening are considered here. Communicative accuracy is calculated according to the following
formula:

                      Communicative accuracy =


2.3.1 Unit-statistic learning

An agent would not have perfect knowledge of the average send and receive populations of the
population. Thus, one bases his or her functions on a finite number of observations, and derives only
approximate values. The more observations an agent makes, the closer his speaking and listening
functions will be to the population average. Oliphant and Batali have also proposed a Unit-Statistic
procedure for agents using the Obverter model. This procedure dictates that each agent uses a single
observation, be it his most confident, more recent, first observed or a random observation, of each
meaning to create his own speaking and listening functions and thus makes extremely small demands on
his memory and cognition. Oliphant’s and Batali’s assumption is that since the higher the usage
frequency of an utterance for a particular meaning, the higher the likelihood of it being observed and thus
the higher the likelihood that the utterance is mapped to that particular meaning. Thus, the consistency
for all meanings will increase with time. It ought to be noted that despite the minimal requirements the
Unit-statistic model makes on the cognitive capabilities of each agent, its final degree of communicative
accuracy (consistency) surpasses that of the Obs-10 variation. This phenomenon may be due to the
inherent qualities of the model, or it can simply be restricted to this graph only and thus may not be
observed in other similar simulations.

2.4     Simulation results

Computational simulations were performed to yield more observations on the Obverter model. In each
round of the simulation, an agent is randomly chosen and removed from a hypothetical population of 100
agents who can send 5 signals for 3 meanings. This agent is replaced by another who uses one of the
learning procedures as shown in Table 2c.

Obverter        The Obverter learning procedure.
Obs-25          The Obverter model when observations of 25 episodes of communication are used.
Obs-10          The Obverter model when observations of 10 episodes of communication are used.
Unit-Statistic The Unit-Statistic learning procedure.
Table 2c. Learning procedures used in the computational simulations.

Graph 2a and Table 2d displays the average communicative accuracy of the populations in 10 simulation
runs per procedure, against the number of rounds of simulation.




Graph 2a. Simulation results

                                                                                                          5
                                                                                                  Review
Learning Procedure                                     Rounds
                      600          1200            1800             2400            3000
Obverter              0.991        1.000           1.000            1.000           1.000
Obs-25                0.879        0.980           0.988            0.989           0.991
Obs-10                0.431        0.659           0.786            0.826           0.848
Unit-Statistic        0.362        0.483           0.621            0.807           0.944
Table 2d. Average communicative accuracy of a hypothetical population of 100 agents who can send 5
utterances for 3 meanings

2.4.1 Observations & conclusions from the simulation
As can be expected, the Obverter model attained the highest communicative accuracy in the shortest
time, followed by the Obs-25, Obs-10 and Unit-statistic procedures. We can also expect that for learning
procedures based on a fixed number of observations, the higher the number of observations, the closer
the similarity to the simulation results of the Obverter curve. However, fewer observations may not result
in a curve similar to that of the Unit-statistic graph. This is because the Unit-statistic procedure will
increase communicative accuracy if some consistency in the agents’ usage of utterances exists, instead of
a totally random choice of utterances. The increase in communicative accuracy is apparent in Graph 2a
only because a mere 5 utterances and 3 meanings are used in the simulation; if more utterances and
meanings were used, a similar trend may not be observed.

2.5     Our critique

2.5.1 Advantages of the Obverter model
Firstly, this model acknowledges that speaking and listening matrices of an agent may differ. The
benefits of this distinction is apparent in cross-cultural communication, when agents can speak according
to one set of U-M mappings and listen using another set that is more relevant to the other party’s cultural
background. Also, in a phenomenon known as “overextension”, speaking matrices are observed to be
smaller than listening matrices. For example, although an agent understands “sad”, “upset”,
“melancholic” and “unhappy” to mean the same thing, he may prefer to use only the word “sad” to
indicate his mood. Thus, his listening matrix consists of 4 U-M mappings, whereas his speaking matrix
only has a single U-M mapping.
Secondly, this model is slightly less mechanistic than the Imitation model because agents can exploit
regularities in other agents’ behaviour, i.e. perceive and predict their responses. As a result,
communication is more dynamic and more realistic.
Thirdly, this model takes into account that perfect knowledge of population averages of speaking and
listening matrices by introducing the Unit-statistic procedure.
Fourthly, some element of choice is incorporated into the Unit-statistic variation of this model, as agents
may use utterances that are a combination of random, favourite and most recent utterances, as well as
those they are more confident of. However, the usefulness of this freedom may also serve to make this
procedure less efficient and effective.

2.5.2 Disadvantages of the Obverter model
Firstly, no homophones are taken into consideration here. As explained in Section 1, the number of
homophones can, reasonably, partially determine the efficiency of a vocabulary. Thus, the omission of
this consideration is a flaw.
Secondly, the speaking and learning matrices are represented in binary form. In the simulation, agents
were assumed to have completed the learning process being added to the population, thus, this binary
representation was fixed permanently the moment it was derived. This is not a realistic option, because
agents should have the ability to learn.
Thirdly, the fact that each speaker strives to make up for each listener’s lack of perfect knowledge of the
most widely-used utterances (and vice versa) results in a negating effect that makes this model seen
redundant. This is because the corrections made by a speaker to optimise consistency, and thus
communicative accuracy, is negated by corrections made by the listener. However, this feature makes
this model quite applicable to the context of communication within multilingual societies.




                                                                                                         6
                                                                                                 Review
Fourthly, both Obverter and Imitation models have a common flaw, in that they do not take into account
any “trendsetters” in the population who, instead of following the majority, create their own utterances
and manage to persuade at least part of the population to follow their lead.

3   An Alternative Model
Deemed to be an improvement over the Imitation Model, the Interaction Model (1) is an agent-based
simulation in which words are assumed to be mapped to meanings and vice versa in the agents’ minds in
a probabilistic manner. The agents continually interact and modify their vocabulary according to the level
of success of their interactions. An interaction event consists of a speaker sampling his speaking matrix
for an utterance to convey a randomly chosen meaning, and a listener sampling his listening matrix for a
meaning to attach to an utterance. A successful interaction results in the speaker and listener
strengthening the association between that particular meaning and utterance, while decreasing the other
associations.
Here, roulette wheel sampling is carried out in which probabilities are akin to spaces on a roulette wheel
with the larger probabilities getting larger spaces on the roulette wheel. A random or pseudo-random
number is selected and the space on the wheel to which it corresponds is chosen, thus larger probabilities
which take up larger spaces on the wheel are more likely to encompass that random number.
Convergence is said to have been achieved when 0.99 of the population shares the same vocabulary.
Convergence is observed abruptly (refer to Graph 3a.), akin to1st order phase transitions which are a
well-known phenomenon in physical and biological systems.




     Graph 3a.                                                   Table 3b.

Varying parameters, an “emergent” property was observed: the speaking matrix is a subset of the
listening matrix, an example of which can be seen in Table 3b.The results correspond to the phenomenon
known as overextension, in which the listening vocabulary is larger than the speaking vocabulary, i.e. we
often understand more obscure vocabulary but do not use them in speech. A disadvantage of the model is
that it assumes no conscious effort on the agent’s part in choosing utterances and meanings. Also, using
roulette wheel sampling to simulate the complex processes which lead to choice is very doubtful, even
for less “intelligent” agents then human beings.

4   “Singlish”: A possible application?

We considered these 3 models in the context of the beginnings of Singapore Colloquial English,
otherwise known as “Singlish”. Since no restriction is placed on the a priori knowledge of each agent;
each agent can have a native tongue (an existent U-M mapping) before he attempts to communicate with
others with no knowledge of that native tongue. The emphasis here is that in the course of their
communication, a consensus vocabulary is established. Thus, in a multilingual society like Singapore,
words from different languages may be adopted and incorporated into this consensus “Singlish”
vocabulary. Most Singaporeans move smoothly between Singapore Colloquial English, Standard
English, as well as other languages such as Mandarin, Bahasa Melayu and Tamil.

In this respect, the Imitation model is unsuitable because it does not take into account the linguistic
limitations of agents from different cultures. An utterance that is most widely-used, i.e. from a
predominant language, may not be understood by an agent from the minority population. For example, if
the majority of the population is Mandarin-speaking Chinese, the predominant language will be
Mandarin. Thus, a Chinese who bases his observations on population averages would, according to the
Imitation model, form a vocabulary that solely consists of Mandarin words. Consequently, when he



                                                                                                        7
                                                                                                 Review
speaks to a Malay, he would use Mandarin, which the latter would not understand, despite it being most
widely-used. Inter-cultural communication would fail, deeming the Imitation model inappropriate.

Also, the Interaction model is inapplicable because the agent’s choice of words is assumed to be
unconscious and determined by roulette-wheel sampling. Thus, there is no conscious effort on his part to
use words that the other agent understands. Effective inter-cultural communication cannot occur on these
terms.

The Unit-statistic variation of the Obverter model is the most appropriate candidate for the modelling of
the development of the “Singlish” vocabulary. This is because it states that each agent uses his most
confident, most recent, first observed or a random observation to create his own speaking and listening
matrices. Using the same example mentioned above, the Chinese would recall his last encounters with a
Malay and thus would choose words that he previously used or similar words that the latter would most
likely understand. With iterations through time, ambiguities concerning the U-M mappings in the
Singaporean society will be ironed out and certain expressions like buey tahan (a combination of the
Hokkien word, “buey”, that means “cannot”, and the Malay word, “tahan”, that means “tolerate”)
become common to most; a “Singlish” vocabulary starts to form and continues to evolve.

Perhaps we can, using the Obverter model, retrace the evolution of local language use and follow the
ongoing pidginization and creolization of English to form Singlish as people adapt to a dynamic cultural
and linguistic environment. It would be fascinating to predict if “Singlish” words, and even a “Singlish”
grammar, will eventually arise to form what can be comfortably defined as a language.

Unfortunately we do not have written documents about language use in the local Singaporean population
in 19th-century and early 20th-century. Written documents such as government records and newspapers
were written in standard native English and consequently do not reflect the actual state of the language
use among the locals. However, clues regarding the emergence and evolution of language can be
obtained by observing newly-arrived foreign workers in Singapore and following the development of his
“Singlish” vocabulary relative to time.

5       Other applications

1) Sign language
There are a few kinds of sign languages, and they may or may not have their own set of grammar.
Nonetheless, a model for the emergence of vocabulary may still be applicable to sign languages (3),
especially if that sign language system is not complicated by syntax or grammar structures. For instance,
American Sign Language (ASL) is taught to deaf children without the language rules. These children
express themselves using basic vocabularies. If they want the cookies in the jar, they would only sign for
WANT followed by COOKIE; this helps them express the concept of their desire. With reference to the
models we have discussed, we can use signs (S) to map meanings (M) instead of a set of utterances. So
when agents meet, they would use signs to communicate and the various simulation results for the S-M
mappings can be obtained. In other words, the emergence of an efficient system for sign language may be
modelled similarly to vocabulary due to their common characteristic that grammar and sentence
structures need not be considered in the model.

2) Artificial intelligence – Natural Language Processing
Artificial intelligence (AI) involves the use of computers to model the behavioural aspects of human
reasoning and learning, including vocabulary acquisition. Due to the large potential of AI to perform
human-like tasks, scientists are looking into the possibility of communication between human and
computers using our languages instead of computer programming language that consists of strings of
numbers and symbols (4,5). This is especially in the case of natural language processing (NLP), which
refers to the use of AI to process natural languages, i.e. the languages humans use to communicate among
ourselves. By modelling the emergence of vocabulary as discussed in previous sections, an efficient
system of learning and acquiring vocabulary can be created in the computer. If AI can advance such that
learning is inherent, communication between human and computers would be easier as commands can be
made in natural languages instead of complicated digital signals. Moreover, such natural languages may
also be used in place of computer language for communication between computers. In this way, a

                                                                                                        8
                                                                                                      Review
universal language may be used so as to facilitate all communications. Thus, an attempt to model the
process of emergence of vocabulary may prove to be useful. For instance, if it is found that imitation is
the most efficient way of developing a common set of vocabulary, we can program the computer such
that commands and communicative utterances are learned by the imitation method. From the simulation
results using the model, the computers can be set using parameters that give the most consistent
vocabulary set.

3) Interactive business systems
The high level of consistency that emerges after large numbers of communicative exchanges between
agents suggests that social behaviour, such as the ability to observe, learn and communicate, imply that
social behaviour might be desirable for artificial agents who are expected to interact with other natural or
artificial agents. For example, incorporating agent technology in conjunction with AI into e-commerce
systems improves transmission efficiency and thus the efficiency and effectiveness of business systems
and online customer services such as online troubleshooting. In these agent-mediated e-commerce
systems, issues such as the delegation of authorities from a human user to a machine agent, the
identification of information that is to be shared by agents and the handling of propositional attitudes, i.e.
the explicit linkage between consumer attitudes, intentions, beliefs, perceptions, acts and values, need to
be dealt with. Also, for agents to communicate with one another, there is a need for techniques for
specifying a common vocabulary of terms to be used.


Conclusion
We have shown different approaches to the problem of modelling the emergence of language, namely the
Imitation, Obverter and Interaction models, and recognize potential applications for the results of
modelling language emergence, especially in the field of Artificial Intelligence. However our review does
not discuss these models with regard to language emergence due to interactions between generations of
agents, i.e. between parent and child, as it would require much greater enquiry into whether it would be
fair to treat the parent and child both as agents with equal learning abilities, as the agents in the models
are. We also recognize that in such studies there is a lack of correlation with real-world data in academic
publications. This is due to the fact that observations over many generations have to be made to come up
with realistic data, and in a sizeable population of agents it is impossible to trace every interaction and its
outcome. To compensate for this lack of data, we have found it interesting and enlightening to apply
these models in a brief but more concrete study of “Singlish”. The “emergence” of “buey tahan” has
made all the difference in enhancing our understanding of the abstract study of vocabulary emergence.


References
1. Ke, J.Y., Minett J. W., Au C., Wang, S.-Y.W. Self-organization and Selection in the Emergence of
   Vocabulary. Complexity 2002, 7(3), 41-54.

2. Oliphant, M., Batali, J. Learning and the Emergence of Coordinated Communication. Centre for
   Research on Language Newsletter 1997 11(1).

3. http://members.aol.com/DEMP12/aboutsign.html. Cited 04 April 2003.

4. http://www.bartleby.com/65/ar/artifInt.html. Cited 04 April 2003.

5. http://207.158.222.5/aiintro.shtml. Cited 04 April 2003.

6. Gyimóthy. Tibor. Natural Language Processing at the Artificial Intelligence Group in Szeged.
   http://www.ercim.org/publication/Ercim_News/enw26/gyimothy.html




                                                                                                             9

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:8/12/2011
language:Indonesian
pages:10