Document Sample
Richard Veale & Richard Liston Department of Mathematics and Computer Science, Ursinus College

We propose a model of semantic memory, drawing upon ideas from various sources. The crux of the model is the concept system (CS), which is an associative network of nodes connected by weighted edges. We hypothesize that this structure, through its interplay with the closely related Short-Term Memory (STM), can account for many interesting human-like properties of memory. In particular we are interested in the emergence and function of higherlevel concepts and how they affect the agent’s behavior. The CS is accessed through a type of spreading activation, where stimulation spreads from one node to another based on several factors, such as the weight of the link between them. The CS contains both general and atomic nodes. Atomic nodes represent conceptions of idiosyncratic experience, and general nodes may represent anything else. The general nodes offer insight into how higherlevel concepts could be stored in memory. Lexical tags, which are ―names‖ that may be fed to the agent simultaneous to its experience and which will be associated with nodes in the STM or the CS, will also be used in the concept-construction process. Pattern mining will be used to identify new concepts.

between such patterns, without explicitly recording them). This theory draws inspiration, ideas, and structures from various sources. We frame descriptions of the model using a very simple agent design, whose sole purpose is to allow direct interaction with the memory model. We refer to research in psychology and cognitive science to support decisions that we make about how to represent a particular task. We also draw upon research related to modeling human-like abilities in computers, particularly those involving artificial neural networks (ANNs).

2. Related Work
Quillian’s & co.’s [4][5][6] seminal work in semantic memory has guided much of the thought in the subject. He used semantic networks of nodes representing tokens and types, connected by links. The knowledge stored in these networks would be accessed through spreading activation, by stimulating one node, and then traveling to adjacent nodes and continuing outwards. The theory did not explain how such a structure would come about in the first place, nor how such a structure could be used for anything except for simulating sentence comprehension under the narrow definition he provided. However, the idea of using semantic networks (indeed, the semantic network itself) and the idea that comprehension occurred as a result of spreading activation and the activation status of nodes in such a network has remained central to the study and modeling of semantic memory. Anderson and Bower [7] propose a theory regarding the associative nature of human memory, that recalls older associativist theories for support. They proposed a model of human memory (Human Associative Memory) which involves the interplay of associated concepts and irreducible experiences, which they call ―sensations‖. The model was the basis for the memory component of Anderson’s [1][2] architecture, ACT (and later ACT-R), which added production rules, among other things, to formulate a full-blown theory of cognition and to account for how such memory would interact with ―thought‖. Douglass Hofstadter and his Fluid Analogies Research Group (FARG)[8] produced several programs (e.g. copycat) based on a general model of how analogical reasoning works, which they take to be the basis of human-like intelligence. The model has both an analogue to the CS (though it contains all concepts from the very start and does not change), and something similar to short-term memory (―the workspace‖). They integrate

concept, learning, neural networks, semantic memory, semantic networks, associtivism

1. Introduction
Traditional computing methods leave much to be desired in areas where humans seem to flourish. Many extensive models have been suggested (such as [1][2][3]) to reconcile this discrepancy, but as yet none have succeeded to a degree that any would consider the enigma of modeling human general intelligence solved. We use the term semantic memory to refer to memory whose relation to things is based primarily on their semantic value or content. Items in memory are categorized, conceptualized, and associated with one another in a way such that they can be used in tasks requiring knowledge of such relations. In this paper we describe a model that accounts for some of the interesting properties that human-like semantic memory exhibits, in particular the extraction of higherlevel concepts (in this research represented as the recognition of more complex patterns, and similarities

some connectionist ideas, e.g. of overall system heat (―happiness‖), and of stochastic elements jimmying the system from an entirely deterministic result. Overall, however, the model seems far too similar to limited symbolic models which solve limited problems to be expanded to the extent that is necessary to support it as a valid model for explaining how human analogy-making and memory works. Especially in question are aspects of their system such as the static nature of their LTM analogue, which prevents the system from gaining new concepts on its own. A more recent model is that of Novamente [3]. The memory section of this model seems related to ours. However, details about the deeper aspects of the design are unavailable, and so it is difficult to understand precisely what decisions they have made. They describe it as ―[resembling] classic semantic networks but has dynamic aspects that are more similar to neural networks‖. This seems very much to be the same type of structure that we have produced in this research. The rest of the model (for theirs is a full-blown model of general AI and consciousness) is significantly different, however. In a slightly removed field, Burgess and Lund [9] have built a model (HAL—Hyperspace Analogue to Language) that constructs models of language (and so a certain type of memory) by examining large corpora of text and building associative graphs to represent them. In addition to the use of weighted graph-type structures to represent this, some of the more theoretical observations they make are relevant, particularly their acknowledgement of the necessity of experience-grounding (as described in [10]). More purely connectionist work has been done in representing memory as well. The PDP group’s [11][12] model of memory using distributed representations 1 ([12], chapter 19) consists of a large number of ―modules‖, each in itself an organized neural network, with a certain number of incoming connections from various other modules, and outgoing connections leading to other modules. Signals are received on the input side, and then the module-internal-network reaches a sufficiently stable state such that signals are transmitted via some of its outgoing connections. This model is similar to the ones described above in that it stipulates that memory emerges through the interplay of a large number of nodes that do not inherently have much (if any) meaning in themselves. Functioning together, however, they seem to have the ability to encode interesting properties—properties which are strikingly similar to many of the properties of humanlike memory. The primary difference is that this model’s

native functions (such as content-directed retrieval) differ as a result of its using distributed representations ([11]). Other systems, such as that described in [13] (a system that uses combinations of several attractor-type neural networks for semantic fact retrieval), building off of these ideas, have been implemented and tweaked. Finally, making the connection between concept theory and models of memory, Van Loocke [14] has compiled a book relating concepts and connectionism. He examines how certain connectionist theories support theories of the nature of concepts in philosophy (prototype theory, etc.) and analyzes many in detail, and how such models can be made to account for research done in psychology. Like [15] and [16], he uses data of reaction-times in recall to support the model.

3. Description of Model
Throughout this description we will be using a very simple environment. Here, ―environment‖ refers to the realm of possible experiences it is possible for the agent to have. The primary purpose of this simple environment is make implementation and debugging simpler for the eventual implementation. Expanding it to more complex environments will then be only a matter of appropriately modifying the transducers (a.k.a. ―sensors‖) and input modules (similar to those described in [17]) such that it can appropriately convert a richer set of experience into internal representations. We describe a model that accounts for interesting properties of memory, all the while constraining ourselves to only using functions or abilities seem native or likely to exist in a real memory of this type. At times, since we are using local representations, computation may become extensive (such as searching for a certain sub-graph within a large graph). It must be remembered that even with distributed representations, though the computation may be parallel in theory, in simulation it is serial, and so may fall prey to the combinatorial explosion before its truly parallel implementation would. This does not change for our model. Indeed, it does pose a major problem, which will be discussed later. Simple Environment (Three Dots) The environment we have chosen to use as the realm of possible experience for our agent is a world of two dimensions. One axis, called the ―width‖, is segmented into three columns. The other, called ―length‖, is continuous, though in practice it will always be segmented into very small ―moments‖. A given ―scroll‖ can contain any number of moments. Any column can, at any moment, be either filled or empty. As such, there are eight possible configurations of a single moment, represented as a three-by-one matrix. Experience moves from the top to the bottom, one moment at a time. As such, the movement of the agent along the scroll downwards (or

A network using local representations use at least one node of its network as the ―identifier‖ for a given feature (in other words, there is at least a one-to-one correspondence between the number of things represented by the network and the number of nodes). In a network using distributed representations, this is not true.

rather, the movement of the scroll upwards past the agent’s transducers) is representative of the passage of time. This is useful because the relation between time and the amount of possible experience is strictly controlled. The scrolls will contain patterns, and it is these patterns that we expect to arise as higher-level concepts in the memory. The reason for choosing this type of experience, versus some sort of linguistic one such as natural language or a limited subset of it, is that we wish to work in an environment more akin to ―physical‖ experience, or at least one which is idiosyncratic and irreducible. Others have made the observation that to attempt to teach an agent language without it having some grounding on which to base it is difficult ([9][10]). In addition, one can avoid all of the complexities of grammar and other uncertain baggage that language brings along with it. Interesting properties might be built up from experience, whereas an agent whose realm is exclusively language is limited in precisely that way. Language, a tool used to communicate in an environment, would be the environment to such an agent. Such an agent would be unable to use language in the normal way.

We assume that the agent has transducers which can recognize whether a given square of its current experience scroll is filled or not, and checks every moment for each of the three squares of its experience. This is converted to a simple internal form, which can be (conveniently) represented in exactly the same way that a moment is represented on the scroll. The internal organization of the model is divided into several main static parts (parts that can be recognized in a snapshot taken of any given moment). These will be described in greater detail in later sections.  The Concept System (CS)  Short-Term Memory (STM)  Experiential Memory In addition to these internal pieces, there are the pieces that interact with the environment (the pieces that make up our agent, minus the memory model). The actual implementation of these will change with the environment.  Lexical Tag Receiver  Command Receiver  Output Location for lexical tags  Output Location for actuation (scroll etc.) Our agent has two primary modes. In the ―training‖ mode, experience will be fed to the agent and it will modify itself, but not output anything. In ―testing‖ mode, it may output things, but will not modify itself permanently. Later on, these two can be combined. For the moment, however, it is simpler to separate them. These modes will be specified to the agent using commands, which are not represented in the model per se, but can be thought of as stimulation coming from another part of the brain, or perhaps changes in the ambient environment (such hormones being introduced to blood-stream). We have chosen to include them for ease of testing, even if perhaps they are not necessary and a type of such memory could exist free of any such goal-directing external entity. Static Aspects I: The Concept System The Concept System (CS) is a directed, weighted graph connecting nodes (sometimes we will call a single node a concept when referring to the CS). Each of these nodes is one of three types: atomic, general, or lexical. These nodes have some special properties. Atomic nodes are associated with a single feature of experience (blueness, redness, softness, etc. In the threedot environment these are features of an moment, i.e. one of the squares being turned on). These nodes are trained to react whenever something with the feature that they are trained to recognize is presented. One could think of these nodes as recognizing when the pattern of neurons fire that result in the experience of that feature. In a more complex environment these could be represented by embedding a recognizer neural network, and training it through experience to recognize that particular feature. Initially,

Figure 1: An experience scroll. Large arrow indicates order of experience. Smaller arrow points to a single “moment”.

Figure 2: Example of a single “moment”. In our simple environment, such a 3x1 matrix can also be seen as the internal representation of the agent.

the same feature might be mistakenly categorized into two or more nodes, but as experience accrues and more ―sensations‖ are experienced, these will be reconciled. In the simple three-dot environment, this functionality is realized by simply checking for an exact match. This may have the undesirable result of subtracting from the ―fuzzy‖ aspects that the esoteric structures are supposed to add, but since the environment is so simple this cannot be helped. General nodes are simply nodes in the graph that can be stimulated, and have links leading to and from other nodes in the graph. More qualitatively, they will represent the higher-level, emergent concepts. Atomic and general nodes may both be associated with a lexical tag. A lexical tag could be anything—a word, a symbol—though throughout we shall assume that it is an English string. The purpose of this tag is to allow

selective access into the network by way of stimulating just the node that is associated with a lexical tag. Functionally, this serves as a sort of proto-language, though in theory it will primarily encode static concepts. Each of these nodes has the ability to become activated with a certain amount of potential (energy). This activation will spread along links to other nodes, modulated by the weight of the link between them. The maximum weight a link can have is positive one (weights are in the range [0, 1]). This, combined with the fact that each node’s activation will be decreased by a set amount each tick, assures that activation will be exhausted after a certain amount of jumps. How many is determined by 1) how strong the issuing node’s activation was and 2) the weights and number of links over which it has been transmitted. When activation energy gets very close to zero it will have no noticeable effect and so is said to be expended.

“Pattern A”

“Pattern B”

“1-2 pattern”

Figure 3: Graphic of a portion of a concept system. General nodes associated with lexical tags (shown as boxes over the nodes) are represented in the lightest shade, atomic nodes (with their associated pattern inside) as a little darker, and general nodes without tags as the darkest shade. Floating edges are leading to other nodes/links not in the picture. Another feature of the CS is that a node may have a link to a link between two other nodes, instead of to another single node, as is ―normal‖. This type of connection would have the function of either additionally stimulating/inhibiting the connection (if it is directed at the link), or of modulating the activation of the node when potential flowed across the link (in the event that it is directed from the link to the node). In addition to increasing drastically the number of patterns of activation that can be achieved given a number of nodes, this type of connection gives us the ability to represent some interesting relations between concepts. To use an intuitive real-world example, a node representing ―fire‖ may become more active when activation travels strongly from ―red‖ to ―hot‖, but the link between ―red‖ and, say, ―car‖ would be inhibited (preventing ―car‖ from becoming too

activated, even if ―car‖ is normally associated strongly with ―red‖). One can imagine many situations in which this would be useful, to direct even just a little more precisely which concepts should be active and which should not. Whether or not this type of connection functionally increases the domain expressible by the system is uncertain. Testing an actual implementation will be the best method for determining whether or not these types of connections are needed.

“Pattern 5”

Diagram 5: Example graphic of the STM. Same as the CS, except for the difference in its dynamics over time. All types of nodes may exist. III: Experiential Memory In addition to STM, which encodes ―processed‖ information, there is also the need for a memory that stores experience in raw form. This results from the fact that by the time something is ready to be processed, it will already have passed. In our simple environment will be represented as a sliding window over the scroll, storing only a few moments prior to the current one.

Diagram 4: Detail of different types of connections. Each edge has a weight. Activation is inhibited if the weight is near zero, and strengthened if nearer to one.

II: Short-Term Memory There has been much research into Short-Term Memory (STM) in the psychology and cognitive science literature [18]. It is often characterized as having relatively small capacity, and being quickly searchable (at least as compared to Long-Term Memory), and volatile (prone to be overwritten by some new, more immediately relevant, ―young upstart‖ memory). Though STM might be seen as playing a similar functional role in this model, it also takes a very different form than is normally attributed to it. Instead of being serial, and encoding chunks of information in a linguistic form, as many models assume, the STM of this model can be succinctly described as a volatile, limited version of the CS. In other words, the structure is the same as the CS, though the rules for how it changes dynamically over time are modified to make it far more transitive. The rules for modifying the weights of links call for more modification than in the CS, resulting in a quickly changing scene as time advances. Nodes that were strongly relevant a moment before may disappear entirely in the next. Two concepts that were very positively related in one moment may have nothing in common in the next. Since the STM is more immediately connected to experience, it arranges itself to reflect trends from only a relatively narrow window of time and experience.

Diagram 6: Sliding window on experience scroll. Dynamic Aspects This section describes the high-level interplay between the parts described above such that the proposed properties will emerge. We hypothesize that interesting properties will emerge which are similar to semantic memory in humans. The way the agent acts at any given moment will be a function of:

  

The tendencies of the system (long-term memory, represented by the CS assuming its current form through the history the agent). The recent experiences of the system (short-term memory, being modified by recent experience and modulated by queries to the CS). The current experience of the system (modifies shortterm memory by adding new information to it and beginning the process that will alter much of the system).


The ―goal‖ of the agent (in our case determined by commands given to it manually).

Let us follow the activity of the system during a short period of time when an experience is presented to it (in training mode).

Diagram 7: Modular overview of the model and the exchange of information between parts. 1) The experience is first appended to the end of the experiential memory. The window will be only one moment long for this example. 2) The representation of the experience is presented to the STM. A lexical tag may also be simultaneously presented. 3) If an atomic node in the STM corresponds to the experience, it will be stimulated with some amount of activation. The amount that works best will be determined from empirical testing, and may be modulated depending on the number of nodes given activation this moment, and other such factors. 4) If a node in the STM associated with the same lexical tag as the one presented, it will also be given some amount of activation. 5) If any of the nodes do not already exist, they will be created on the fly. Links will be extended to other nodes that are currently very active. The reasoning behind this is that strength of connection should vary with temporal cooccurrence. A strong STM activation level means that it was used recently. 6) Once the two nodes exist, a search begins for the corresponding nodes in the CS. This search occurs starting from the nodes corresponding to strongly activated/associated nodes in the STM. A certain amount of energy is expended in the spreading activation from these nodes. When a node is activated past some threshold, this event is reported to the STM. If the content matches, that node (in the CS) is given a large amount of activation. Nodes around it which are activated will also be returned to STM. The STM will be modified based on the tendencies of the system (represented as the CS). If the energy is expended and no appropriate node has been found, it is assumed to have degraded past the point of recall, at least using the current cues. A new node is created in the CS and links are extended to other nodes active in STM, in the same fashion as before. 7) While these actions occur in reaction to new experience being presented, ambient actions will also occur. All weights will degrade by a fixed amount once every moment. Currently activated nodes’ activation will decrease slightly. Activation will spread until energy is expended. Weights between activated nodes will increase by a rule similar to the delta rule, mutatis mutandis. Activated nodes in STM will give activation to corresponding nodes in the CS. Sufficiently active nodes in the CS will be passed up to STM, where modification of weights will occur to a higher degree than in the CS. Patterns will be recognized in the STM, and new concepts created in the STM and activated. (The Novamente [3] engine seems to use a similar process of separating out

new nodes in their memory structure.) The above process repeats until energy is expended (functionally it will continue until slightly after the end of experiential input). The results of these modifications and activations are that temporally co-occurring phenomena (i.e. experiences that tend to occur in close proximity in time, or situations where one tends to always occur near another in time)

will be associated unless other evidence inhibits this. Slowly, over time, patterns will be built in the CS, and so mirrored in the STM. These patterns will be extracted and labeled as new concepts, to be activated and utilized later. And complexity will gradually emerge in the system, giving rise to interesting pattern-recognition abilities (e.g. categorization, conceptualization).

Beethoven’s 9th

Beethoven’s 9th

Diagram 8: A possible (high-level) snapshot from the recall task being discussed. At this point in time the second atomic node has been recalled (and output) and the system is moving on to the next. Dotted connections are inhibitory, full lines are stimulatory. We have marked what each node’s relative activation by putting rings of different thicknesses about them. Thicker rings represent higher levels of activation. Note the lexical tag node which is acting as the “handle” into the memory is constantly receiving strong activation, which is spreading to adjacent nodes, its strength modulated by the character and strength of the connection between the two. Discussion The motivation for abilities such as that of recognizing and separating out constituents of atomic sensations, Kuhl’s research ([19][20]) shows that untrained babies, and even many higher-order mammals, show the ability to differentiate between many different phonemes. Kuhl also, in the same paper ([19]), speaks strongly to the function of ―motherese‖ in children learning language, which supports the system of lexical tags, which helps to guide the initial creation of higher-level concepts by providing handles that can be called on to selectively activate certain concepts. In terms of temporal co-occurrence being indispensable in learning patterns, Baddeley and Weiskrantz present evidence in their book ([21]) that even in humans, a certain degree of proximity or order is required in the constituent events for any sort of concept or rule to be formed in the mind of the subject. What follows is an example to demonstrate how the recall of a simple episodic memory might take place. For this task, we will assume that only a lexical tag is presented to the agent and that it has the goal of recalling whatever information it deems most pertinent to the presented lexical tag. From the STM, the node corresponding to the lexical tag queried will be activated. Activation will spread to the nodes strongest associated with it, modulated by weights.

In this example, the node receiving strongest activation would be the atomic node representing the first array of notes of the pattern. All these nodes would be copied into the STM with appropriate levels of activation and connected by their weights. Experimentation will determine whether there would need to be inhibitory connections between the configurations already played and the nodes representing later configurations in the pattern, such that they are not activated too strongly, and so output preemptively. When an atomic node became sufficiently stimulated, it would be output. The result of this would be the sequence of three-by-one ―moments‖ (configurations), that are the pattern being played. If too much similar experience was gained, the agent may have trouble recalling each pattern accurately. We argue that this very phenomenon occurs in humans as well, and so is similar even in its ostensible drawbacks. A more abstract (and anthropomorphic) description of what is happening is that, by holding in mind that it only ―wants‖ things related to the pattern (represented by the lexical-tag-associated node), and the slowly-degradingover-time previous pieces of the pattern, each subsequent piece of the pattern can be recalled. Much of the research in neural networks has been in devising proofs that there exists an assignment of weights that can satisfy such a result. We are sufficiently convinced that such an assignment exists. A few problems arise, however, such as how the agent knows when to stop, or in the playing of the same configuration over again within a single pattern. Relation to Concept Theory Our model’s approach concepts can be explained by several different theories in the structure of concepts and how they interact with human memory. Particularly relevant to all representation with neural networks are the prototype and exemplar hypotheses, which are in practice similar. By the prototype theory, there exists a prototypical example of the concept, to which other ―noncore‖ features may be appended in recognition. In exemplar theory, there exists some exemplar of the class of that object, to which candidates for inclusion in that category will be compared and only ones that satisfy it to a sufficient degree will be included ([22][23]). Indeed, Van Loocke [14] refers to prototype theory in his analysis of how certain aspects of distributed representations can be mapped to aspects used in categorization of concepts, since it seems that networks will tend to ―settle‖ on a given memory if something sufficiently close to it is excited. In our model, concepts are represented as a single node, and the part that node will play in any action undertaken is defined by the weights of afferent and efferent connections to other nodes in the concept system. In other words, a concept is really only defined insofar as are the other concepts that create its identity, ultimately grounded in physical, idiosyncratic, irreducible experience. One might immediately regard this as answering to the (neo-)

classical concept theory (that higher-level concepts are fuzzily constructed of lower-level ones), i.e. a car is only a car so much as it is made up of wheels (80% of cars have them), a chassis (85%) and an engine (96%) ([22][23]), which is not a mistake. However, at the atomic level, it seems to conform to prototype theory, in that the atomic nodes contain esoteric structures (neural networks) trained to recognize a certain type of experience. So, it is possible to argue that—perhaps even more clearly than do wholly distributed models—our model adopts the synthetic view. Namely, that at a lower level, prototype theory holds true, but as more complex, composite concepts are brought into being they are defined by (as the name suggests) composing pre-existing concepts. So, the concept ―bird‖ is composed of those pre-existing concepts for wings, a body, feathers, etc..

4. Conclusion
The advantages of such a semantic memory are many. There are many tasks that are very difficult using traditional computational methods, and yet at which humans excel. Many of these seem to involve understanding of the semantic value of a thing. This can be achieved through having the right types of structures and rules, and exercising a correct regimen of training— so that the agent will accrue a history of experience so that its structures are functionally formed similar to the human’s. The existence of the necessity of this training is somewhat of a drawback, though no more so than the necessity of raising a human to develop as one wishes. Development of an appropriate training regimen or sufficiently complex and appropriate self-directing rules is another problem entirely, and one that becomes only more complex as the environment does. In this paper a model has been outlined for how such a semantic memory might work. The environment we selected is extremely limited, so as to demonstrate viability. Expansion to other environments is not difficult intuitively. However, because of the requirement that the memory/agent have access to transducers and input modules that can convert experience into an appropriate internal representation, and that certain aspects of the system (such as the recognition capabilities of atomic nodes) be scaled up to account for the increased complexity of such an environment, the problem is realistically probably very difficult. If the environment could be extended farther, semantic search (online, for example) would become possible. If the agent had a similarly organized concept system as the searcher, it could automatically make assumptions that the user unconsciously makes. Recently it has been the practice that search engines automatically append similar items to one’s query to narrow down the search. Making such items more accurate with regards to the searcher’s intentions would make the returned items more relevant.

Even with scarce information, perhaps relatively high accuracy could be achieved. Applications also seem possible in the field of Natural Language Processing (NLP), especially in such uses as Machine Translation. With the aid of a correctly constructed semantic memory model, ambiguous words’ senses could be decoded more accurately (since words’ sense often rely on their relation to other words, context, and background knowledge). As such, a more natural translation would be possible. Regarding cognitive science, any model that contributes to our understanding of how the human mind works offers a springboard for discussion. One of the major problems involved in creating such a model is that there is little research that can be concretely applied to it. The questions raised in the creation of this type of model are of this type. We hope that it will elicit discussion, whether it is for or against the model. Future Directions The direction this research will immediately take will be an implementation using the simple environment we have described in this paper, followed by extensive testing. Throughout the research we have encountered many problems that seem to be reconcilable if we convert the model to use distributed representations instead of local ones. As such, that seems the logical direction in which to proceed, keeping in mind data gathered from the implementation and testing. Finally, in the farther future, we hope to expand the model to deal with more complex environments.

[1] J.R. Anderson, The architecture of cognition (Cambridge, MA: Harvard University Press, 1983). [2] J.R. Anderson & C. Lebiere, The atomic components of thought (Mahwah, NJ: Erlbaum, 1998). [3] B. Goertzel, M. Looks & C. Pennachin, Novamente: an integrative architecture for artificial general intelligence, Proc. of AAAI Symposium on Achieving Human-Level Intelligence through Integrated Systems and Research, Washington DC, August 2004. [4] M.R. Quillian, Semantic memory. In: M. Minsky (ed.), Semantic information processing (Cambridge, MA: The MIT Press, 1968). [5] A.M. Collins & E.F. Loftus, A spreading-activation theory of semantic processing, Psychological Review, 82(6), 1975, 407-428. [6] A. Bell & M.R. Quillian, Capturing concepts in a semantic net. In: E.L. Jacks (ed.) Associative Information Techniques (New York, NY: American Elsevier, 1971). [7] J.R. Anderson & G.H. Bower, Human associative memory (Washington DC: VH Winston and Sons, 1973) [8] D. Hofstadter and the Fluid Analogies Research Group, Fluid concepts and creative analogies: computer models of the fundamental mechanisms of thought (New York, NY: BasicBooks, 1995).

[9] C. Burgess and K. Lund , The dynamics of meaning in memory. In: E. Dietrich and A. Markman (eds.), Cognitive dynamics (Mahwah, NJ: Erlbaum, 2000, pp. 117–156). [10] S. Harnad, The symbol-grounding problem, Physica D, 42, 1990, 335-346. [11] D.E. Rumelhart, J.L. McClelland, & The PDP Research Group, Parallel distributed processing: explorations in the microstructure of cognition., Volume 1: foundations (Cambridge, MA: The MIT Press, 1986). [12] D.E. Rumelhart, J.L. McClelland, & The PDP Research Group, Parallel distributed processing: explorations in the microstructure of cognition. Volume 2: psychological land biological models (Cambridge, MA: The MIT Press, 1986). [13] E. Ruppin & M. Usher, An attractor neural network model of semantic fact retrieval, International Joint Conference on Neural Networks, 3, 1990, 683-688. [14] P.R. Van Loocke, The dynamics of concepts, a connectionist model (New York, NY: Springer-Verlag, 1994). [15] A.M. Collins & M.R. Quillian, Retrieval time from semantic memory, Journal of Verbal Learning and Verbal Behavior, 8, 1969, 240-247. [16] L.J. Rips, E.J. Shoben, & E.E. Smith, Structure and process in semantic memory: a featural model for semantic decisions, Psychological review, 81(3), 1974, 214-241. [17] J.A. Fodor, The modularity of mind: an essay on faculty psychology (Cambridge, MA: The MIT Press, 1983). [18] M.I. Posner (ed.), Foundations of cognitive science (Cambridge, MA: The MIT Press, 1989). [19] P.K. Kuhl, A new view of language acquisition, Proc. National Academy of Sciences USA, 97(22), 2000, 1185011857. [20] F. Ramus, M.D. Hauser, C. Miller, D. Morris, J. Mehler, Language discrimination by human newborns and by cotton-top tamarin monkeys, Science, 288(5464), 2000, 349-351. [21] A. Baddeley & L. Weiskrantz (eds.), Attention: selection, awareness, and control: a tribute to Donald Broadbent (Oxford: Oxford University Press, 1993). [22] E. Margolis & S. Laurence (eds.), Concepts: core readings (Cambridge, MA: The MIT Press, 1999). [23] E. Margolis & S. Laurence, Concepts. In: E.N. Zalta (ed.), The Stanford encyclopedia of philosophy (Stanford, CA: The Metaphysics Research Lab, Center for the Study of Language and Information, 2006).

Shared By: