Learning Center
Plans & pricing Sign in
Sign Out

This is a draft. I welcome comments and suggestions. Please dont


  • pg 1
									This is a draft. I welcome comments and suggestions. Please don’t quote without permission.

                                           Early and often

Suppose you find a bunch of old puzzle pieces in an unlabeled box. And you want to put them
together to see what they make. There aren’t many pieces, which should make your job pretty easy.
And what’s more, most of the pieces have different shapes and depict pretty different-looking parts
of the whole picture. But the catch is that because you don’t have the original box, you don’t know
what the assembled puzzle is supposed to look like. So how do you solve the puzzle? Well, you do
have a little bit of top-down knowledge you can bring to bear. For instance, if you’ve done a lot of
puzzles before, then you know what types of things they tend to depict: portraits, landscapes,
monuments, that kind of thing. But the best soluation is probably just to place a first piece
somewhere, then progressively take one piece at a time, and try to fit each one together with the
pieces you’ve already placed. If a piece looks like it fits in a particular spot, you place it there. If
there’s nowhere for it to go, you set it aside until the right spot for it appears.

This piece-by-piece process is in a way very much akin to what you do when you’re processing
sentences. You often don’t know where a sentence is going until you get well into it – sometimes
not until the very end. Moreover, just as the individual pieces of a puzzle have shapes and functions,
so words have particular forms (what they sound like or look like) and meanings (what they
contribute to the meaning of the whole utterance). And just as you might inspect and place puzzle
pieces one at a time, so you are exposed to the words in a sentence one at a time, and have to try to
integrate them one at a time with your ongoing best guess about what the sentence is. That’s because
when you’re reading or listening to language, you are engaging in a dynamical process – it unrolls
over time. You don't see or hear all the words in an utterance at the same time; they come to you
one at a time. First you hear or see the first word, and then the next word, and so on. And they
come with frightening speed – around three words per second.

But so far in this book, we've largely ignored the time dimension – the piece-by-piece, millisecond-
by-millisecond course of understanding sentences. Since language comprehension inherently
happens through time, there’s a lot we’ve been missing. How do you integrate the word you’re
hearing or reading right now with that you’ve previously heard? Do you begin constructing mental
simulation as soon as you hear each word? If so, you may end up following the wrong path, and
needing to start over with different assumptions. Or are you conservative with mental simulation,
waiting until you have enough information to be confident about what you're supposed to simulate?
Can mental simulation actually help you figure out what word you’re looking at or listening to?

In other words, we’ve been ignoring the details of exactly what people are doing as they hear or read
a sentence word by word, in order to know what to simulate. This dynamic process of figuring out
how a sentence is structured and what it’s about is usually known as parsing. Parsing is the process
of fitting together the words of an utterance, assembling them into larger units of grammar and
larger units of meaning. For instance, suppose you read the sentence The rabid monkey is gnawing on the
unconscious scientist. Without even worrying about the meanings of the words, you know that The rabid
monkey is the subject of is gnawing - the monkey is the one doing the gnawing. It’s the gnawer. And you
know that the unconscious scientist is the thing being gnawed – the gnawee. You know this because of the
order of the words – in sentences like this in English, the subject comes before the verb and the
object comes after the verb You also know that the adjective rabid applies to the monkey, and not
the professor (at least, for the time being) because adjectives in structures like this modify the nouns
that most immediately follow them. Parsing is the process of figuring out while you’re reading or
hearing these words how they fit together.
It turns out that people fit the parts of a sentence together in much the same way you would fit the
pieces of a puzzle together one at a time. To figure out where a particular puzzle piece you’ve just
picked up fits, one thing you can use is its shape. The shape of a puzzle piece determines what other
pieces it fits together with – a piece with a particularly bulbous tab sticking out of it will only fit
together with another piece with a matching blank. We can think of the shapes of puzzle pieces as
analogous to the grammatical properties of words. Words belong to specific parts of speech, which
only allow them to fit together, through grammar, in particular ways. When you hear or read a word,
you might know that it’s an adjective (like rabid), which tells you that it’s going to fit together with a
noun that it will modify. In parsing, you stick together words that fit together, one at a time.

But puzzle pieces don’t merely have shape; they also depict something. The coloration on their
surface contributes to the image produced by the puzzle as a whole. Interpreting the coloration on a
puzzle piece – or a set of them assembled in some way – is akin on our analogy to the meanings that
words provide. You know that a puzzle piece ought to be interlocked with another piece that its fits
together with not only in terms of its shape, but also its color, as above. In the same way, you know
that rabid modifies monkey not only because adjectives modify nouns, but also because rabid describes
a property of animals, and monkey denotes an animal. In parsing language, people use both of these
criteria – the grammatical categories of words and their meanings – to figure out how to integrate
them with the rest of the words they’ve heard into a coherent idea of what the utterance is telling

One piece at a time

Just like the puzzle-builder, who makes decisions about how to place each piece, one at a time, so
the language understander makes decisions about how to integrate each word as it’s read or heard;
that is, they parse incrementally. There’s a lot of evidence that we process language incrementally,
much more than can be covered here. One of the best pieces of evidence, though, comes from
garden path sentences, so-called because they lead you down a garden path to the wrong set of
expectations. For instance:

        Time flies like an arrow. Fruit flies like a banana.
                                   - Groucho Marx

In reading the second sentence, Fruit flies like a banana, you probably mistakenly took flies as the verb
– after all, it was a verb in the first sentence. You figured that fruit was the subject, and that you were
going to find out what fruit flies similarly to. But then you got to the banana at the end and realized
that, no, flies wasn’t a verb in this sentence but part of a compound noun – fruit flies. And like was
actually the verb. That’s right, you were led down a garden path because the grammar of the
sentence wasn’t what you thought it was.
Now, that example is a little messy, because there could be a lot of reasons why you expect flies to be
the verb. For instance, it could be that you expect the grammar of the second sentence to be like the
grammar of the first, but alternatively, it could be that you expect the meaning of the second to be
like the meaning of the first. But there are some cases where it’s unambiguously the meanings of
words that cause you to experience garden pathing. Consider, for example, the sentence:

        The lawyer cross-examined by the prosecutor confessed.

My guess is that, like most people, when you got to the by the prosecutor part, you did a mental double-
take. Why is this? Well, the trick here is that lawyers tend to cross-examine people. So when you get
to cross-examined, you think you’ve figured out the structure of the sentence. The lawyer is the subject,
and cross-examined is a very apt verb for it to be the subject of. But then you got to the by the prosecutor
and had to go back and reconsider who did what to whom. And we know that part of the reason
you were so easily garden-pathed here is that lawyers tend to do cross-examining. To prove this to
yourself, read the following sentence:

        The witness cross-examined by the prosecutor confessed.

This should have been much easier to understand – you might not have gotten garden-pathed at all.
Research shows that it would still be easier to parse, even if you hadn't just read another similar
sentence.1 The reason has to do with our semantic knowledge about witnesses and lawyers. We
know that cross-examining is the type of thing that lawyers tend to do, not to be subjected to, while
the reverse is true for witnesses. This world knowledge affects our decisions about how to parse
words in the moment-by-moment activity of understanding the sentences they’re part of. So it
appears that the parsing process uses aspects of meaning to come up with a word-by-word current
best guess of what the sentence is about.

One way to tell what people are doing as they’re parsing a sentence is to watch their eyes. One study
had people listen to sentences, while looking at an array of pictures.2 These pictures were of various
sorts of objects that differed in subtle ways. For instance, some of them were edible, like a cake and
an apple, while others, like a cat and table, were not. While they were looking at these arrays of
pictures, the participants heard sentences that described people acting on the objects. For instance, a
sentence might have been The boy ate the apple. What the researchers were interested in was where
people were looking as they listened to these sentences. In particular, they wanted to see whether
people looked more at edible objects when they came to the word ate, even before they heard the
name of the object that was being eaten. And this is precisely what they found - people were looking
more at the set of objects that fit with the meaning of the verb, even before they heard the name of
the relevant object. The straightforward interpretation of this result, and subsequent work that has
corroborated and extended it, is that people make predictions about what the rest of the sentence
will contain – semantically – as soon as words that they have already heard constrain what things
could reasonably follow.

Incremental simulation

So there’s lots of evidence that people assemble a representation of a sentence incrementally, word
by word. But when does simulation kick in? To go back to our puzzle analogy, we know that people
look at the shapes and colors of pieces early on, but when do they and start to extrapolate what it
could be that the pieces depict? There’s a range of possibilities, in principle. Perhaps they activate
mental simulation indiscriminately, simulating actions and percepts associated with the current word,
regardless of what else they’ve previously seen or heard. Or maybe they’re a little more cautious,
simulating only when they’ve decided what role a word they’ve just heard plays in the ongoing
utterance. Or, finally, they could be very stingy with simulation, and wait until all the pieces are in
place, or until the end of the utterance, to mentally simulate whatever the combined whole tells

You might think that one of the last two approaches would be the more efficient way to go. It might
help to do at least some parsing before you start to simulate. Before you can construct a mental
simulation to accurately depict The rabid monkey is gnawing on the unconscious scientist, you need to know
whether the word rabid applies to the monkey or the scientist, and the same for unconscious. You also
need to know what the subject of is gnawing is - is the monkey or the scientist the gnawer? So it
would make sense that before simulation of the described scene can take place, there must be a
phase during which the understander deals with the grammar of the sentence to arrange its parts
into sensible assemblies.

But on the other hand, couldn’t it be that as we process words incrementally, we also evoke mental
simulation incrementally? That is, could it be that we mentally simulate as each word comes in, and
while we're trying to parse a sentence into a coherent whole? Let's apply this idea to the garden-path
sentence we saw earlier, The lawyer cross-examined by the prosecutor confessed. When you hear the lawyer, you
might conjure up a mental simulation of a lawyer, though perhaps an unspecific one. Then comes
the word cross-examined. You've got your mental representation of a lawyer, and although the current
parse you have for the sentence is incomplete (and as it will later turn out, incorrect!), you go ahead
and attempt to mentally simulate the lawyer, standing in front of a witness stand, cross-examining.
Whom the lawyer is cross-examining is unclear, so your mental simulation might well exclude details
of what the cross-examinee looks like, or you might simulate a prototypical or stereotypical cross-
examinee. But when you then get to by the prosecutor, your parsing machinery grinds to a halt, and so
does your simulation machinery. And so, once you've collected enough information to re-parse the
sentence as it was intended – the lawyer is in fact the cross-examinee – you can re-simulate the
described scene, now with the lawyer on the witness stand, rather than in front of it. In the next
section, we’ll see some experimental research meant to tease apart exactly when you simulate and
what it is you’re simulating.

The Knob to the rescue

We can break down the issue about when you simulate into two parts. First, do you wait until the
end of an utterance to mentally simulation, or do you simulate in the middle of parsing? And
second, if the answer is that you simulate while the sentence is ongoing, do you simulate just what a
given word itself tells you to simulate – just looking at the individual puzzle piece and extrapolating
what it might depict? Or do you simulate what you think the word is contributing to the sentence as
a whole – do you place the piece in your growing puzzle and imagine what the whole might depict?

To answer these questions, we need to find a way to detect mental simulation over the course of
sentence processing. In fact, there’s a very widely used methodology known as self-paced reading
that provides just this. There are different variants, but the basic idea is that you display parts of a
sentence on a screen one at a time. But the parts don’t appear automatically – you have the person
reading the sentence perform some action to make the each piece appear on the screen. Typically,
self-paced reading uses a button, so a participant in an experiment might push the button and see
The and then push the button again to see rabid, and again to see monkey, and so on. Self-paced
reading is often used to detect where people have trouble processing sentences. The more difficult it
is for a person to understand a given word that has just appeared on the screen, the longer they’ll
take to press the button to display the next word.

Self-paced reading can also be adapted to tell us about mental simulation. Researchers at the
University of Rotterdam used a device you might recall from Chapter 3, known as The Knob.3 The
Knob is a very small device – one inch in diameter – that can be used to collect manual responses.
But as the name suggests, you don’t press it, you turn it. The Knob can be configured to act like a
button, so that in order to see the next part of a sentence you’re reading, you just rotate the knob a
very small amount – just 5 degrees. The neat thing about the knob is that you can have people rotate
it either clockwise or counter-clockwise. What the Rotterdam researchers did was to have people use
the knob to pace their own reading of sentences that described manual rotation either clockwise or
counterclockwise. For instance, the two sentences below describe actions involving rotation of the
dominant hand in opposite directions. (The slashes you see show the segments that were presented
with each 5-degree rotation of the knob.)

        Before/the/big race/the driver/took out/his key/and/started/the/car.
        To quench/his/thirst/the/marathon/runner/eagerly/opened/the/water bottle.

With sentences describing clockwise or counterclockwise motion, and responses involving one of
the same two types of motion, you can measure whether there are compatibility effects between the
sentence meaning and the manual response direction on any given portion of the sentence. The
findings were quite clear. When people were rotating the knob in a compatible direction, like the
clockwise rotation shown in the figure below, which is compatible with the direction of turning a
key in the ignition to start a car, they turned the knob fast. And when they were turning it in an
incompatible direction, they were slower. Crucially, this was only true of the rotation actions the the
readers made directly after the verb (for instance, start). When they looked at the words before and
after the verb, they found no difference, regardless of whether the knob was being turned in the
same direction as the object in the sentence.

The graph below depicts their results more clearly. What you see there is the average time it took
people to rotate the knob (either clockwise or counterclockwise) while processing different regions
of a sentence. The two lines depict cases where the direction of rotation implied by the sentence
matches or mismatches the direction of action the participant makes to see the next part of the
sentence. As you can see, there’s basically no difference at any of the regions, except in the Verb
region. If you look at the sentences above, you can see why this would be. The relevant verbs there
are started and opened. Neither of them necessarily describes rotation in a particular direction on its
own. But the stuff before the verb in each sentence sets up the context such that, when the verb
appears, it’s pretty clear that the sentence describes rotation in a particular direction.

So the main point of interest here is that as soon as the verb appears, implying manual rotation in a
particular direction, people mentally simulate rotation in that same direction. This answers our first
question above; people don’t wait until the end of a sentence to start simulating – they do it as early
as possible. But there are two other things of note. First, this effect is produced by words like open
and start that don’t in and of themselves denote rotation in a particular direction, only doing so
within the right context. The same researchers actually conducted another study, to see whether
words other than verbs can produce the mental simulation effects.4 The stimuli they created didn’t
imply rotation in one direction or the other until after the verb. As you can see below, it’s only on
the very last word of the sentence that you can infer which direction the carpenter had to rotate his
or her hand to turn the screw. And sure enough, they found the same match-mismatch difference,
on this last word of the sentence, but not on the words preceding it. So it appears that whenever in a
sentence you get distinguishing information about a mental simulation, you go ahead and mentally
simulate, whether the key word is a verb like started that’s describes the action, or an adverb like
loosely that’s only, well, loosely associated with it. This suggests an answer to our second question.
People aren’t merely simulating the in a word-specific way; they’re simulating as a product of the
word in the particular context it’s in.

        The carpenter/turned/the/screw./The boards/had/been/connected/too/tightly.
        The carpenter/turned/the/screw./The boards/had/been/connected/too/loosely.

Another important outcome of this study is the finding that the simulation effect is ephemeral – by
the time the reader is on to the next part of the sentence (“Post-verb1” in the chart above), there’s
no measurable difference between the matching and mismatching conditions. This last point is
worth thinking a little more about. Why would it be that the measurable simulation of rotating your
hand in a particular direction would disappear immediately after the verb?
Here’s one possibility. If you look at the sentences about starting cars and opening water bottles,
you’ll see that the part of each sentence immediately after the verb indicates the thing that the action
was performed on. At this point, people reading the sentences don’t appear to be representing
manual rotation anymore. Why not? Maybe what’s happening is that with each new word that comes
in, you shift your attention to whatever the new information is that it’s providing, perhaps accessing
the appropriate mental simulation. For instance, in the sentences above, the first word after the verb
is the. Maybe what the tells you is to prepare to mentally simulate some object, perhaps motor aspects
of how you interact with it, or other aspects, like its visual appearance, tactile feel, and so on. Maybe
if the next word were something related to the manual rotation described by the verb, you would be
more likely to continue simulating the rotation as you processed it. But if it’s car, then this draws
your mental simulation capacity to other motor and perceptual things you know about cars. As it
turns out, the Knob has something to tell us about this, too.

What the Knob’s handlers did was to come up with new sentences to present to readers, who once
again knobbed their way through the sentences. The sentences were similar to those in the previous
experiment, except that they now had a specific word after the verb, an adverb. There were two
types of adverb, as shown in the sentences below. One type described the manner of the action
(such as rapidly, slowly, or quickly), as in the first sentence below. The other described an aspect of the
mental state of the person performing the action (like hungrily, obediently, or happily). They figured that
if the adverb was about the person’s mental state, and not the action itself, then this would draw
readers’ mental simulation away from the rotation itself. But if the adverb was about the action, then
they should continue simulating the action. What they found was that adverbs that described the
manner of action, like rapidly, sustained focus of the mental simulation on the action itself even after
the verb – people were faster to perform matching than mismatching actions on both the verb and
the subsequent adverb. But with adverbs like hungrily, which pertain to the emotions or mental states
of the person performing the described action, people showed a significant match-mismatch
difference only for the verb, and not the subsequent adverb. In other words, how long you keep
simulating action depends on whether the following words draw your attention away from that
aspect of the described scene.

        He was/craving a /juicy/pickle./On the/shelf, he/found a/closed jar/which he/opened/rapidly
        He was/craving a /juicy/pickle./On the/shelf, he/found a/closed jar/which he/opened/hungrily

A different strand of research that came out of our lab actually addresses the question of the
temporal dynamics of simulation as well, even though it started out focusing on a very different
issue. We wanted to know whether mental simulation works the same in other languages as it does
in English. A graduate student, Manami Sato, was working on Japanese, tried to replicate one of the
basic experiments described in Chapter 3. This was the one where people read sentences that
mention objects and imply that they have certain shapes. For instance, each of the sentences below
implies a different shape for the egg, namely the shape to its right.

        Nana put the egg in the fridge.
        Nana put the egg in the pan.

The original finding using English sentences like these was that people were faster to press a button
saying that the depicted object had been mentioned in the sentence when the shape in the picture
matched the one implied by the sentence.5 Sato tried to replicate this finding with Japanese
sentences, expecting to find the same thing – there’s no reason to think that Japanese should be less
coercive in driving simulation of objects’ shapes than English is. But she was surprised to find that
there was no difference in participants’ reaction times, regardless of whether or not the picture of
the object matched the one implied by the sentence. At first we were baffled, so we looked closely at
the sentences we used as stimuli in Japanese. They looked like this:

        Nana-ga reezooko-nonakani tamago-o ireta
        Nana fridge-inside        egg       put
        “Nana put the egg inside the fridge”

        Nana-ga huraipan-nonakani tamago-o ireta
        Nana pan-inside           egg      put
        “Nana put the egg in the pan”

Then it struck us that perhaps we were testing for mental simulation at the wrong time. After all,
Japanese sentence have a different word order than English ones. In English, the verb (like put)
usually comes in the middle of a sentence, and the direct object whose shape we were testing for
(like egg) comes later. But in Japanese, the order is reversed, as you can see from the sentences above.
In Japanese, you build up an idea of what objects are where and what shape they have pretty early in
the sentence. By the time you read the word tamago, meaning “egg”, you have a pretty good idea
what shape it has – you’ve already seen that it’s “in the pan” or “in the refrigerator”, and eggs don’t
tend to be placed into pans intact any more often than they’re cracked open into refrigerators. So
when the Japanese reader finally gets to the verb ireta, meaning “put”, at the very end of the
sentence, there’s no need to perform any more visual simulation of the shape of the object. Instead,
this verb might trigger simulation of something else. For instance, it might yield motor or visual
simulation of the action of putting the egg in the described location.

Sato wanted to test this proposed explanation of the otherwise mysterious difference between
English and Japanese. So she made a little modification to the methodology, changing where in the
sentence the picture appeared. If our explanation was right, then moving the picture earlier in the
sentence – so that instead of coming at the end, it directly followed the object noun (e.g. the word
tamago, meaning “egg”) – this would lead to a difference between the matching and the mismatching
pictures. So Sato conducted another experiment, using the same kind of sentences, but this time
people saw the picture before the verb and had to decide whether it had been mentioned in the
sentence. So, for instance, they would see:

        Nana-ga reezooko-nonakani      tamago-o                   ireta
        Nana    fridge-inside          egg                        put
       “Nana put the egg inside the fridge”

       Nana-ga huraipan- nonakani tamago-o                       ireta
       Nana    pan-inside         egg                            put

       “Nana put the egg in the pan”

When we looked at the results, all of a sudden, the missing effect was back – people responded
faster to the images when they matched the shape implied by the sentence than when they didn’t.
We can interpret this as evidence from a different methodology that people construct mental
simulations on a word-by-word basis. When they get a key bit of information about what to
simulate, they go ahead and do so. But this simulation is short-lasting, perhaps by necessity. As each
new word comes in, it has to be dealt with, as its own contribution to a described scene is mentally

So far, the story that has emerged suggests that you mentally simulate early, often, and briefly. This
could mean that on occasion you mentally simulate one thing, because you’re pretty certain that you
know what’s being described, only to have to revise your simulation entirely when some new word
appears later in the sentence. For example, imagine you’re a Japanese speaker, and you read:

       Nana-ga reezooko-nonakani tamago-o otoshita
       Nana fridge-inside        egg      dropped
       “Nana dropped the egg inside the fridge”


       Nana-ga huraipan-nouede tamago-o korogashita
       Nana pan-on               egg    rolled
       “Nana rolled the egg on the pan”

These sentences imply unusual shapes for the objects in the mentioned locations – you expect an
egg in a pan to be broken, not capable of rolling. But because in Japanese the verb comes at the end
of the sentence, you don’t actually know until the very last word that the shape of the object is the
unexpected one – that the egg in the fridge is actually broken, or that the egg in the pan is actually
intact. We already know from the experiment described above that prior to the verb, people have
constructed mental simulations of the shapes of objects. In order to understand these sentences,
when the verb appears at the end, they might actually have to construct an alternative simulation of
the object, one with a different shape. So Sato used sentences like the ones above, and presented the
picture this time at the end of each sentence. And sure enough, the match-mismatch effect was
back. People responded faster to the shape depicting the final, implied shape of the object than to
the one that would have seemed more reasonable just one word before. In other words, people
quickly correct and revise the mental simulations they construct when new information overrides the
assumptions they had made previously, and that they had based previous simulation on.

These studies with English and Japanese show that people don’t wait until the end of a sentence to
mentally simulation. They make best-guesses about the motor or perceptual content that’s described
as soon as they have enough information to do so. These mental simulations are short-lived; they
last only as long as the sentence maintains focus on that same aspect of the scene it describes. And
when people find that they’ve simulated in error, they go back and evoke revised their mental

And then?

At this point, if you’ve been paying close enough attention, you might be scratching your head.
We’ve now seen two sets of results that seem to contradict one another. The first stretches back to
the early chapters of the book; we’ve seen again and again that a variety of measures, taken at the
end of sentences, show people mentally simulating. For instance, the very first Knob study,
described back in chapter 3, showed that if you have people make judgments about the
meaningfulness of sentences after reading them, by twisting the Knob one way or the other, they do
so faster when the action described by the sentence is in the same direction. But just above, we saw
that while you conjure up mental simulation during a sentence of particular elements of the scene,
like objects or actions, this simulation is soon obscured by simulation effects produced by the next
piece of language in the sentence that you see or hear. If a simulation that emerges during a sentence
subsequently disappears when you get to the next word, then why do we see simulation effects at the
ends of sentences?

The short answer is that we don’t know. But that doesn’t mean we can’t conjecture. For example,
some people have suggested that you go through several steps during sentence processing. There’s a
first stage where you build up the pieces of the sentence from what you hear or read. And then
there’s a second stage where you re-simulate what it was that you were supposed to mentally
simulate.6 This second stage is often referred to as wrap-up.

You can see why performing a wrap-up simulation would be useful in principle. You go through a
lot of twists and turns in simulating as you read or listen to a sentence, but sentences often convey
overriding ideas that you don’t want to jumble up. So once you’re certain you’ve come to a viable
interpretation of a sentence, perhaps a recap is in order. You could use wrap-up for a number of
purposes. For one, it might be something stable and reliable you can use to update your beliefs
about the world. You might not want to rely on the demonstrably ephemeral simulations that you
experience in the middle of a sentence; by contrast, the scene that a sentence ultimately describes
and that you construct in wrap-up might be more useful for this purpose. For the same reason, you
might prefer to use wrap-up (over the short-lived, ephemeral, and often misguided simulations
produced during a sentence) when preparing to act in response to an utterance. The appropriate
response generated in the middle of a sentence like There’s a bomb in my office… might not seem so
appropriate when you get to the end of the sentence …clip art collection.

Recap simulations might be particularly well suited for these purposes, not only because they’re
more likely to be accurate, but also for another reason. They can in principle last longer than
simulations performed during the course of sentence comprehension, because they’re less likely to
updated or changed on the basis of subsequent input. Holding a stable and (likely) accurate
simulation firmly in mind for a while could be quite useful if you want to do use your understanding
of an utterance for some purpose. Right now, the relation between what you do while you’re
processing a sentence and the mental simulations you perform subsequent to it is a ripe area for
future study.
Say it ain’t so

We’ve seen in this chapter that mental simulation in comprehension is dynamic – as you process
word after word of a sentence, you construct mental simulations of your current best guess of what’s
being described. But you’ve probably also had the experience of coming to the end of a sentence,
only to have to do more mental work to figure out what the sentence actually means. For instance,
suppose you’re hiring someone, and you’re scanning through a letter of recommendation for a
particular candidate, when you come upon a sentence like this: I can’t recommend this candidate any more
strongly. Your first impression might be that the recommender is offering their strongest support for
the candidate. That is, that she can’t recommend the candidate any more strongly because there are
no words strong enough to express how unfathomably amazing the candidate is. But upon further
reflection, you might be struck by the possibility that the recommender is actually saying that she
can’t recommend the candidate any more strongly because she doesn’t know much about the
candidate, or that there’s nothing good to say about him. The point here is that the process of
understanding a sentence continues to be dynamic, even after the last word is heard, or the last
punctuation mark read.

To bring this back to our puzzle analogy, assembling a puzzle can go through a similar sequence of
steps, if it depicts something complex. So imagine that you’ve got this puzzle, and you go ahead and
place all the pieces, guessing along the way about what the whole puzzle might be about. But you
can’t figure it out. Then, once you get to the end, you take a look at the assembled image, and it
looks like this:

Now you have to go through a series of simulations to figure out exactly what’s going on in the

Some language makes you similarly go through a set of simulations after the sentence ends. And
nowhere is this more obvious that than in negated language. But before we get into the details, a
word is in order about negation. So far, we've focused only on language describing purportedly
factual, concrete scenes. But every language in the world also provides means to describe events that
didn't happen, won't happen, or aren't happening. English carries several sorts of negation in its
grammatical arsenal. An entire sentence can be negated like so: The shortstop didn't hurl the softball at the
first baseman. Parts of a sentence can be individually negated: No shortstop hurled the softball at the first
baseman or The shortstop hurled not one softball at the first baseman, and so on.

Negation is of special interest to accounts of language understanding that use simulation, because it
is a particularly challenging case. That’s because even if there is convincing evidence that people
perform mental simulation when understanding language about things happening, like softball
players throwing things at each other, it’s not immediately obvious how the same process would
apply to language about things not happening, like softball players not throwing things at each other.

What – if anything – do people mentally simulate when they process negated sentences? One
promising idea is that negation is processed through several stages. The basic idea is that when
negated utterances are processed, the first scene to be simulated is the counterfactual scene – that is,
the scene described as not true. So, for instance, in processing Your birthday presents aren't on top of the
refrigerator, the counterfactual situation is the presents in fact being on top of the refrigerator. The
hypothesis is that an understander first simulates this scene. But this counterfactual simulation is
subsequently suppressed or modified, which leads to the second proposed stage in the course of
processing. This is the activation of the factual scene – the scene that the sentence is saying is actually
true. In our example, this would mean a simulation of the present not on top of the refrigerator, but
somewhere else, as you can see represented schematically below:

time →

Your birthday presents aren’t
on top of the refrigerator

[sentence]                      [counterfactual simulation]                    [factual simulation]

So, to be clear, the idea is that when you process negated language, you go through a two-step
process, where you first mentally simulate what’s not supposed to be true (the presents on the
fridge) and then simulate what is supposed to be true (the presents not on the fridge). How can we
tell if this is what people are actually doing? Let's begin with the counterfactual situation, which has
been suggested to be activated first during processing. In recent work, German psychologist Barbara
Kaup and her colleagues presented people with negated sentences, like There was not an eagle in the
sky.7 They used the same method described above in the study on Japanese; the sentences mentioned
entities (the eagle, for example) that could differ in their shape, and then 250msec later a picture that
showed that same entity in one of the two shapes. In this case, an eagle in the sky would have
outstretched wings, while another image of an eagle would depict it with its wings folded against its
sides. The question was which image people would process faster. With affirmative sentences, the
outcome is clear – the There was an eagle in the sky produces faster responses to the flying than the
sitting eagle. But the negated version of the sentence, There was not an eagle in the sky, explicitly states
that there is no such flying eagle. Which picture do people process faster in this case? The results
showed significantly faster responses to images when they matched the shape of the object implied
in the sentence's counterfactual situation – that is, of the flying eagle – than when they did not. In
other words, right after a negated sentence, you construct a mental simulation of the counterfactual
scene. If the sentence tells you that there was not an eagle in the sky, you nevertheless simulate a
flying eagle.

There’s other evidence pointing to this same conclusion that comes from a different methodology, a
semantic priming method.8 Rachel Giora and her colleagues had participants first read a sentences of
one of three types (all of which are below): an affirmative sentence containing a priming word (like
sharp below); a negated version of the same sentence; or an antonym sentence – an affirmative
sentence using an antonym of the priming word (like blunt). (These are actually translations of the
original sentences, which were all presented in Hebrew to Hebrew native speakers.)

        Affirmative       This instrument is sharp.
        Negated           This instrument is not sharp.
        Antonym           This instrument is blunt.

Subjects indicated they had understood each sentence by pressing a button, and 100msec later, they
saw a sequence of characters that either formed a word or did not. Their job was to decide if this
target word was a well-formed word in their language, so this was what’s called a lexical decision
task. In the critical trials, the target word was a real word that was either related to the priming word
(e.g. piercing) or unrelated to it (e.g. glowing). What Giora and her colleagues found was that
affirmative sentences lead to faster responses to related prime words than unrelated prime words.
This isn’t surprising – we know from lots of studies on semantic priming that words are processed
faster after other words that have related meanings. Interestingly, though, the negated sentences did
the same thing, suggesting that even when a word appears in a negated sentence context, it still
activates mental representations of its meaning. And what’s more, we know that this priming in
negated sentences is not due to the factual situation (where the instrument is blunt) being activated,
because in the antonym condition, the related words were not responded to any faster than the
unrelated words were. That is, just activating the idea of bluntness does not prime the word piercing,
but not sharp does prime the word piercing. This suggests that the effect in the Negated condition did
not result from activation of the factual situation, but of the counterfactual situation.

Now, you might be skeptical, and think that this finding alone doesn’t necessarily mean that people
are performing mental simulations when they process negated sentences. It could just that the word
sharp makes you respond faster to the word piercing because their meanings are related, regardless of
what mental simulation you do or don’t perform. And you would be quite justified in thinking this.
But it does at least jibe with the previous study – the one with the eagle not in the sky – in showing
that in negated sentences, the counterfactual situation is indeed accessed right after the sentence is

So these different studies both give us evidence for the first part the hypothesized two-step process
– negated sentences activate mental simulations of the counterfactual situation. Is the factual
situation mentally simulated next? One way to tell what happens at different stages in processing is
to probe people’s mental simulation at different timepoints. The two studies just mentioned
presented the image 250msec and 100msec after the end of the sentences, respectively. But if you
wanted to know what was happening later, you might present the image 750 or 1500msec after the
end of the sentence. This is what Kaup and her colleagues did.9 They made sentences, like those
below, again mentioning objects that could be in different states.

They again had people read the sentences, then push a button to indicate they understood the
sentence. This button press then triggered the presentation of an image on the screen – of the object
in one of those two states. But they introduced a delay of 750 or 1500msec before each picture.
What they found was exactly in keeping with the two-step account. Below, you see how long it took
people to respond to the picture, saying that it had been mentioned in the sentence. On the far left is
the outcome from Kaup’s original study, where the delay before the image appeared was 250msec.
as you can see, people responded faster to the counterfactual image. But at the 750msec delay, there
was no difference between the factual and counterfactual images. And by 1500msec, the trend had
reversed itself – people were now faster to say that the factual image had been mentioned than the
counterfactual one. This is pretty compelling evidence that when you process a negated sentence,
you first construct a mental simulation of the counterfactual scene, and then move on to the factual

So understanding negated language is dynamic – the simulations you construct to understand it
change over time. And both parts of the process can been seen as serving specific functions. The
factual component is in a way the most important, because it encodes the critical information that
allows the understander to act appropriately. If someone tells you that Your present isn’t on the
refrigerator (and if you believe them) you won't expect it to be there and won’t look there for it. You
might instead look for it elsewhere, or ask the speaker Well, where is it then?, reflecting your belief that
it isn't in fact on top of the refrigerator.

So if the factual simulation does all this work, what good is the counterfactual simulation? Here are a
couple of ideas. First, it could be part of the process that allows you to determine what the factual
situation actually is in the first place. By phasing out your simulation of the counterfactual scene (for
instance, the present atop the fridge), you can ensure that what you end up building as your factual
scene is an accurate representation of what the speaker intended. Namely, it has to not have
whatever the negated properties of the counterfactual scene are. Second, by going through the
counterfactual phase, you actually take a particular type of trajectory in understanding an utterance
that is different from what you would do if you just heard an affirmative sentence describing the
same factual scene. Compare the umbrella isn’t open with the umbrella is closed. You might end up with
the same factual simulation in processing these two sentences (of a closed umbrella), but you will
have taken different paths to get there. And just as it makes a difference whether you cheated your
way to a 1600 on the SATs, or whether you studied assiduously for months to produce the same
outcome, so it might make a difference what the cognitive path is that you traverse to get to a factual
simulation. The experience you have of simulating and then rejecting a counterfactual simulation
might be part of what it is to feel like you understand a negated utterance.

Meaning on the run

There’s actually something quite profound about this idea – that the process of understanding might
be as important as the outcome. And you can see this not just in negated sentences, but pervasively
throughout language.

Take an example sentence like this

        My sister was a track star.

You can say it in different ways, with different intonation to make different points. (Intonation is the
tune, emphasis, and speed of an utterance.) For instance, if you’re saying it to explain how your
sister could have possibly just gone out and run a marathon, right after giving birth to her second
child, you might put emphasis on track star, so the sentence sounds like this:

        My sister was a TRACK STAR.

But you can say the sentence in other ways. Suppose your sister was indeed a track star in the past,
but has since taken to sitting on the couch and watching daytime television, and her friend is asking
you to confirm that your sister was indeed at one point the athlete she claims she was. Then you
might say:

        My sister WAS a track star

What’s the difference in how people will understand these two versions of the same sentence? One
idea, that a graduate student working with me has been pursuing, is that when you emphasize was, it
tells the understander that the thing the sentence describes used to be true, but isn’t anymore. It
could be, and her dissertation research has begun to show this, that sentences like this are like
negated sentences, in that they drive you to go through several phases of simulation. Instead of a
counterfactual and then a factual simulation, though, sentences like My sister WAS a track star lead
you through a simulation of the formerly true scenario (the sister being a track star), followed by a
simulation of the implied, different one (her no longer being a track star). This contrasts with the
one-stage process you might use to understand the more neutral My sister was a TRACK STAR,
where you simulate only the sister as a track star – with this bland intonation, there’s no implication
that the sister has stopped being a track star, so no need for a modified second phase of simulation.

Or consider hyperbole. Suppose your son tells you:

        I’m so hungry I could eat a goat.

He probably doesn’t really mean that he wants you to prepare him a meal with the substantial caloric
value of an entire adult goat. No, he just wants you to infer that he’s very hungry. How do you get to
this conclusion? Well, again, it could be that you pass through a phase in which you do simulate the
described, but unrealistic, scene of your son going to town on a goat carcass, after which point you
come to the more sensible conclusion that your son is hungry, and perform an appropriately better
mannered simulation.

And it’s not just hyperbole; many other sorts of figurative language offer themselves as excellent
candidates for multi-phase simulation. You might recall metonymy, for example, from Chapter 6.
This is where you use a word that denotes something to refer to something else related to it. For
instance, you can refer to a job candidate as red hat because he was wearing a red hat at the interview.
When you use metonymy like this, you are probably leading people through a process of simulation
that first evokes a mental simulation of the red hat, followed by the rest of the person underneath it.
If you call him something else, say freckle-face, then your simulation will follow a different course.
Even though red hat and freckle-face both lead you to identify the same person, the course of
simulation that you traverse to get there might be quite different.

This dynamicity might be quite important for understanding how meaning works. Note that we’re
stepping here away from what the experimental research itself says, and into a fully more reflective
mode. But as we’ve seen in this chapter, understanding is dynamic, in at least two ways. First, as you
hear or read words, and parse them incrementally, you construct mental simulations of your current
best guess of what they contribute to simulation. In puzzle terms, with each new piece, you
daydream about what you now see in the puzzle. And second, some language makes you go through
a series of steps during simulation, for instance, negated sentences lead you to mentally simulate first
a counterfactual and then a factual scene. That is, when you finally put some puzzles together, you
need to go through several steps to understand them. Sometimes the assembled puzzle just makes a
new puzzle.

But this is quite different from how we typically think of meaning. Usually, when we want to know
about meaning, we ask things like what does X mean? and expect the answer to be a description of a
thing or an event, or an idea, as if there were a description that could capture what a word or phrase
or utterance means. I suspect that’s how most people who have never formally studied meaning see
things. But professionals who develop technical theories of meaning almost to a person make the
same assumption. Linguists try to come up with the categories or logical symbols or descriptions
that can aptly describe what it is that words and utterances mean.

But the thing or event or idea that language refers to doesn’t quite capture the entirety of what goes
on when people understand that word or phrase or sentence. The way that you understand freckle-face
and red-hat – the processes that you go through to get from the words to a mental simulation of the
person referred to – are different. It’s true that you can come to the same endstate in the different
ways – the same puzzle can be cut into different pieces. But you have different experiences of
getting there. This means that your subjective experience will be different – in one case, you might
experience a simulation of the guy having the different properties that he’s being referred to in terms
of. You might be more likely later to remember him having those properties. And you might make
different inferences based on those properties (like about his race, or the weather, or what college he
went to).

In other words, maybe what does X mean is the wrong question, or at best maybe it’s only part of the
question. Perhaps the real question is what are the understanding processes that X invokes? Or, to return to
the puzzle analogy, it might be that how you ultimately interpret what the puzzle looks like is only
part of the story. How you put the pieces together, and what steps you go through once you’ve
assembled it may be equally important. And when you think about it, if the picture on the front were
the only interesting part about a puzzle, then you’d never bother to open the box in the first place.

          Trueswell, Tanenhaus, & Garnsey (1994)
          Altmann & Kamide (1999)
          Zwaan & Taylor (2006)
          Taylor et al. (2008)
          Zwaan et al. (2002)
          Townsend & Bever (2001), Zwaan & Taylor (2006)
          Kaup, Yaxley, Madden, Zwaan, & Lüdtke (2007)
          Giora et al. (2004)
          Kaup, Luedke, & Zwaan (2006)	

To top