Prediction and embodiment in dialogue Martin Pickering University of Edinburgh Embodiment • Many researchers assume that cognition is “embodied” (or “grounded”) rather than “abstract” (e.g., Barsalou, 2008) – Activates representations associated with the body and actions • Much of this work argues that language is embodied (e.g., Barsalou, 2008; Glenberg, 2008; Zwaan & Taylor, 2006). • Similar claims coming from experimental psychology and from neuroscience (e.g., Gallese, 2008; Pulvermüller, 2005) Embodiment of form vs. meaning • Embodiment of meaning – cf. simulation at the content level (Gallese, 2008) – Processing (producing or comprehending) walk involves the use of representations involved in the act of walking – Provides a component of meaning (no need to make “strong” claim that all meaning is embodied) • Embodiment of form – cf. simulation at the vehicle level (Gallese, 2008) – Comprehending language involves the use of representations involved in the act of producing language • Definitionally true for producing language (of course) Embodiment of meaning • People’s representations of scene descriptions incorporate spatial perspective (e.g., Bransford & Johnson, 1973) • Sense-judgements for a sentence involving movement (close the drawer) faster if the response involves movement in the same direction (away from the body) than opposite direction (towards the body; Glenberg & Kaschak, 2002) • Participants turned a knob to present sentences a word at a time. They were faster to read words (turned down) when they turned the knob in the direction implied by the words (anticlockwise) • MEG study showed activation of appropriate motor areas within 170ms (“foot area” for kick, “mouth area” for eat; Pulvermüller et al., 2005) – Compatible with speed of word recognition, hence seems to occur “on line” Embodiment of form • Listeners activate appropriate tongue/lip muscles while listening to speech but not non-speech (Fadiga et al., 1995; Watkins et al., 2003) • Large overlap in cortical areas activated during speech and passive listening (Pulvermüller et al., 2006; Wilson et al., 2004) • Activation of brain areas associated with production during aspects of comprehension from phonology (Heim et al., 2003) to narrative structure (Mar, 2004) Effects of embodiment • Consider effects on overt behaviour • Effects of form and meaning • Two kinds of effects – Overt imitation – Complementary responses “Slap!” • Meaning: – Overt imitation: produce act of slapping – Complementary response: act of flinching • Form: – Overt imitation: utter “slap” as well – Complementary response: utter “his face” – Note that imitation corresponds to the action, and the complementary response corresponds to the immediate response to that action Imitative and complementary activation of embodied meaning • Already discussed standard “embodiment” evidence (Glenberg & Kaschak, 2002, etc.) • But also complementary activation – Participants presented with words referring to small or large objects, and this affected hand aperture in a subsequent grasping task (Glover et al., 2004) – Also evidence of complementary activation from neuroscience • e.g., words for graspable objects activating regions associated with grasping (see Martin & Chao, 2001) Imitative and complementary activation of form • Imitation: evidence from “alignment” effects in dialogue – Tendency to repeat words (Brennan & Clark, 1996), syntax (Branigan et al., 2000) – see below – Effects are extremely rapid (Fowler et al., 2003) Alignment of syntax (Branigan et al., 2000) Box of selected cards GIVE “The chef giving the jug to the swimmer” Participant Confederate Confederate script Box of cards to be Branigan et al., 2000, Cognition described Cleland & Pickering, 2003, JML 10 Procedure • Confederate describes card: The chef giving the jug to the swimmer • Participant selects the card that matches this description • Participant picks up top card from her box: HAND • Participant describes card: “The cowboy handing …” Experiment • Confederate says • Either The chef handing the cake to the swimmer • Or The chef handing the swimmer the cake • Participant describes card The cowboy handing the banana to the burglar Or The cowboy handing the burglar the banana HAND Same vs. different verb • 4 prime conditions: PO-same: The chef handing the cake to the swimmer DO-same: The chef handing the swimmer the cake PO-different: The chef giving the cake to the swimmer DO-different: The chef giving the swimmer the cake The cowboy handing the banana to the burglar 100 80 % Participant says confederate says banana to burglar 60 jug to swimmer swimmer jug 40 20 0 same verb different verb Alignment between languages Confederate says Confederate says El taxi persigue el camión El camión es perseguido por el taxi “The taxi chases the truck” “The truck is chased by the taxi” Participant tends to say Participant tends to say The bullet hits the bottle The bottle is hit by the bullet • Interlocutors align on language-independent representations - facilitating rapid shifts between languages • Relatedness between languages can enhance priming - when verbs have same meanings, when word orders are the same Hartsuiker et al. (2004) Schoonbaert et al. (2007) Bernolet et al. (2007) Psychological Science Journal of Memory and Language JEP:LMC Imitative and complementary activation of form • Complementary activation – Addressees complete speakers’ contributions • A: and number 12 is, uh, … B: chair. (Clark & Wilkes-Gibbs, 1986) – People faster at naming word or picture after a syntactically compatible context than otherwise (Griffin & Bock, 1998; Tyler & Marslen-Wilson, 1977; Wright & Garrett, 1984) • As they glide gracefully over the city, flying kites ARE vs. IS • Are is complementary to kites (plural verb not noun) Why? • Why do we appear to get imitative and • For both form and meaning embodiment, why do we sometimes get imitation and sometimes complementarity? – Functional explanation: presumably sometimes useful to imitate, sometimes useful to behave in a complementary fashion – Mechanistic explanation: suppression of one’s own responses after they occur (e.g., Dell, 1986; see Hartsuiker et al., 2005). Similarly, people may activate then if necessary suppress imitative responses • But why do we get any of it at all? – One important purpose appears to be to aid prediction Covert simulation • Much evidence for motor involvement during perception • Massive literature on mirror neuron system (e.g., activation of same neurons during behaviour and observation) that appears to be goal-directed (e.g., Rizzolatti, Gallese, Iacoboni) • Interference between moving arm and watching other person’s arm movement but not robot arm movement (Kilner et al., 2003) – Encoding another’s movements using one’s own motor programs Covert simulation in language comprehension • Already suggested that people activate form and meaning representations during comprehension – Form: activation of tongue/lip muscles and speech-related areas during listening, etc. – Meaning: effects of motor tasks on comprehending action sentences, activation of motor areas during processing of action words Simulation for prediction • Why does such simulation occur? – For overt imitation? • But monkeys have mirror systems, yet don’t ape • Instead, overt imitation appears to be a consequence of covert simulation (and of course serves as evidence for covert simulation) – To aid action identification, understanding, and memory? (“postdictive” simulation) • clearly may occur (e.g., in rehearsal) • but perhaps not only purpose • Or to aid prediction? • Emerging view in cognition (e.g., Prinz, 2006; Wilson & Knoblich, 2005), development (e.g., Csibra, 2007), cognitive neuroscience (e.g., Frith, 2007), computational speech processing (Moore, 2008) • When understanding language, such prediction could involve simulation of form or meaning Prediction (Wilson & Knoblich, 2005) • Prediction gets you “ahead of the game” but only if the target is sufficiently predictable. Two main types: • Predictable physical movements – Including acceleration, rotation etc. – We can (fairly) reliably predict where objects will be ahead of time • Predictable behaviour of other people – Again, we can (fairly) reliably predict some aspects of their behaviour How do we predict other people? • Experience observing others? – This is one possibility and clearly does occur • Experience of our own behaviour? – Works if we are sufficiently like others (which we are in many respects) – Therefore use representations of own behaviour as proxy for others’ behaviour – Such simulation can be fast, because we have the relevant mechanisms in place (see below), and is arguably non-inferential • Much evidence that people predict each other’s behaviour by working out “what would I do under these circumstances” – e.g., better at predicting outcomes of own behaviour (e.g., dart throwing) than others’ behaviour – Hard to explain all this evidence by prior perception of one’s own behaviours Emulation (Grush, 2004) • A forward model of an external system that runs simulations of that system in real time (Desmurget & Grafton, 2000; Wolpert, 2001) – Before moving your arm you model the path it should take – If it deviates, you correct accordingly – More rapid than feedback, and works in the absence of feedback – Motor system uses emulators extensively to determine if subsequent movements are correct – Presumably emulation is also used in monitoring language production (but our current interest is comprehension) How might prediction occur? • Perception covert motor simulation • Simulation drive emulators • The perceptual system can use such emulators to make predictions when perceiving the behaviour of other people (Wilson & Knoblich, 2005) – Because their behaviours are largely the same as the perceiver’s • Of course this is the case in language comprehension – So people can emulate using language production mechanisms – At different linguistic levels (words, grammar, meaning …) – Particularly strongly in predictable contexts (“high-cloze”) • Some such emulation can relate to embodied meaning – Comprehenders predict the motor activation that would occur if they used those words Prediction of form in comprehension of monologue • DeLong et al. (2005): – The day was breezy so the boy went outside to fly a kite (predictable) – The day was breezy so the boy went outside to fly an airplane (unpredictable) – People predict kite and that it begins with a consonant larger N400 on an than a • Van Berkum et al. (2005): prediction of gender (in Dutch) – The safe … was situated behind a big … – Disrupted (reading time and N400) when big has wrong gender for painting. • Anticipatory eye movements in scene perception (Altmann & Kamide, 1999) • Prediction of grammar (e.g., Lau et al., 2007; Staub & Clifton, 2006) • Predicting when others’ utterances are likely to end, based on the meaning of the utterance (de Ruiter et al., 2006) • Pickering and Garrod (2007) proposed that the production system acts as an emulator during language comprehension – Emulator continually predicts the next element using the results of simulation at different levels (meaning, grammar, sound …) – Predictions depend on how constraining the context is at each level – Also emulation assists in dealing with noisy input (e.g., phoneme restoration effect) • Will activate both the current word, grammar etc. – Potentially leading to overt imitation • And the predicted word, grammar, etc. – Potentially leading to complementary responses • Pickering and Garrod focused on prediction of form – With “meaning” not referring to embodied action representations – But similar emulation of motoric representations presumably occurs Prediction of meaning in comprehension of monologue? • Claim is that comprehenders should predict embodied meaning – e.g., effects such as Zwaan and Taylor (2006) should occur predictively (given strong context) – Not tested yet Speech Input Step 1 Harry went out to fly his red … Step 2 Harry went out to fly his red f … Step 3 Harry went out to fly his red fl … Step 4 Harry went out to fly his red fla … Step 5 Harry went out to fly his red flag Language Input Input Ø Ø Ø Analysis System /f/ Ø Ø analysis /fl/ /flæ/ Ø Ø Ø Ø /flæg/ Noun flyable + + + Kalman Phonology Syntax Semantics Gain - - Noun -flyable /kait/ /flæg/ Noun flyable /flæg/ Noun flyable Production Forward /flæg/ Noun Noun flyable flyable /flæg/ System model Interpretation • So far focused on monologue • But emulation of form and meaning may be particularly useful for dialogue – And should lead to both imitative and complementary activation Why is dialogue so easy? • “Should” be harder than monologue – Dealing with changes on the fly – Can’t always plan ahead – Working out precisely when to speak and who to speak to – Produce and comprehend at same time (because of feedback) – Constant task-switching • Pickering and Garrod (2004) – interlocutors “align” their mental states – Conversation is successful when interlocutors come to see the world in the same way – but how? Why emulation is useful for dialogue I • Dialogue involves regular switches between production and comprehension – Interlocutors take turns to take the floor – Addressee isn’t passive listener but provides “backchannel” feedback (assertions, queries, etc.) • Such feedback enhances quality of narratives (e.g., Bavelas et al., 2000) • Thus production system is constantly activated during comprehension in dialogue Why emulation is useful for dialogue II • Addressee must be constantly prepared to respond, to make a contribution when appropriate (e.g., Sacks et al., 1974) – Sometimes a contribution is normatively required (e.g., when asked a non-rhetorical question) – Other times it is optional (e.g., after speaker finishes some statements) And why it is effective • interlocutors align at many linguistic levels during dialogue (Pickering & Garrod, 2004) – For example, similar activation of words and grammar – In particular, their representations are more similar than non- interlocutors • Hence, predictions are more likely to be accurate – If we are well-aligned, using my own representations as proxies for your representations is likely to be successful • Dialogue is therefore a form of joint activity that is particularly likely to benefit from simulation and prediction Emulation of embodied meaning in dialogue? • Much dialogue involves interlocutors also interacting with environment – e.g., task-oriented dialogue • Here, predictions about environment are especially useful • Clark and Krych (2004) had a director instruct a builder to construct a LEGO model – When director could see workspace, she changed her language and timed her speech to fit with the builder’s actions – Appeared that the builder’s actions were treated as continuous feedback by the director • Prediction of embodied meaning facilitates rapid perception or performance of the action – And therefore also helps make conversation easy and facilitates alignment • May be other benefits of aligning embodied meaning – Essentially another level of alignment (beyond alignment of words, grammar etc.) that supports communicative success – Clearly involves “common coding”, as implicated in both production and comprehension Interactive-Alignment Model A B Situation Model Situation Model Semantic Semantic representation representation Syntactic Syntactic representation representation Lexical Lexical representation representation Phonological Phonological representation representation Phonetic Phonetic representation representation Key references • Pickering, M.J., & Garrod, S. (2004). Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences, 27, 169-225. • Garrod, S., & Pickering, M.J. (2004). Why is conversation so easy? Trends in Cognitive Sciences, 8, 8-11. • Pickering, M.J., & Garrod, S. (2007). Do people use language production to make predictions during comprehension? Trends in Cognitive Sciences, 11, 105-110. • Pickering, M.J., & Garrod, S. (in press). Prediction and embodiment in dialogue. European Journal of Social Psychology. • Garrod, S., & Pickering, M.J. (in press). Joint action, interactive alignment, and dialogue. Topics in Cognitive Science. • Pickering, M.J., & Garrod, S. (in press). The use of prediction to drive alignment in dialogue. In G. Semin, & G. Echterhoff (Eds), Grounding sociality: Neurons, minds, and culture. Hove: Psychology Press.