Spatial Language and Attention

Research Areas: 

This project investigates how people deploy attention during spatial language comprehension. A ‘visual world’ hypothesis suggests that we look at the objects mentioned in the description (Tanenhaus et al., 1995) while spatial language account (e.g., AVS by Regier & Carlson, 2001) claims that understanding spatial description requires a shift of attention from an object of reference to a ‘located object’. The results suggest that people need both the mechanisms in order to understand spatial descriptions.


Methods and Research Questions: 

It is well established that spatial language processing requires attentional mechanisms (Logan, 1994) but how precisely people deploy visual attention during real-time spatial language comprehension, is still unclear. We examine gaze pattern during spatial language comprehension, and compare spatial language account and ‘visual world’ accounts.

It is well established that spatial language processing requires attention mechanisms (Carlson & Logan, 2005; Logan, 1994) but how precisely people deploy visual attention during real-time spatial language comprehension, is still unclear. The Attention Vector Sum (AVS) model postulates that to comprehend a spatial description such as “The clock is above the vase” people must shift their attention from the vase (‘reference object’) to the clock (located object, e.g., Carlson-Radvansky & Irwin, 1994; Carlson & Logan, 2005, Regier & Carlson, 2001). An alternative account from ‘visual world’ studies suggests people incrementally inspect objects as they are mentioned and thus given the description “The clock is above the vase”, the clock should be inspected first followed by the vase (e.g., Tanenhaus et al., 1995). In sum, these two accounts predict opposing inspection orders although the visual world (but not the AVS) account specifies the time course of visual attention allocation. We examine gaze pattern to objects during spatial language comprehension, and evaluate their fit against predictions of the AVS (reference object -> located object) and visual world (located object -> reference object) accounts.

German and had the following format: “The [located object] is [spatial preposition] the [reference object]”, where the spatial prepositions could be über (‘above’) or unter (‘under’). Each scene included three objects. The located object and the reference object are the objects mentioned in the sentence while the competitor object is the third object not mentioned in the sentence. The object names in all of the pairs were controlled for the number of syllables, article gender, and frequency. The objects were pictures of real objects resized in a 300 x 300 pixel format on a white background. Objects in each triplet do not present any functional relations. The design included 2 spatial prepositions (über vs. unter) x 2 sentence values (match vs. mismatch). Participants were asked to decide as quickly and accurately as possible whether the sentence matched (vs. didn’t match) the picture while their eye movements were recorded. At the beginning of the experiment a fixation point appeared in the middle of screen for 1500 ms, after which the picture and the sentence were presented. An ISI of 2500 ms ended the trial.


We analyzed the first three fixations and the first three inspections after people had heard the spatial term (e.g., über ‘above’), and had fixated the reference object. Analyses of the first three fixations after the offset of the spatial preposition, and after a fixation to the reference object, showed that people fixate the reference object more often than the located object. This fixation pattern indicates that people establish reference between the spoken sentence and the scene, corroborating the visual world account. The fixation pattern does not contradict the AVS model, however, since the model does not specify when after processing the spatial preposition the shift from the reference to the located object should occur. Indeed, analyses of inspections suggest that people fixate the located object after inspecting the reference object and before deciding whether the description matches the scene. The distribution of fixations during the second noun phrase confirms this view. These analyses support the Attention Vector Sum account and provide evidence against the visual world account.