Action selection based on multi-modal anchoring

2008-07 till 2011-12
Research Areas: 

The project aims at developing a computational theory of the formation and multi-modal perceptual anchoring of symbolic concepts. This is explored in prior experiments in humans, which take a step towards the ontology of semantics and provide insights into both infants’ and adults’ pre-linguistic meaning processing. Consequences are explored in a computational model that utilizes cross-modal feature mappings for a multi-modal anchoring approach on a robotic platform.

Methods and Research Questions: 

Humans are adept perceivers of cross-modal information that is ubiquitous in our everyday life. To some extent the formation of symbolic concepts and their multi-modal anchoring appear to be genetically pre-determined. However, there is also evidence that these concepts develop over a certain time span. Both assumptions seem to account for color and tone semantics. Warning signals are pervasive and essential in our everyday lives. Nevertheless, little is known about the origins of the processing priority of certain low-level cues such as, for example, colors and tones, that are often inherent in warning signals, and moreover, often in auditory-visual combination: is there a predisposition for perceiving red as more cautionary than green, as well as experiencing more of a warning from high tones than from low tones? The project focuses the question whether (1) there is such a connection and (2) if this interconnection is innate or environmentally shaped during a particular time span. On the basis of this psychological research the role of semantic factors is emphasized as crucial for cross-modal anchoring. However, the role of predispositions with regard to certain semiotic congruencies has been mostly ignored or dealt with in a very heuristic and class-specific manner in computational approaches. Here, we systematically explore architectural approaches that are able to exploit semantic factors and go beyond the use of typical amodal cues, like synchrony and spatial localization. This enables a (robotic) agent to cope with more complex situations in a changing and dynamic environment resulting in improved grounding capabilities in multi-modal perception. An adult questionnaire study was set out to systematically show both the specificity and the universality of emotional and semantic concepts that are related to colors and tone pitch. The peculiar scope of this study was to investigate cross-modal interconnections (1). Subjects evaluated four colors (red, green, blue, yellow) and four tone pitches (475 Hz, 700 Hz, 1500 Hz, and 2500 Hz) with regard to the emotions and associations they elicited. In a second step, a series of infant habituation and preferential looking studies is conducted to determine whether infants discriminate between different color-tone-connections (2) mirroring the semantic congruencies reported by adults. This would imply a predisposition to perceive these semantics from early on. Semantics yet play a minor role in the development of artificial systems for sensor fusion. Hence, different machine learning approaches are evaluated concerning their use for cross-modal integration. The robotic platform BIRON serves as an artificial agent that provides different sensory modalities, including laser range information as well as visual and auditory data. The anchoring system is designed on the basis of a memory architecture that allows the application of online-learning methods and, thereby, the adaptation from very general feature mappings to very specific ones.


In the adult questionnaire study we found red evoking similar negative emotions and associations as high pitch, whereas green was found to elicit positive emotions and associations similar to low pitch. These results suggest some overlapping congruent psychological qualities for red and high pitch, and green and low pitch, respectively.

First results of a series of infant habituation and preferential looking studies provide evidence that 4-month-old infants discriminate a green/low tone event after having habituated to a red/high tone event. This indicates that the 4-month-olds formed a cross-modal connection between the tone and the color earlier than so far investigated. We suggest that this formation might be based on a facilitating factor inherent in the colors and tones, probably in the sense of adult semiotics.

On the basis of intersensory synchrony, a computational approach is developed that uses previously detected correspondences in order to learn new dependencies between different sensory information. This has shown to facilitate the correspondence detection in ambiguous situations, where intersensory synchrony is not sufficient.