Cognitive Robotics with focus on cognitive vision

Lecture
Date: 
11 October 2017
Begin time: 
09:00
End time: 
16:40
Room: 
CITEC, 1.204

Mario Fritz, Towards Scalable and Holistic Learning and Inference

With the advance of new sensor technology and abundant data resources, machines can get a detailed “picture” of the real-world – unlike ever possible before. The previously wide gap between these raw data sources and the semantic understanding of humans is start- ing to close. Driven by big data, increased compute power and advances in machine learning, we see a new generation of systems emerging that achieve new levels of per- formance on a range of competences such as visual scene understanding and natural language comprehension as well as robotic control.

In this talk, I will first elaborate on how we evaluate and progress such computational intel- ligence with a modern approach to the Turing Test, where the task is to answer natural language questions on images. Second, I will outline my recent work on deep learning that spans application domains from computer vision over HCI to robotics. Lastly, I will discuss privacy implication that these new learning techniques have on the individual when deployed in future intelligent systems.

Thorsten Sattler

The Semantics of Visual Localization and Mapping

Abstract:

3D scene perception is a key ability for robots, as well as for any type of intelligent system designed to operate in the real world. Among 3D scene perception algorithms, methods for 3D mapping reconstruct 3D models of scenes from camera images. Visual localization techniques in turn use these maps to determine the position and orientation of one or more cameras in the world. Visual localization and mapping are thus fundamental prob- lems that need to be solved reliably and robustly in order to enable autonomous agents such as self-driving cars or drones. At the same time, localization and mapping algorithms are key technologies for Mixed and Augmented Reality applications.

Over the last years and decades, tremendous progress has been made in the area of 3D Computer Vision, including impressive results for localization and mapping. Still, localiza- tion and mapping techniques can be rather brittle in challenging scenarios that are highly relevant for practical applications. This talk gives an overview over these challenges and explains how a higher-level understanding of the environment can help to solve some of them. In particular, I will present algorithms for localization and 3D reconstruction that rely on semantic information. This higher level of abstraction allows them to succeed under challenging conditions that could not be handled by previous work relying on purely pho- tometric or geometric cues. I will then outline how these techniques can be extended to tackle a certain family of open problems. I will finally conclude the talk with a set of exam- ples showing that algorithms for 3D scene perception will need to become even “smarter” in order to allow complex scene interactions for robots and other types of intelligent sys- tems.

Jörg Stückler

Dense and Direct Methods for Robotic Perception and Learning

Abstract:

Intelligent robots require 3D and semantic scene understanding in order to purposefully act in their environment. I present efficient dense methods for object perception, simulta- neous localization and mapping (SLAM) and semantic mapping based on multi-resolution surfel maps. These maps are an intermediate dense 3D representation for RGB-D and range measurements designed for efficient processing on CPUs. Direct visual ap- proaches avoid the extraction of intermediate designed features and enable fast parallel processing of images. I present examples of direct methods that achieve SLAM with RGB- D cameras in indoor environments and with stereo cameras in street scenes in real-time. Several of the approaches have been demonstrated in robotic systems such as multicop- ters that navigate autonomously or personal robots that manipulate objects in everyday environments. Direct dense methods are also ideally suited to incorporate multi-view ge- ometry principles into deep learning. I will detail some of our recent work on such learning approaches for perception, and outline future research directions towards end-to-end ro- botic learning that couples perception and action.