Unobtrusive Multi-Modal Situation-Based Augmented Reality |
Augmented reality (AR) is the overlay of computer-generated objects onto the real world. Until now, AR has been confined to very specific scenarios in areas like industry, aviation or the military, where cost and bulkiness are of secondary importance. However, this is rapidly changing as several companies rush to release products that are targeted at a consumer market. While the hardware side of AR is making quick progress, the software still struggles with many challenging problems such as accurate tracking and realistic rendering. These problems are being heavily researched at present. Past research has somewhat neglected the usability aspects of AR. Tampering with human sensory input is a delicate task and could even be dangerous in certain situations. Therefore, it is desirable for an AR system to be able to perceive its environment and decide to what degree or in which way to augment the current reality. Such capabilities are called context awareness. In order to bring AR to the average person, we decided to use mobile phones as the underlying hardware platform as far as possible, although they still lack some of the computing power needed for modern AR. On the other hand, not only do they provide a rich interaction platform that is already tied to everyday life, they also come with various sensors nowadays that can be used to infer the current context. Among the sensors already found in many mobile phones are accelerometers, gyroscopes, magnetometers, photometers, thermometers, proximity sensors, and of course microphones and cameras. Other devices that are usually not regarded as sensors but nonetheless provide potentially useful information to determine context are devices such as GPS, wifi, Bluetooth, FM radio receivers, clocks, and even user input and other software states. A possible example of a device using "contextual awareness" capabilities is a phone which automatically silences an incoming call from work due to the fact that it is evening time and the owner is enjoying a meal with his or her partner. Traditionally, such context inference has been very explicit, i.e. based on a fixed set of rules that the developer deemed sensible. This has the disadvantage of being inflexible and needing explicit modelling for every combination of sensors, thus making it hard to transfer rule sets from one device to another. It is not possible for system designers to foresee all users' preferences nor all relevant contexts users might find themselves in. Therefore, we want to build a system that is able to learn from user input so that users can mould it to their needs. Furthermore, this learning ability should enable the system to make good use of the sensor data available on any given platform and even in "smart" environments that might provide additional sensor readings. However, we expect this simple approach to fail because all that can be asked from a user is simple feedback like "This disturbance was annoying." or "Blocking my view in that situation was not a good thing to do." Such limited feedback is not sufficient for any learning algorithm to extract important patterns out of complex sensor data. It is unreasonable to ask a user to reject a large number of unwelcome phone calls just to train a mobile phone to be unobtrusive. In order to overcome this problem we decided to try to develop a hybrid approach. In this, symbolic rules serve as an interface to manipulate a dynamic subsymbolic mapping function. Some initial rules can be used to kick-start the learning algorithm and more rules can be employed by the user at a later time when guiding the phone's behaviour is particularly desirable. We are optimistic that this approach will provide a good balance between flexibility and ease of use. However, developing algorithms that can achieve these goals while fulfilling all the requirements will be a great challenge.[view:groupmembers==202]