Low latency technology for interactive virtual environments

  • Niedrig-Latenz-Technologie für interaktive virtuelle Umgebungen

Assenmacher, Ingo; Kuhlen, Torsten (Thesis advisor)

Aachen : Publikationsserver der RWTH Aachen University (2009)
Dissertation / PhD Thesis

Aachen, Techn. Hochsch., Diss., 2009


Minimizing system latency is a traditionally important topic for the development of multi-modal Virtual Environments (VE). Human perception thresholds have to be met in order to create immersive environments with a high degree of believability. The system latency has to be in the range of milliseconds, indicating the need for fast interfaces and low system overhead. This thesis provides a comprehensive approach to the creation of multi-modal VEs with high requirements on low latencies, abstract and flexible, yet real-time capable interfaces for device data handling and versatile application support mechanisms. In that sense it offers a stable software and conceptual basis for the development of appealing multi-modal environments. The "Virtueller Kopfhörer" (VirKopf) system is a representative of a demanding multi-modal environment that was developed as a joint research project between the Institute of Technical Acoustics and the VR Group at the Department of Computer Science at RWTH Aachen University. It features binaural acoustics, which enables the placing of virtual sounds at arbitrary 3-D positions within the scene, even very close to the user's head. Headphone-less reproduction is supported by dynamic crosstalk cancellation (CTC). The system is designed for immersive CAVE-like environments. As a cost for this comprehensive system, the requirements for a precise setup and accurate data processing have to be respected very carefully. For example, delivering the correct tracking data with a low latency is most crucial for the successful application of the dynamic CTC. By using CTC, a sweet spot is created, providing a correct sound field impression for the user. In a dynamic system, where the user is free to move arbitrarily, this sweet spot is constantly updated to the current position of the ears of the user, which in term is determined by a tracking device. Due to the discrete processing, a misalignment between the assumed and real position of the user's ears can occur. A misalignment between these positions of above 1 cm is enough to cause audible artefacts for the listener, disrupting the 3-D impression of the auralized scene. This is a severe constraint, as practically the runtime of the sound waves from the loudspeakers to the user's ears can take several milliseconds, and this can not be compensated by faster tracking hardware. Predictive tracking can be used to estimate a future position of the user's ears based on observations from the past. However, these algorithms can not forecast arbitrarily into the future and a low latency system support is a mandatory precondition for a successful application. Low latency processing is not only important for the VirKopf system, but a general requirement on VR software, especially for device and interaction handling. A versatile, flexible and runtime optimal VR device driver architecture is introduced. This architecture enables the parallel low-latency data access for multi-modal data streams and enhanced interaction algorithms as it supports driver-level histories. Additionally, the architecture suggests enhanced transformation and application stages which simplify the application development for the field of VR. The resulting misalignment of the estimation of the user's head in the virtual scene is lowered by an adaptive predictive tracking algorithm. The suggested solution features an on-line update strategy based solely on the local development of the tracking sensor's velocity. The coupling of a visual VR system with its acoustic counterpart as a network communication architecture is defined and its capabilities explained. The cost of end-to-end latency with respect to this audio-visual coupling architecture is inspected and discussed in detail. In addition to the optimized system behavior, an application architecture for multi-modal VEs is described. This approach models VEs as a collection of communicating agents, enabling the building of versatile interactive, multi-modal virtual worlds. A cluster rendering scheme based on a hybrid master-slave architecture is introduced. This approach is furthermore optimized for a minimal latency state processing from master to slave.