Colloquium - Details

You will receive information about presentations in time if you subscribe to the newsletter of the Colloquium Communications Technology.

All interested students are cordially invited, registration is not required.

Doctoral Defense: Interactive Reproduction of Binaurally Recorded Signals

Sebastian Nagel
Thursday, November 14, 2024
11:30 AM
IKS 4G | zoom

The goal of acoustic augmented or virtual reality is to artificially invoke a realistic auditory im-pression in the listener. One technique to achieve this goal is binaural reproduction, which refers to the reproduction of certain audio signals at the two ears of the listener. The ear signals can be obtained by binaural recording or binaural rendering, that is, by spatially sampling a real or virtual sound field with two microphones, one at each ear of a recording head. Such signals are referred to as binaurally recorded signals in the following.

For a moving listener to perceive sound sources as fixed in the environment, the reproduced sig-nals need to match the listener’s movements. State-of-the-art methods for this interactive binaural reproduction generate such signals based on denser spatial samplings of sound fields (i.e., more than two microphones), or they perform real-time binaural rendering. The goal of this thesis is to achieve interactive binaural reproduction based on binaurally recorded signals, that is, on two ear signals originally intended for non-interactive binaural reproduction for a non-moving listener.

This is desirable for two major reasons. It makes binaural recordings usable for immersive play-back. This potentially improves the user experience in applications such as telecommunications, or for consumer-generated content, where greater technical effort for recording may not be feasible. Furthermore, the resulting methods seamlessly interoperate with established technologies. Technically, binaurally recorded signals are ordinary stereo signals, and their spatial relationships are defined by human anatomy. This eliminates the coordination and standardization efforts that would be required to make the state-of-the-art methods widely usable.

The task of this dissertation is to develop algorithms that interact with a complex biological system (the human auditory system). The ultimate evaluation criterion is the subjective quality of the listening experience to the human listener, which can only be assessed through listening experiments. Therefore, to validate the result, a listening experiment is presented at the end of the dis-sertation. Before that, algorithms are developed and evaluated on a theoretical basis. Derivations are based on signal models, and evaluations are based on the interpretation of signal properties, both of which are rooted in knowledge of human auditory perception.

The model-based algorithms are developed in two steps. First, binaurally recorded signals with only coherent sound from a single sound source are regarded. The derived algorithms act as a time-variant filter to modify the spatial properties of the binaurally recorded signals. The filter is parameterized with the measured listener head motion and the source direction estimated from the recorded signals. This novel principle allows the listener to perceive the source in a stable position in the environment. It was first proposed by the author in [NJ18], and it has been patented [NJ23]. This dissertation provides analyses of different filter design methods and architectures.

The subjective quality of the first algorithms is not suitable for signals which violate the model assumption of only direct sound, such as reverberant signals. Therefore, signals with a mix of co-herent and incoherent sound are regarded in the second step. The derived algorithms perform a coherence-adaptive trade-off between the spatial modification of the coherent sound and the preservation of incoherent sound [NHJ20]. Their performance is theoretically limited by the limited ability of linear spatial filters to separate coherent and incoherent signal components [NJ21]. This dissertation provides a thorough analysis of the theoretical limitations. It also proposes perceptually motivated improvements to the filter structure that significantly extend the theoretical limits.

Finally, this dissertation presents a listening experiment to validate the proposed method. Acoustic scenes with speech sources were binaurally recorded at the ears of human test subjects. A real-time prototype with head tracking provided interactive binaural reproduction of these recordings via headphones. Subjects were asked to distinguish, in an indirect comparison, the artificial binaural reproduction from reality (ground-truth signals emitted by the loudspeakers in the same room). Results show that this task was difficult even for expert listeners, indicating that the method provides a natural and plausible listening experience.

back