Dereverberation
Motivation
Dereverberation is the removal of unwanted reverberation from signals using signal processing. The first section of this text illustrates what reverberation is and why it is a problem. The second section introduces typical approaches to dereverberation and their applications.
Reverberation
When a sound source is located in an enclosed room, the emitted sound waves are reflected on the room‘s walls and other surfaces. On their way to a listener, the reflected waves travel a longer distance. Also, every reflection distorts the signal depending on the surface characteristics. A listener inside the room thus perceives the original source signal together with a superposition of many delayed, attenuated and spectrally colored copies of that signal, which are called reverberation. Figure 1 illustrates this phenomenon.
Figure 1 also shows that there are two types of reverberation with different characteristics: early reverberation and late reverberation. Early reverberation consists of reflections that arrive shortly after the direct sound and can be identified as coming from a certain direction. Late reverberation consists of sound waves that arrive later, have been reflected many times and form a diffuse sound field, where sound comes from all directions equally.
For fixed positions of sender and receiver, the reverberation characteristics of a room can be described by measuring Room Impulse Responses (RIRs). The receiver signal y(k) can then be predicted from the source signal x(k) with the RIR h(k) by convolution:
Figure 2 shows a RIR measured in the lecture hall (4G) in our institute (available here). The different kinds of reverberation are visible in the impulse response and shown in different colors. After a delay of around 15 ms, the first sound arrives via the direct path. It is visible as the first and largest impulse. After that, individual early reflections can be seen as smaller copies of the initial impulse. After some more time, individual reflections no longer stand out. In this phase of late reverberation, the intensity of the reverberation decays exponentially over time.
Effects of Reverberation
With the mathematical relationship in Equation 1 and the measured room impulse response shown in Figure 2, the different kinds of reverberation can be simulated. In the following audio example, speech sounds as if it had been recorded in our lecture room. The different kinds of reverberation are added one after the other:
As can be heard, early reflections are perceived as reinforcement and spectral coloration of the speech. There are some studies indicating that the additional energy from early reflections actually increases speech intelligibility. Late reverberation, in contrast, is perceived as "echo"-like. It makes the source sound more distant and thus harder to understand. This is a problem, especially in applications where increased listening effort is already required, such as narrow-band telephony or hearing aids.
In the spectrogram in the example, it can also be see that reverberation causes a temporal "smearing" of the speech. The boundaries between different phonemes are blurred and it becomes harder to distinguish them by their spectral distributions. This is a problem for Automatic Speech Recognition (ASR) systems, which need to recognize phonemes to identify the words and sentences in a speech signal.
Dereverberation
Signal processing with dereverberation algorithms is used to mitigate the problems caused by reverberation in different applications.
Dereverberation Algorithms
There are three different categories of dereverberation algorithms:
- Speech Enhancement Approach:
Late reverberation can be seen as additive noise. Thus, some principles of noise reduction algorithms such as spectral weighting and spectral subtraction can be also be used for dereverberation. Then the energy of the reverberation components in the signal needs to be estimated. If there is only one microphone, statistical models of the temporal characteristics (exponential decay) of late reverberation can be used for this. With two microphones, the diffuse character of late reverberation can be exploited. - Blind Deconvolution:
Under certain conditions and assumptions, a system that describes the room characteristics (like the room impulse response) can be estimated. From this estimate, an inverse system can be constructed that removes reverberation. - Beamforming:
Because most reflections arrive from a different direction than the direct sound, Beamforming can be also used to dereverberate audio signals if multiple microphones are available.
Applications of Dereverberation
Telephone speech signals
Reverberation affects the quality of telephone speech signals. Especially in hands-free mode, the distance between the mouth of the speaker and the microphone can make the speech sound distant. This is an example of what speech would sound like recorded with a mobile phone in hands-free position in a reverberant room.
Hearing aids
Hearing aids have a similar problem. With binaural processing, a dual-channel algorithm can improve the signal while retaining the binaural cues necessary for a spatial impression. The following samples should be listened to with headphones, if possible:
Automatic Speech Recognition
Dereverberation can drastically improve the performance of Automatic Speech Recognition. The following video shows an example of what a Kaldi ASR system understood in different sound samples:
References
[jeub10a]
Marco Jeub, Magnus Schäfer, Hauke Krüger, Christoph Matthias Nelke, Christophe Beaugeant, and Peter Vary
Do We Need Dereverberation for Hand-Held Telephony?
International Congress on Acoustics (ICA), August 2010
[loellmann09d]
Heinrich W. Löllmann and Peter Vary
Low Delay Noise Reduction and Dereverberation for Hearing Aids
EURASIP Journal on Applied Signal Processing, 2009
[jeub10]
Marco Jeub and Peter Vary
Binaural Dereverberation Based on a Dual-Channel Wiener Filter with Optimized Noise Field Coherence
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2010
[jeub10b]
Marco Jeub, Magnus Schäfer, Thomas Esch, and Peter Vary
Model-Based Dereverberation Preserving Binaural Cues
IEEE Transactions on Audio, Speech, and Language Processing, September 2010