Audio-visual enhancement of speech in noise.

A key problem for telecommunication or human-machine communication systems concerns speech enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an acoustic-only approach-that is, the processing of the audio corrupted signal using audio information ~from the corrupted signal only or additive audio information!. In this paper, an audio-visual approach to the problem is considered, since it has been demonstrated in several studies that viewing the speaker's face improves message intelligibility, especially in noisy environments. A speech enhancement prototype system that takes advantage of visual inputs is developed. A filtering process approach is proposed that uses enhancement filters estimated with the help of lip shape information. The estimation process is based on linear regression or simple neural networks using a training corpus. A set of experiments assessed by Gaussian classification and perceptual tests demonstrates that it is indeed possible to enhance simple stimuli ~vowel-plosive-vowel sequences! embedded in white Gaussian noise.

Main Author: Girin, Laurent.
Other Authors: Schwartz, Jean-Luc., Feng, Gang.
Language: English
Published: 2001
