Audio-visual enhancement of speech in noise.
A key problem for telecommunication or human-machine communication systems concerns speech enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an acoustic-only approach-that is, the processing of the audio corrupted signal using audio information ~from th...
Main Authors: | , , |
---|---|
Format: | |
Language: | English |
Published: |
2001
|
Online Access: | http://ezproxy.villanova.edu/login?url=https://digital.library.villanova.edu/Item/vudl:176083 |
id |
vudl:176083 |
---|---|
record_format |
vudl |
institution |
Villanova University |
collection |
Digital Library |
modeltype_str_mv |
vudl-system:ResourceCollection vudl-system:CollectionModel vudl-system:CoreModel |
datastream_str_mv |
STRUCTMAP AGENTS MEMBER-QUERY RELS-EXT PARENT-QUERY PARENT-LIST-RAW THUMBNAIL MEMBER-LIST-RAW PARENT-LIST DC PROCESS-MD LICENSE LEGACY-METS AUDIT |
hierarchytype |
|
hierarchy_all_parents_str_mv |
vudl:176064 vudl:172968 vudl:641262 vudl:3 vudl:1 |
sequence_vudl_176064_str |
0000000007 |
hierarchy_top_id |
vudl:641262 |
hierarchy_top_title |
Villanova Faculty Publications |
fedora_parent_id_str_mv |
vudl:176064 |
hierarchy_first_parent_id_str |
vudl:176083 |
hierarchy_parent_id |
vudl:176064 |
hierarchy_parent_title |
Feng Gang |
hierarchy_sequence_sort_str |
0000000007 |
hierarchy_sequence |
0000000007 |
spelling |
Audio-visual enhancement of speech in noise. Girin, Laurent. Schwartz, Jean-Luc. Feng, Gang. A key problem for telecommunication or human-machine communication systems concerns speech enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an acoustic-only approach-that is, the processing of the audio corrupted signal using audio information ~from the corrupted signal only or additive audio information!. In this paper, an audio-visual approach to the problem is considered, since it has been demonstrated in several studies that viewing the speaker's face improves message intelligibility, especially in noisy environments. A speech enhancement prototype system that takes advantage of visual inputs is developed. A filtering process approach is proposed that uses enhancement filters estimated with the help of lip shape information. The estimation process is based on linear regression or simple neural networks using a training corpus. A set of experiments assessed by Gaussian classification and perceptual tests demonstrates that it is indeed possible to enhance simple stimuli ~vowel-plosive-vowel sequences! embedded in white Gaussian noise. 2001 Villanova Faculty Authorship vudl:176083 Journal of the Acoustic Society of America 109(6), June 2001, 3007-3020. en |
dc.title_txt_mv |
Audio-visual enhancement of speech in noise. |
dc.creator_txt_mv |
Girin, Laurent. Schwartz, Jean-Luc. Feng, Gang. |
dc.description_txt_mv |
A key problem for telecommunication or human-machine communication systems concerns speech
enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an
acoustic-only approach-that is, the processing of the audio corrupted signal using audio
information ~from the corrupted signal only or additive audio information!. In this paper, an
audio-visual approach to the problem is considered, since it has been demonstrated in several studies
that viewing the speaker's face improves message intelligibility, especially in noisy environments.
A speech enhancement prototype system that takes advantage of visual inputs is developed. A
filtering process approach is proposed that uses enhancement filters estimated with the help of lip
shape information. The estimation process is based on linear regression or simple neural networks
using a training corpus. A set of experiments assessed by Gaussian classification and perceptual
tests demonstrates that it is indeed possible to enhance simple stimuli ~vowel-plosive-vowel
sequences! embedded in white Gaussian noise. |
dc.date_txt_mv |
2001 |
dc.format_txt_mv |
Villanova Faculty Authorship |
dc.identifier_txt_mv |
vudl:176083 |
dc.source_txt_mv |
Journal of the Acoustic Society of America 109(6), June 2001, 3007-3020. |
dc.language_txt_mv |
en |
author |
Girin, Laurent. Schwartz, Jean-Luc. Feng, Gang. |
spellingShingle |
Girin, Laurent. Schwartz, Jean-Luc. Feng, Gang. Audio-visual enhancement of speech in noise. |
author_facet |
Girin, Laurent. Schwartz, Jean-Luc. Feng, Gang. |
dc_source_str_mv |
Journal of the Acoustic Society of America 109(6), June 2001, 3007-3020. |
format |
Villanova Faculty Authorship |
author_sort |
Girin, Laurent. |
dc_date_str |
2001 |
dc_title_str |
Audio-visual enhancement of speech in noise. |
description |
A key problem for telecommunication or human-machine communication systems concerns speech
enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an
acoustic-only approach-that is, the processing of the audio corrupted signal using audio
information ~from the corrupted signal only or additive audio information!. In this paper, an
audio-visual approach to the problem is considered, since it has been demonstrated in several studies
that viewing the speaker's face improves message intelligibility, especially in noisy environments.
A speech enhancement prototype system that takes advantage of visual inputs is developed. A
filtering process approach is proposed that uses enhancement filters estimated with the help of lip
shape information. The estimation process is based on linear regression or simple neural networks
using a training corpus. A set of experiments assessed by Gaussian classification and perceptual
tests demonstrates that it is indeed possible to enhance simple stimuli ~vowel-plosive-vowel
sequences! embedded in white Gaussian noise. |
title |
Audio-visual enhancement of speech in noise. |
title_full |
Audio-visual enhancement of speech in noise. |
title_fullStr |
Audio-visual enhancement of speech in noise. |
title_full_unstemmed |
Audio-visual enhancement of speech in noise. |
title_short |
Audio-visual enhancement of speech in noise. |
title_sort |
audio-visual enhancement of speech in noise. |
publishDate |
2001 |
normalized_sort_date |
2001-01-01T00:00:00Z |
language |
English |
collection_title_sort_str |
audio-visual enhancement of speech in noise. |
relsext.hasModel_txt_mv |
http://hades.library.villanova.edu:8080/rest/vudl-system:ResourceCollection http://hades.library.villanova.edu:8080/rest/vudl-system:CollectionModel http://hades.library.villanova.edu:8080/rest/vudl-system:CoreModel |
relsext.isMemberOf_txt_mv |
http://hades.library.villanova.edu:8080/rest/vudl:176064 |
fgs.ownerId_txt_mv |
diglibEditor |
fgs.type_txt_mv |
http://www.w3.org/ns/ldp#Container http://fedora.info/definitions/v4/repository#Resource http://www.w3.org/ns/ldp#RDFSource http://www.w3.org/ns/ldp#BasicContainer http://www.w3.org/ns/ldp#Resource http://fedora.info/definitions/v4/repository#Container |
fgs.label_txt_mv |
Audio-visual enhancement of speech in noise. |
fgs.lastModifiedBy_txt_mv |
fedoraAdmin |
fgs.state_txt_mv |
Active |
relsext.sortOn_txt_mv |
title |
relsext.hasLegacyURL_txt_mv |
http://digital.library.villanova.edu/Villanova%20Digital%20Collection/Faculty%20Fulltext/Feng%20Gang/FengGang-d14a6e86-9385-4579-a2d0-0e3f1881aafc.xml |
relsext.itemID_txt_mv |
oai:digital.library.villanova.edu:vudl:176083 |
fgs.createdDate_txt_mv |
2013-01-22T05:16:43.434Z |
fgs.createdBy_txt_mv |
fedoraAdmin |
fgs.lastModifiedDate_txt_mv |
2021-04-12T19:19:53.329Z |
relsext.sequence_txt_mv |
vudl:176064#7 |
has_order_str |
no |
agent.name_txt_mv |
Falvey Memorial Library, Villanova University klk |
license.mdRef_str |
http://digital.library.villanova.edu/copyright.html |
license_str |
protected |
has_thumbnail_str |
true |
THUMBNAIL_contentDigest_digest_str |
203c69e18f4f46c81e9892448d2c07cd |
first_indexed |
2014-01-11T22:38:24Z |
last_indexed |
2021-04-12T20:13:36Z |
_version_ |
1785892959088017408 |
subpages |