Audio-visual enhancement of speech in noise.

A key problem for telecommunication or human-machine communication systems concerns speech enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an acoustic-only approach-that is, the processing of the audio corrupted signal using audio information ~from th...

Full description

Bibliographic Details
Main Authors: Girin, Laurent., Schwartz, Jean-Luc., Feng, Gang.
Format: Villanova Faculty Authorship
Language:English
Published: 2001
Online Access:http://ezproxy.villanova.edu/login?url=https://digital.library.villanova.edu/Item/vudl:176083
id vudl:176083
record_format vudl
institution Villanova University
collection Digital Library
modeltype_str_mv vudl-system:ResourceCollection
vudl-system:CollectionModel
vudl-system:CoreModel
datastream_str_mv STRUCTMAP
AGENTS
MEMBER-QUERY
RELS-EXT
PARENT-QUERY
PARENT-LIST-RAW
THUMBNAIL
MEMBER-LIST-RAW
PARENT-LIST
DC
PROCESS-MD
LICENSE
LEGACY-METS
AUDIT
hierarchytype
hierarchy_all_parents_str_mv vudl:176064
vudl:172968
vudl:641262
vudl:3
vudl:1
sequence_vudl_176064_str 0000000007
hierarchy_top_id vudl:641262
hierarchy_top_title Villanova Faculty Publications
fedora_parent_id_str_mv vudl:176064
hierarchy_first_parent_id_str vudl:176083
hierarchy_parent_id vudl:176064
hierarchy_parent_title Feng Gang
hierarchy_sequence_sort_str 0000000007
hierarchy_sequence 0000000007
spelling Audio-visual enhancement of speech in noise.
Girin, Laurent.
Schwartz, Jean-Luc.
Feng, Gang.
A key problem for telecommunication or human-machine communication systems concerns speech enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an acoustic-only approach-that is, the processing of the audio corrupted signal using audio information ~from the corrupted signal only or additive audio information!. In this paper, an audio-visual approach to the problem is considered, since it has been demonstrated in several studies that viewing the speaker's face improves message intelligibility, especially in noisy environments. A speech enhancement prototype system that takes advantage of visual inputs is developed. A filtering process approach is proposed that uses enhancement filters estimated with the help of lip shape information. The estimation process is based on linear regression or simple neural networks using a training corpus. A set of experiments assessed by Gaussian classification and perceptual tests demonstrates that it is indeed possible to enhance simple stimuli ~vowel-plosive-vowel sequences! embedded in white Gaussian noise.
2001
Villanova Faculty Authorship
vudl:176083
Journal of the Acoustic Society of America 109(6), June 2001, 3007-3020.
en
dc.title_txt_mv Audio-visual enhancement of speech in noise.
dc.creator_txt_mv Girin, Laurent.
Schwartz, Jean-Luc.
Feng, Gang.
dc.description_txt_mv A key problem for telecommunication or human-machine communication systems concerns speech enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an acoustic-only approach-that is, the processing of the audio corrupted signal using audio information ~from the corrupted signal only or additive audio information!. In this paper, an audio-visual approach to the problem is considered, since it has been demonstrated in several studies that viewing the speaker's face improves message intelligibility, especially in noisy environments. A speech enhancement prototype system that takes advantage of visual inputs is developed. A filtering process approach is proposed that uses enhancement filters estimated with the help of lip shape information. The estimation process is based on linear regression or simple neural networks using a training corpus. A set of experiments assessed by Gaussian classification and perceptual tests demonstrates that it is indeed possible to enhance simple stimuli ~vowel-plosive-vowel sequences! embedded in white Gaussian noise.
dc.date_txt_mv 2001
dc.format_txt_mv Villanova Faculty Authorship
dc.identifier_txt_mv vudl:176083
dc.source_txt_mv Journal of the Acoustic Society of America 109(6), June 2001, 3007-3020.
dc.language_txt_mv en
author Girin, Laurent.
Schwartz, Jean-Luc.
Feng, Gang.
spellingShingle Girin, Laurent.
Schwartz, Jean-Luc.
Feng, Gang.
Audio-visual enhancement of speech in noise.
author_facet Girin, Laurent.
Schwartz, Jean-Luc.
Feng, Gang.
dc_source_str_mv Journal of the Acoustic Society of America 109(6), June 2001, 3007-3020.
format Villanova Faculty Authorship
author_sort Girin, Laurent.
dc_date_str 2001
dc_title_str Audio-visual enhancement of speech in noise.
description A key problem for telecommunication or human-machine communication systems concerns speech enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an acoustic-only approach-that is, the processing of the audio corrupted signal using audio information ~from the corrupted signal only or additive audio information!. In this paper, an audio-visual approach to the problem is considered, since it has been demonstrated in several studies that viewing the speaker's face improves message intelligibility, especially in noisy environments. A speech enhancement prototype system that takes advantage of visual inputs is developed. A filtering process approach is proposed that uses enhancement filters estimated with the help of lip shape information. The estimation process is based on linear regression or simple neural networks using a training corpus. A set of experiments assessed by Gaussian classification and perceptual tests demonstrates that it is indeed possible to enhance simple stimuli ~vowel-plosive-vowel sequences! embedded in white Gaussian noise.
title Audio-visual enhancement of speech in noise.
title_full Audio-visual enhancement of speech in noise.
title_fullStr Audio-visual enhancement of speech in noise.
title_full_unstemmed Audio-visual enhancement of speech in noise.
title_short Audio-visual enhancement of speech in noise.
title_sort audio-visual enhancement of speech in noise.
publishDate 2001
normalized_sort_date 2001-01-01T00:00:00Z
language English
collection_title_sort_str audio-visual enhancement of speech in noise.
relsext.hasModel_txt_mv http://hades.library.villanova.edu:8080/rest/vudl-system:ResourceCollection
http://hades.library.villanova.edu:8080/rest/vudl-system:CollectionModel
http://hades.library.villanova.edu:8080/rest/vudl-system:CoreModel
relsext.isMemberOf_txt_mv http://hades.library.villanova.edu:8080/rest/vudl:176064
fgs.ownerId_txt_mv diglibEditor
fgs.type_txt_mv http://www.w3.org/ns/ldp#Container
http://fedora.info/definitions/v4/repository#Resource
http://www.w3.org/ns/ldp#RDFSource
http://www.w3.org/ns/ldp#BasicContainer
http://www.w3.org/ns/ldp#Resource
http://fedora.info/definitions/v4/repository#Container
fgs.label_txt_mv Audio-visual enhancement of speech in noise.
fgs.lastModifiedBy_txt_mv fedoraAdmin
fgs.state_txt_mv Active
relsext.sortOn_txt_mv title
relsext.hasLegacyURL_txt_mv http://digital.library.villanova.edu/Villanova%20Digital%20Collection/Faculty%20Fulltext/Feng%20Gang/FengGang-d14a6e86-9385-4579-a2d0-0e3f1881aafc.xml
relsext.itemID_txt_mv oai:digital.library.villanova.edu:vudl:176083
fgs.createdDate_txt_mv 2013-01-22T05:16:43.434Z
fgs.createdBy_txt_mv fedoraAdmin
fgs.lastModifiedDate_txt_mv 2021-04-12T19:19:53.329Z
relsext.sequence_txt_mv vudl:176064#7
has_order_str no
agent.name_txt_mv Falvey Memorial Library, Villanova University
klk
license.mdRef_str http://digital.library.villanova.edu/copyright.html
license_str protected
has_thumbnail_str true
THUMBNAIL_contentDigest_digest_str 203c69e18f4f46c81e9892448d2c07cd
first_indexed 2014-01-11T22:38:24Z
last_indexed 2021-04-12T20:13:36Z
_version_ 1785892959088017408
subpages