講演抄録/キーワード |
講演名 |
2013-06-14 13:30
Combining perceptually-motivated spectral shaping with loudness and duration modification for intelligibility enhancement of HMM-based synthetic speech in noise Cassia Valentini-Botinhao(Univ. of Edinburgh)・○Junichi Yamagishi(NII/Univ. of Edinburgh)・Simon King(Univ. of Edinburgh)・Yannis Stylianou(FORTH, Greece) SP2013-47 WIT2013-17 |
抄録 |
(和) |
This paper presents our entry to a speech-in-noise intelligibility enhancement evaluation: the Hurricane Challenge. The system consists of a Text-To-Speech voice manipulated through a combination of enhancement strategies, each of which is known to be individually successful: a perceptually-motivated spectral shaper based on the Glimpse Proportion measure, dynamic range compression, and adaptation to Lombard excitation and duration patterns. We achieved substantial intelligibility improvements relative to unmodified synthetic speech: 4.9dB in competing speaker and 4.1dB in speech-shaped noise. An analysis conducted across this and other two similar evaluations shows that the spectral shaper and the compressor (both of which are loudness boosters) contribute most under higher SNR conditions, particularly for speech-shaped noise. Duration and excitation Lombard-adapted changes are more beneficial in lower SNR conditions, and for competing speaker noise. |
(英) |
This paper presents our entry to a speech-in-noise intelligibility enhancement evaluation: the Hurricane Challenge. The system consists of a Text-To-Speech voice manipulated through a combination of enhancement strategies, each of which is known to be individually successful: a perceptually-motivated spectral shaper based on the Glimpse Proportion measure, dynamic range compression, and adaptation to Lombard excitation and duration patterns. We achieved substantial intelligibility improvements relative to unmodified synthetic speech: 4.9dB in competing speaker and 4.1dB in speech-shaped noise. An analysis conducted across this and other two similar evaluations shows that the spectral shaper and the compressor (both of which are loudness boosters) contribute most under higher SNR conditions, particularly for speech-shaped noise. Duration and excitation Lombard-adapted changes are more beneficial in lower SNR conditions, and for competing speaker noise. |
キーワード |
(和) |
ntelligibility of speech in noise / HMM-based speech synthesis / Lombard speech / / / / / |
(英) |
ntelligibility of speech in noise / HMM-based speech synthesis / Lombard speech / / / / / |
文献情報 |
信学技報, vol. 113, no. 76, SP2013-47, pp. 95-100, 2013年6月. |
資料番号 |
SP2013-47 |
発行日 |
2013-06-06 (SP, WIT) |
ISSN |
Print edition: ISSN 0913-5685 Online edition: ISSN 2432-6380 |
著作権に ついて |
技術研究報告に掲載された論文の著作権は電子情報通信学会に帰属します.(許諾番号:10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
PDFダウンロード |
SP2013-47 WIT2013-17 |
|