不均衡データを考慮したDNNに基づくささやき声検出

芦原 孝典; 篠原 雄介; 佐藤 宏; 森谷 崇史; 松井 清彰; 山口 義和

Presentation	2019-10-26 Neural Whispered Speech Detection with Imbalanced Learning Takanori Ashihara, Yusuke Shinohara, Hiroshi Sato, Takafumi Moriya, Kiyoaki Matsui, Yoshikazu Yamaguchi,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In this paper, we present a neural whispered-speech detection technique that offers utterance-level classification of whispered and non-whispered speech exhibiting imbalanced data distributions. Previous studies have shown that machine learning models trained on a large amount of whispered and non-whispered utterances perform remarkably well for whispered speech detection. However, it is often difficult to collect large numbers of whispered utterances. In this paper, we propose a method to train neural whispered speech detectors from a small amount of whispered utterances in combination with a large amount of non-whispered utterances. In doing so, special care is taken to ensure that severely imbalanced datasets can effectively train neural networks. Specifically, we use a class-aware sampling method for training neural networks. To evaluate the networks, we gather test samples recorded by both condenser and smartphone microphones at different distances from the speakers to simulate practical environments. Experiments show the importance of imbalanced learning in enhancing the performance of utterance level classifiers.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	whispered speech / vocal effort / deep neural networks / imbalanced learning / class-aware sampling
Paper #	SP2019-26,WIT2019-25
Date of Issue	2019-10-19 (SP, WIT)

Conference Information
Committee	WIT / SP
Conference Date	2019/10/26(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Daiichi Institute of Technology
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair	Daisuke Wakatsuki(Tsukuba Univ. of Tech.) / Hisashi Kawai(NICT)
Vice Chair	Shinji Sakou(Nagoya Inst. of Tech.) / Akinobu Ri(Nagoya Inst. of Tech.)
Secretary	Shinji Sakou(Saitama Industrial Tech. Center) / Akinobu Ri(Teikyo Univ.)
Assistant	Manabi Miyagi(Tsukuba Univ. of Tech.) / Minako Hosono(AIST) / Aki Sugano(Nagoya Univ.) / Tomoki Koriyama(Tokyo Inst. of Tech.) / Yusuke Ijima(NTT)

Paper Information
Registration To	Technical Committee on Well-being Information Technology / Technical Committee on Speech
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Neural Whispered Speech Detection with Imbalanced Learning
Sub Title (in English)
Keyword(1)	whispered speech
Keyword(2)	vocal effort
Keyword(3)	deep neural networks
Keyword(4)	imbalanced learning
Keyword(5)	class-aware sampling
1st Author's Name	Takanori Ashihara
1st Author's Affiliation	NTT Corporation(NTT)
2nd Author's Name	Yusuke Shinohara
2nd Author's Affiliation	NTT Corporation(NTT)
3rd Author's Name	Hiroshi Sato
3rd Author's Affiliation	NTT Corporation(NTT)
4th Author's Name	Takafumi Moriya
4th Author's Affiliation	NTT Corporation(NTT)
5th Author's Name	Kiyoaki Matsui
5th Author's Affiliation	NTT Corporation(NTT)
6th Author's Name	Yoshikazu Yamaguchi
6th Author's Affiliation	NTT Corporation(NTT)
Date	2019-10-26
Paper #	SP2019-26,WIT2019-25
Volume (vol)	vol.119
Number (no)	SP-250,WIT-251
Page	pp.pp.51-56(SP), pp.51-56(WIT),
#Pages	6
Date of Issue	2019-10-19 (SP, WIT)