構音障害者音声認識のための混合正規分布に基づく音素ラベリングの検討

高島 悠樹; 中鹿 亘; 滝口 哲也; 有木 康雄

Presentation	2015-06-18 Phone Labeling Based on Gaussian Mixture Model for Dysarthric Speech Recognition Yuki Takashima, Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	We investigate in this paper speech recognition for a person with an articulation disorder resulting from athetoid cerebral palsy. As our previous work, the feature extraction method using a convolutional neural network is proposed, and showed its effectiveness. The neural network needs the teaching signal to train the network using back-propagation, and the previous method uses forced alignment using HMMs from speech data for the teaching signal. However, because the dysarthric speech fluctuates every utterance, it is difficult to obtain the correct alignment. It is considered that the network is not adequately trained due to the wrong alignment. However, phone boundaries for dysarthric speech are ambiguous, and it is difficult to give the correct alignment and it is difficult to give the correct alignment. Therefore, we propose a phone labeling method using the Gaussian distribution. In this paper, we report our experimental results of speech recognition using the networks trained by the phone alignments calculated by our proposed method.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	articulation disorders / feature extraction / convolutional neural network / bottleneck feature / phoneme labeling
Paper #	PRMU2015-44,SP2015-13,WIT2015-13
Date of Issue	2015-06-11 (PRMU, SP, WIT)

Conference Information
Committee	WIT / SP / ASJ-H / PRMU
Conference Date	2015/6/18(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair	Kiyohiko Nunokawa(Tokyo International Univ.) / Kazunori Mano(Shibaura Inst. of Tech.) / Masato Akagi(北陸先端大) / Eisaku Maeda(NTT)
Vice Chair	Chikamune Wada(Kyushu Inst. of Tech.) / Norihide Kitaoka(Tokushima Univ.) / Shigeto Furukawa(NTT) / Shuji Senda(NEC) / Seiichi Uchida(Kyushu Univ.)
Secretary	Chikamune Wada(Nagoya Inst. of Tech.) / Norihide Kitaoka(AIST) / Shigeto Furukawa(Tsukuba Univ. of Tech.) / Shuji Senda(Tokyo City Univ.) / Seiichi Uchida(Kobe Univ.)
Assistant	Tomohiro Amemiya(NTT) / Takeaki Shionome(Tsukuba Univ. of Tech.) / Manabi Miyagi(Tsukuba Univ. of Tech.) / Takashi Nose(Tohoku Univ.) / Taichi Asami(NTT) / / Kazuaki Kondo(Kyoto Univ.) / Akisato Kimura(NTT)

Paper Information
Registration To	Technical Committee on Well-being Information Technology / Technical Committee on Speech / * / Technical Committee on Pattern Recognition and Media Understanding
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Phone Labeling Based on Gaussian Mixture Model for Dysarthric Speech Recognition
Sub Title (in English)
Keyword(1)	articulation disorders
Keyword(2)	feature extraction
Keyword(3)	convolutional neural network
Keyword(4)	bottleneck feature
Keyword(5)	phoneme labeling
1st Author's Name	Yuki Takashima
1st Author's Affiliation	Kobe University(Kobe Univ.)
2nd Author's Name	Toru Nakashika
2nd Author's Affiliation	The University of Electro-Communications(UEC)
3rd Author's Name	Tetsuya Takiguchi
3rd Author's Affiliation	Kobe University(Kobe Univ.)
4th Author's Name	Yasuo Ariki
4th Author's Affiliation	Kobe University(Kobe Univ.)
Date	2015-06-18
Paper #	PRMU2015-44,SP2015-13,WIT2015-13
Volume (vol)	vol.115
Number (no)	PRMU-98,SP-99,WIT-100
Page	pp.pp.71-76(PRMU), pp.71-76(SP), pp.71-76(WIT),
#Pages	6
Date of Issue	2015-06-11 (PRMU, SP, WIT)