Deep Learningに基づく音声特徴量の有限状態トランスデューサ型識別モデルによる識別(高精度音声認識,認識,理解,対話,一般)

久保 陽太郎; 堀 貴明; 中村 篤

Presentation	2012-07-21 WFST-based Structured Classification of Features Extracted by Using Deep Neural Networks Yotaro KUBO, Takaaki HORI, Atsushi NAKAMURA,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Multilayer perceptions, which include more than 2 hidden layers, are known to be efficient for modeling of complex classification processes. However, due to the local optima and plateaus in their training objective functions, these perceptrons had not been used in practice. Recently, a heuristic method that involves the use of initial value obtained by applying unsupervised training of neural networks have enabled the practical use of such perceptrons. By introducing multiple hidden layers, the total number of needed units to accurately model the nonlinear classification processes would become smaller than that in single hidden layer networks. Consequently, we can analyze that the main contribution of introducing deep processings is enhancement in feature representations. On the other hand, an approach called structured classification have been collecting attention of speech researchers since it realizes direct modeling of sequence-to-sequence classification. However, it is known that the feature transformation is important in this approach since it typically considers the sequence classification as linear classification processes. In this paper, we attempt to combine these two approaches in order to enhance the both sides; feature representations and label representations. Specifically, we introduced the structured classification method based on weighted finite-state transducers into the multilayer perceptron-based speech recognition systems.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Automatic Speech Recognition / Structured Classification / Deep Learning / Temporal Features
Paper #	SP2012-57
Date of Issue

Conference Information
Committee	SP
Conference Date	2012/7/12(1days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To	Speech (SP)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	WFST-based Structured Classification of Features Extracted by Using Deep Neural Networks
Sub Title (in English)
Keyword(1)	Automatic Speech Recognition
Keyword(2)	Structured Classification
Keyword(3)	Deep Learning
Keyword(4)	Temporal Features
1st Author's Name	Yotaro KUBO
1st Author's Affiliation	NTT Communication Science Laboratories()
2nd Author's Name	Takaaki HORI
2nd Author's Affiliation	NTT Communication Science Laboratories
3rd Author's Name	Atsushi NAKAMURA
3rd Author's Affiliation	NTT Communication Science Laboratories
Date	2012-07-21
Paper #	SP2012-57
Volume (vol)	vol.112
Number (no)	141
Page	pp.pp.-
#Pages	6
Date of Issue