ディープニューラルネットワークに基づく音声選択と環境適応による非同期音声収録の音声認識(ポスター・デモセッション,第16回音声言語シンポジウム)

任 波; 王 龍標; 甲斐 充彦

Presentation	2014-12-16 Speech selection and environmental adaptation for asynchronous speech recording based on deep neural network BO REN, LONGBIAO WANG, ATSUHIKO KAI,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. This is implemented by combining automatic asynchronous speech (microphone or mobile terminal) selection and environmental adaptation with deep neural network based framework. Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous mobile terminals. For the system proposed in this paper, by using BottleNeck Features (BNFs) from a Deep Neural Network (DNN) rather than the conventional Mel-Frequency Cesptral Coefficients (MFCCs), we adopted the state-of-the-art deep neural network acoustic model, environmental adaptation and automatic asynchronous speech selection. The proposed method was evaluated by using a reverberant WSJCAM0 corpus, which was emitted by a loudspeaker and recorded in a meeting room with multiple speakers by far-field multiple mobile terminals. By using the bottleneck features based DNN acoustic model with automatic asynchronous speech selection and environmental adaptation, the average Word Error Rate (WER) was reduced from 55.32% of the baseline system to 19.38%, i.e. the relative error reduction rate was 64.97%.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	distant-talking speech recognition / tandem DNN / hybrid DNN / model adaptation / asynchronous speech
Paper #	SP2014-121
Date of Issue

Conference Information
Committee	SP
Conference Date	2014/12/8(1days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To	Speech (SP)
Language	ENG
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Speech selection and environmental adaptation for asynchronous speech recording based on deep neural network
Sub Title (in English)
Keyword(1)	distant-talking speech recognition
Keyword(2)	tandem DNN
Keyword(3)	hybrid DNN
Keyword(4)	model adaptation
Keyword(5)	asynchronous speech
1st Author's Name	BO REN
1st Author's Affiliation	Nagaoka University of Technology()
2nd Author's Name	LONGBIAO WANG
2nd Author's Affiliation	Nagaoka University of Technology
3rd Author's Name	ATSUHIKO KAI
3rd Author's Affiliation	Shizuoka University
Date	2014-12-16
Paper #	SP2014-121
Volume (vol)	vol.114
Number (no)	365
Page	pp.pp.-
#Pages	6
Date of Issue