Presentation 2014-12-16
Speech selection and environmental adaptation for asynchronous speech recording based on deep neural network
BO REN, LONGBIAO WANG, ATSUHIKO KAI,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. This is implemented by combining automatic asynchronous speech (microphone or mobile terminal) selection and environmental adaptation with deep neural network based framework. Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous mobile terminals. For the system proposed in this paper, by using BottleNeck Features (BNFs) from a Deep Neural Network (DNN) rather than the conventional Mel-Frequency Cesptral Coefficients (MFCCs), we adopted the state-of-the-art deep neural network acoustic model, environmental adaptation and automatic asynchronous speech selection. The proposed method was evaluated by using a reverberant WSJCAM0 corpus, which was emitted by a loudspeaker and recorded in a meeting room with multiple speakers by far-field multiple mobile terminals. By using the bottleneck features based DNN acoustic model with automatic asynchronous speech selection and environmental adaptation, the average Word Error Rate (WER) was reduced from 55.32% of the baseline system to 19.38%, i.e. the relative error reduction rate was 64.97%.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) distant-talking speech recognition / tandem DNN / hybrid DNN / model adaptation / asynchronous speech
Paper # SP2014-121
Date of Issue

Conference Information
Committee SP
Conference Date 2014/12/8(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Speech (SP)
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Speech selection and environmental adaptation for asynchronous speech recording based on deep neural network
Sub Title (in English)
Keyword(1) distant-talking speech recognition
Keyword(2) tandem DNN
Keyword(3) hybrid DNN
Keyword(4) model adaptation
Keyword(5) asynchronous speech
1st Author's Name BO REN
1st Author's Affiliation Nagaoka University of Technology()
2nd Author's Name LONGBIAO WANG
2nd Author's Affiliation Nagaoka University of Technology
3rd Author's Name ATSUHIKO KAI
3rd Author's Affiliation Shizuoka University
Date 2014-12-16
Paper # SP2014-121
Volume (vol) vol.114
Number (no) 365
Page pp.pp.-
#Pages 6
Date of Issue