深層学習によるボトルネック特徴量を用いたマルチモーダル音声認識

Presentation	2015-10-16 Multi-modal speech recognition using deep bottleneck features Satoshi Tamura, Hiroshi Ninomiya, Norihide Kitaoka, Shin Osuga, Yurie Iribe, Kazuya Takeda, Satoru Hayamizu,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In this paper, we propose a novel multi-modal speech recognition method which uses speech and lip images, employing Deep BottleNeck Features (DBNFs). At first, we incorporated several kinds of basic visual features, then significant improvement of visual-only speech recognition (lipreading) was observed. Next, we applied the DBNF technique to MFCCs in the audio modality and the above features in the visual modality, to obtain audio and visual DBNFs respectively. By using these DBNFs and multi-stream HMMs, we achieved more than 75% recognition accuracy even in heavily noisy conditions. In addition, we found recognition performance can be sufficiently improved by performing voice activity detection in the visual modality.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	multi-modal speech recognition / lipreading / bottleneck feature / deep learning / voice activity detection
Paper #	SP2015-69
Date of Issue	2015-10-08 (SP)

Conference Information
Committee	SP
Conference Date	2015/10/15(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Kobe Univ.
Topics (in Japanese)	(See Japanese page)
Topics (in English)	Speech interface, Synthesis, Dialogue, Application system, etc.
Chair	Kazunori Mano(Shibaura Inst. of Tech.)
Vice Chair	Norihide Kitaoka(Tokushima Univ.)
Secretary	Norihide Kitaoka(Tokyo City Univ.)
Assistant	Takashi Nose(Tohoku Univ.) / Taichi Asami(NTT)

Paper Information
Registration To	Technical Committee on Speech
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Multi-modal speech recognition using deep bottleneck features
Sub Title (in English)
Keyword(1)	multi-modal speech recognition
Keyword(2)	lipreading
Keyword(3)	bottleneck feature
Keyword(4)	deep learning
Keyword(5)	voice activity detection
1st Author's Name	Satoshi Tamura
1st Author's Affiliation	Gifu University(Gifu Univ)
2nd Author's Name	Hiroshi Ninomiya
2nd Author's Affiliation	Nagoya University(Nagoya Univ)
3rd Author's Name	Norihide Kitaoka
3rd Author's Affiliation	Tokushima University(Tokushima Univ)
4th Author's Name	Shin Osuga
4th Author's Affiliation	Aisin Seiki Co., Ltd.(Aisin Seiki)
5th Author's Name	Yurie Iribe
5th Author's Affiliation	Aichi Prefectural University(Aichi Prefectural Univ)
6th Author's Name	Kazuya Takeda
6th Author's Affiliation	Nagoya University(Nagoya Univ)
7th Author's Name	Satoru Hayamizu
7th Author's Affiliation	Gifu University(Gifu Univ)
Date	2015-10-16
Paper #	SP2015-69
Volume (vol)	vol.115
Number (no)	SP-253
Page	pp.pp.57-62(SP),
#Pages	6
Date of Issue	2015-10-08 (SP)