Presentation | 2015-10-16 Multi-modal speech recognition using deep bottleneck features Satoshi Tamura, Hiroshi Ninomiya, Norihide Kitaoka, Shin Osuga, Yurie Iribe, Kazuya Takeda, Satoru Hayamizu, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In this paper, we propose a novel multi-modal speech recognition method which uses speech and lip images, employing Deep BottleNeck Features (DBNFs). At first, we incorporated several kinds of basic visual features, then significant improvement of visual-only speech recognition (lipreading) was observed. Next, we applied the DBNF technique to MFCCs in the audio modality and the above features in the visual modality, to obtain audio and visual DBNFs respectively. By using these DBNFs and multi-stream HMMs, we achieved more than 75% recognition accuracy even in heavily noisy conditions. In addition, we found recognition performance can be sufficiently improved by performing voice activity detection in the visual modality. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | multi-modal speech recognition / lipreading / bottleneck feature / deep learning / voice activity detection |
Paper # | SP2015-69 |
Date of Issue | 2015-10-08 (SP) |
Conference Information | |
Committee | SP |
---|---|
Conference Date | 2015/10/15(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Kobe Univ. |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Speech interface, Synthesis, Dialogue, Application system, etc. |
Chair | Kazunori Mano(Shibaura Inst. of Tech.) |
Vice Chair | Norihide Kitaoka(Tokushima Univ.) |
Secretary | Norihide Kitaoka(Tokyo City Univ.) |
Assistant | Takashi Nose(Tohoku Univ.) / Taichi Asami(NTT) |
Paper Information | |
Registration To | Technical Committee on Speech |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Multi-modal speech recognition using deep bottleneck features |
Sub Title (in English) | |
Keyword(1) | multi-modal speech recognition |
Keyword(2) | lipreading |
Keyword(3) | bottleneck feature |
Keyword(4) | deep learning |
Keyword(5) | voice activity detection |
1st Author's Name | Satoshi Tamura |
1st Author's Affiliation | Gifu University(Gifu Univ) |
2nd Author's Name | Hiroshi Ninomiya |
2nd Author's Affiliation | Nagoya University(Nagoya Univ) |
3rd Author's Name | Norihide Kitaoka |
3rd Author's Affiliation | Tokushima University(Tokushima Univ) |
4th Author's Name | Shin Osuga |
4th Author's Affiliation | Aisin Seiki Co., Ltd.(Aisin Seiki) |
5th Author's Name | Yurie Iribe |
5th Author's Affiliation | Aichi Prefectural University(Aichi Prefectural Univ) |
6th Author's Name | Kazuya Takeda |
6th Author's Affiliation | Nagoya University(Nagoya Univ) |
7th Author's Name | Satoru Hayamizu |
7th Author's Affiliation | Gifu University(Gifu Univ) |
Date | 2015-10-16 |
Paper # | SP2015-69 |
Volume (vol) | vol.115 |
Number (no) | SP-253 |
Page | pp.pp.57-62(SP), |
#Pages | 6 |
Date of Issue | 2015-10-08 (SP) |