Presentation | 2015-12-02 Automation of high performance system building for large vocabulary speech recognition using evolution strategy with pareto optimality Takafumi Moriya, Tomohiro Tanaka, Takahiro Shinozaki, Shinji Watanabe, Kevin Duh, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | The performance of speech recognition tasks can be significantly improved by the use of deep neural networks (DNN). However, when building a high performance speech recognition system, the laborious effort required by human experts in tuning numerous parameters remains a prominent obstacle. In addition, computation time can be prohibitive when training large DNN models. The goal of this paper is to automate the process. We propose to tune DNN-HMM based large vocabulary speech recognition systems using the covariance matrix adaptation evolution strategy (CMA-ES) with a multi-objective Pareto optimization. This optimizes systems to achieve both high-accuracy and compact model size. Compared to a strong manually-tuned configuration borrowed from a similar system, our approach automatically discovered systems with lower WER by 0.48%, and systems with 59% smaller model size while keeping WER constant. The optimized training script is released in the Kaldi speech recognition toolkit as the first publicly available recipe for Japanese large vocabulary speech recognition. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | large vocabulary speech recognition / evolution strategy / deep neural network / multi-objective optimization |
Paper # | SP2015-75 |
Date of Issue | 2015-11-25 (SP) |
Conference Information | |
Committee | NLC / IPSJ-NL / SP / IPSJ-SLP |
---|---|
Conference Date | 2015/12/2(3days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Nagoya Inst of Tech. |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | The Second Natural Language Processing Symposium & The 17th Spoken Language Symposium |
Chair | Koichi Takeuchi(Okayama Univ.) / Kentaro Inui(Tohoku Univ.) / Kazunori Mano(Shibaura Inst. of Tech.) / Koichi Shinoda(東工大) |
Vice Chair | Hiroshi Kanayama(IBM) / Makoto Ichise(NTT DoCoMo) / / Norihide Kitaoka(Tokushima Univ.) |
Secretary | Hiroshi Kanayama(Univ. of Tokyo/Hottolink) / Makoto Ichise(Ryukoku Univ.) / (Osaka Univ.) / Norihide Kitaoka(Tohoku Univ.) / (Mixi Co. Ltd.) |
Assistant | Kazutaka Shimada(Kyushu Inst. of Tech.) / Ryuichiro Higashinaka(NTT) / / Takashi Nose(Tohoku Univ.) / Taichi Asami(NTT) |
Paper Information | |
Registration To | Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Automation of high performance system building for large vocabulary speech recognition using evolution strategy with pareto optimality |
Sub Title (in English) | |
Keyword(1) | large vocabulary speech recognition |
Keyword(2) | evolution strategy |
Keyword(3) | deep neural network |
Keyword(4) | multi-objective optimization |
1st Author's Name | Takafumi Moriya |
1st Author's Affiliation | Tokyo Institute of Technology(Tokyo Tech) |
2nd Author's Name | Tomohiro Tanaka |
2nd Author's Affiliation | Tokyo Institute of Technology(Tokyo Tech) |
3rd Author's Name | Takahiro Shinozaki |
3rd Author's Affiliation | Tokyo Institute of Technology(Tokyo Tech) |
4th Author's Name | Shinji Watanabe |
4th Author's Affiliation | Mitsubishi Electric Research Laboratories(MERL) |
5th Author's Name | Kevin Duh |
5th Author's Affiliation | Nara Institute of Science and Technology(NAIST) |
Date | 2015-12-02 |
Paper # | SP2015-75 |
Volume (vol) | vol.115 |
Number (no) | SP-346 |
Page | pp.pp.31-36(SP), |
#Pages | 6 |
Date of Issue | 2015-11-25 (SP) |