Presentation | 2022-12-01 ASR model adaptation to target domain with large-scale audio data without transcription Takahiro Kinouchi, Daiki Mori, Ogawa Atsunori, Norihide Kitaoka, |
---|---|
PDF Download Page | ![]() |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Nowadays, speech recognition is used in various services and businesses thanks to the advent of high-performance models such as the Transformer speech recognition model. However, to train our high-performance speech recognition model from scratch, we need a large amount of speech data and its transcribed text data. It is both time-consuming and economically difficult for us to prepare these data on our own. On the other hand, it is relatively easy to prepare only the speech data of the target domain. Therefore, in this study, we integrate the wav2vec 2.0 model, which is pre-trained only with a large amount of target domain speech data, and the decoder module of the Transformer speech recognition model, which is pre-trained with a large amount of out-of-domain corpus, to create an speech recognition model that is comparatively applicable to the target domain. The purpose of this study is to create a speech recognition model for the target domain in an environment where the training data (speech data and its transcribed text data) of the target domain does not exist. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | wav2vec 2.0 / domain adaptation / end-to-end speech recognition / Encoder-Decoder model |
Paper # | NLC2022-18,SP2022-38 |
Date of Issue | 2022-11-22 (NLC, SP) |
Conference Information | |
Committee | NLC / IPSJ-NL / SP / IPSJ-SLP |
---|---|
Conference Date | 2022/11/29(3days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Mitsuo Yoshida(Univ. of Tsukuba) / 須藤 克仁(奈良先端科学技術大学院大学) / Tomoki Toda(Nagoya Univ.) / 戸田 智基(名古屋大学) |
Vice Chair | Hiroki Sakaji(Univ. of Tokyo) / Takeshi Kobayakawa(NHK) |
Secretary | Hiroki Sakaji(NTT) / Takeshi Kobayakawa(Hiroshima Univ. of Economics) / (株式会社デンソーアイティーラボラトリ) / (北海学園大学) / (東京農工大学) |
Assistant | Kanjin Takahashi(Sansan) / Yasuhiro Ogawa(Nagoya Univ.) / / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo) |
Paper Information | |
Registration To | Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | ASR model adaptation to target domain with large-scale audio data without transcription |
Sub Title (in English) | |
Keyword(1) | wav2vec 2.0 |
Keyword(2) | domain adaptation |
Keyword(3) | end-to-end speech recognition |
Keyword(4) | Encoder-Decoder model |
1st Author's Name | Takahiro Kinouchi |
1st Author's Affiliation | Toyohashi University of Technology(TUT) |
2nd Author's Name | Daiki Mori |
2nd Author's Affiliation | Toyohashi University of Technology(TUT) |
3rd Author's Name | Ogawa Atsunori |
3rd Author's Affiliation | NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT) |
4th Author's Name | Norihide Kitaoka |
4th Author's Affiliation | Toyohashi University of Technology(TUT) |
Date | 2022-12-01 |
Paper # | NLC2022-18,SP2022-38 |
Volume (vol) | vol.122 |
Number (no) | NLC-287,SP-288 |
Page | pp.pp.50-53(NLC), pp.50-53(SP), |
#Pages | 4 |
Date of Issue | 2022-11-22 (NLC, SP) |