Presentation | 2022-12-01 Domain and language adaptation of large-scale pretrained model for speech recognition of low-resource language Kak Soky, Sheng Li, Chenhui Chu, Tatsuya Kawahara, |
---|---|
PDF Download Page | ![]() |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | The self-supervised learning (SSL) models are effective for automatic speech recognition (ASR). Due to the huge parameter size, it usually requires about 10 hours of data for finetuning ASR. However, such size of ASR training data is unavailable for some low-resource languages. Moreover, the SSL pre-trained models were originally trained using European languages; they thus might not be well-adapted to other domains or languages. To bare those challenges, We propose a two-step adaptation method: (1) domain adaptation, which uses in-domain multi-lingual datasets to finetune the pre-trained model, and (2) language adaptation, which finetunes the same language datasets but different domains. Then, we investigate the effectiveness of adapting only one hour of target-labeled data for the ASR task. The experiment using the Extraordinary Chambers in the Courts of Cambodia dataset shows that first conducting domain adaption and then language adaption is the most effective method for reducing the CER of the baseline by 6.15% and 7.75% of the test and validation sets, respectively. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Speech recognition / domain adaptation / language adaptation / low-resource / Khmer language / wav2vec2.0-based / self-supervised learning / large-scale pre-trained model |
Paper # | NLC2022-17,SP2022-37 |
Date of Issue | 2022-11-22 (NLC, SP) |
Conference Information | |
Committee | NLC / IPSJ-NL / SP / IPSJ-SLP |
---|---|
Conference Date | 2022/11/29(3days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Mitsuo Yoshida(Univ. of Tsukuba) / 須藤 克仁(奈良先端科学技術大学院大学) / Tomoki Toda(Nagoya Univ.) / 戸田 智基(名古屋大学) |
Vice Chair | Hiroki Sakaji(Univ. of Tokyo) / Takeshi Kobayakawa(NHK) |
Secretary | Hiroki Sakaji(NTT) / Takeshi Kobayakawa(Hiroshima Univ. of Economics) / (株式会社デンソーアイティーラボラトリ) / (北海学園大学) / (東京農工大学) |
Assistant | Kanjin Takahashi(Sansan) / Yasuhiro Ogawa(Nagoya Univ.) / / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo) |
Paper Information | |
Registration To | Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Domain and language adaptation of large-scale pretrained model for speech recognition of low-resource language |
Sub Title (in English) | |
Keyword(1) | Speech recognition |
Keyword(2) | domain adaptation |
Keyword(3) | language adaptation |
Keyword(4) | low-resource |
Keyword(5) | Khmer language |
Keyword(6) | wav2vec2.0-based |
Keyword(7) | self-supervised learning |
Keyword(8) | large-scale pre-trained model |
1st Author's Name | Kak Soky |
1st Author's Affiliation | Kyoto University(Kyoto University) |
2nd Author's Name | Sheng Li |
2nd Author's Affiliation | National Institute of Information and Communications Technology(NICT) |
3rd Author's Name | Chenhui Chu |
3rd Author's Affiliation | Kyoto University(Kyoto University) |
4th Author's Name | Tatsuya Kawahara |
4th Author's Affiliation | Kyoto University(Kyoto University) |
Date | 2022-12-01 |
Paper # | NLC2022-17,SP2022-37 |
Volume (vol) | vol.122 |
Number (no) | NLC-287,SP-288 |
Page | pp.pp.45-49(NLC), pp.45-49(SP), |
#Pages | 5 |
Date of Issue | 2022-11-22 (NLC, SP) |