Presentation 2022-12-01
Domain and language adaptation of large-scale pretrained model for speech recognition of low-resource language
Kak Soky, Sheng Li, Chenhui Chu, Tatsuya Kawahara,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) The self-supervised learning (SSL) models are effective for automatic speech recognition (ASR). Due to the huge parameter size, it usually requires about 10 hours of data for finetuning ASR. However, such size of ASR training data is unavailable for some low-resource languages. Moreover, the SSL pre-trained models were originally trained using European languages; they thus might not be well-adapted to other domains or languages. To bare those challenges, We propose a two-step adaptation method: (1) domain adaptation, which uses in-domain multi-lingual datasets to finetune the pre-trained model, and (2) language adaptation, which finetunes the same language datasets but different domains. Then, we investigate the effectiveness of adapting only one hour of target-labeled data for the ASR task. The experiment using the Extraordinary Chambers in the Courts of Cambodia dataset shows that first conducting domain adaption and then language adaption is the most effective method for reducing the CER of the baseline by 6.15% and 7.75% of the test and validation sets, respectively.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Speech recognition / domain adaptation / language adaptation / low-resource / Khmer language / wav2vec2.0-based / self-supervised learning / large-scale pre-trained model
Paper # NLC2022-17,SP2022-37
Date of Issue 2022-11-22 (NLC, SP)

Conference Information
Committee NLC / IPSJ-NL / SP / IPSJ-SLP
Conference Date 2022/11/29(3days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Mitsuo Yoshida(Univ. of Tsukuba) / 須藤 克仁(奈良先端科学技術大学院大学) / Tomoki Toda(Nagoya Univ.) / 戸田 智基(名古屋大学)
Vice Chair Hiroki Sakaji(Univ. of Tokyo) / Takeshi Kobayakawa(NHK)
Secretary Hiroki Sakaji(NTT) / Takeshi Kobayakawa(Hiroshima Univ. of Economics) / (株式会社デンソーアイティーラボラトリ) / (北海学園大学) / (東京農工大学)
Assistant Kanjin Takahashi(Sansan) / Yasuhiro Ogawa(Nagoya Univ.) / / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo)

Paper Information
Registration To Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Domain and language adaptation of large-scale pretrained model for speech recognition of low-resource language
Sub Title (in English)
Keyword(1) Speech recognition
Keyword(2) domain adaptation
Keyword(3) language adaptation
Keyword(4) low-resource
Keyword(5) Khmer language
Keyword(6) wav2vec2.0-based
Keyword(7) self-supervised learning
Keyword(8) large-scale pre-trained model
1st Author's Name Kak Soky
1st Author's Affiliation Kyoto University(Kyoto University)
2nd Author's Name Sheng Li
2nd Author's Affiliation National Institute of Information and Communications Technology(NICT)
3rd Author's Name Chenhui Chu
3rd Author's Affiliation Kyoto University(Kyoto University)
4th Author's Name Tatsuya Kawahara
4th Author's Affiliation Kyoto University(Kyoto University)
Date 2022-12-01
Paper # NLC2022-17,SP2022-37
Volume (vol) vol.122
Number (no) NLC-287,SP-288
Page pp.pp.45-49(NLC), pp.45-49(SP),
#Pages 5
Date of Issue 2022-11-22 (NLC, SP)