講演抄録/キーワード |
講演名 |
2022-12-01 15:20
Domain and language adaptation of large-scale pretrained model for speech recognition of low-resource language ○Kak Soky(Kyoto University)・Sheng Li(NICT)・Chenhui Chu・Tatsuya Kawahara(Kyoto University) NLC2022-17 SP2022-37 |
抄録 |
(和) |
The self-supervised learning (SSL) models are effective for automatic speech recognition (ASR). Due to the huge parameter size, it usually requires about 10 hours of data for finetuning ASR. However, such size of ASR training data is unavailable for some low-resource languages. Moreover, the SSL pre-trained models were originally trained using European languages; they thus might not be well-adapted to other domains or languages. To bare those challenges, We propose a two-step adaptation method: (1) domain adaptation, which uses in-domain multi-lingual datasets to finetune the pre-trained model, and (2) language adaptation, which finetunes the same language datasets but different domains. Then, we investigate the effectiveness of adapting only one hour of target-labeled data for the ASR task. The experiment using the Extraordinary Chambers in the Courts of Cambodia dataset shows that first conducting domain adaption and then language adaption is the most effective method for reducing the CER of the baseline by 6.15% and 7.75% of the test and validation sets, respectively. |
(英) |
The self-supervised learning (SSL) models are effective for automatic speech recognition (ASR). Due to the huge parameter size, it usually requires about 10 hours of data for finetuning ASR. However, such size of ASR training data is unavailable for some low-resource languages. Moreover, the SSL pre-trained models were originally trained using European languages; they thus might not be well-adapted to other domains or languages. To bare those challenges, We propose a two-step adaptation method: (1) domain adaptation, which uses in-domain multi-lingual datasets to finetune the pre-trained model, and (2) language adaptation, which finetunes the same language datasets but different domains. Then, we investigate the effectiveness of adapting only one hour of target-labeled data for the ASR task. The experiment using the Extraordinary Chambers in the Courts of Cambodia dataset shows that first conducting domain adaption and then language adaption is the most effective method for reducing the CER of the baseline by 6.15% and 7.75% of the test and validation sets, respectively. |
キーワード |
(和) |
Speech recognition / domain adaptation / language adaptation / low-resource / Khmer language / wav2vec2.0-based / self-supervised learning / large-scale pre-trained model |
(英) |
Speech recognition / domain adaptation / language adaptation / low-resource / Khmer language / wav2vec2.0-based / self-supervised learning / large-scale pre-trained model |
文献情報 |
信学技報, vol. 122, no. 288, SP2022-37, pp. 45-49, 2022年11月. |
資料番号 |
SP2022-37 |
発行日 |
2022-11-22 (NLC, SP) |
ISSN |
Online edition: ISSN 2432-6380 |
著作権に ついて |
技術研究報告に掲載された論文の著作権は電子情報通信学会に帰属します.(許諾番号:10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
PDFダウンロード |
NLC2022-17 SP2022-37 |
|