Presentation | 2023-03-01 Domain Adaptation for Improving End-to-end ASR Performance of Classroom Speech with Variable Recording Condition Raufun Nahar, Rino Suzuki, Atsuhiko Kai, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Automatic speech recognition (ASR) of real-world speech recorded in real environment has been a challenge in the field of artificial intelligence (AI). The real environment speech can vary in terms of location, recording medium and devices and so on. In this research, we particularly take interest in recognizing data recorded in university classroom. This real-world classroom situation is simulated by re-recording a small amount of data in classroom by playing through loudspeaker and recording them using low-quality wireless microphone. Previous research on supervised training of ASR indicates the requirement of large-scale transcribed data in target environment. However, it is costly to record and transcribe such amount of data for desired environment. Therefore, we adopt DNN-based data augmentation method for end-to-end ASR model as well as self-supervised-learning (SSL) based feature extraction with implicit end-to-end model to perform ASR task for classroom data. Fine-tuning of SSL-based ASR using target domain data helps achieving 17.9% character error rate for low audibility data. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | ASRReal environmentClassroom recordingSelf-supervised learningData augmentation |
Paper # | EA2022-101,SIP2022-145,SP2022-65 |
Date of Issue | 2023-02-21 (EA, SIP, SP) |
Conference Information | |
Committee | SP / IPSJ-SLP / EA / SIP |
---|---|
Conference Date | 2023/2/28(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Tomoki Toda(Nagoya Univ.) / Tomoki Toda(Nagoya Univ.) / Kenichi Furuya(Oita Univ.) / Toshihisa Tanaka(Tokyo Univ. Agri.&Tech.) |
Vice Chair | / / Tatsuya Kako(NTT) / Junki Ono(Tokyo Metropolitan Univ.) / Koichi Ichige(Yokohama National Univ.) / Takayuki Nakachi(Ryukyu Univ.) |
Secretary | (NTT) / (Univ. of Electro-Comm.) / Tatsuya Kako(NTT) / Junki Ono(Univ. of Electro-Comm.) / Koichi Ichige(NTT) / Takayuki Nakachi(RitsumeikanUniv.) |
Assistant | Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo) / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo) / Masato Nakayama(Osaka Sangyo Univ.) / Kouhei Yatabe(Tuat) / Taichi Yoshida(UEC) / Shoko Imaizumi(Chiba Univ.) |
Paper Information | |
Registration To | Technical Committee on Speech / Special Interest Group on Spoken Language Processing / Technical Committee on Engineering Acoustics / Technical Committee on Signal Processing |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Domain Adaptation for Improving End-to-end ASR Performance of Classroom Speech with Variable Recording Condition |
Sub Title (in English) | |
Keyword(1) | ASRReal environmentClassroom recordingSelf-supervised learningData augmentation |
1st Author's Name | Raufun Nahar |
1st Author's Affiliation | Shizuoka University(Shizuoka Univ.) |
2nd Author's Name | Rino Suzuki |
2nd Author's Affiliation | Shizuoka University(Shizuoka Univ.) |
3rd Author's Name | Atsuhiko Kai |
3rd Author's Affiliation | Shizuoka University(Shizuoka Univ.) |
Date | 2023-03-01 |
Paper # | EA2022-101,SIP2022-145,SP2022-65 |
Volume (vol) | vol.122 |
Number (no) | EA-387,SIP-388,SP-389 |
Page | pp.pp.153-158(EA), pp.153-158(SIP), pp.153-158(SP), |
#Pages | 6 |
Date of Issue | 2023-02-21 (EA, SIP, SP) |