Presentation 2023-03-01
Domain Adaptation for Improving End-to-end ASR Performance of Classroom Speech with Variable Recording Condition
Raufun Nahar, Rino Suzuki, Atsuhiko Kai,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Automatic speech recognition (ASR) of real-world speech recorded in real environment has been a challenge in the field of artificial intelligence (AI). The real environment speech can vary in terms of location, recording medium and devices and so on. In this research, we particularly take interest in recognizing data recorded in university classroom. This real-world classroom situation is simulated by re-recording a small amount of data in classroom by playing through loudspeaker and recording them using low-quality wireless microphone. Previous research on supervised training of ASR indicates the requirement of large-scale transcribed data in target environment. However, it is costly to record and transcribe such amount of data for desired environment. Therefore, we adopt DNN-based data augmentation method for end-to-end ASR model as well as self-supervised-learning (SSL) based feature extraction with implicit end-to-end model to perform ASR task for classroom data. Fine-tuning of SSL-based ASR using target domain data helps achieving 17.9% character error rate for low audibility data.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) ASRReal environmentClassroom recordingSelf-supervised learningData augmentation
Paper # EA2022-101,SIP2022-145,SP2022-65
Date of Issue 2023-02-21 (EA, SIP, SP)

Conference Information
Committee SP / IPSJ-SLP / EA / SIP
Conference Date 2023/2/28(2days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Tomoki Toda(Nagoya Univ.) / Tomoki Toda(Nagoya Univ.) / Kenichi Furuya(Oita Univ.) / Toshihisa Tanaka(Tokyo Univ. Agri.&Tech.)
Vice Chair / / Tatsuya Kako(NTT) / Junki Ono(Tokyo Metropolitan Univ.) / Koichi Ichige(Yokohama National Univ.) / Takayuki Nakachi(Ryukyu Univ.)
Secretary (NTT) / (Univ. of Electro-Comm.) / Tatsuya Kako(NTT) / Junki Ono(Univ. of Electro-Comm.) / Koichi Ichige(NTT) / Takayuki Nakachi(RitsumeikanUniv.)
Assistant Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo) / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo) / Masato Nakayama(Osaka Sangyo Univ.) / Kouhei Yatabe(Tuat) / Taichi Yoshida(UEC) / Shoko Imaizumi(Chiba Univ.)

Paper Information
Registration To Technical Committee on Speech / Special Interest Group on Spoken Language Processing / Technical Committee on Engineering Acoustics / Technical Committee on Signal Processing
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Domain Adaptation for Improving End-to-end ASR Performance of Classroom Speech with Variable Recording Condition
Sub Title (in English)
Keyword(1) ASRReal environmentClassroom recordingSelf-supervised learningData augmentation
1st Author's Name Raufun Nahar
1st Author's Affiliation Shizuoka University(Shizuoka Univ.)
2nd Author's Name Rino Suzuki
2nd Author's Affiliation Shizuoka University(Shizuoka Univ.)
3rd Author's Name Atsuhiko Kai
3rd Author's Affiliation Shizuoka University(Shizuoka Univ.)
Date 2023-03-01
Paper # EA2022-101,SIP2022-145,SP2022-65
Volume (vol) vol.122
Number (no) EA-387,SIP-388,SP-389
Page pp.pp.153-158(EA), pp.153-158(SIP), pp.153-158(SP),
#Pages 6
Date of Issue 2023-02-21 (EA, SIP, SP)