Domain Adaptation for Improving End-to-end ASR Performance of Classroom Speech with Variable Recording Condition

ナハル ラウフン; 鈴木 莉乃; 甲斐 充彦

Presentation	2023-03-01 Domain Adaptation for Improving End-to-end ASR Performance of Classroom Speech with Variable Recording Condition Raufun Nahar, Rino Suzuki, Atsuhiko Kai,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Automatic speech recognition (ASR) of real-world speech recorded in real environment has been a challenge in the field of artificial intelligence (AI). The real environment speech can vary in terms of location, recording medium and devices and so on. In this research, we particularly take interest in recognizing data recorded in university classroom. This real-world classroom situation is simulated by re-recording a small amount of data in classroom by playing through loudspeaker and recording them using low-quality wireless microphone. Previous research on supervised training of ASR indicates the requirement of large-scale transcribed data in target environment. However, it is costly to record and transcribe such amount of data for desired environment. Therefore, we adopt DNN-based data augmentation method for end-to-end ASR model as well as self-supervised-learning (SSL) based feature extraction with implicit end-to-end model to perform ASR task for classroom data. Fine-tuning of SSL-based ASR using target domain data helps achieving 17.9% character error rate for low audibility data.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	ASRReal environmentClassroom recordingSelf-supervised learningData augmentation
Paper #	EA2022-101,SIP2022-145,SP2022-65
Date of Issue	2023-02-21 (EA, SIP, SP)

Conference Information
Committee	SP / IPSJ-SLP / EA / SIP
Conference Date	2023/2/28(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair	Tomoki Toda(Nagoya Univ.) / Tomoki Toda(Nagoya Univ.) / Kenichi Furuya(Oita Univ.) / Toshihisa Tanaka(Tokyo Univ. Agri.&Tech.)
Vice Chair	/ / Tatsuya Kako(NTT) / Junki Ono(Tokyo Metropolitan Univ.) / Koichi Ichige(Yokohama National Univ.) / Takayuki Nakachi(Ryukyu Univ.)
Secretary	(NTT) / (Univ. of Electro-Comm.) / Tatsuya Kako(NTT) / Junki Ono(Univ. of Electro-Comm.) / Koichi Ichige(NTT) / Takayuki Nakachi(RitsumeikanUniv.)
Assistant	Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo) / Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo) / Masato Nakayama(Osaka Sangyo Univ.) / Kouhei Yatabe(Tuat) / Taichi Yoshida(UEC) / Shoko Imaizumi(Chiba Univ.)

Paper Information
Registration To	Technical Committee on Speech / Special Interest Group on Spoken Language Processing / Technical Committee on Engineering Acoustics / Technical Committee on Signal Processing
Language	ENG
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Domain Adaptation for Improving End-to-end ASR Performance of Classroom Speech with Variable Recording Condition
Sub Title (in English)
Keyword(1)	ASRReal environmentClassroom recordingSelf-supervised learningData augmentation
1st Author's Name	Raufun Nahar
1st Author's Affiliation	Shizuoka University(Shizuoka Univ.)
2nd Author's Name	Rino Suzuki
2nd Author's Affiliation	Shizuoka University(Shizuoka Univ.)
3rd Author's Name	Atsuhiko Kai
3rd Author's Affiliation	Shizuoka University(Shizuoka Univ.)
Date	2023-03-01
Paper #	EA2022-101,SIP2022-145,SP2022-65
Volume (vol)	vol.122
Number (no)	EA-387,SIP-388,SP-389
Page	pp.pp.153-158(EA), pp.153-158(SIP), pp.153-158(SP),
#Pages	6
Date of Issue	2023-02-21 (EA, SIP, SP)