時間反転音声を用いた音声認識のためのデータ拡張

芦原 孝典; 田中 智大; 森谷 崇史; 増村 亮; 篠原 雄介; 柏野 牧夫

Presentation	2020-03-02 Data augmentation for ASR system by using locally time-reversed speech Takanori Ashihara, Tomohiro Tanaka, Takafumi Moriya, Ryo Masumura, Yusuke Shinohara, Makio Kashino,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Data augmentation is one of the techniques to mitigate overfitting and improve robustness against several acoustic variabilities for the ASR system. This approach is to create artificially augmented data by adding certain types of transformations that maintain the class label for acquiring generalization ability. In this paper, we treat an auditory illusion as the acoustic transformation for the data generation. The auditory illusions related to speech signals have been proposed variously. Among them, we examine a locally time-reversed speech for data augmentation, especially. In our previous research, we proposed temporal reversal processing on a raw waveform directly. In contrast, we propose a method that processes the inversion on a feature sequence in this paper. Instead of the inversion of the raw waveform, the augmentation is able to eliminate the generation of an additional waveform, and thus enables online data creation during training. We applied the augmentation approach on the End-to-End automatic speech recognition task and evaluated the model compared with the baseline model by using CSJ corpus. As a result, the relative performance improvement of 8.4% was observed relative to the baseline.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	automatic speech recognition / End-to-End / locally time-reversed speech / data augmentation / auditory illusion
Paper #	EA2019-110,SIP2019-112,SP2019-59
Date of Issue	2020-02-24 (EA, SIP, SP)

Conference Information
Committee	SP / EA / SIP
Conference Date	2020/3/2(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Okinawa Industry Support Center
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair	Hisashi Kawai(NICT) / Kenichi Furuya(Oita Univ.) / Naoyuki Aikawa(TUS)
Vice Chair	Akinobu Ri(Nagoya Inst. of Tech.) / Suehiro Shimauchi(Kanazawa Inst. of Tech.) / Shigeto Takeoka(Shizuoka Inst. of Science and Tech.) / Kazunori Hayashi(Osaka City Univ) / Yukihiro Bandou(NTT)
Secretary	Akinobu Ri(Kyoto Univ.) / Suehiro Shimauchi(Waseda Univ.) / Shigeto Takeoka(NHK) / Kazunori Hayashi(Univ. of Tokyo) / Yukihiro Bandou(Hiroshima Univ.)
Assistant	Tomoki Koriyama(Univ. of Tokyo) / Yusuke Ijima(NTT) / Keisuke Imoto(Ritsumeikan Univ.) / Daisuke Morikawa(Toyama Pref Univ.) / Kenjiro Sugimoto(Waseda Univ.)

Paper Information
Registration To	Technical Committee on Speech / Technical Committee on Engineering Acoustics / Technical Committee on Signal Processing
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Data augmentation for ASR system by using locally time-reversed speech
Sub Title (in English)	Temporal inversion of feature sequence
Keyword(1)	automatic speech recognition
Keyword(2)	End-to-End
Keyword(3)	locally time-reversed speech
Keyword(4)	data augmentation
Keyword(5)	auditory illusion
1st Author's Name	Takanori Ashihara
1st Author's Affiliation	Nippon Telegraph and Telephone Corporation(NTT)
2nd Author's Name	Tomohiro Tanaka
2nd Author's Affiliation	Nippon Telegraph and Telephone Corporation(NTT)
3rd Author's Name	Takafumi Moriya
3rd Author's Affiliation	Nippon Telegraph and Telephone Corporation(NTT)
4th Author's Name	Ryo Masumura
4th Author's Affiliation	Nippon Telegraph and Telephone Corporation(NTT)
5th Author's Name	Yusuke Shinohara
5th Author's Affiliation	Nippon Telegraph and Telephone Corporation(NTT)
6th Author's Name	Makio Kashino
6th Author's Affiliation	Nippon Telegraph and Telephone Corporation(NTT)
Date	2020-03-02
Paper #	EA2019-110,SIP2019-112,SP2019-59
Volume (vol)	vol.119
Number (no)	EA-439,SIP-440,SP-441
Page	pp.pp.53-58(EA), pp.53-58(SIP), pp.53-58(SP),
#Pages	6
Date of Issue	2020-02-24 (EA, SIP, SP)