Presentation 2020-03-02
Data augmentation for ASR system by using locally time-reversed speech
Takanori Ashihara, Tomohiro Tanaka, Takafumi Moriya, Ryo Masumura, Yusuke Shinohara, Makio Kashino,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Data augmentation is one of the techniques to mitigate overfitting and improve robustness against several acoustic variabilities for the ASR system. This approach is to create artificially augmented data by adding certain types of transformations that maintain the class label for acquiring generalization ability. In this paper, we treat an auditory illusion as the acoustic transformation for the data generation. The auditory illusions related to speech signals have been proposed variously. Among them, we examine a locally time-reversed speech for data augmentation, especially. In our previous research, we proposed temporal reversal processing on a raw waveform directly. In contrast, we propose a method that processes the inversion on a feature sequence in this paper. Instead of the inversion of the raw waveform, the augmentation is able to eliminate the generation of an additional waveform, and thus enables online data creation during training. We applied the augmentation approach on the End-to-End automatic speech recognition task and evaluated the model compared with the baseline model by using CSJ corpus. As a result, the relative performance improvement of 8.4% was observed relative to the baseline.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) automatic speech recognition / End-to-End / locally time-reversed speech / data augmentation / auditory illusion
Paper # EA2019-110,SIP2019-112,SP2019-59
Date of Issue 2020-02-24 (EA, SIP, SP)

Conference Information
Committee SP / EA / SIP
Conference Date 2020/3/2(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Okinawa Industry Support Center
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Hisashi Kawai(NICT) / Kenichi Furuya(Oita Univ.) / Naoyuki Aikawa(TUS)
Vice Chair Akinobu Ri(Nagoya Inst. of Tech.) / Suehiro Shimauchi(Kanazawa Inst. of Tech.) / Shigeto Takeoka(Shizuoka Inst. of Science and Tech.) / Kazunori Hayashi(Osaka City Univ) / Yukihiro Bandou(NTT)
Secretary Akinobu Ri(Kyoto Univ.) / Suehiro Shimauchi(Waseda Univ.) / Shigeto Takeoka(NHK) / Kazunori Hayashi(Univ. of Tokyo) / Yukihiro Bandou(Hiroshima Univ.)
Assistant Tomoki Koriyama(Univ. of Tokyo) / Yusuke Ijima(NTT) / Keisuke Imoto(Ritsumeikan Univ.) / Daisuke Morikawa(Toyama Pref Univ.) / Kenjiro Sugimoto(Waseda Univ.)

Paper Information
Registration To Technical Committee on Speech / Technical Committee on Engineering Acoustics / Technical Committee on Signal Processing
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Data augmentation for ASR system by using locally time-reversed speech
Sub Title (in English) Temporal inversion of feature sequence
Keyword(1) automatic speech recognition
Keyword(2) End-to-End
Keyword(3) locally time-reversed speech
Keyword(4) data augmentation
Keyword(5) auditory illusion
1st Author's Name Takanori Ashihara
1st Author's Affiliation Nippon Telegraph and Telephone Corporation(NTT)
2nd Author's Name Tomohiro Tanaka
2nd Author's Affiliation Nippon Telegraph and Telephone Corporation(NTT)
3rd Author's Name Takafumi Moriya
3rd Author's Affiliation Nippon Telegraph and Telephone Corporation(NTT)
4th Author's Name Ryo Masumura
4th Author's Affiliation Nippon Telegraph and Telephone Corporation(NTT)
5th Author's Name Yusuke Shinohara
5th Author's Affiliation Nippon Telegraph and Telephone Corporation(NTT)
6th Author's Name Makio Kashino
6th Author's Affiliation Nippon Telegraph and Telephone Corporation(NTT)
Date 2020-03-02
Paper # EA2019-110,SIP2019-112,SP2019-59
Volume (vol) vol.119
Number (no) EA-439,SIP-440,SP-441
Page pp.pp.53-58(EA), pp.53-58(SIP), pp.53-58(SP),
#Pages 6
Date of Issue 2020-02-24 (EA, SIP, SP)