Presentation | 2020-03-02 Data augmentation for ASR system by using locally time-reversed speech Takanori Ashihara, Tomohiro Tanaka, Takafumi Moriya, Ryo Masumura, Yusuke Shinohara, Makio Kashino, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Data augmentation is one of the techniques to mitigate overfitting and improve robustness against several acoustic variabilities for the ASR system. This approach is to create artificially augmented data by adding certain types of transformations that maintain the class label for acquiring generalization ability. In this paper, we treat an auditory illusion as the acoustic transformation for the data generation. The auditory illusions related to speech signals have been proposed variously. Among them, we examine a locally time-reversed speech for data augmentation, especially. In our previous research, we proposed temporal reversal processing on a raw waveform directly. In contrast, we propose a method that processes the inversion on a feature sequence in this paper. Instead of the inversion of the raw waveform, the augmentation is able to eliminate the generation of an additional waveform, and thus enables online data creation during training. We applied the augmentation approach on the End-to-End automatic speech recognition task and evaluated the model compared with the baseline model by using CSJ corpus. As a result, the relative performance improvement of 8.4% was observed relative to the baseline. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | automatic speech recognition / End-to-End / locally time-reversed speech / data augmentation / auditory illusion |
Paper # | EA2019-110,SIP2019-112,SP2019-59 |
Date of Issue | 2020-02-24 (EA, SIP, SP) |
Conference Information | |
Committee | SP / EA / SIP |
---|---|
Conference Date | 2020/3/2(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Okinawa Industry Support Center |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Hisashi Kawai(NICT) / Kenichi Furuya(Oita Univ.) / Naoyuki Aikawa(TUS) |
Vice Chair | Akinobu Ri(Nagoya Inst. of Tech.) / Suehiro Shimauchi(Kanazawa Inst. of Tech.) / Shigeto Takeoka(Shizuoka Inst. of Science and Tech.) / Kazunori Hayashi(Osaka City Univ) / Yukihiro Bandou(NTT) |
Secretary | Akinobu Ri(Kyoto Univ.) / Suehiro Shimauchi(Waseda Univ.) / Shigeto Takeoka(NHK) / Kazunori Hayashi(Univ. of Tokyo) / Yukihiro Bandou(Hiroshima Univ.) |
Assistant | Tomoki Koriyama(Univ. of Tokyo) / Yusuke Ijima(NTT) / Keisuke Imoto(Ritsumeikan Univ.) / Daisuke Morikawa(Toyama Pref Univ.) / Kenjiro Sugimoto(Waseda Univ.) |
Paper Information | |
Registration To | Technical Committee on Speech / Technical Committee on Engineering Acoustics / Technical Committee on Signal Processing |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Data augmentation for ASR system by using locally time-reversed speech |
Sub Title (in English) | Temporal inversion of feature sequence |
Keyword(1) | automatic speech recognition |
Keyword(2) | End-to-End |
Keyword(3) | locally time-reversed speech |
Keyword(4) | data augmentation |
Keyword(5) | auditory illusion |
1st Author's Name | Takanori Ashihara |
1st Author's Affiliation | Nippon Telegraph and Telephone Corporation(NTT) |
2nd Author's Name | Tomohiro Tanaka |
2nd Author's Affiliation | Nippon Telegraph and Telephone Corporation(NTT) |
3rd Author's Name | Takafumi Moriya |
3rd Author's Affiliation | Nippon Telegraph and Telephone Corporation(NTT) |
4th Author's Name | Ryo Masumura |
4th Author's Affiliation | Nippon Telegraph and Telephone Corporation(NTT) |
5th Author's Name | Yusuke Shinohara |
5th Author's Affiliation | Nippon Telegraph and Telephone Corporation(NTT) |
6th Author's Name | Makio Kashino |
6th Author's Affiliation | Nippon Telegraph and Telephone Corporation(NTT) |
Date | 2020-03-02 |
Paper # | EA2019-110,SIP2019-112,SP2019-59 |
Volume (vol) | vol.119 |
Number (no) | EA-439,SIP-440,SP-441 |
Page | pp.pp.53-58(EA), pp.53-58(SIP), pp.53-58(SP), |
#Pages | 6 |
Date of Issue | 2020-02-24 (EA, SIP, SP) |