Presentation | 2017-03-02 Non-native speech conversion with consistency-aware recursive network and generative adversarial network Keisuke Oyamada, Hirokazu Kameoka, Takuhiro Kaneko, Hiroyasu Ando, Kaoru Hiramatsu, Kunio Kashino, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | This paper deals with the problem of automatically modifying the pronunciation of non-native speech. Since the pronunciation characteristics of non-native speakers tend to depend heavily on the context (such as words), conversion rules must be learned from and applied to a sequence of features rather than a single-frame feature. This paper proposes constructing a neural network that allows a sequence of features as an input and an output, and guarantees the consistency between the generated features within overlapping segments. We further propose applying a recently proposed generative adversarial network (GAN)-based post filterto the generated feature sequence with the aim of synthesizing natural sounding speech. Through subjective and quantitative evaluations, we confirmed the superiority of the proposed method over a conventional NN approach in terms of the conversion quality. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | |
Paper # | EA2016-139,SIP2016-194,SP2016-134 |
Date of Issue | 2017-02-22 (EA, SIP, SP) |
Conference Information | |
Committee | SP / SIP / EA |
---|---|
Conference Date | 2017/3/1(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Okinawa Industry Support Center |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Speech, Engineering/Electro Acoustics, Signal Processing, and Related Topics |
Chair | Kazunori Mano(Shibaura Inst. of Tech.) / Makoto Nakashizuka(Chiba Inst. of Tech.) / Mitsunori Mizumachi(Kyushu Inst. of Tech.) |
Vice Chair | Hiroki Mori(Utsunomiya Univ.) / Masahiro Okuda(Univ. of Kitakyushu) / Shogo Muramatsu(Niigata Univ.) / Yoichi Haneda(Univ. of Electro-Comm.) / Suehiro Shimauchi(NTT) |
Secretary | Hiroki Mori(Kobe Univ.) / Masahiro Okuda(Shizuoka Univ.) / Shogo Muramatsu(Ritsumeikan Univ.) / Yoichi Haneda(Chiba Inst. of Tech.) / Suehiro Shimauchi(KDDI R&D Labs.) |
Assistant | Taichi Asami(NTT) / Kei Hashimoto(Nagoya Inst. of Tech.) / Osamu Watanabe(Takushoku Univ.) / Shigeto Takeoka(Shizuoka Inst. of Science and Tech.) / TREVINO Jorge(Tohoku Univ.) |
Paper Information | |
Registration To | Technical Committee on Speech / Technical Committee on Signal Processing / Technical Committee on Engineering Acoustics |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Non-native speech conversion with consistency-aware recursive network and generative adversarial network |
Sub Title (in English) | |
Keyword(1) | |
Keyword(2) | |
Keyword(3) | |
Keyword(4) | |
1st Author's Name | Keisuke Oyamada |
1st Author's Affiliation | University of Tsukuba(Univ. of Tsukuba) |
2nd Author's Name | Hirokazu Kameoka |
2nd Author's Affiliation | Nippon Telegraph and Telephone Corporation(NTT) |
3rd Author's Name | Takuhiro Kaneko |
3rd Author's Affiliation | Nippon Telegraph and Telephone Corporation(NTT) |
4th Author's Name | Hiroyasu Ando |
4th Author's Affiliation | University of Tsukuba(Univ. of Tsukuba) |
5th Author's Name | Kaoru Hiramatsu |
5th Author's Affiliation | Nippon Telegraph and Telephone Corporation(NTT) |
6th Author's Name | Kunio Kashino |
6th Author's Affiliation | Nippon Telegraph and Telephone Corporation(NTT) |
Date | 2017-03-02 |
Paper # | EA2016-139,SIP2016-194,SP2016-134 |
Volume (vol) | vol.116 |
Number (no) | EA-475,SIP-476,SP-477 |
Page | pp.pp.315-320(EA), pp.315-320(SIP), pp.315-320(SP), |
#Pages | 6 |
Date of Issue | 2017-02-22 (EA, SIP, SP) |