Presentation 2017-03-02
Non-native speech conversion with consistency-aware recursive network and generative adversarial network
Keisuke Oyamada, Hirokazu Kameoka, Takuhiro Kaneko, Hiroyasu Ando, Kaoru Hiramatsu, Kunio Kashino,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper deals with the problem of automatically modifying the pronunciation of non-native speech. Since the pronunciation characteristics of non-native speakers tend to depend heavily on the context (such as words), conversion rules must be learned from and applied to a sequence of features rather than a single-frame feature. This paper proposes constructing a neural network that allows a sequence of features as an input and an output, and guarantees the consistency between the generated features within overlapping segments. We further propose applying a recently proposed generative adversarial network (GAN)-based post filterto the generated feature sequence with the aim of synthesizing natural sounding speech. Through subjective and quantitative evaluations, we confirmed the superiority of the proposed method over a conventional NN approach in terms of the conversion quality.
Keyword(in Japanese) (See Japanese page)
Keyword(in English)
Paper # EA2016-139,SIP2016-194,SP2016-134
Date of Issue 2017-02-22 (EA, SIP, SP)

Conference Information
Committee SP / SIP / EA
Conference Date 2017/3/1(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Okinawa Industry Support Center
Topics (in Japanese) (See Japanese page)
Topics (in English) Speech, Engineering/Electro Acoustics, Signal Processing, and Related Topics
Chair Kazunori Mano(Shibaura Inst. of Tech.) / Makoto Nakashizuka(Chiba Inst. of Tech.) / Mitsunori Mizumachi(Kyushu Inst. of Tech.)
Vice Chair Hiroki Mori(Utsunomiya Univ.) / Masahiro Okuda(Univ. of Kitakyushu) / Shogo Muramatsu(Niigata Univ.) / Yoichi Haneda(Univ. of Electro-Comm.) / Suehiro Shimauchi(NTT)
Secretary Hiroki Mori(Kobe Univ.) / Masahiro Okuda(Shizuoka Univ.) / Shogo Muramatsu(Ritsumeikan Univ.) / Yoichi Haneda(Chiba Inst. of Tech.) / Suehiro Shimauchi(KDDI R&D Labs.)
Assistant Taichi Asami(NTT) / Kei Hashimoto(Nagoya Inst. of Tech.) / Osamu Watanabe(Takushoku Univ.) / Shigeto Takeoka(Shizuoka Inst. of Science and Tech.) / TREVINO Jorge(Tohoku Univ.)

Paper Information
Registration To Technical Committee on Speech / Technical Committee on Signal Processing / Technical Committee on Engineering Acoustics
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Non-native speech conversion with consistency-aware recursive network and generative adversarial network
Sub Title (in English)
Keyword(1)
Keyword(2)
Keyword(3)
Keyword(4)
1st Author's Name Keisuke Oyamada
1st Author's Affiliation University of Tsukuba(Univ. of Tsukuba)
2nd Author's Name Hirokazu Kameoka
2nd Author's Affiliation Nippon Telegraph and Telephone Corporation(NTT)
3rd Author's Name Takuhiro Kaneko
3rd Author's Affiliation Nippon Telegraph and Telephone Corporation(NTT)
4th Author's Name Hiroyasu Ando
4th Author's Affiliation University of Tsukuba(Univ. of Tsukuba)
5th Author's Name Kaoru Hiramatsu
5th Author's Affiliation Nippon Telegraph and Telephone Corporation(NTT)
6th Author's Name Kunio Kashino
6th Author's Affiliation Nippon Telegraph and Telephone Corporation(NTT)
Date 2017-03-02
Paper # EA2016-139,SIP2016-194,SP2016-134
Volume (vol) vol.116
Number (no) EA-475,SIP-476,SP-477
Page pp.pp.315-320(EA), pp.315-320(SIP), pp.315-320(SP),
#Pages 6
Date of Issue 2017-02-22 (EA, SIP, SP)