Presentation | 2018-03-19 Non-parallel and Many-to-Many Voice Conversion Using Variational Autoencoder Conditioned by Phonetic Posteriorgrams and d-vectors Yuki Saito, Yusuke Ijima, Kyosuke Nishida, Shinnosuke Takamichi, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | This paper proposes novel frameworks for non-parallel and many-to-many voice conversion (VC) using variational autoencoders (VAEs). In conventional VAE-based VC, converted speech quality is significantly degraded due to an over-regularization of latent variables representing phonetic contents. To overcome the issue, this paper proposes a VAE-based non-parallel VC conditioned by not only the speaker codes but also phonetic posteriorgrams (PPGs) predicted from pre-trained speech recognition models. This paper also extends the conventional VC to many-to-many VC that can convert arbitrary speakers’ characteristics into another ones. We compare two methods to realize this: 1) speaker code adaptation, and 2) the use of $d$-vectors obtained by using pre-trained speaker verification models. Experimental results demonstrate that 1) PPGs successfully improve converted speech quality, and 2) both speaker codes and $d$-vectors can be adopted to the VAE-based non-parallel and many-to-many VC. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | non-parallel voice conversion / many-to-many voice conversion / variational autoencoders / phonetic posteriorgrams / d-vectors |
Paper # | EA2017-105,SIP2017-114,SP2017-88 |
Date of Issue | 2018-03-12 (EA, SIP, SP) |
Conference Information | |
Committee | SIP / EA / SP / MI |
---|---|
Conference Date | 2018/3/19(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Speech, Engineering/Electro Acoustics, Signal Processing, and Related Topics [SIP, EA, SP]/ Medical Image Engineering, Analysis, Recognition, etc. [MI] |
Chair | Masahiro Okuda(Univ. of Kitakyushu) / Suehiro Shimauchi(NTT) / Yoichi Yamashita(Ritsumeikan Univ.) / Kensaku Mori(Nagoya Univ.) |
Vice Chair | Shogo Muramatsu(Niigata Univ.) / Naoyuki Aikawa(TUS) / Mitsunori Mizumachi(Kyutech) / Hiroki Mori(Utsunomiya Univ.) / Yoshiki Kawata(Tokushima Univ.) / Yuichi Kimura(Kinki Univ.) |
Secretary | Shogo Muramatsu(Chiba Inst. of Tech.) / Naoyuki Aikawa(Takushoku Univ.) / Mitsunori Mizumachi(Akita Pref. Univ.) / Hiroki Mori(Shizuoka Inst. of Science and Tech.) / Yoshiki Kawata(Shizuoka Univ.) / Yuichi Kimura(Meijo Univ.) |
Assistant | Masayoshi Nakamoto(Hiroshima Univ.ひろ) / TREVINO Jorge(Tohoku Univ.) / Nobutaka Ito(NTT) / Kei Hashimoto(Nagoya Inst. of Tech.) / Satoshi Kobashikawa(NTT) / Ryo Haraguchi(Univ. of Hyogo) / Yasushi Hirano(Yamaguchi Univ.) |
Paper Information | |
Registration To | Technical Committee on Signal Processing / Technical Committee on Engineering Acoustics / Technical Committee on Speech / Technical Committee on Medical Imaging |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Non-parallel and Many-to-Many Voice Conversion Using Variational Autoencoder Conditioned by Phonetic Posteriorgrams and d-vectors |
Sub Title (in English) | |
Keyword(1) | non-parallel voice conversion |
Keyword(2) | many-to-many voice conversion |
Keyword(3) | variational autoencoders |
Keyword(4) | phonetic posteriorgrams |
Keyword(5) | d-vectors |
1st Author's Name | Yuki Saito |
1st Author's Affiliation | NTT Media Intelligence Laboratories, NTT Corporation/The University of Tokyo(NTT/Univ. of Tokyo) |
2nd Author's Name | Yusuke Ijima |
2nd Author's Affiliation | NTT Media Intelligence Laboratories, NTT Corporation(NTT) |
3rd Author's Name | Kyosuke Nishida |
3rd Author's Affiliation | NTT Media Intelligence Laboratories, NTT Corporation(NTT) |
4th Author's Name | Shinnosuke Takamichi |
4th Author's Affiliation | The University of Tokyo(Univ. of Tokyo) |
Date | 2018-03-19 |
Paper # | EA2017-105,SIP2017-114,SP2017-88 |
Volume (vol) | vol.117 |
Number (no) | EA-515,SIP-516,SP-517 |
Page | pp.pp.21-26(EA), pp.21-26(SIP), pp.21-26(SP), |
#Pages | 6 |
Date of Issue | 2018-03-12 (EA, SIP, SP) |