Presentation | 2018-08-27 [Poster Presentation] An Experimental Study on Transforming the Emotion in Speech using GAN Kenji Yasuda, Ryohei Orihara, Yuichi Sei, Yasuyuki Tahara, Akihiko Ohsuga, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In domain transfer task deep learning has made it possible to generate more natural and highly accurate output. Especially with the advent of GAN(Generative Adversarial Network), learning of transfers between unspecified domains has become possible. Voice conversion is an example of domain transformation for speech. Voice conversion can be paraphrased as speaker domain transformation, where many studies have been done. However, few studies have focused on transformations other than speakers. When aiming at a more natural speech synthesis, it is necessary to study transformations other than speaker. Therefore, in this research, we use a model called CycleGAN to perform voice conversion on emotions. Especially, the acoustic feature to learn and convert combines fundamental frequency(F0) and Mel-Frequency Cepstrum Coefficients(MFCC). Also, the converter generated using training data, including multiple speakers. We selected "ANG(anger)", "HAP(happiness)", "SAD(sadness)" as conversion targets. As a result of evaluation experiments, the model performs well on conversion to "ANG" in female speakers. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Deep Learning / Domain Transfer / Generative Adversarial Network / Voice Conversion / Speech Processing |
Paper # | SP2018-26 |
Date of Issue | 2018-08-20 (SP) |
Conference Information | |
Committee | SP |
---|---|
Conference Date | 2018/8/27(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Kyoto Univ. |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Yoichi Yamashita(Ritsumeikan Univ.) |
Vice Chair | Akinobu Ri(Nagoya Inst. of Tech.) |
Secretary | Akinobu Ri(Meijo Univ.) |
Assistant | Satoshi Kobashikawa(NTT) / Tomoki Koriyama(Tokyo Inst. of Tech.) |
Paper Information | |
Registration To | Technical Committee on Speech |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | [Poster Presentation] An Experimental Study on Transforming the Emotion in Speech using GAN |
Sub Title (in English) | |
Keyword(1) | Deep Learning |
Keyword(2) | Domain Transfer |
Keyword(3) | Generative Adversarial Network |
Keyword(4) | Voice Conversion |
Keyword(5) | Speech Processing |
1st Author's Name | Kenji Yasuda |
1st Author's Affiliation | The University of Electro-Communications(UEC) |
2nd Author's Name | Ryohei Orihara |
2nd Author's Affiliation | The University of Electro-Communications(UEC) |
3rd Author's Name | Yuichi Sei |
3rd Author's Affiliation | The University of Electro-Communications(UEC) |
4th Author's Name | Yasuyuki Tahara |
4th Author's Affiliation | The University of Electro-Communications(UEC) |
5th Author's Name | Akihiko Ohsuga |
5th Author's Affiliation | The University of Electro-Communications(UEC) |
Date | 2018-08-27 |
Paper # | SP2018-26 |
Volume (vol) | vol.118 |
Number (no) | SP-198 |
Page | pp.pp.19-22(SP), |
#Pages | 4 |
Date of Issue | 2018-08-20 (SP) |