Presentation | 2017-06-22 Comparisons on Transplant Emotional Expressions in DNN-based TTS Synthesis Katsuki Inoue, Sunao Hara, Masanobu Abe, Nobukatsu Hojo, Yusuke Ijima, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Recent studies have shown that DNN-based speech synthesis can generate more natural synthesized speech than the conventional HMM-based speech synthesis. There are some studies that the method of emotional transplantation in order to variegate synthesized speech in HMM-based speech synthesis. However, it is not revealed whether emotion can be transplanted in DNN-based speech synthesis. In this paper, we compare DNN architectures in order to transplant emotional expressions to improve expressiveness of DNN-based TTS synthesis. The following three kinds of DNN architectures are examined. (1) Parallel Model : an output layer consisted of both speaker dependent layers and emotion dependent layers. (2) Serial Model : an output layer consisted of emotion dependent layers preceded by speaker dependent layers. (3) Auxiliary Input Model : an input layer consisted of speaker ID and emotion ID as well as linguistic feature vectors. The DNNs were trained using neutral speech uttered by 24 speakers, and joyful speech and sad speech uttered by 3 speakers out of the 24 speakers. The DNNs were compared by the objective evaluation and the subjective evaluation. When synthesizing unseen emotion, evaluation results showed that Parallel Model is much better than Serial Model and is slightly better than Auxiliary Input Model. Also the test showed that Serial Model is the best of the three models when synthesizing seen emotion. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | speech synthesis / deep neural network / emotional transplantation / multi-task learning |
Paper # | PRMU2017-29,SP2017-5 |
Date of Issue | 2017-06-15 (PRMU, SP) |
Conference Information | |
Committee | PRMU / SP |
---|---|
Conference Date | 2017/6/22(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Shinichi Sato(NII) / Yoichi Yamashita(Ritsumeikan Univ.) |
Vice Chair | Hironobu Fujiyoshi(Chubu Univ.) / Yoshihisa Ijiri(Omron) / Hiroki Mori(Utsunomiya Univ.) |
Secretary | Hironobu Fujiyoshi(AIST) / Yoshihisa Ijiri(NAIST) / Hiroki Mori(Shizuoka Univ.) |
Assistant | Masato Ishii(NEC) / Yusuke Sugano(Osaka Univ.) / Kei Hashimoto(Nagoya Inst. of Tech.) / Satoshi Kobashikawa(NTT) |
Paper Information | |
Registration To | Technical Committee on Pattern Recognition and Media Understanding / Technical Committee on Speech |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Comparisons on Transplant Emotional Expressions in DNN-based TTS Synthesis |
Sub Title (in English) | |
Keyword(1) | speech synthesis |
Keyword(2) | deep neural network |
Keyword(3) | emotional transplantation |
Keyword(4) | multi-task learning |
1st Author's Name | Katsuki Inoue |
1st Author's Affiliation | Okayama University(Okayama Univ.) |
2nd Author's Name | Sunao Hara |
2nd Author's Affiliation | Okayama University(Okayama Univ.) |
3rd Author's Name | Masanobu Abe |
3rd Author's Affiliation | Okayama University(Okayama Univ.) |
4th Author's Name | Nobukatsu Hojo |
4th Author's Affiliation | Nippon Telegraph and Telephone Corporation(NTT) |
5th Author's Name | Yusuke Ijima |
5th Author's Affiliation | Nippon Telegraph and Telephone Corporation(NTT) |
Date | 2017-06-22 |
Paper # | PRMU2017-29,SP2017-5 |
Volume (vol) | vol.117 |
Number (no) | PRMU-105,SP-106 |
Page | pp.pp.23-28(PRMU), pp.23-28(SP), |
#Pages | 6 |
Date of Issue | 2017-06-15 (PRMU, SP) |