Presentation 2017-06-22
Comparisons on Transplant Emotional Expressions in DNN-based TTS Synthesis
Katsuki Inoue, Sunao Hara, Masanobu Abe, Nobukatsu Hojo, Yusuke Ijima,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Recent studies have shown that DNN-based speech synthesis can generate more natural synthesized speech than the conventional HMM-based speech synthesis. There are some studies that the method of emotional transplantation in order to variegate synthesized speech in HMM-based speech synthesis. However, it is not revealed whether emotion can be transplanted in DNN-based speech synthesis. In this paper, we compare DNN architectures in order to transplant emotional expressions to improve expressiveness of DNN-based TTS synthesis. The following three kinds of DNN architectures are examined. (1) Parallel Model : an output layer consisted of both speaker dependent layers and emotion dependent layers. (2) Serial Model : an output layer consisted of emotion dependent layers preceded by speaker dependent layers. (3) Auxiliary Input Model : an input layer consisted of speaker ID and emotion ID as well as linguistic feature vectors. The DNNs were trained using neutral speech uttered by 24 speakers, and joyful speech and sad speech uttered by 3 speakers out of the 24 speakers. The DNNs were compared by the objective evaluation and the subjective evaluation. When synthesizing unseen emotion, evaluation results showed that Parallel Model is much better than Serial Model and is slightly better than Auxiliary Input Model. Also the test showed that Serial Model is the best of the three models when synthesizing seen emotion.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) speech synthesis / deep neural network / emotional transplantation / multi-task learning
Paper # PRMU2017-29,SP2017-5
Date of Issue 2017-06-15 (PRMU, SP)

Conference Information
Committee PRMU / SP
Conference Date 2017/6/22(2days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Shinichi Sato(NII) / Yoichi Yamashita(Ritsumeikan Univ.)
Vice Chair Hironobu Fujiyoshi(Chubu Univ.) / Yoshihisa Ijiri(Omron) / Hiroki Mori(Utsunomiya Univ.)
Secretary Hironobu Fujiyoshi(AIST) / Yoshihisa Ijiri(NAIST) / Hiroki Mori(Shizuoka Univ.)
Assistant Masato Ishii(NEC) / Yusuke Sugano(Osaka Univ.) / Kei Hashimoto(Nagoya Inst. of Tech.) / Satoshi Kobashikawa(NTT)

Paper Information
Registration To Technical Committee on Pattern Recognition and Media Understanding / Technical Committee on Speech
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Comparisons on Transplant Emotional Expressions in DNN-based TTS Synthesis
Sub Title (in English)
Keyword(1) speech synthesis
Keyword(2) deep neural network
Keyword(3) emotional transplantation
Keyword(4) multi-task learning
1st Author's Name Katsuki Inoue
1st Author's Affiliation Okayama University(Okayama Univ.)
2nd Author's Name Sunao Hara
2nd Author's Affiliation Okayama University(Okayama Univ.)
3rd Author's Name Masanobu Abe
3rd Author's Affiliation Okayama University(Okayama Univ.)
4th Author's Name Nobukatsu Hojo
4th Author's Affiliation Nippon Telegraph and Telephone Corporation(NTT)
5th Author's Name Yusuke Ijima
5th Author's Affiliation Nippon Telegraph and Telephone Corporation(NTT)
Date 2017-06-22
Paper # PRMU2017-29,SP2017-5
Volume (vol) vol.117
Number (no) PRMU-105,SP-106
Page pp.pp.23-28(PRMU), pp.23-28(SP),
#Pages 6
Date of Issue 2017-06-15 (PRMU, SP)