Presentation | 2018-06-29 Speaker adaptation in speech synthesis based on neural networks including temporal structure modeling Kento Nakao, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | This paper proposes a speaker adaptation technique for speech synthesis based on deep neural networks (DNNs) using a structure of hidden semi-Markov models (HSMMs). Speaker adaptation techniques for DNN-based speech synthesis usually use the fixed time-alignments which are estimated by some external aligners. Therefore, acoustic features and temporal structuresof speech are separately adapted in speaker adaptation. To perform speaker adaptation considering temporal structures, a special type of mixture density network (MDN) called MDN-HSMM, which outputs parameters of HSMMs, is applied. Experimental results show that the proposed method improves the naturalness and speaker similarity of the synthesized speech from the speaker adaptation based on DNNs. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | statistical parametric speech synthesis / speakre adaptation / neural network / speaker code |
Paper # | PRMU2018-31,SP2018-11 |
Date of Issue | 2018-06-21 (PRMU, SP) |
Conference Information | |
Committee | PRMU / SP |
---|---|
Conference Date | 2018/6/28(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Shinichi Sato(NII) / Yoichi Yamashita(Ritsumeikan Univ.) |
Vice Chair | Yoshihisa Ijiri(Omron) / Toru Tamaki(Hiroshima Univ.) / Akinobu Ri(Nagoya Inst. of Tech.) |
Secretary | Yoshihisa Ijiri(NEC) / Toru Tamaki(Osaka Univ.) / Akinobu Ri(Kyoto Univ.) |
Assistant | Go Irie(NTT) / Yoshitaka Ushiku(Univ. of Tokyo) / Tomoki Koriyama(Tokyo Inst. of Tech.) / Satoshi Kobashikawa(NTT) |
Paper Information | |
Registration To | Technical Committee on Pattern Recognition and Media Understanding / Technical Committee on Speech |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Speaker adaptation in speech synthesis based on neural networks including temporal structure modeling |
Sub Title (in English) | |
Keyword(1) | statistical parametric speech synthesis |
Keyword(2) | speakre adaptation |
Keyword(3) | neural network |
Keyword(4) | speaker code |
1st Author's Name | Kento Nakao |
1st Author's Affiliation | Nagoya Institute of Technology(NIT) |
2nd Author's Name | Kei Hashimoto |
2nd Author's Affiliation | Nagoya Institute of Technology(NIT) |
3rd Author's Name | Keiichiro Oura |
3rd Author's Affiliation | Nagoya Institute of Technology(NIT) |
4th Author's Name | Yoshihiko Nankaku |
4th Author's Affiliation | Nagoya Institute of Technology(NIT) |
5th Author's Name | Keiichi Tokuda |
5th Author's Affiliation | Nagoya Institute of Technology(NIT) |
Date | 2018-06-29 |
Paper # | PRMU2018-31,SP2018-11 |
Volume (vol) | vol.118 |
Number (no) | PRMU-111,SP-112 |
Page | pp.pp.53-58(PRMU), pp.53-58(SP), |
#Pages | 6 |
Date of Issue | 2018-06-21 (PRMU, SP) |