Presentation 2018-06-29
Speaker adaptation in speech synthesis based on neural networks including temporal structure modeling
Kento Nakao, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper proposes a speaker adaptation technique for speech synthesis based on deep neural networks (DNNs) using a structure of hidden semi-Markov models (HSMMs). Speaker adaptation techniques for DNN-based speech synthesis usually use the fixed time-alignments which are estimated by some external aligners. Therefore, acoustic features and temporal structuresof speech are separately adapted in speaker adaptation. To perform speaker adaptation considering temporal structures, a special type of mixture density network (MDN) called MDN-HSMM, which outputs parameters of HSMMs, is applied. Experimental results show that the proposed method improves the naturalness and speaker similarity of the synthesized speech from the speaker adaptation based on DNNs.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) statistical parametric speech synthesis / speakre adaptation / neural network / speaker code
Paper # PRMU2018-31,SP2018-11
Date of Issue 2018-06-21 (PRMU, SP)

Conference Information
Committee PRMU / SP
Conference Date 2018/6/28(2days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Shinichi Sato(NII) / Yoichi Yamashita(Ritsumeikan Univ.)
Vice Chair Yoshihisa Ijiri(Omron) / Toru Tamaki(Hiroshima Univ.) / Akinobu Ri(Nagoya Inst. of Tech.)
Secretary Yoshihisa Ijiri(NEC) / Toru Tamaki(Osaka Univ.) / Akinobu Ri(Kyoto Univ.)
Assistant Go Irie(NTT) / Yoshitaka Ushiku(Univ. of Tokyo) / Tomoki Koriyama(Tokyo Inst. of Tech.) / Satoshi Kobashikawa(NTT)

Paper Information
Registration To Technical Committee on Pattern Recognition and Media Understanding / Technical Committee on Speech
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Speaker adaptation in speech synthesis based on neural networks including temporal structure modeling
Sub Title (in English)
Keyword(1) statistical parametric speech synthesis
Keyword(2) speakre adaptation
Keyword(3) neural network
Keyword(4) speaker code
1st Author's Name Kento Nakao
1st Author's Affiliation Nagoya Institute of Technology(NIT)
2nd Author's Name Kei Hashimoto
2nd Author's Affiliation Nagoya Institute of Technology(NIT)
3rd Author's Name Keiichiro Oura
3rd Author's Affiliation Nagoya Institute of Technology(NIT)
4th Author's Name Yoshihiko Nankaku
4th Author's Affiliation Nagoya Institute of Technology(NIT)
5th Author's Name Keiichi Tokuda
5th Author's Affiliation Nagoya Institute of Technology(NIT)
Date 2018-06-29
Paper # PRMU2018-31,SP2018-11
Volume (vol) vol.118
Number (no) PRMU-111,SP-112
Page pp.pp.53-58(PRMU), pp.53-58(SP),
#Pages 6
Date of Issue 2018-06-21 (PRMU, SP)