時間構造を考慮したニューラルネットワークに基づく音声合成における話者適応の検討

Presentation	2018-06-29 Speaker adaptation in speech synthesis based on neural networks including temporal structure modeling Kento Nakao, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	This paper proposes a speaker adaptation technique for speech synthesis based on deep neural networks (DNNs) using a structure of hidden semi-Markov models (HSMMs). Speaker adaptation techniques for DNN-based speech synthesis usually use the fixed time-alignments which are estimated by some external aligners. Therefore, acoustic features and temporal structuresof speech are separately adapted in speaker adaptation. To perform speaker adaptation considering temporal structures, a special type of mixture density network (MDN) called MDN-HSMM, which outputs parameters of HSMMs, is applied. Experimental results show that the proposed method improves the naturalness and speaker similarity of the synthesized speech from the speaker adaptation based on DNNs.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	statistical parametric speech synthesis / speakre adaptation / neural network / speaker code
Paper #	PRMU2018-31,SP2018-11
Date of Issue	2018-06-21 (PRMU, SP)

Conference Information
Committee	PRMU / SP
Conference Date	2018/6/28(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair	Shinichi Sato(NII) / Yoichi Yamashita(Ritsumeikan Univ.)
Vice Chair	Yoshihisa Ijiri(Omron) / Toru Tamaki(Hiroshima Univ.) / Akinobu Ri(Nagoya Inst. of Tech.)
Secretary	Yoshihisa Ijiri(NEC) / Toru Tamaki(Osaka Univ.) / Akinobu Ri(Kyoto Univ.)
Assistant	Go Irie(NTT) / Yoshitaka Ushiku(Univ. of Tokyo) / Tomoki Koriyama(Tokyo Inst. of Tech.) / Satoshi Kobashikawa(NTT)

Paper Information
Registration To	Technical Committee on Pattern Recognition and Media Understanding / Technical Committee on Speech
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Speaker adaptation in speech synthesis based on neural networks including temporal structure modeling
Sub Title (in English)
Keyword(1)	statistical parametric speech synthesis
Keyword(2)	speakre adaptation
Keyword(3)	neural network
Keyword(4)	speaker code
1st Author's Name	Kento Nakao
1st Author's Affiliation	Nagoya Institute of Technology(NIT)
2nd Author's Name	Kei Hashimoto
2nd Author's Affiliation	Nagoya Institute of Technology(NIT)
3rd Author's Name	Keiichiro Oura
3rd Author's Affiliation	Nagoya Institute of Technology(NIT)
4th Author's Name	Yoshihiko Nankaku
4th Author's Affiliation	Nagoya Institute of Technology(NIT)
5th Author's Name	Keiichi Tokuda
5th Author's Affiliation	Nagoya Institute of Technology(NIT)
Date	2018-06-29
Paper #	PRMU2018-31,SP2018-11
Volume (vol)	vol.118
Number (no)	PRMU-111,SP-112
Page	pp.pp.53-58(PRMU), pp.53-58(SP),
#Pages	6
Date of Issue	2018-06-21 (PRMU, SP)