話者特徴量の操作によりシームレスに話者性を制御できるEnd-to-End音声合成方式の検討

青谷 直樹; 原 直; 阿部 匡伸

Presentation	2022-06-17 Study of End-to-End Text-to-Speech that can seamlessly control speaker's individuality by Manipulating Speaker features Naoki Aotani, Sunao Hara, Msanobu Abe,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In this paper, we investigate an End-to-End speech synthesis scheme that enables to seamlessly control speaker individuality using speaker features that represent speaker’s specific voice quality and average voice height as auxiliary information. Speaker individuality is controlled as follows. To generate a control feature vector, after extracting speaker features from two speakers, the speaker features are overlap added with weight. In this paper, seamlessly controlling speaker individuality is referred to as the gradual change in speaker individuality by adjusting weights. The speaker features used in this study are vectors obtained by combining two types of information. The first is the x-vector used in conventional method to control voice quality. The second is an average value of the fundamental frequency of the target voice to control the average voice height. Here, x-vectors are extracted from speech that is modified to have the same mean F0 value, which result in minimizing interactions between x-vectors and fundamental frequency. Results of the evaluation experiments showed that speaker individuality can be seamlessly controlled by gradually changing the weights in the case of female-female speaker pair and male-male speaker pair. However, it was not true in the case of female-male speaker pair.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Speech synthesis / End-to-End / Deep Neural Network / speaker’s individuality / Voice Conversion
Paper #	SP2022-14
Date of Issue	2022-06-10 (SP)

Conference Information
Committee	SP / IPSJ-MUS / IPSJ-SLP
Conference Date	2022/6/17(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Online
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair	Tomoki Toda(Nagoya Univ.)
Vice Chair
Secretary	(NTT) / (Univ. of Electro-Comm.)
Assistant	Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo)

Paper Information
Registration To	Technical Committee on Speech / Special Interest Group on Music and Computer / Special Interest Group on Spoken Language Processing
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Study of End-to-End Text-to-Speech that can seamlessly control speaker's individuality by Manipulating Speaker features
Sub Title (in English)
Keyword(1)	Speech synthesis
Keyword(2)	End-to-End
Keyword(3)	Deep Neural Network
Keyword(4)	speaker’s individuality
Keyword(5)	Voice Conversion
1st Author's Name	Naoki Aotani
1st Author's Affiliation	Okayama University(Okayama Univ)
2nd Author's Name	Sunao Hara
2nd Author's Affiliation	Okayama University(Okayama Univ)
3rd Author's Name	Msanobu Abe
3rd Author's Affiliation	Okayama University(Okayama Univ)
Date	2022-06-17
Paper #	SP2022-14
Volume (vol)	vol.122
Number (no)	SP-81
Page	pp.pp.55-60(SP),
#Pages	6
Date of Issue	2022-06-10 (SP)