Presentation 2022-06-17
Study of End-to-End Text-to-Speech that can seamlessly control speaker's individuality by Manipulating Speaker features
Naoki Aotani, Sunao Hara, Msanobu Abe,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this paper, we investigate an End-to-End speech synthesis scheme that enables to seamlessly control speaker individuality using speaker features that represent speaker’s specific voice quality and average voice height as auxiliary information. Speaker individuality is controlled as follows. To generate a control feature vector, after extracting speaker features from two speakers, the speaker features are overlap added with weight. In this paper, seamlessly controlling speaker individuality is referred to as the gradual change in speaker individuality by adjusting weights. The speaker features used in this study are vectors obtained by combining two types of information. The first is the x-vector used in conventional method to control voice quality. The second is an average value of the fundamental frequency of the target voice to control the average voice height. Here, x-vectors are extracted from speech that is modified to have the same mean F0 value, which result in minimizing interactions between x-vectors and fundamental frequency. Results of the evaluation experiments showed that speaker individuality can be seamlessly controlled by gradually changing the weights in the case of female-female speaker pair and male-male speaker pair. However, it was not true in the case of female-male speaker pair.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Speech synthesis / End-to-End / Deep Neural Network / speaker’s individuality / Voice Conversion
Paper # SP2022-14
Date of Issue 2022-06-10 (SP)

Conference Information
Committee SP / IPSJ-MUS / IPSJ-SLP
Conference Date 2022/6/17(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Online
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Tomoki Toda(Nagoya Univ.)
Vice Chair
Secretary (NTT) / (Univ. of Electro-Comm.)
Assistant Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo)

Paper Information
Registration To Technical Committee on Speech / Special Interest Group on Music and Computer / Special Interest Group on Spoken Language Processing
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Study of End-to-End Text-to-Speech that can seamlessly control speaker's individuality by Manipulating Speaker features
Sub Title (in English)
Keyword(1) Speech synthesis
Keyword(2) End-to-End
Keyword(3) Deep Neural Network
Keyword(4) speaker’s individuality
Keyword(5) Voice Conversion
1st Author's Name Naoki Aotani
1st Author's Affiliation Okayama University(Okayama Univ)
2nd Author's Name Sunao Hara
2nd Author's Affiliation Okayama University(Okayama Univ)
3rd Author's Name Msanobu Abe
3rd Author's Affiliation Okayama University(Okayama Univ)
Date 2022-06-17
Paper # SP2022-14
Volume (vol) vol.122
Number (no) SP-81
Page pp.pp.55-60(SP),
#Pages 6
Date of Issue 2022-06-10 (SP)