Presentation | 2022-06-17 Study of End-to-End Text-to-Speech that can seamlessly control speaker's individuality by Manipulating Speaker features Naoki Aotani, Sunao Hara, Msanobu Abe, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In this paper, we investigate an End-to-End speech synthesis scheme that enables to seamlessly control speaker individuality using speaker features that represent speaker’s specific voice quality and average voice height as auxiliary information. Speaker individuality is controlled as follows. To generate a control feature vector, after extracting speaker features from two speakers, the speaker features are overlap added with weight. In this paper, seamlessly controlling speaker individuality is referred to as the gradual change in speaker individuality by adjusting weights. The speaker features used in this study are vectors obtained by combining two types of information. The first is the x-vector used in conventional method to control voice quality. The second is an average value of the fundamental frequency of the target voice to control the average voice height. Here, x-vectors are extracted from speech that is modified to have the same mean F0 value, which result in minimizing interactions between x-vectors and fundamental frequency. Results of the evaluation experiments showed that speaker individuality can be seamlessly controlled by gradually changing the weights in the case of female-female speaker pair and male-male speaker pair. However, it was not true in the case of female-male speaker pair. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Speech synthesis / End-to-End / Deep Neural Network / speaker’s individuality / Voice Conversion |
Paper # | SP2022-14 |
Date of Issue | 2022-06-10 (SP) |
Conference Information | |
Committee | SP / IPSJ-MUS / IPSJ-SLP |
---|---|
Conference Date | 2022/6/17(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Online |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Tomoki Toda(Nagoya Univ.) |
Vice Chair | |
Secretary | (NTT) / (Univ. of Electro-Comm.) |
Assistant | Ryo Aihara(Mitsubishi Electric) / Daisuke Saito(Univ. of Tokyo) |
Paper Information | |
Registration To | Technical Committee on Speech / Special Interest Group on Music and Computer / Special Interest Group on Spoken Language Processing |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Study of End-to-End Text-to-Speech that can seamlessly control speaker's individuality by Manipulating Speaker features |
Sub Title (in English) | |
Keyword(1) | Speech synthesis |
Keyword(2) | End-to-End |
Keyword(3) | Deep Neural Network |
Keyword(4) | speaker’s individuality |
Keyword(5) | Voice Conversion |
1st Author's Name | Naoki Aotani |
1st Author's Affiliation | Okayama University(Okayama Univ) |
2nd Author's Name | Sunao Hara |
2nd Author's Affiliation | Okayama University(Okayama Univ) |
3rd Author's Name | Msanobu Abe |
3rd Author's Affiliation | Okayama University(Okayama Univ) |
Date | 2022-06-17 |
Paper # | SP2022-14 |
Volume (vol) | vol.122 |
Number (no) | SP-81 |
Page | pp.pp.55-60(SP), |
#Pages | 6 |
Date of Issue | 2022-06-10 (SP) |