Presentation | 2010-01-22 Statistical sequence-to-frame mapping techniques for voice conversion Yu QIAO, Daisuke SAITO, Nobuaki MINEMATSU, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Voice conversion, a task to transform one speaker's voice to another's, can be regarded as a problem to find a mapping function between voice spaces of two speakers. GMM-based statistical mapping methods [1], [2] have been widely used for voice conversion. However, the classical GMM-based techniques make use of a frame-to-frame mapping function, which largely ignores the contextual information existing over a speech sequence and usually causes over-smoothness of converted speech. It is well known that HMM yields an efficient method to model the density of a whole speech sequence and has found successes in speech recognition and synthesis. Inspired by this fact, this paper studies how to use HMM for voice conversion. We derive an HMM-based sequence-to-frame mapping function with statistical analysis. Different from previous HMM-based voice conversion methods [3]~[5] that used forced alignment for segmentation and transform frames aligned to a state with its associated linear transformation, our method has a soft mapping function as a weighted summation of linear transformations. The weights are calculated as the HMM posterior probabilities of frames. We also propose and compare two methods to learn the parameters of our mapping functions, namely least square error estimation and maximum likelihood estimation. We carried out experiments to examine the proposed HMM-based method for voice conversion. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Voice conversion / linear regression / sequence-to-frame mapping / HMM |
Paper # | CQ2009-98,PRMU2009-197,SP2009-138,MVE2009-120 |
Date of Issue |
Conference Information | |
Committee | CQ |
---|---|
Conference Date | 2010/1/14(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Communication Quality (CQ) |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Statistical sequence-to-frame mapping techniques for voice conversion |
Sub Title (in English) | |
Keyword(1) | Voice conversion |
Keyword(2) | linear regression |
Keyword(3) | sequence-to-frame mapping |
Keyword(4) | HMM |
1st Author's Name | Yu QIAO |
1st Author's Affiliation | Grad. School of Info. Sci. and Tech., Univ. of Tokyo() |
2nd Author's Name | Daisuke SAITO |
2nd Author's Affiliation | Grad. School of Engineering, Univ. of Tokyo |
3rd Author's Name | Nobuaki MINEMATSU |
3rd Author's Affiliation | Grad. School of Info. Sci. and Tech., Univ. of Tokyo |
Date | 2010-01-22 |
Paper # | CQ2009-98,PRMU2009-197,SP2009-138,MVE2009-120 |
Volume (vol) | vol.109 |
Number (no) | 373 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |