Presentation 2010-01-22
Statistical sequence-to-frame mapping techniques for voice conversion
Yu QIAO, Daisuke SAITO, Nobuaki MINEMATSU,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Voice conversion, a task to transform one speaker's voice to another's, can be regarded as a problem to find a mapping function between voice spaces of two speakers. GMM-based statistical mapping methods [1], [2] have been widely used for voice conversion. However, the classical GMM-based techniques make use of a frame-to-frame mapping function, which largely ignores the contextual information existing over a speech sequence and usually causes over-smoothness of converted speech. It is well known that HMM yields an efficient method to model the density of a whole speech sequence and has found successes in speech recognition and synthesis. Inspired by this fact, this paper studies how to use HMM for voice conversion. We derive an HMM-based sequence-to-frame mapping function with statistical analysis. Different from previous HMM-based voice conversion methods [3]~[5] that used forced alignment for segmentation and transform frames aligned to a state with its associated linear transformation, our method has a soft mapping function as a weighted summation of linear transformations. The weights are calculated as the HMM posterior probabilities of frames. We also propose and compare two methods to learn the parameters of our mapping functions, namely least square error estimation and maximum likelihood estimation. We carried out experiments to examine the proposed HMM-based method for voice conversion.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Voice conversion / linear regression / sequence-to-frame mapping / HMM
Paper # CQ2009-98,PRMU2009-197,SP2009-138,MVE2009-120
Date of Issue

Conference Information
Committee CQ
Conference Date 2010/1/14(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Communication Quality (CQ)
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Statistical sequence-to-frame mapping techniques for voice conversion
Sub Title (in English)
Keyword(1) Voice conversion
Keyword(2) linear regression
Keyword(3) sequence-to-frame mapping
Keyword(4) HMM
1st Author's Name Yu QIAO
1st Author's Affiliation Grad. School of Info. Sci. and Tech., Univ. of Tokyo()
2nd Author's Name Daisuke SAITO
2nd Author's Affiliation Grad. School of Engineering, Univ. of Tokyo
3rd Author's Name Nobuaki MINEMATSU
3rd Author's Affiliation Grad. School of Info. Sci. and Tech., Univ. of Tokyo
Date 2010-01-22
Paper # CQ2009-98,PRMU2009-197,SP2009-138,MVE2009-120
Volume (vol) vol.109
Number (no) 373
Page pp.pp.-
#Pages 6
Date of Issue