統計的系列-フレーム写像に基づく音声変換(一般セッション,クロスモーダル)

喬 宇; 齋藤 大輔; 峯松 信明

Presentation	2010-01-22 Statistical sequence-to-frame mapping techniques for voice conversion Yu QIAO, Daisuke SAITO, Nobuaki MINEMATSU,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Voice conversion, a task to transform one speaker's voice to another's, can be regarded as a problem to find a mapping function between voice spaces of two speakers. GMM-based statistical mapping methods [1], [2] have been widely used for voice conversion. However, the classical GMM-based techniques make use of a frame-to-frame mapping function, which largely ignores the contextual information existing over a speech sequence and usually causes over-smoothness of converted speech. It is well known that HMM yields an efficient method to model the density of a whole speech sequence and has found successes in speech recognition and synthesis. Inspired by this fact, this paper studies how to use HMM for voice conversion. We derive an HMM-based sequence-to-frame mapping function with statistical analysis. Different from previous HMM-based voice conversion methods [3]～[5] that used forced alignment for segmentation and transform frames aligned to a state with its associated linear transformation, our method has a soft mapping function as a weighted summation of linear transformations. The weights are calculated as the HMM posterior probabilities of frames. We also propose and compare two methods to learn the parameters of our mapping functions, namely least square error estimation and maximum likelihood estimation. We carried out experiments to examine the proposed HMM-based method for voice conversion.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Voice conversion / linear regression / sequence-to-frame mapping / HMM
Paper #	CQ2009-98,PRMU2009-197,SP2009-138,MVE2009-120
Date of Issue

Conference Information
Committee	CQ
Conference Date	2010/1/14(1days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To	Communication Quality (CQ)
Language	ENG
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Statistical sequence-to-frame mapping techniques for voice conversion
Sub Title (in English)
Keyword(1)	Voice conversion
Keyword(2)	linear regression
Keyword(3)	sequence-to-frame mapping
Keyword(4)	HMM
1st Author's Name	Yu QIAO
1st Author's Affiliation	Grad. School of Info. Sci. and Tech., Univ. of Tokyo()
2nd Author's Name	Daisuke SAITO
2nd Author's Affiliation	Grad. School of Engineering, Univ. of Tokyo
3rd Author's Name	Nobuaki MINEMATSU
3rd Author's Affiliation	Grad. School of Info. Sci. and Tech., Univ. of Tokyo
Date	2010-01-22
Paper #	CQ2009-98,PRMU2009-197,SP2009-138,MVE2009-120
Volume (vol)	vol.109
Number (no)	373
Page	pp.pp.-
#Pages	6
Date of Issue