Presentation 2012-11-08
Effects of speaker adaptive training on arbitrary speaker conversion based on tensor representation
Daisuke SAITO, Nobuaki MINEMATSU, Keikichi HIROSE,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this paper, speaker adaptive training techniques are introduced to tensor-based arbitrary speaker conversion. In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC), which is based on an eigenvoice Gaussian mixture model (EV-GMM), was proposed. Although the EVC can effectively construct the conversion model for arbitrary target speakers using only a few utterances, increase of the utterances used to construct the conversion model does not always improve the conversion performance. This is because the EV-GMM method has an inherent problem in representation of GMM supervectors. We previously proposed tensor-based speaker space as a solution for this problem, and realized more flexible control of speaker characteristics. In this paper, to aim larger improvement of the performance of VC, speaker adaptive training and tensor-based speaker representation are integrated. The proposed method can construct the flexible and precise conversion model, and experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Voice conversion / Gaussian mixture model / Tucker decomposition / speaker adaptive training
Paper # SP2012-72
Date of Issue

Conference Information
Committee SP
Conference Date 2012/11/1(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Speech (SP)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Effects of speaker adaptive training on arbitrary speaker conversion based on tensor representation
Sub Title (in English)
Keyword(1) Voice conversion
Keyword(2) Gaussian mixture model
Keyword(3) Tucker decomposition
Keyword(4) speaker adaptive training
1st Author's Name Daisuke SAITO
1st Author's Affiliation Graduate School of Information Science and Technology, The University of Tokyo()
2nd Author's Name Nobuaki MINEMATSU
2nd Author's Affiliation Graduate School of Engineering, The University of Tokyo
3rd Author's Name Keikichi HIROSE
3rd Author's Affiliation Graduate School of Information Science and Technology, The University of Tokyo
Date 2012-11-08
Paper # SP2012-72
Volume (vol) vol.112
Number (no) 281
Page pp.pp.-
#Pages 6
Date of Issue