スペクトル系列の最尤推定に基づく短遅延声質変換法

村松 敬司; 大谷 大和; 戸田 智基; 猿渡 洋; 鹿野 清宏

講演名	2009-01-30 スペクトル系列の最尤推定に基づく短遅延声質変換法村松敬司, 大谷大和, 戸田智基, 猿渡洋, 鹿野清宏,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	本稿では,スペクトル変換手法,及びスペクトル変換後の後処理について検討し,高品質かつリアルタイムなVCの実現を目指す.混合正規分布モデルに基づく声質変換手法において,代表的な2つのスペクトル変換手法:1)フレーム毎に変換を行う最小平均自乗誤差変換,及び2)一発話の系列を同時に変換する最尤スペクトル系列変換が提案されている.前者はリアルタイム変換が可能であるが,不自然なスペクトル遷移が生じる可能性がある.一方後者はスペクトルの動的特徴量を考慮する事により変換性能が高いが,リアルタイム変換が出来ない.動的特徴量を考慮したリアルタイム変換を実現するため,最尤スペクトル系列変換に対して時間再帰アルゴリズムを適用する.また,統計量に基づくVCでは,統計処理によってスペクトルの過剰な平滑化が生じてしまう.これに対してスペクトルの系列内変動(GV)を考慮した最尤特徴量変換法が提案されているが,短遅延変換に適用するのは困難である.本稿では,GVを考慮したポストフィルタを短遅延処理に適用する手法を提案する.実験的評価により,提案手法の有効性を示す.
抄録(英)	In this paper, we aim to achieve high-quality and real-time VC considering spectral conversion method and post-processing of spectral conversion. As typical voice conversion methods, two spectral conversion processes have been proposed: 1) the frame-based conversion that converts spectral parameters frame by frame and 2) the trajectory-based conversion that converts all spectral parameters over an utterance simultaneously. The former process is capable of real-time conversion but it sometimes causes inappropriate spectral movements. On the other hand, the latter process provides the converted spectral parameters exhibiting proper dynamic characteristics but it isn't capable of real-time conversion. To realize the real-time conversion process considering spectral dynamic characteristics, we propose a time-recursive conversion algorithm based on maximum likelihood estimation of spectral parameter trajectory. And, the converted trajectory is often excessively smoothed due to the statistical processing. Although the maximum likelihood feature conversion method which considers global variance (GV) is proposed, it is complicated to apply to the low-delay conversion. In this paper, we propose a technique using post-filter which considers GV. Experimental results show that the proposed methods are effective.
キーワード(和)	音声合成 / 声質変換 / 混合正規分布モデル / 最尤スペクトル系列変換 / 短遅延処理
キーワード(英)	speech synthesis / voice conversion / Gaussian mixture model / maximum likelihood estimation / low-delay processing
資料番号	SP2008-141
発行日

研究会情報
研究会	SP
開催期間	2009/1/22(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Speech (SP)
本文の言語	JPN
タイトル（和）	スペクトル系列の最尤推定に基づく短遅延声質変換法
サブタイトル（和）
タイトル（英）	Low-delay voice conversion algorithm based on maximum likelihood estimation of spectral parameter trajectory
サブタイトル（和）
キーワード(1)（和/英）	音声合成 / speech synthesis
キーワード(2)（和/英）	声質変換 / voice conversion
キーワード(3)（和/英）	混合正規分布モデル / Gaussian mixture model
キーワード(4)（和/英）	最尤スペクトル系列変換 / maximum likelihood estimation
キーワード(5)（和/英）	短遅延処理 / low-delay processing
第 1 著者氏名（和/英）	村松敬司 / Takashi MURAMATSU
第 1 著者所属（和/英）	奈良先端科学技術大学院大学 Nara Institute of Science and Technology
第 2 著者氏名（和/英）	大谷大和 / Yamato OHTANI
第 2 著者所属（和/英）	奈良先端科学技術大学院大学 Nara Institute of Science and Technology
第 3 著者氏名（和/英）	戸田智基 / Tomoki TODA
第 3 著者所属（和/英）	奈良先端科学技術大学院大学 Nara Institute of Science and Technology
第 4 著者氏名（和/英）	猿渡洋 / Hiroshi SARUWATARI
第 4 著者所属（和/英）	奈良先端科学技術大学院大学 Nara Institute of Science and Technology
第 5 著者氏名（和/英）	鹿野清宏 / Kiyohiro SHIKANO
第 5 著者所属（和/英）	奈良先端科学技術大学院大学 Nara Institute of Science and Technology
発表年月日	2009-01-30
資料番号	SP2008-141
巻番号（vol）	vol.108
号番号（no）	422
ページ範囲	pp.-
ページ数	6
発行日