F0量子化と非パラレル学習に基づく声質変換の評価(テーマセッション,クロスモーダル)

太田 悠平; 能勢 隆; 小林 隆夫

講演名	2010-01-21 F0量子化と非パラレル学習に基づく声質変換の評価(テーマセッション,クロスモーダル) 太田悠平, 能勢隆, 小林隆夫,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	コンテキスト依存HMMに基づく声質変換法の有効性を示すためにGMMに基づく声質変換法との比較を中心に,客観及び主観評価実験を行った結果を報告する.この手法では元話者の入力音声に対して音韻及び韻律の情報を抽出し,これらの情報に基づいてあらかじめ学習した目標話者の音響モデルから音声を生成することで声質変換を実現している.また韻律のうち,ピッチ情報を適切にモデル化するために,従来HMM音声合成で用いられていた手動ラベリングに基づくアクセント情報ではなく,学習データのF0値自体を粗く量子化したシンボルをコンテキストとして利用することで学習データに対する自動ラベリングが可能である.さらに従来提案されているGMMに基づく声質変換法では音素単位や複数の音素にわたる音響的特徴に含まれる話者の個人性を適切に変換することが困難であったのに対し,HMMに基づく手法では音韻・韻律に関わるコンテキスト依存モデルを利用することにより,このようなセグメンタル・スープラセグメンタル特徴も変換することができる.評価の結果,HMMに基づく手法を用いることにより,従来法よりも自然性が大幅に改善され,また話者性の変換においても従来を上回る結果が得られた.
抄録(英)	This paper describes the performance evaluation results of a context-dependent HMM-based voice conversion technique to show its effectiveness by comparing with a GMM-based one. In the HMM-based conversion, first we extract the phonetic and prosodic information from input speech of a source speaker. Then, converted synthetic speech is generated from the pre-trained acoustic model of a target speaker. To appropriately model the pitch information, we use a roughly quantized FO symbol sequence as the prosodic context instead of accent information obtained by manual labeling for training data. By using the phonetically and prosodically context-dependent HMMs, the speaker characteristics appearing in segmental and supra-segmental features can be also converted, which is difficult in conventional GMM-based techniques. Objective and subjective experimental results show that the naturalness and speaker individuality of converted speech are significantly improved by using HMM-based voice conversion.
キーワード(和)	声質変換 / HMM音声合成 / 韻律情報 / F0量子化 / GMM
キーワード(英)	voice conversion / HMM-based speech synthesis / prosodic information / F0 quantization / GMM
資料番号	CQ2009-60,PRMU2009-159,SP2009-100,MVE2009-82
発行日

研究会情報
研究会	PRMU
開催期間	2010/1/14(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Pattern Recognition and Media Understanding (PRMU)
本文の言語	JPN
タイトル（和）	F0量子化と非パラレル学習に基づく声質変換の評価(テーマセッション,クロスモーダル)
サブタイトル（和）
タイトル（英）	Performance evaluation of Voice Conversion Based on F0 Quantization and Non-parallel Training
サブタイトル（和）
キーワード(1)（和/英）	声質変換 / voice conversion
キーワード(2)（和/英）	HMM音声合成 / HMM-based speech synthesis
キーワード(3)（和/英）	韻律情報 / prosodic information
キーワード(4)（和/英）	F0量子化 / F0 quantization
キーワード(5)（和/英）	GMM / GMM
第 1 著者氏名（和/英）	太田悠平 / Yuhei OTA
第 1 著者所属（和/英）	東京工業大学大学院総合理工学研究科 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
第 2 著者氏名（和/英）	能勢隆 / Takashi NOSE
第 2 著者所属（和/英）	東京工業大学大学院総合理工学研究科 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
第 3 著者氏名（和/英）	小林隆夫 / Takao KOBAYASHI
第 3 著者所属（和/英）	東京工業大学大学院総合理工学研究科 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
発表年月日	2010-01-21
資料番号	CQ2009-60,PRMU2009-159,SP2009-100,MVE2009-82
巻番号（vol）	vol.109
号番号（no）	374
ページ範囲	pp.-
ページ数	6
発行日