話者依存型Conditional Restricted Boltzmann Machineによる声質変換(声質変換,第15回音声言語シンポジウム)

中鹿 亘; 滝口 哲也; 有木 康雄

講演名	2013-12-19 話者依存型Conditional Restricted Boltzmann Machineによる声質変換(声質変換,第15回音声言語シンポジウム) 中鹿亘, 滝口哲也, 有木康雄,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	本研究では,元の音響特徴量空間よりも音韻性や時間変化性を抑え,話者性を強調させることによって,より入力話者音声の声質を出力話者のものへと変換しやすい話者依存空間を形成することを目的として,話者ごとにconditional restricted Boltzmann machine(CRBM)を用いた声質変換法を提案する.提案手法ではまず初めに,話者ごとに用意した学習データ(パラレルデータである必要は無い)を用いて,入力話者,出力話者のCRBMを独立に学習させる.次に,少量のパラレルデータの音響特徴量を,それぞれのCRBMを通して話者依存高次元空間へ写像(CRBMの前方推論)し,その高次特徴量同士をNeural Network(NN)を用いて変換させる.NNの変換で得られた特徴量は,CRBMの後方推論によって元の音響特徴量へ逆変換することが可能である.評価実験では,従来のGMMやNN,DBNを用いた声質変換法に比べて,主観的にも客観的にも良い精度が得られたことを確認した.
抄録(英)	In this paper, we present a voice conversion (VC) method that utilizes conditional restricted Boltzmann machines (CRBMs) for each speaker to obtain time-invariant speaker-independent spaces where voice features are converted more easily than those in an original acoustic feature space. First, we train two CRBMs for a source and target speaker independently using speaker-dependent training data (without the need to parallelize the training data). Then, a small number of parallel data are fed into each CRBM and the high-order features produced by the CRBMs are used to train a concatenating neural network (NN) between the two CRBMs. Finally, the entire network (the two CRBMs and the NN) is fine-tuned using the acoustic parallel data. Through voice-conversion experiments, we confirmed the high performance of our method in terms of objective and subjective evaluations, comparing it with conventional GMM, NN, and speaker-dependent DBN approaches.
キーワード(和)	声質変換 / conditional restricted Boltzmann machine / deep learning / 話者強調
キーワード(英)	Voice conversion / conditional restricted Boltzmann machine / deep learning / speaker specific features
資料番号	SP2013-88
発行日

研究会情報
研究会	SP
開催期間	2013/12/12(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Speech (SP)
本文の言語	JPN
タイトル（和）	話者依存型Conditional Restricted Boltzmann Machineによる声質変換(声質変換,第15回音声言語シンポジウム)
サブタイトル（和）
タイトル（英）	Speaker-dependent conditionl restricted Boltzmann machine for voice conversion
サブタイトル（和）
キーワード(1)（和/英）	声質変換 / Voice conversion
キーワード(2)（和/英）	conditional restricted Boltzmann machine / conditional restricted Boltzmann machine
キーワード(3)（和/英）	deep learning / deep learning
キーワード(4)（和/英）	話者強調 / speaker specific features
第 1 著者氏名（和/英）	中鹿亘 / Toru NAKASHIKA
第 1 著者所属（和/英）	神戸大学大学院システム情報学研究科 Graduate School of System Informatics, Kobe University
第 2 著者氏名（和/英）	滝口哲也 / Tetsuya TAKIGUCHI
第 2 著者所属（和/英）	神戸大学自然科学系先端融合研究環 Organization of Advanced Science and Technology, Kobe University
第 3 著者氏名（和/英）	有木康雄 / Yasuo ARIKI
第 3 著者所属（和/英）	神戸大学自然科学系先端融合研究環 Organization of Advanced Science and Technology, Kobe University
発表年月日	2013-12-19
資料番号	SP2013-88
巻番号（vol）	vol.113
号番号（no）	366
ページ範囲	pp.-
ページ数	6
発行日