局所的な句構造の情報を用いたニューラル音声合成

海木 延佳; サクティ サクリアニ; 中村 哲

講演名	2021-06-19 局所的な句構造の情報を用いたニューラル音声合成海木延佳(奈良先端大), サクティサクリアニ(奈良先端大), 中村哲(奈良先端大),
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	自然な韻律をもつ日本語音声を合成するため、局所的な句構造に基づくフレーズ成分を表す韻律記号をend-to-end音声合成に新たに導入すること提案する．本稿では、フレーズ成分を表現するために、１）句境界に係り受けの深さを表す韻律記号を追加するモデルと、２）韻律生成制御機構に基づき、フレーズ成分とアクセント成分の重畳型モデルを反映させた韻律記号を採用するの２つのモデルを提案する．この２つのモデルを用いた音声合成により，右枝分かれ境界において、１）フレーズ境界を示すポーズが生成されること．２）F0のフレーズ成分の立て直しが生じることが観察された．アクセント成分のみの韻律記号を用いた従来モデルに対し、これら２つの提案モデルの効果を検証するため対比較の聴取実験を行った．この結果、日本語end-to-end音声合成に文の局所的な句境界の情報や、韻律の生成モデルを取り入れることにより、発話者の意図をより正しく反映した自然な韻律を持つ合成音声が生成できることが確認された．
抄録(英)	In order to synthesize Japanese speech with natural prosody, we introduce an end-to-end TTS with new prosodic symbol representing phrase components based on local phrase dependency structures to end-to-end text-to-speech synthesis (TTS). In this paper, we propose two TTS models: 1) a model with prosodic symbols that represent the depth at phrase boundaries, and 2) a model with prosodic symbols that reflects a folded model of phrase and accent components based on a prosodic generation control mechanism. In synthesized speech at left-branching boundary using these two models, 1) pause indicating the phrase boundary is generated. 2) the re-rebuilding phrase component of F0 may occur. To verify the effect of these two proposed models on a conventional model using prosodic symbols using only accent components, we conducted a subjective evaluation on the AB test. As a result, it was confirmed that by using local phrase boundary information of sentences and prosodic generation model in Japanese end-to-end text-to-speech synthesis, synthetic speech with more natural prosody that reflects the intention of the utterance could be generated.
キーワード(和)	ニューラルend-to-endテキスト音声合成 / 局所的な句構造 / 韻律記号
キーワード(英)	Neural end-to-end text-to-speech speech synthesis / Local phrase dependency structure / Prosodic symbol
資料番号	SP2021-23
発行日	2021-06-11 (SP)

研究会情報
研究会	SP / IPSJ-SLP / IPSJ-MUS
開催期間	2021/6/18(から2日開催)
開催地（和）	オンライン開催
開催地（英）	Online
テーマ（和）	音学シンポジウム2021
テーマ（英）	OTOGAKU Symposium 2021
委員長氏名（和）	河井恒(NICT) / 北岡教英(豊橋技科大) / 竹川佳成(はこだて未来大)
委員長氏名（英）	Hisashi Kawai(NICT) / 北岡教英(豊橋技科大) / 竹川佳成(はこだて未来大)
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）	高道慎之介(東大) / 小川哲司(早大) / 秋田祐哉(京大) / 太刀岡勇気(デンソー) / 高島遼一(神戸大) / 高道慎之介(東大) / 森勢将雅(明治大) / 松原正樹(筑波大) / 糸山克寿(東工大) / 深山覚(産総研) / 大石康智(NTT) / 平田圭二(はこだて未来大)
幹事氏名（英）	Shinnosuke Takamichi(Univ. of Tokyo) / Tetsuji Ogawa(Waseda Univ.) / 秋田祐哉(京大) / 太刀岡勇気(デンソー) / 高島遼一(神戸大) / 高道慎之介(東大) / 森勢将雅(明治大) / 松原正樹(筑波大) / 糸山克寿(東工大) / 深山覚(産総研) / 大石康智(NTT) / 平田圭二(はこだて未来大)
幹事補佐氏名（和）	井島勇祐(NTT)
幹事補佐氏名（英）	Yusuke Ijima(NTT)

講演論文情報詳細
申込み研究会	Technical Committee on Speech / Special Interest Group on Spoken Language Processing / Special Interest Group on Music and Computer
本文の言語	JPN
タイトル（和）	局所的な句構造の情報を用いたニューラル音声合成
サブタイトル（和）
タイトル（英）	Neural speech synthesis using local phrase dependency structure information
サブタイトル（和）
キーワード(1)（和/英）	ニューラルend-to-endテキスト音声合成 / Neural end-to-end text-to-speech speech synthesis
キーワード(2)（和/英）	局所的な句構造 / Local phrase dependency structure
キーワード(3)（和/英）	韻律記号 / Prosodic symbol
第 1 著者氏名（和/英）	海木延佳 / Nobuyoshi Kaiki
第 1 著者所属（和/英）	奈良先端科学技術大学院大学(略称：奈良先端大) Nara Institute of science and Technology(略称：NIST)
第 2 著者氏名（和/英）	サクティサクリアニ / Sakriani Sakti
第 2 著者所属（和/英）	奈良先端科学技術大学院大学(略称：奈良先端大) Nara Institute of science and Technology(略称：NIST)
第 3 著者氏名（和/英）	中村哲 / Satoshi Nakamura
第 3 著者所属（和/英）	奈良先端科学技術大学院大学(略称：奈良先端大) Nara Institute of science and Technology(略称：NIST)
発表年月日	2021-06-19
資料番号	SP2021-23
巻番号（vol）	vol.121
号番号（no）	SP-66
ページ範囲	pp.107-112(SP),
ページ数	6
発行日	2021-06-11 (SP)