多様な韻律生成のための多クラス局所韻律コンテキストの検討(オーガナイズドセッション「多様な音声・歌声の合成に向けて」,音声・言語・対話,一般)

前野 悠; 能勢 隆; 小林 隆夫; 郡山 知樹; 井島 勇祐; 中嶋 秀治; 水野 秀之; 吉岡 理

Presentation	2013-01-31 A Study on Multi-Class Local Prosodic Context for Expressive Prosody Generation Yu MAENO, Takashi NOSE, Takao KOBAYASHI, Tomoki KORIYAMA, Yusuke IJIMA, Hideharu NAKAJIMA, Hideyuki MIZUNO, Osamu YOSHIOKA,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	This paper describes a technique for reproducing local prosodic variability which appears in expressive speech including various speaking styles. Synthetic speech generated using only linguistic contexts in HMM-based speech synthesis tends to have smaller prosody variation compared with the original speech. To add more variation in synthetic speech, we define novel phrase-level prosodic contexts from the residual information of prosodic features between original and synthetic speech for training data. Specifically, we create the prosodic contexts of F0, duration, and power feature by using average difference between original and synthetic speech in each phrase. We evaluate the potential of the proposed technique under a condition where the appropriate prosodic contexts of test sentences are known in synthesis phase. We also examine whether users can intuitively modify the pitch by adjusting proposed prosodic contexts.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	HMM-based speech synthesis / expressive speech synthesis / prosodic context / unsupervised labeling / audiobook
Paper #	SP2012-112
Date of Issue

Conference Information
Committee	SP
Conference Date	2013/1/23(1days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To	Speech (SP)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	A Study on Multi-Class Local Prosodic Context for Expressive Prosody Generation
Sub Title (in English)
Keyword(1)	HMM-based speech synthesis
Keyword(2)	expressive speech synthesis
Keyword(3)	prosodic context
Keyword(4)	unsupervised labeling
Keyword(5)	audiobook
1st Author's Name	Yu MAENO
1st Author's Affiliation	Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology()
2nd Author's Name	Takashi NOSE
2nd Author's Affiliation	Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
3rd Author's Name	Takao KOBAYASHI
3rd Author's Affiliation	Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
4th Author's Name	Tomoki KORIYAMA
4th Author's Affiliation	Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
5th Author's Name	Yusuke IJIMA
5th Author's Affiliation	NTT Media Intelligence Laboratories, NTT Corporation
6th Author's Name	Hideharu NAKAJIMA
6th Author's Affiliation	NTT Media Intelligence Laboratories, NTT Corporation
7th Author's Name	Hideyuki MIZUNO
7th Author's Affiliation	NTT Media Intelligence Laboratories, NTT Corporation
8th Author's Name	Osamu YOSHIOKA
8th Author's Affiliation	NTT Media Intelligence Laboratories, NTT Corporation
Date	2013-01-31
Paper #	SP2012-112
Volume (vol)	vol.112
Number (no)	422
Page	pp.pp.-
#Pages	6
Date of Issue