Presentation 2013-01-31
A Study on Multi-Class Local Prosodic Context for Expressive Prosody Generation
Yu MAENO, Takashi NOSE, Takao KOBAYASHI, Tomoki KORIYAMA, Yusuke IJIMA, Hideharu NAKAJIMA, Hideyuki MIZUNO, Osamu YOSHIOKA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper describes a technique for reproducing local prosodic variability which appears in expressive speech including various speaking styles. Synthetic speech generated using only linguistic contexts in HMM-based speech synthesis tends to have smaller prosody variation compared with the original speech. To add more variation in synthetic speech, we define novel phrase-level prosodic contexts from the residual information of prosodic features between original and synthetic speech for training data. Specifically, we create the prosodic contexts of F0, duration, and power feature by using average difference between original and synthetic speech in each phrase. We evaluate the potential of the proposed technique under a condition where the appropriate prosodic contexts of test sentences are known in synthesis phase. We also examine whether users can intuitively modify the pitch by adjusting proposed prosodic contexts.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) HMM-based speech synthesis / expressive speech synthesis / prosodic context / unsupervised labeling / audiobook
Paper # SP2012-112
Date of Issue

Conference Information
Committee SP
Conference Date 2013/1/23(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Speech (SP)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) A Study on Multi-Class Local Prosodic Context for Expressive Prosody Generation
Sub Title (in English)
Keyword(1) HMM-based speech synthesis
Keyword(2) expressive speech synthesis
Keyword(3) prosodic context
Keyword(4) unsupervised labeling
Keyword(5) audiobook
1st Author's Name Yu MAENO
1st Author's Affiliation Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology()
2nd Author's Name Takashi NOSE
2nd Author's Affiliation Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
3rd Author's Name Takao KOBAYASHI
3rd Author's Affiliation Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
4th Author's Name Tomoki KORIYAMA
4th Author's Affiliation Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
5th Author's Name Yusuke IJIMA
5th Author's Affiliation NTT Media Intelligence Laboratories, NTT Corporation
6th Author's Name Hideharu NAKAJIMA
6th Author's Affiliation NTT Media Intelligence Laboratories, NTT Corporation
7th Author's Name Hideyuki MIZUNO
7th Author's Affiliation NTT Media Intelligence Laboratories, NTT Corporation
8th Author's Name Osamu YOSHIOKA
8th Author's Affiliation NTT Media Intelligence Laboratories, NTT Corporation
Date 2013-01-31
Paper # SP2012-112
Volume (vol) vol.112
Number (no) 422
Page pp.pp.-
#Pages 6
Date of Issue