Presentation | 2013-01-31 A Study on Multi-Class Local Prosodic Context for Expressive Prosody Generation Yu MAENO, Takashi NOSE, Takao KOBAYASHI, Tomoki KORIYAMA, Yusuke IJIMA, Hideharu NAKAJIMA, Hideyuki MIZUNO, Osamu YOSHIOKA, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | This paper describes a technique for reproducing local prosodic variability which appears in expressive speech including various speaking styles. Synthetic speech generated using only linguistic contexts in HMM-based speech synthesis tends to have smaller prosody variation compared with the original speech. To add more variation in synthetic speech, we define novel phrase-level prosodic contexts from the residual information of prosodic features between original and synthetic speech for training data. Specifically, we create the prosodic contexts of F0, duration, and power feature by using average difference between original and synthetic speech in each phrase. We evaluate the potential of the proposed technique under a condition where the appropriate prosodic contexts of test sentences are known in synthesis phase. We also examine whether users can intuitively modify the pitch by adjusting proposed prosodic contexts. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | HMM-based speech synthesis / expressive speech synthesis / prosodic context / unsupervised labeling / audiobook |
Paper # | SP2012-112 |
Date of Issue |
Conference Information | |
Committee | SP |
---|---|
Conference Date | 2013/1/23(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Speech (SP) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | A Study on Multi-Class Local Prosodic Context for Expressive Prosody Generation |
Sub Title (in English) | |
Keyword(1) | HMM-based speech synthesis |
Keyword(2) | expressive speech synthesis |
Keyword(3) | prosodic context |
Keyword(4) | unsupervised labeling |
Keyword(5) | audiobook |
1st Author's Name | Yu MAENO |
1st Author's Affiliation | Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology() |
2nd Author's Name | Takashi NOSE |
2nd Author's Affiliation | Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology |
3rd Author's Name | Takao KOBAYASHI |
3rd Author's Affiliation | Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology |
4th Author's Name | Tomoki KORIYAMA |
4th Author's Affiliation | Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology |
5th Author's Name | Yusuke IJIMA |
5th Author's Affiliation | NTT Media Intelligence Laboratories, NTT Corporation |
6th Author's Name | Hideharu NAKAJIMA |
6th Author's Affiliation | NTT Media Intelligence Laboratories, NTT Corporation |
7th Author's Name | Hideyuki MIZUNO |
7th Author's Affiliation | NTT Media Intelligence Laboratories, NTT Corporation |
8th Author's Name | Osamu YOSHIOKA |
8th Author's Affiliation | NTT Media Intelligence Laboratories, NTT Corporation |
Date | 2013-01-31 |
Paper # | SP2012-112 |
Volume (vol) | vol.112 |
Number (no) | 422 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |