Presentation | 1999/8/5 Simultaneous Modeling of Spectrum, Pitch and State Duration in HMM-based Speech Synthesis Takayoshi Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In this paper, we describe an HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM. In the system, spectrum, pitch and state duration are modeled by continuous density HMMs, multi-space probability distribution HMMs and multi-dimensional Gaussian distributions, respectively. The distributions for spectral parameter, pitch parameter and the state duration are clustered independently by using a decision-tree based context clustering technique. Synthetic speech is generated by using an speech parameter generation algorithm from HMM and a mel-cepstrum based vocoding technique. Through informal listening tests, we have confirmed that the proposed system successfully synthesizes natural-sounding speech which resembles the speaker in the training database. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | HMM / text-to-speech synthesis / mel-cepstrum / prosodic model / context clustering |
Paper # | SP99-59 |
Date of Issue |
Conference Information | |
Committee | SP |
---|---|
Conference Date | 1999/8/5(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Speech (SP) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Simultaneous Modeling of Spectrum, Pitch and State Duration in HMM-based Speech Synthesis |
Sub Title (in English) | |
Keyword(1) | HMM |
Keyword(2) | text-to-speech synthesis |
Keyword(3) | mel-cepstrum |
Keyword(4) | prosodic model |
Keyword(5) | context clustering |
1st Author's Name | Takayoshi Yoshimura |
1st Author's Affiliation | Department of Computer Science, Nagoya Inst. of Tech.() |
2nd Author's Name | Keiichi Tokuda |
2nd Author's Affiliation | Department of Computer Science, Nagoya Inst. of Tech. |
3rd Author's Name | Takashi Masuko |
3rd Author's Affiliation | Interdisciplinary Graduate School of Science and Engineering,Tokyo Inst.of Tech. |
4th Author's Name | Takao Kobayashi |
4th Author's Affiliation | Interdisciplinary Graduate School of Science and Engineering,Tokyo Inst.of Tech. |
5th Author's Name | Tadashi Kitamura |
5th Author's Affiliation | Department of Computer Science, Nagoya Inst. of Tech. |
Date | 1999/8/5 |
Paper # | SP99-59 |
Volume (vol) | vol.99 |
Number (no) | 255 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |