Presentation | 2005/8/19 Phone Duration Modeling Based on Ensemble Learning Junichi YAMAGISHI, Hisashi KAWAI, Toshio HIRAI, Takao KOBAYASHI, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Phone duration which controls rhythm and/or tempo of synthetic speech is one of important acoustic features for text-to-speech synthesis. Controlling phone duration can be viewed as an estimation problem of prediction function using several phonetic and prosodic features and linguistic information as explanatory variables of the function, and the methods based on multiple linear regression or regression tree have been applied to the duration prediction. In this study, to improve the prediction accuracy of the methods, we use "ensemble learning" that takes advantage of several prediction models. "Gradient boosting" is examined to efficiently improve the prediction accuracy of regression tree. The gradient boosting is recursive ensemble learning using residual error of the prediction models, and can improve the accuracy by small number of parameters. We apply the algorithm to the duration prediction of Japanese and Chinese and discuss the effectiveness. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Phone duration / Ensemble learning / Regression tree / Boosting / Bagging |
Paper # | SP2005-53 |
Date of Issue |
Conference Information | |
Committee | SP |
---|---|
Conference Date | 2005/8/19(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Speech (SP) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Phone Duration Modeling Based on Ensemble Learning |
Sub Title (in English) | |
Keyword(1) | Phone duration |
Keyword(2) | Ensemble learning |
Keyword(3) | Regression tree |
Keyword(4) | Boosting |
Keyword(5) | Bagging |
1st Author's Name | Junichi YAMAGISHI |
1st Author's Affiliation | Spoken Language Communication Research Laboratories, Advanced Telecommunications Research Institute International:Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology() |
2nd Author's Name | Hisashi KAWAI |
2nd Author's Affiliation | Spoken Language Communication Research Laboratories, Advanced Telecommunications Research Institute International:KDDI R&D Laboratories |
3rd Author's Name | Toshio HIRAI |
3rd Author's Affiliation | Spoken Language Communication Research Laboratories, Advanced Telecommunications Research Institute International |
4th Author's Name | Takao KOBAYASHI |
4th Author's Affiliation | Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology |
Date | 2005/8/19 |
Paper # | SP2005-53 |
Volume (vol) | vol.105 |
Number (no) | 253 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |