Presentation 2005/8/19
Phone Duration Modeling Based on Ensemble Learning
Junichi YAMAGISHI, Hisashi KAWAI, Toshio HIRAI, Takao KOBAYASHI,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Phone duration which controls rhythm and/or tempo of synthetic speech is one of important acoustic features for text-to-speech synthesis. Controlling phone duration can be viewed as an estimation problem of prediction function using several phonetic and prosodic features and linguistic information as explanatory variables of the function, and the methods based on multiple linear regression or regression tree have been applied to the duration prediction. In this study, to improve the prediction accuracy of the methods, we use "ensemble learning" that takes advantage of several prediction models. "Gradient boosting" is examined to efficiently improve the prediction accuracy of regression tree. The gradient boosting is recursive ensemble learning using residual error of the prediction models, and can improve the accuracy by small number of parameters. We apply the algorithm to the duration prediction of Japanese and Chinese and discuss the effectiveness.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Phone duration / Ensemble learning / Regression tree / Boosting / Bagging
Paper # SP2005-53
Date of Issue

Conference Information
Committee SP
Conference Date 2005/8/19(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Speech (SP)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Phone Duration Modeling Based on Ensemble Learning
Sub Title (in English)
Keyword(1) Phone duration
Keyword(2) Ensemble learning
Keyword(3) Regression tree
Keyword(4) Boosting
Keyword(5) Bagging
1st Author's Name Junichi YAMAGISHI
1st Author's Affiliation Spoken Language Communication Research Laboratories, Advanced Telecommunications Research Institute International:Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology()
2nd Author's Name Hisashi KAWAI
2nd Author's Affiliation Spoken Language Communication Research Laboratories, Advanced Telecommunications Research Institute International:KDDI R&D Laboratories
3rd Author's Name Toshio HIRAI
3rd Author's Affiliation Spoken Language Communication Research Laboratories, Advanced Telecommunications Research Institute International
4th Author's Name Takao KOBAYASHI
4th Author's Affiliation Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
Date 2005/8/19
Paper # SP2005-53
Volume (vol) vol.105
Number (no) 253
Page pp.pp.-
#Pages 6
Date of Issue