Presentation 2001/12/13
Vocal Tract Length Normalization Using Linear Transformation based on Maximum Likelihood Estimation
Jun ROKUI, MITSURU Nakai, Hiroshi SHIMODAIRA, Shigeki SAGAYAMA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Vocal tract length normalization (VTLN) is one of the popular speaker adaptation techniques for speech recognition. The present study proposes a new VTLN algorithm in which expectation-maximization(EM) based parameter adaptation of HMM to vocal tract length is achieved in the mel-cepstral domain by utilizing a linear transformation model. Compared to other existing approaches based on bi-linear transformation for VTLN where a specific non-linear frequency warping function is employed in the spectrum domain and parameter adaptation of HMM is carried out in the cepstral domain, the proposed approach assumes a linear frequency warping with a single scaling factor and equivalent operation is modeled in the mel-cepstral domain by using a first order Taylor series approximation. The proposed scheme demonstrates significant improvement of recognition performance in a speaker independent word recognition task.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Vocal Tract Length Normalization / Linear Transformation / Maximum Likelihood Estimation / Speaker Adaptation / Speaker Normalization
Paper # NLC2001-52,SP2001-87
Date of Issue

Conference Information
Committee SP
Conference Date 2001/12/13(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Speech (SP)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Vocal Tract Length Normalization Using Linear Transformation based on Maximum Likelihood Estimation
Sub Title (in English)
Keyword(1) Vocal Tract Length Normalization
Keyword(2) Linear Transformation
Keyword(3) Maximum Likelihood Estimation
Keyword(4) Speaker Adaptation
Keyword(5) Speaker Normalization
1st Author's Name Jun ROKUI
1st Author's Affiliation Japan Advanced Institute of Science and Technology, Hokuriku. Dept of Information Science()
2nd Author's Name MITSURU Nakai
2nd Author's Affiliation Japan Advanced Institute of Science and Technology, Hokuriku. Dept of Information Science
3rd Author's Name Hiroshi SHIMODAIRA
3rd Author's Affiliation Japan Advanced Institute of Science and Technology, Hokuriku. Dept of Information Science
4th Author's Name Shigeki SAGAYAMA
4th Author's Affiliation The University of Tokyo. Graduate School of Information Science and Technology
Date 2001/12/13
Paper # NLC2001-52,SP2001-87
Volume (vol) vol.101
Number (no) 522
Page pp.pp.-
#Pages 6
Date of Issue