Presentation | 1998/9/11 Visual Parameter Estimation from Utterance based on the EM Algorithm using Audio-Visual HMMs Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | This paper proposes a method to re-estimate output visual parameters for speech-to-lip movement synthesis using audio-visual hidden Markov models(HMMs) under the Expectation-Maximization(EM) algorithm. In a previous work, we have proposed an HMM-Viterbi method estimating a visual parameter sequence from an utterance using audio HMMs. The HMM-Viterbi method produces the output visual parameters per HMM state specified by the decoded HMM states. However, the HMM-Viterbi method involves a substantial problem that the deterministic decoding process assigns a single HMM state for an input audio frame. The deterministic process may output incorrect visual parameters due to incorrect HMM state alignment. The proposed method avoids the deterministic decoding process by the non-deterministic visual parameter estimation by the EM algorithm. The proposed method repeatedly estimates visual parameters while maximizing the likelihood of the audio-visual observation sequence using audio-visual HMMs. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | hidden Markov models / EM algorithm / image sequence synthesis / multimodal speech processing / lip synchronization |
Paper # | DSP98-86,SP98-65 |
Date of Issue |
Conference Information | |
Committee | DSP |
---|---|
Conference Date | 1998/9/11(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Digital Signal Processing (DSP) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Visual Parameter Estimation from Utterance based on the EM Algorithm using Audio-Visual HMMs |
Sub Title (in English) | |
Keyword(1) | hidden Markov models |
Keyword(2) | EM algorithm |
Keyword(3) | image sequence synthesis |
Keyword(4) | multimodal speech processing |
Keyword(5) | lip synchronization |
1st Author's Name | Eli Yamamoto |
1st Author's Affiliation | Graduate School of Information Science, Nara Institute of Science and Technology() |
2nd Author's Name | Satoshi Nakamura |
2nd Author's Affiliation | Graduate School of Information Science, Nara Institute of Science and Technology |
3rd Author's Name | Kiyohiro Shikano |
3rd Author's Affiliation | Graduate School of Information Science, Nara Institute of Science and Technology |
Date | 1998/9/11 |
Paper # | DSP98-86,SP98-65 |
Volume (vol) | vol.98 |
Number (no) | 262 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |