Presentation 1998/9/11
Visual Parameter Estimation from Utterance based on the EM Algorithm using Audio-Visual HMMs
Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper proposes a method to re-estimate output visual parameters for speech-to-lip movement synthesis using audio-visual hidden Markov models(HMMs) under the Expectation-Maximization(EM) algorithm. In a previous work, we have proposed an HMM-Viterbi method estimating a visual parameter sequence from an utterance using audio HMMs. The HMM-Viterbi method produces the output visual parameters per HMM state specified by the decoded HMM states. However, the HMM-Viterbi method involves a substantial problem that the deterministic decoding process assigns a single HMM state for an input audio frame. The deterministic process may output incorrect visual parameters due to incorrect HMM state alignment. The proposed method avoids the deterministic decoding process by the non-deterministic visual parameter estimation by the EM algorithm. The proposed method repeatedly estimates visual parameters while maximizing the likelihood of the audio-visual observation sequence using audio-visual HMMs.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) hidden Markov models / EM algorithm / image sequence synthesis / multimodal speech processing / lip synchronization
Paper # DSP98-86,SP98-65
Date of Issue

Conference Information
Committee DSP
Conference Date 1998/9/11(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Digital Signal Processing (DSP)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Visual Parameter Estimation from Utterance based on the EM Algorithm using Audio-Visual HMMs
Sub Title (in English)
Keyword(1) hidden Markov models
Keyword(2) EM algorithm
Keyword(3) image sequence synthesis
Keyword(4) multimodal speech processing
Keyword(5) lip synchronization
1st Author's Name Eli Yamamoto
1st Author's Affiliation Graduate School of Information Science, Nara Institute of Science and Technology()
2nd Author's Name Satoshi Nakamura
2nd Author's Affiliation Graduate School of Information Science, Nara Institute of Science and Technology
3rd Author's Name Kiyohiro Shikano
3rd Author's Affiliation Graduate School of Information Science, Nara Institute of Science and Technology
Date 1998/9/11
Paper # DSP98-86,SP98-65
Volume (vol) vol.98
Number (no) 262
Page pp.pp.-
#Pages 6
Date of Issue