Presentation 2000/12/14
A Robust End Point Detection by Speaker's Facial Image
Kazumasa MURAI, Keisuke NOMA, Kenichi KUMATANI, Tomoko MATSUI, Satoshi NAKAMURA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this paper, we propose a method to detect the end points of speaking sections (EPD : End Point Detection) by visual information. It is well known that the accuracy of EPD affects speech recognition accuracy. Detecting the speech end points from a noisy audio signal is difficult because the speech is masked by the audio noise. We propose a method for EPD that uses image of the speaker's facial motion that are not affected by audio noise. Our method locates the skin area by color information and estimates the area that includes the speech organs. Then the end points are detected by the speed at which the image alternates. An evaluation experiment also confirms that the proposed method is robust with respect to visual noise. Its accuracy with/without visual noise is 99.8% while audio (SNR 25dB) EPD is 97.5%.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Speech Recognition / Speaking Section / Facial Image / Skin Color / End Point Detection
Paper # NLC2000-39,SP2000-87
Date of Issue

Conference Information
Committee NLC
Conference Date 2000/12/14(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) A Robust End Point Detection by Speaker's Facial Image
Sub Title (in English)
Keyword(1) Speech Recognition
Keyword(2) Speaking Section
Keyword(3) Facial Image
Keyword(4) Skin Color
Keyword(5) End Point Detection
1st Author's Name Kazumasa MURAI
1st Author's Affiliation ATR Spoken Language Translation Research Laboratories : Graduate School of Information Science, Nara Institute of Science and Technology()
2nd Author's Name Keisuke NOMA
2nd Author's Affiliation Graduate School of Information Science, Nara Institute of Science and Technology
3rd Author's Name Kenichi KUMATANI
3rd Author's Affiliation ATR Spoken Language Translation Research Laboratories : Graduate School of Information Science, Nara Institute of Science and Technology
4th Author's Name Tomoko MATSUI
4th Author's Affiliation ATR Spoken Language Translation Research Laboratories
5th Author's Name Satoshi NAKAMURA
5th Author's Affiliation ATR Spoken Language Translation Research Laboratories
Date 2000/12/14
Paper # NLC2000-39,SP2000-87
Volume (vol) vol.100
Number (no) 520
Page pp.pp.-
#Pages 6
Date of Issue