Presentation 2023-03-17
An Effect of Data Augmentation using 3D Models in Machine Lipreading on the Recognition Accuracy
Kazuma Kimura, Kenko Ota,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this study, we investigate the use of a three-dimensional model of a speaker's face as a data augmentation method for machine learning lip reading, which estimates the content of speech based only on oral information. In our previous research, recognition was performed on a word-by-word basis, but we also introduce a method for recognition on a phoneme-by-phoneme basis, similar to normal continuous speech recognition. As a result of the evaluation, we achieved an error rate of 0.2842 for the data in which the speaker of the evaluation data was included in the speaker of the training data and was not converted to a three-dimensional model. The error rate of 0.3290 was also achieved for data where the speaker of the evaluation data was not included in the speaker of the training data and was not converted to a three-dimensional model. In the future, it will be necessary to increase the amount of data with sentences in order to improve the versatility of speech recognition
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Lipreading / 3D Models / Phoneme / Data augmentation
Paper # MICT2022-59
Date of Issue 2023-03-10 (MICT)

Conference Information
Committee EMCJ / MICT
Conference Date 2023/3/17(1days)
Place (in Japanese) (See Japanese page)
Place (in English) Kikai-Shinko-Kaikan Bldg
Topics (in Japanese) (See Japanese page)
Topics (in English) Healthcare and Medical Information Communication Technologies, EMC, etc
Chair Atsuhiro Nishikata(Tokyo Inst. of Tech.) / Hirokazu Tanaka(Hiroshima City Univ.)
Vice Chair Kimihiro Tajima(NTT-AT) / Chika Sugimoto(Yokohama National Univ.) / Daisuke Anzai(Nagoya Inst. of Tech.)
Secretary Kimihiro Tajima(Hokkaido Univ.) / Chika Sugimoto(Hitachi) / Daisuke Anzai(Okayama Pref. Univ.)
Assistant Kiyoto Matsushima(Hitachi) / Kenji Ogata(ADOX) / Toru Matsushima(Kyushu Inst. of Tech.) / Takahiro Ito(Hiroshima City Univ) / Natsuki Nakayama(Nagoya Univ.) / Takuya Nishikawa(National Cerebral and Cardiovascular Center Hospital)

Paper Information
Registration To Technical Committee on Electromagnetic Compatibility / Technical Committee on Healthcare and Medical Information Communication Technology
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) An Effect of Data Augmentation using 3D Models in Machine Lipreading on the Recognition Accuracy
Sub Title (in English)
Keyword(1) Lipreading
Keyword(2) 3D Models
Keyword(3) Phoneme
Keyword(4) Data augmentation
1st Author's Name Kazuma Kimura
1st Author's Affiliation Nippon Institute of Technology(NIT)
2nd Author's Name Kenko Ota
2nd Author's Affiliation Nippon Institute of Technology(NIT)
Date 2023-03-17
Paper # MICT2022-59
Volume (vol) vol.122
Number (no) MICT-447
Page pp.pp.17-21(MICT),
#Pages 5
Date of Issue 2023-03-10 (MICT)