Presentation | 2023-03-17 An Effect of Data Augmentation using 3D Models in Machine Lipreading on the Recognition Accuracy Kazuma Kimura, Kenko Ota, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In this study, we investigate the use of a three-dimensional model of a speaker's face as a data augmentation method for machine learning lip reading, which estimates the content of speech based only on oral information. In our previous research, recognition was performed on a word-by-word basis, but we also introduce a method for recognition on a phoneme-by-phoneme basis, similar to normal continuous speech recognition. As a result of the evaluation, we achieved an error rate of 0.2842 for the data in which the speaker of the evaluation data was included in the speaker of the training data and was not converted to a three-dimensional model. The error rate of 0.3290 was also achieved for data where the speaker of the evaluation data was not included in the speaker of the training data and was not converted to a three-dimensional model. In the future, it will be necessary to increase the amount of data with sentences in order to improve the versatility of speech recognition |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Lipreading / 3D Models / Phoneme / Data augmentation |
Paper # | MICT2022-59 |
Date of Issue | 2023-03-10 (MICT) |
Conference Information | |
Committee | EMCJ / MICT |
---|---|
Conference Date | 2023/3/17(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Kikai-Shinko-Kaikan Bldg |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Healthcare and Medical Information Communication Technologies, EMC, etc |
Chair | Atsuhiro Nishikata(Tokyo Inst. of Tech.) / Hirokazu Tanaka(Hiroshima City Univ.) |
Vice Chair | Kimihiro Tajima(NTT-AT) / Chika Sugimoto(Yokohama National Univ.) / Daisuke Anzai(Nagoya Inst. of Tech.) |
Secretary | Kimihiro Tajima(Hokkaido Univ.) / Chika Sugimoto(Hitachi) / Daisuke Anzai(Okayama Pref. Univ.) |
Assistant | Kiyoto Matsushima(Hitachi) / Kenji Ogata(ADOX) / Toru Matsushima(Kyushu Inst. of Tech.) / Takahiro Ito(Hiroshima City Univ) / Natsuki Nakayama(Nagoya Univ.) / Takuya Nishikawa(National Cerebral and Cardiovascular Center Hospital) |
Paper Information | |
Registration To | Technical Committee on Electromagnetic Compatibility / Technical Committee on Healthcare and Medical Information Communication Technology |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | An Effect of Data Augmentation using 3D Models in Machine Lipreading on the Recognition Accuracy |
Sub Title (in English) | |
Keyword(1) | Lipreading |
Keyword(2) | 3D Models |
Keyword(3) | Phoneme |
Keyword(4) | Data augmentation |
1st Author's Name | Kazuma Kimura |
1st Author's Affiliation | Nippon Institute of Technology(NIT) |
2nd Author's Name | Kenko Ota |
2nd Author's Affiliation | Nippon Institute of Technology(NIT) |
Date | 2023-03-17 |
Paper # | MICT2022-59 |
Volume (vol) | vol.122 |
Number (no) | MICT-447 |
Page | pp.pp.17-21(MICT), |
#Pages | 5 |
Date of Issue | 2023-03-10 (MICT) |