時空間的特徴を考慮したDNNによる手話翻訳手法の比較検討

渡邊 滉大; 亀山 渉

Presentation	2020-03-06 A Comparison Study of Neural Sign Language Translation Methods with Spatio-Temporal Features Kodai Watanabe, Wataru Kameyama,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In Neural Sign Language Translation, a model based on 2DCNN (2 Dimensional Convolutional Neural Network) called AlexNet and a neural machine translation model called Seq2Seq has been proposed. In this model, temporal information is extracted by GRU (Gated Recurrent Unit) from the features in which the spatial information is lost by 2DCNN. However, since sign language uses position, shape and motion of hands and fingers, a model that can extract temporal information from the features that contain spatial information seems to be more suitable. Therefore, in this paper, we propose various methods and compare them that extract temporal information at the stage of extracting spatial features from each frame of video. As the result of the comparison experiment of the various spatio-temporal feature extractors, it is suggested that the number of to-be-optimized parameters and the performance of sign language translation are inversely proportional on the dataset used in this experiment. That seems the reason why the model using only Optical Flow shows the highest performance in sign language translation because it has the least number of parameters to be trained.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Neural Sign Language Translation / Spatio-temporal Features / DNN / Optical Flow
Paper #	IMQ2019-68,IE2019-150,MVE2019-89
Date of Issue	2020-02-27 (IMQ, IE, MVE)

Conference Information
Committee	IE / IMQ / MVE / CQ
Conference Date	2020/3/5(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Kyushu Institute of Technology
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair	Hideaki Kimata(NTT) / Toshiya Nakaguchi(Chiba Univ.) / Kenji Mase(Nagoya Univ.) / Hideyuki Shimonishi(NEC)
Vice Chair	Kazuya Kodama(NII) / Keita Takahashi(Nagoya Univ.) / Mitsuru Maeda(Canon) / Kenya Uomori(Osaka Univ.) / Masayuki Ihara(NTT) / Jun Okamoto(NTT) / Takefumi Hiraguri(Nippon Inst. of Tech.)
Secretary	Kazuya Kodama(NTT) / Keita Takahashi(NHK) / Mitsuru Maeda(Shizuoka Univ.) / Kenya Uomori(Sony Semiconductor Solutions) / Masayuki Ihara(Nagoya Univ.) / Jun Okamoto(NTT) / Takefumi Hiraguri(Nippon Inst. of Tech.)
Assistant	Kyohei Unno(KDDI Research) / Norishige Fukushima(Nagoya Inst. of Tech.) / Hiroaki Kudo(Nagoya Univ.) / Masaru Tsuchida(NTT) / Keita Hirai(Chiba Univ.) / Satoshi Nishiguchi(Oosaka Inst. of Tech.) / Masanori Yokoyama(NTT) / Shogo Fukushima(Univ. of ToKyo) / Chikara Sasaki(KDDI Research) / Yoshiaki Nishikawa(NEC) / Takuto Kimura(NTT)

Paper Information
Registration To	Technical Committee on Image Engineering / Technical Committee on Image Media Quality / Technical Committee on Media Experience and Virtual Environment / Technical Committee on Communication Quality
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	A Comparison Study of Neural Sign Language Translation Methods with Spatio-Temporal Features
Sub Title (in English)
Keyword(1)	Neural Sign Language Translation
Keyword(2)	Spatio-temporal Features
Keyword(3)	DNN
Keyword(4)	Optical Flow
1st Author's Name	Kodai Watanabe
1st Author's Affiliation	Waseda University(Waseda Univ.)
2nd Author's Name	Wataru Kameyama
2nd Author's Affiliation	Waseda University(Waseda Univ.)
Date	2020-03-06
Paper #	IMQ2019-68,IE2019-150,MVE2019-89
Volume (vol)	vol.119
Number (no)	IMQ-454,IE-456,MVE-457
Page	pp.pp.273-278(IMQ), pp.273-278(IE), pp.273-278(MVE),
#Pages	6
Date of Issue	2020-02-27 (IMQ, IE, MVE)