動画像からの人の飽きの検出

立川 悠輝; 中澤 篤志

講演名	2022-10-07 動画像からの人の飽きの検出立川悠輝(京大), 中澤篤志(京大),
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	人の内部状態の推定は Affectiveインタラクティブシステムを構成するために必要不可欠な要素である．特に近年，エージェント等とのインタラクションを行わせるインタフェースが様々な場面で利用されているが，ユーザーがそのインタフェース自体を選好しているかは明らかでなく，仮にシステムがユーザーのエージェントに対する「飽き(Boredom)」が検出できれば，エージェントの動作を変化させるなどの対応が取れるため有効である．このため本研究では，ユーザーの「飽き」の状態を顔画像から認識することを目的とし，以下のような実験を行った．31名の参加者に対し，興味があると想定される話題（食べ物，行事）と興味が無いと想定される話題（幾何学，建築）についてエージェントと会話タスクを行わせ，その間の顔動画を撮影した．顔の個人差を吸収し，表情の変化のみに着目するため，入力画像から顔領域を正規化し，Optical Flowを求める．得られたOptical Flow列に対して2D-CNN, 3D-CNNの二種類のネットワークを適用して学習・推定を行ったところ，飽きに対して60% (2D-CNN), 54% (3D-CNN)(chance rate = 50%)の結果を得た．また同時に, 対象人物のパーソナリティ（外向性）および新しい技術等への受容性を推定する学習・推定を行ったところ，58% (2D-CNN), 39% (3D-CNN) (chance rate = 0.25)の結果を得た．ここから，顔表情の動きをCNNで学習することにより，飽きの推定およびパーソナリティの推定が行える可能性を示し，また学習されたネットワークの可視化を行うことで，どの領域が識別根拠になっているかを推定することが可能となった．
抄録(英)	Prediction of individual internal state is an essential element to realize future affective interactive systems. Nevertheless, the user interfaces that uses agents (avatar) are becoming popular in various application fields, it is not clear whether the users prefer the interaction with the agents. If the system can detect the users' boredom with the agents, the system can change the agent's behavior and prevent user's boredom. In this study, we developed the algorithm to recognize the user's `boredom' from facial images. 31 participants were asked to perform a conversational task with the agent on topics that they were supposed to be interested in (e.g. food and events) and not interested in (e.g. geometry and architecture) and their facial videos are taken. For the video, we detected the facial parts and normalized the facial regions from the input images, and obtained optical flow. For the recognition, two types of networks, 2D-CNN and 3D-CNN, were developed. As the result, the recognition rate of the boredom were 60% (2D-CNN) and 54% (3D-CNN) (chance rate = 50%), respectively. Moreover, we trained the same network to identify four types of personalities of the users. As the result, the accuracy were 33% (2D-CNN) and 39% (3D-CNN) (chance rate = 25%), respectively. These results indicated that learning facial expression movements with a CNN can be used to estimate boredom and personality, and visualization of the learned network can be used to estimate which regions are the basis for discrimination.
キーワード(和)	表情解析 / 飽き / パーソナリティ
キーワード(英)	facial expression analysis / boredom / personality
資料番号	MVE2022-27
発行日	2022-09-29 (MVE)

研究会情報
研究会	MVE / VRSJ-SIG-MR / IPSJ-EC / HI-SIG-DeMO / VRSJ-SIG-CS
開催期間	2022/10/6(から2日開催)
開催地（和）	北海道釧路市阿寒湖まりむ館（仮）＋オンライン開催
開催地（英）
テーマ（和）	AR/MR技術、ヒューマンインタフェース技術、メディア情報処理技術に関する基礎/応用
テーマ（英）
委員長氏名（和）	清川清(奈良先端大)
委員長氏名（英）	Kiyoshi Kiyokawa(NAIST)
副委員長氏名（和）	新井田統(KDDI総合研究所)
副委員長氏名（英）	Sumaru Niida(KDDI Research)
幹事氏名（和）	磯山直也(奈良先端大) / 原豪紀(慶大/大日本印刷) / 福嶋政期(東大) / 後藤充裕(NTT)
幹事氏名（英）	Naoya Isoyama(NAIST) / Takenori Hara(DNP) / Shogo Fukushima(Univ. of ToKyo) / Mitsuhiro Goto(NTT)
幹事補佐氏名（和）	宍戸英彦(筑波大) / 中澤篤志(京大) / 東條直也(KDDI総合研究所) / 萩山直紀(NTT)
幹事補佐氏名（英）	Hidehiko Shishido(Univ. of Tsukuba) / Atsushi Nakazawa(Kyoto Univ.) / Naoya Tojo(KDDI Research) / Naoki Hagiyama(NTT)

講演論文情報詳細
申込み研究会	Technical Committee on Media Experience and Virtual Environment / SIG-MR / Special Interest Group on Entertainment Computing / Special Interest Group on De-vice Media Oriented UI / SIG-CS
本文の言語	JPN
タイトル（和）	動画像からの人の飽きの検出
サブタイトル（和）
タイトル（英）	Detection of human boredom from video
サブタイトル（和）
キーワード(1)（和/英）	表情解析 / facial expression analysis
キーワード(2)（和/英）	飽き / boredom
キーワード(3)（和/英）	パーソナリティ / personality
第 1 著者氏名（和/英）	立川悠輝 / Yuki Tachikawa
第 1 著者所属（和/英）	京都大学(略称：京大) Kyoto University(略称：Kyoto Univ.)
第 2 著者氏名（和/英）	中澤篤志 / Atsushi Nakazawa
第 2 著者所属（和/英）	京都大学(略称：京大) Kyoto University(略称：Kyoto Univ.)
発表年月日	2022-10-07
資料番号	MVE2022-27
巻番号（vol）	vol.122
号番号（no）	MVE-200
ページ範囲	pp.52-56(MVE),
ページ数	5
発行日	2022-09-29 (MVE)