Presentation 2008-11-27
A Realtime Multimodal System toward Multiparty Conversation Scene Analysis : Integrating Face Pose Tracking and Speaker Diarization using Multimodal Omnidirectional Sensors
Kazuhiro Otsuka, Shoko Araki, Kentaro Ishizuka, Masakiyo Fujimoto, Junji Yamato,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper presents a realtime system for analyzing group meetings that uses a novel omnidirectional camera-microphone system. The goal is to automatically discover the visual focus of attention (VFOA), i.e. "who is looking at whom", in addition to speaker diarization, i.e. "who is speaking and when". First, a novel sensing device is presented; it consists of two cameras with two fisheye lenses and a microphone array. Second, from omnidirectional images captured with the cameras, the position and pose of people's faces are estimated by STCTracker (Sparse Template Condensation Tracker); it realizes realtime tracking by utilizing GPUs (Graphics Processing Units). The face position/pose data is used to estimate the focus of attention in the group. Using the microphone array, robust speaker diarization is carried out by a VAD (Voice Activity Detection) and a DOA (Direction of Arrival) estimation. This paper also presents new 3-D visualization schemes for the results of an analysis. Using two PCs, one for vision and one for audio processing, the system runs at 27.1 [frame/sec] on average for 5-person meetings.
Keyword(in Japanese) (See Japanese page)
Keyword(in English)
Paper # PRMU2008-119,MVE2008-68
Date of Issue

Conference Information
Committee MVE
Conference Date 2008/11/20(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Media Experience and Virtual Environment (MVE)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) A Realtime Multimodal System toward Multiparty Conversation Scene Analysis : Integrating Face Pose Tracking and Speaker Diarization using Multimodal Omnidirectional Sensors
Sub Title (in English)
Keyword(1)
1st Author's Name Kazuhiro Otsuka
1st Author's Affiliation NTT Communication Science Laboratories, NIPPON TELEGRAPH AND TELEPHONE CORPORATION()
2nd Author's Name Shoko Araki
2nd Author's Affiliation NTT Communication Science Laboratories, NIPPON TELEGRAPH AND TELEPHONE CORPORATION
3rd Author's Name Kentaro Ishizuka
3rd Author's Affiliation NTT Communication Science Laboratories, NIPPON TELEGRAPH AND TELEPHONE CORPORATION
4th Author's Name Masakiyo Fujimoto
4th Author's Affiliation NTT Communication Science Laboratories, NIPPON TELEGRAPH AND TELEPHONE CORPORATION
5th Author's Name Junji Yamato
5th Author's Affiliation NTT Communication Science Laboratories, NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Date 2008-11-27
Paper # PRMU2008-119,MVE2008-68
Volume (vol) vol.108
Number (no) 328
Page pp.pp.-
#Pages 8
Date of Issue