Presentation | 2008-11-27 A Realtime Multimodal System toward Multiparty Conversation Scene Analysis : Integrating Face Pose Tracking and Speaker Diarization using Multimodal Omnidirectional Sensors Kazuhiro Otsuka, Shoko Araki, Kentaro Ishizuka, Masakiyo Fujimoto, Junji Yamato, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | This paper presents a realtime system for analyzing group meetings that uses a novel omnidirectional camera-microphone system. The goal is to automatically discover the visual focus of attention (VFOA), i.e. "who is looking at whom", in addition to speaker diarization, i.e. "who is speaking and when". First, a novel sensing device is presented; it consists of two cameras with two fisheye lenses and a microphone array. Second, from omnidirectional images captured with the cameras, the position and pose of people's faces are estimated by STCTracker (Sparse Template Condensation Tracker); it realizes realtime tracking by utilizing GPUs (Graphics Processing Units). The face position/pose data is used to estimate the focus of attention in the group. Using the microphone array, robust speaker diarization is carried out by a VAD (Voice Activity Detection) and a DOA (Direction of Arrival) estimation. This paper also presents new 3-D visualization schemes for the results of an analysis. Using two PCs, one for vision and one for audio processing, the system runs at 27.1 [frame/sec] on average for 5-person meetings. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | |
Paper # | PRMU2008-119,MVE2008-68 |
Date of Issue |
Conference Information | |
Committee | MVE |
---|---|
Conference Date | 2008/11/20(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Media Experience and Virtual Environment (MVE) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | A Realtime Multimodal System toward Multiparty Conversation Scene Analysis : Integrating Face Pose Tracking and Speaker Diarization using Multimodal Omnidirectional Sensors |
Sub Title (in English) | |
Keyword(1) | |
1st Author's Name | Kazuhiro Otsuka |
1st Author's Affiliation | NTT Communication Science Laboratories, NIPPON TELEGRAPH AND TELEPHONE CORPORATION() |
2nd Author's Name | Shoko Araki |
2nd Author's Affiliation | NTT Communication Science Laboratories, NIPPON TELEGRAPH AND TELEPHONE CORPORATION |
3rd Author's Name | Kentaro Ishizuka |
3rd Author's Affiliation | NTT Communication Science Laboratories, NIPPON TELEGRAPH AND TELEPHONE CORPORATION |
4th Author's Name | Masakiyo Fujimoto |
4th Author's Affiliation | NTT Communication Science Laboratories, NIPPON TELEGRAPH AND TELEPHONE CORPORATION |
5th Author's Name | Junji Yamato |
5th Author's Affiliation | NTT Communication Science Laboratories, NIPPON TELEGRAPH AND TELEPHONE CORPORATION |
Date | 2008-11-27 |
Paper # | PRMU2008-119,MVE2008-68 |
Volume (vol) | vol.108 |
Number (no) | 328 |
Page | pp.pp.- |
#Pages | 8 |
Date of Issue |