Presentation | 2008-12-09 Speaker diarization of multi-party conversations based on audio and visual information integration Kentaro ISHIZUKA, Shoko ARAKI, Kazuhiro OTSUKA, Masakiyo FUJIMOTO, Tomohiro NAKATANI, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | This paper proposes a speaker diarization method, which detects "who spoke when" in multi-party conversations, based on the probabilistic integration of audio and visual information. The audio and visual information is obtained from a compact system, which consists of two cameras with fisheye lenses and a triangular microphone array with three microphones, designed to analyze multi-party conversations. To realize speaker diarization, our proposed method utilizes the probability distributions of speech presence, speaker locations, and participants' face locations obtained with a speech activity detector, a direction of arrival based speaker location detector, and a face tracker, respectively. An experiment using real casual conversations revealed the advantages of such integration. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Multi-party conversation analysis / Speaker diarization / Multimodal systems |
Paper # | NLC2008-28,SP2008-83 |
Date of Issue |
Conference Information | |
Committee | NLC |
---|---|
Conference Date | 2008/12/2(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Natural Language Understanding and Models of Communication (NLC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Speaker diarization of multi-party conversations based on audio and visual information integration |
Sub Title (in English) | |
Keyword(1) | Multi-party conversation analysis |
Keyword(2) | Speaker diarization |
Keyword(3) | Multimodal systems |
1st Author's Name | Kentaro ISHIZUKA |
1st Author's Affiliation | NTT Communication Science Laboratories, NTT Corporation() |
2nd Author's Name | Shoko ARAKI |
2nd Author's Affiliation | NTT Communication Science Laboratories, NTT Corporation |
3rd Author's Name | Kazuhiro OTSUKA |
3rd Author's Affiliation | NTT Communication Science Laboratories, NTT Corporation |
4th Author's Name | Masakiyo FUJIMOTO |
4th Author's Affiliation | NTT Communication Science Laboratories, NTT Corporation |
5th Author's Name | Tomohiro NAKATANI |
5th Author's Affiliation | NTT Communication Science Laboratories, NTT Corporation |
Date | 2008-12-09 |
Paper # | NLC2008-28,SP2008-83 |
Volume (vol) | vol.108 |
Number (no) | 337 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |