Presentation 2008-12-09
Speaker diarization of multi-party conversations based on audio and visual information integration
Kentaro ISHIZUKA, Shoko ARAKI, Kazuhiro OTSUKA, Masakiyo FUJIMOTO, Tomohiro NAKATANI,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper proposes a speaker diarization method, which detects "who spoke when" in multi-party conversations, based on the probabilistic integration of audio and visual information. The audio and visual information is obtained from a compact system, which consists of two cameras with fisheye lenses and a triangular microphone array with three microphones, designed to analyze multi-party conversations. To realize speaker diarization, our proposed method utilizes the probability distributions of speech presence, speaker locations, and participants' face locations obtained with a speech activity detector, a direction of arrival based speaker location detector, and a face tracker, respectively. An experiment using real casual conversations revealed the advantages of such integration.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Multi-party conversation analysis / Speaker diarization / Multimodal systems
Paper # NLC2008-28,SP2008-83
Date of Issue

Conference Information
Committee NLC
Conference Date 2008/12/2(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Speaker diarization of multi-party conversations based on audio and visual information integration
Sub Title (in English)
Keyword(1) Multi-party conversation analysis
Keyword(2) Speaker diarization
Keyword(3) Multimodal systems
1st Author's Name Kentaro ISHIZUKA
1st Author's Affiliation NTT Communication Science Laboratories, NTT Corporation()
2nd Author's Name Shoko ARAKI
2nd Author's Affiliation NTT Communication Science Laboratories, NTT Corporation
3rd Author's Name Kazuhiro OTSUKA
3rd Author's Affiliation NTT Communication Science Laboratories, NTT Corporation
4th Author's Name Masakiyo FUJIMOTO
4th Author's Affiliation NTT Communication Science Laboratories, NTT Corporation
5th Author's Name Tomohiro NAKATANI
5th Author's Affiliation NTT Communication Science Laboratories, NTT Corporation
Date 2008-12-09
Paper # NLC2008-28,SP2008-83
Volume (vol) vol.108
Number (no) 337
Page pp.pp.-
#Pages 6
Date of Issue