学習による映像中の音源同定(テーマセッション,コンピュータビジョンとパターン認識のための機械学習と最適化,一般)

池田 千廣; フォン ヤオカイ; 内田 誠一

講演名	2010-09-05 学習による映像中の音源同定(テーマセッション,コンピュータビジョンとパターン認識のための機械学習と最適化,一般) 池田千廣, フォンヤオカイ, 内田誠一,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	本論文では,対象を絞った上でより高精度な画像と音の統合解析処理を目的としている.具体的には音源同定問題をAdaBoostを用いた識別器学習の枠組みで扱う.ここで,AdaBoost識別器とは複数個の識別器(弱識別器と呼ばれる)による識別結果の重み付多数決によって認識結果を得る方法である.学習においては,正例(映像中の音源に相当する画素)・負例(同じく非音源に相当する画素)を準備し,それを用いてAdaBoostにより誤認識率が最も低くなるように弱識別器を構成する.こうして得られた弱識別器の識別結果の重み付多数決によって音源が同定される.画像による弱識別器と音情報による弱識別器を組み合わせることで,高精度な音源同定が可能となる.ただし,音源同定独特の性質として,音と画像というマルチモーダルな情報が与えられたとしても,利用の仕方によっては,音情報がほとんど識別に寄与しなくなることがある.本稿ではこの状況について確認するとともに,その状況の解決法について提案する.
抄録(英)	Sound source detection in an image is a difficult inverse problem where the pixels belonging to the sound source area are to be estimated. The purpose of this paper is to consider an accurate sound source detection method by using machine learning framework. Specifically, the proposed method relies on an AdaBoost-based learning scheme for discriminating whether each pixel belongs to a sound source or not. The learning is done by training weak learners to discriminate positive samples (couples of image features around sound sources and audio features) and negative samples (couples of image features distant from sound sources and audio features). This learning scheme simply combines these multimodal information (i.e., image and audio) by using some weak learners to discriminate the samples by a single image feature and others by a single audio feature. The performance of this naive implementation based on a simple combination of multimodal information was examined experimentally and its essential problem was revealed with a possible remedy.
キーワード(和)	音源同定 / 学習 / AdaBoost
キーワード(英)	sound source detection / learning / AdaBoost
資料番号	PRMU2010-69,IBISML2010-41
発行日

研究会情報
研究会	PRMU
開催期間	2010/8/29(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Pattern Recognition and Media Understanding (PRMU)
本文の言語	JPN
タイトル（和）	学習による映像中の音源同定(テーマセッション,コンピュータビジョンとパターン認識のための機械学習と最適化,一般)
サブタイトル（和）
タイトル（英）	Sound Source Detection by Learning
サブタイトル（和）
キーワード(1)（和/英）	音源同定 / sound source detection
キーワード(2)（和/英）	学習 / learning
キーワード(3)（和/英）	AdaBoost / AdaBoost
第 1 著者氏名（和/英）	池田千廣 / Chihiro IKEDA
第 1 著者所属（和/英）	九州大学大学院システム情報科学府 Graduate School of Information Science and Electrical Engineering, Kyushu University
第 2 著者氏名（和/英）	フォンヤオカイ / Yaokai FENG
第 2 著者所属（和/英）	九州大学大学院システム情報研究院 Faculity of Information Science and Electrical Engineering, Kyushu University
第 3 著者氏名（和/英）	内田誠一 / Seiichi UCHIDA
第 3 著者所属（和/英）	九州大学大学院システム情報研究院 Faculity of Information Science and Electrical Engineering, Kyushu University
発表年月日	2010-09-05
資料番号	PRMU2010-69,IBISML2010-41
巻番号（vol）	vol.110
号番号（no）	187
ページ範囲	pp.-
ページ数	6
発行日