二次形式の大域的最適化によるクラスタリング(第15回情報論的学習理論ワークショップ)

広瀬 俊亮

講演名	2012-11-07 二次形式の大域的最適化によるクラスタリング(第15回情報論的学習理論ワークショップ) 広瀬俊亮,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	本稿では、パラメータの推定も含めて大域的な最適解を得ることが出来るクラスタリング手法を提案する。クラスタリングを実行する際の困難として問題を大域的な最適化として定式化することが挙げられる。複数のクラスターへのインディケーターを求めるという問題をSpectral Clusteringのように固有値問題として定式化する場合、固有ベクトルの成分の符号が一定でないために負の確率(データ点のクラスターへの帰属確率)が現れるという問題がある。この為、大域的な最適解が得られても負の確率を抑制する為の処理が必要となり、最終的に得られる解は大域的な最適解ではなくなってしまう。一方で分布の形を仮定すれば凸最適化として定式化できるが、クラスタリング結果が仮定した分布の形に強く依存するのでその方法は実用的ではない。本稿ではこれらの困難を回避し大域的最適解を得られるクラスタリング手法を提案する。核となるアイデアは以下の通りである。第一に、分布の形を仮定しない。第二に、情報量の最大化という指標を採用する。第三に、クラスタリング問題を固有値問題と凸二次最適化とを用いて定式化する。固有値問題において確率そのものではなく確率振幅を固有ベクトルに対応させる。確率振幅とは、二乗すると確率になるものとして定義される。これにより、負の確率の発生を回避しつつ大域的な最適解が得られる。
抄録(英)	This paper addresses the issue of constructing a clustering formulation by which we can derive a global optimum including parameter values. When dealing with clustering problems, it is difficult to formulate them as global optimization. When we formulate clustering problems, where we derive multiple cluster indicators, as eigen value problem like Spectral Clustering, we face a difficulty that negative probabilities appear. This is because components of eigen vectors have not fixed sign. Thus it is necessary to conduct post processing for suppressing the negetive components. Due to the additional processing, final results are not global optimum though the solutions without processing are global optimum. On the other hand, it is possible to construct convex clustering formulations if we assume some probability distibutions. However formulations with probability distribution assumption are not very effective. This is because clustering results strongly depend on the assumed distributions. In this paper, we propose a clustering method, by which the above mentioned difficulties can be avoided. The key ideas are as follows. First, we do not assume any distributions. Second, we adopt information maximization criterion. Third, we formulate a clustering problem as a combination of eigen value problem and convex quadratic programming. In the eigen value problem, we do not recognize eigen vectors as probalities but recognize them as probability amplitude. It is defined that the square of probability amplitude is equal to probability. By adopting probability amplitudes, we can avoid the negative probabilities and thus we can construct global optimization formulation.
キーワード(和)	クラスタリング / 大域的最適化 / 固有値問題 / 確率振幅 / 二次計画 / 情報量最大化
キーワード(英)	clusteirng / global optimization / eigen value problem / probability amplitude / quadratic form / information maximization
資料番号	IBISML2012-41
発行日

研究会情報
研究会	IBISML
開催期間	2012/10/31(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Information-Based Induction Sciences and Machine Learning (IBISML)
本文の言語	JPN
タイトル（和）	二次形式の大域的最適化によるクラスタリング(第15回情報論的学習理論ワークショップ)
サブタイトル（和）
タイトル（英）	Clustering Method based on Global Optimization of Quadratic Forms
サブタイトル（和）
キーワード(1)（和/英）	クラスタリング / clusteirng
キーワード(2)（和/英）	大域的最適化 / global optimization
キーワード(3)（和/英）	固有値問題 / eigen value problem
キーワード(4)（和/英）	確率振幅 / probability amplitude
キーワード(5)（和/英）	二次計画 / quadratic form
キーワード(6)（和/英）	情報量最大化 / information maximization
第 1 著者氏名（和/英）	広瀬俊亮 / Shunsuke HIROSE
第 1 著者所属（和/英）	SAS Institute Japan株式会社コンサルティングサービス部 Consulting Services Department, SAS Institute Japan Ltd.
発表年月日	2012-11-07
資料番号	IBISML2012-41
巻番号（vol）	vol.112
号番号（no）	279
ページ範囲	pp.-
ページ数	6
発行日