講演名 2014-11-18
Learning from Positive and Unlabeled Data 1 : Classifier Training and Theoretical Analysis
,
PDFダウンロードページ PDFダウンロードページへ
抄録(和)
抄録(英) Learning a classifier from positive and unlabeled data is an important class of classification problems that are conceivable in many practical applications. In this paper, we first show that this problem can be solved by cost-sensitive learning between positive and unlabeled data. Then we reveal that convex surrogate loss functions such as the hinge loss may lead to a wrong classification boundary due to an intrinsic bias, and show that the use of non-convex loss functions such as the ramp loss is essential to avoid this problem. We next analyze the excess risk when the class prior is estimated from data, and show that the classification accuracy is not sensitive to class prior estimation if the unlabeled data is dominated by the positive data (this is naturally satisfied in inlier-based outlier detection because inliers are dominant in the unlabeled dataset). Finally, we provide generalization error bounds and show that, for an equal number of samples labeled and unlabeled samples, the generalization error of learning only from positive and unlabeled samples is no worse than 2√<2> times the fully supervised case. These theoretical findings are also validated through experiments.
キーワード(和)
キーワード(英) Classification / positive and unlabeled learning / class-prior estimation
資料番号 IBISML2014-65
発行日

研究会情報
研究会 IBISML
開催期間 2014/11/10(から1日開催)
開催地(和)
開催地(英)
テーマ(和)
テーマ(英)
委員長氏名(和)
委員長氏名(英)
副委員長氏名(和)
副委員長氏名(英)
幹事氏名(和)
幹事氏名(英)
幹事補佐氏名(和)
幹事補佐氏名(英)

講演論文情報詳細
申込み研究会 Information-Based Induction Sciences and Machine Learning (IBISML)
本文の言語 ENG
タイトル(和)
サブタイトル(和)
タイトル(英) Learning from Positive and Unlabeled Data 1 : Classifier Training and Theoretical Analysis
サブタイトル(和)
キーワード(1)(和/英) / Classification
第 1 著者 氏名(和/英) / PLESSIS Marthinus Christoffel DU
第 1 著者 所属(和/英)
Department of Complexity Science and Engineering, University of Tokyo
発表年月日 2014-11-18
資料番号 IBISML2014-65
巻番号(vol) vol.114
号番号(no) 306
ページ範囲 pp.-
ページ数 7
発行日