Learning from Positive and Unlabeled Data 1 : Classifier Training and Theoretical Analysis

Presentation	2014-11-18 Learning from Positive and Unlabeled Data 1 : Classifier Training and Theoretical Analysis PLESSIS Marthinus Christoffel DU, Gang NIU, Masashi SUGIYAMA,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Learning a classifier from positive and unlabeled data is an important class of classification problems that are conceivable in many practical applications. In this paper, we first show that this problem can be solved by cost-sensitive learning between positive and unlabeled data. Then we reveal that convex surrogate loss functions such as the hinge loss may lead to a wrong classification boundary due to an intrinsic bias, and show that the use of non-convex loss functions such as the ramp loss is essential to avoid this problem. We next analyze the excess risk when the class prior is estimated from data, and show that the classification accuracy is not sensitive to class prior estimation if the unlabeled data is dominated by the positive data (this is naturally satisfied in inlier-based outlier detection because inliers are dominant in the unlabeled dataset). Finally, we provide generalization error bounds and show that, for an equal number of samples labeled and unlabeled samples, the generalization error of learning only from positive and unlabeled samples is no worse than 2√<2> times the fully supervised case. These theoretical findings are also validated through experiments.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Classification / positive and unlabeled learning / class-prior estimation
Paper #	IBISML2014-65
Date of Issue

Conference Information
Committee	IBISML
Conference Date	2014/11/10(1days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To	Information-Based Induction Sciences and Machine Learning (IBISML)
Language	ENG
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Learning from Positive and Unlabeled Data 1 : Classifier Training and Theoretical Analysis
Sub Title (in English)
Keyword(1)	Classification
Keyword(2)	positive and unlabeled learning
Keyword(3)	class-prior estimation
1st Author's Name	PLESSIS Marthinus Christoffel DU
1st Author's Affiliation	Department of Complexity Science and Engineering, University of Tokyo()
2nd Author's Name	Gang NIU
2nd Author's Affiliation	Baidu Inc.
3rd Author's Name	Masashi SUGIYAMA
3rd Author's Affiliation	Department of Complexity Science and Engineering, University of Tokyo
Date	2014-11-18
Paper #	IBISML2014-65
Volume (vol)	vol.114
Number (no)	306
Page	pp.pp.-
#Pages	7
Date of Issue