Presentation 2014-11-18
Learning from Positive and Unlabeled Data 1 : Classifier Training and Theoretical Analysis
PLESSIS Marthinus Christoffel DU, Gang NIU, Masashi SUGIYAMA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Learning a classifier from positive and unlabeled data is an important class of classification problems that are conceivable in many practical applications. In this paper, we first show that this problem can be solved by cost-sensitive learning between positive and unlabeled data. Then we reveal that convex surrogate loss functions such as the hinge loss may lead to a wrong classification boundary due to an intrinsic bias, and show that the use of non-convex loss functions such as the ramp loss is essential to avoid this problem. We next analyze the excess risk when the class prior is estimated from data, and show that the classification accuracy is not sensitive to class prior estimation if the unlabeled data is dominated by the positive data (this is naturally satisfied in inlier-based outlier detection because inliers are dominant in the unlabeled dataset). Finally, we provide generalization error bounds and show that, for an equal number of samples labeled and unlabeled samples, the generalization error of learning only from positive and unlabeled samples is no worse than 2√<2> times the fully supervised case. These theoretical findings are also validated through experiments.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Classification / positive and unlabeled learning / class-prior estimation
Paper # IBISML2014-65
Date of Issue

Conference Information
Committee IBISML
Conference Date 2014/11/10(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Information-Based Induction Sciences and Machine Learning (IBISML)
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Learning from Positive and Unlabeled Data 1 : Classifier Training and Theoretical Analysis
Sub Title (in English)
Keyword(1) Classification
Keyword(2) positive and unlabeled learning
Keyword(3) class-prior estimation
1st Author's Name PLESSIS Marthinus Christoffel DU
1st Author's Affiliation Department of Complexity Science and Engineering, University of Tokyo()
2nd Author's Name Gang NIU
2nd Author's Affiliation Baidu Inc.
3rd Author's Name Masashi SUGIYAMA
3rd Author's Affiliation Department of Complexity Science and Engineering, University of Tokyo
Date 2014-11-18
Paper # IBISML2014-65
Volume (vol) vol.114
Number (no) 306
Page pp.pp.-
#Pages 7
Date of Issue