Presentation | 2014-11-18 Learning from Positive and Unlabeled Data 1 : Classifier Training and Theoretical Analysis PLESSIS Marthinus Christoffel DU, Gang NIU, Masashi SUGIYAMA, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Learning a classifier from positive and unlabeled data is an important class of classification problems that are conceivable in many practical applications. In this paper, we first show that this problem can be solved by cost-sensitive learning between positive and unlabeled data. Then we reveal that convex surrogate loss functions such as the hinge loss may lead to a wrong classification boundary due to an intrinsic bias, and show that the use of non-convex loss functions such as the ramp loss is essential to avoid this problem. We next analyze the excess risk when the class prior is estimated from data, and show that the classification accuracy is not sensitive to class prior estimation if the unlabeled data is dominated by the positive data (this is naturally satisfied in inlier-based outlier detection because inliers are dominant in the unlabeled dataset). Finally, we provide generalization error bounds and show that, for an equal number of samples labeled and unlabeled samples, the generalization error of learning only from positive and unlabeled samples is no worse than 2√<2> times the fully supervised case. These theoretical findings are also validated through experiments. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Classification / positive and unlabeled learning / class-prior estimation |
Paper # | IBISML2014-65 |
Date of Issue |
Conference Information | |
Committee | IBISML |
---|---|
Conference Date | 2014/11/10(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Information-Based Induction Sciences and Machine Learning (IBISML) |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Learning from Positive and Unlabeled Data 1 : Classifier Training and Theoretical Analysis |
Sub Title (in English) | |
Keyword(1) | Classification |
Keyword(2) | positive and unlabeled learning |
Keyword(3) | class-prior estimation |
1st Author's Name | PLESSIS Marthinus Christoffel DU |
1st Author's Affiliation | Department of Complexity Science and Engineering, University of Tokyo() |
2nd Author's Name | Gang NIU |
2nd Author's Affiliation | Baidu Inc. |
3rd Author's Name | Masashi SUGIYAMA |
3rd Author's Affiliation | Department of Complexity Science and Engineering, University of Tokyo |
Date | 2014-11-18 |
Paper # | IBISML2014-65 |
Volume (vol) | vol.114 |
Number (no) | 306 |
Page | pp.pp.- |
#Pages | 7 |
Date of Issue |