Presentation | 2005-02-24 Optimal combination of labeled and unlabeled data for semi-supervised classification Akinori FUJINO, Naonori UEDA, Kazumi SAITO, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Unlabeled data are used to improve the accuracy of classifiers when the number of labeled data is not enough. In probabilistic approach, the ratio of the numbers of labeled and unlabeled data used for training affects the accuracy of the classifiers, and therefore the ratio should be adjusted to effectively use the unlabeled data. We propose a new method for determining the optimal ratio based on maximum entropy principle. Through text classification experiments using three sets of real data, we have confirmed the usefulness of the proposed method. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | EM algorithm / maximum entropy principle / naive Bayes model / text classification |
Paper # | NLC2004-100,PRMU2004-182 |
Date of Issue |
Conference Information | |
Committee | NLC |
---|---|
Conference Date | 2005/2/17(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Natural Language Understanding and Models of Communication (NLC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Optimal combination of labeled and unlabeled data for semi-supervised classification |
Sub Title (in English) | |
Keyword(1) | EM algorithm |
Keyword(2) | maximum entropy principle |
Keyword(3) | naive Bayes model |
Keyword(4) | text classification |
1st Author's Name | Akinori FUJINO |
1st Author's Affiliation | NTT Communication Science Laboratories, NTT Corporation() |
2nd Author's Name | Naonori UEDA |
2nd Author's Affiliation | NTT Communication Science Laboratories, NTT Corporation |
3rd Author's Name | Kazumi SAITO |
3rd Author's Affiliation | NTT Communication Science Laboratories, NTT Corporation |
Date | 2005-02-24 |
Paper # | NLC2004-100,PRMU2004-182 |
Volume (vol) | vol.104 |
Number (no) | 667 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |