Presentation 2005-02-24
Optimal combination of labeled and unlabeled data for semi-supervised classification
Akinori FUJINO, Naonori UEDA, Kazumi SAITO,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Unlabeled data are used to improve the accuracy of classifiers when the number of labeled data is not enough. In probabilistic approach, the ratio of the numbers of labeled and unlabeled data used for training affects the accuracy of the classifiers, and therefore the ratio should be adjusted to effectively use the unlabeled data. We propose a new method for determining the optimal ratio based on maximum entropy principle. Through text classification experiments using three sets of real data, we have confirmed the usefulness of the proposed method.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) EM algorithm / maximum entropy principle / naive Bayes model / text classification
Paper # NLC2004-100,PRMU2004-182
Date of Issue

Conference Information
Committee NLC
Conference Date 2005/2/17(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Optimal combination of labeled and unlabeled data for semi-supervised classification
Sub Title (in English)
Keyword(1) EM algorithm
Keyword(2) maximum entropy principle
Keyword(3) naive Bayes model
Keyword(4) text classification
1st Author's Name Akinori FUJINO
1st Author's Affiliation NTT Communication Science Laboratories, NTT Corporation()
2nd Author's Name Naonori UEDA
2nd Author's Affiliation NTT Communication Science Laboratories, NTT Corporation
3rd Author's Name Kazumi SAITO
3rd Author's Affiliation NTT Communication Science Laboratories, NTT Corporation
Date 2005-02-24
Paper # NLC2004-100,PRMU2004-182
Volume (vol) vol.104
Number (no) 667
Page pp.pp.-
#Pages 6
Date of Issue