Presentation | 2003/10/30 Boosting with Subtree-based Decision Stumps and Its application to Semi-strucutred Text Classification(Natural Language Understanding and Models of Communication) Taku Kudo, Yuji Matsumoto, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | The research focus in text classification has expanded from a simple topic identification to a more challenging task, such as opinion/modality identification. For the latter, the traditional bag-of-word representations are not sufficient, and a richer, more structural representation will be required. Accordingly, learning algorithms must be able to handle such sub-structures observed in text. In this paper, we propose a Boosting algorithm that captures sub-structures embedded in text. The proposal consists of i) decision stumps that use subtree as features and ii) Boosting algorithm in which the subtree-based decision stumps are applied as weak learners. We also discuss a relation between our algorithm and SVM with Tree Kernel. Three experiments on the opinion/modality classification tasks confirm that subtree features are important. Our Boosting algorithm is computationally efficient for classification tasks involving discrete structural features. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Decision Stumps / Boosting / Semi-structured Text Classification / Tree Kernel |
Paper # | NLC2003-33 |
Date of Issue |
Conference Information | |
Committee | NLC |
---|---|
Conference Date | 2003/10/30(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Natural Language Understanding and Models of Communication (NLC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Boosting with Subtree-based Decision Stumps and Its application to Semi-strucutred Text Classification(Natural Language Understanding and Models of Communication) |
Sub Title (in English) | |
Keyword(1) | Decision Stumps |
Keyword(2) | Boosting |
Keyword(3) | Semi-structured Text Classification |
Keyword(4) | Tree Kernel |
1st Author's Name | Taku Kudo |
1st Author's Affiliation | Graduate School of Information Science, Nara Institute of Science and Technology() |
2nd Author's Name | Yuji Matsumoto |
2nd Author's Affiliation | Graduate School of Information Science, Nara Institute of Science and Technology |
Date | 2003/10/30 |
Paper # | NLC2003-33 |
Volume (vol) | vol.103 |
Number (no) | 407 |
Page | pp.pp.- |
#Pages | 8 |
Date of Issue |