Presentation 2003/10/30
Boosting with Subtree-based Decision Stumps and Its application to Semi-strucutred Text Classification(Natural Language Understanding and Models of Communication)
Taku Kudo, Yuji Matsumoto,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) The research focus in text classification has expanded from a simple topic identification to a more challenging task, such as opinion/modality identification. For the latter, the traditional bag-of-word representations are not sufficient, and a richer, more structural representation will be required. Accordingly, learning algorithms must be able to handle such sub-structures observed in text. In this paper, we propose a Boosting algorithm that captures sub-structures embedded in text. The proposal consists of i) decision stumps that use subtree as features and ii) Boosting algorithm in which the subtree-based decision stumps are applied as weak learners. We also discuss a relation between our algorithm and SVM with Tree Kernel. Three experiments on the opinion/modality classification tasks confirm that subtree features are important. Our Boosting algorithm is computationally efficient for classification tasks involving discrete structural features.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Decision Stumps / Boosting / Semi-structured Text Classification / Tree Kernel
Paper # NLC2003-33
Date of Issue

Conference Information
Committee NLC
Conference Date 2003/10/30(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Boosting with Subtree-based Decision Stumps and Its application to Semi-strucutred Text Classification(Natural Language Understanding and Models of Communication)
Sub Title (in English)
Keyword(1) Decision Stumps
Keyword(2) Boosting
Keyword(3) Semi-structured Text Classification
Keyword(4) Tree Kernel
1st Author's Name Taku Kudo
1st Author's Affiliation Graduate School of Information Science, Nara Institute of Science and Technology()
2nd Author's Name Yuji Matsumoto
2nd Author's Affiliation Graduate School of Information Science, Nara Institute of Science and Technology
Date 2003/10/30
Paper # NLC2003-33
Volume (vol) vol.103
Number (no) 407
Page pp.pp.-
#Pages 8
Date of Issue