Presentation 2003/10/30
On Selection Criteria of Combinatorial Features for Machine Learning(Natural Language Understanding and Models of Communication)
Hideki ISOZAKI, Tsutomu HIRAO, Jun SUZUKI,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Machine Learning is used for various tasks of Natural Language Processing such as Named Entity Recognition, Important Sentence Extraction, and Dependency Analysis. Features for Machine Learning are found by trial and error. However, it is possible to find useful features by using statistical measures. For example, PrefixSpan finds frequent word patterns and TidalSMP finds useful feature combinations. Such combinatiorial features are often redundant and are not optimized for Machine Learning. Here, we show that a simple reranking method improves the performance of Machine Learning in two tasks: Important Sentence Extraction and English Dependency Analysis.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) PrefixSpan / TidalSMP / SVM / Feature Selection / Machine Learning / Mining
Paper # NLC2003-34
Date of Issue

Conference Information
Committee NLC
Conference Date 2003/10/30(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) On Selection Criteria of Combinatorial Features for Machine Learning(Natural Language Understanding and Models of Communication)
Sub Title (in English)
Keyword(1) PrefixSpan
Keyword(2) TidalSMP
Keyword(3) SVM
Keyword(4) Feature Selection
Keyword(5) Machine Learning
Keyword(6) Mining
1st Author's Name Hideki ISOZAKI
1st Author's Affiliation NTT Communication Science Laboratories, NTT Corporation()
2nd Author's Name Tsutomu HIRAO
2nd Author's Affiliation NTT Communication Science Laboratories, NTT Corporation
3rd Author's Name Jun SUZUKI
3rd Author's Affiliation NTT Communication Science Laboratories, NTT Corporation
Date 2003/10/30
Paper # NLC2003-34
Volume (vol) vol.103
Number (no) 407
Page pp.pp.-
#Pages 6
Date of Issue