Presentation | 1995/7/21 A Linear-Time Algorithm for Optimal Generalization of Language Data Hideki Tanaka, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | The proper treatment of structured attributes in inductive learning is getting much attention as this learning technique is now frequently applied to the knowledge extraction in natural language processing. In this context, the problem is finding a set of thesaurus nodes that maximally generalizes words in the learning source, but causes minimum errors. The number of candidate node sets, however, explodes as the thesaurus size increases, and no efficient algorithm has been discovered so far. In this paper, we propose the algorithm T^* which can find the optimal node sets in linear-time. This algorithm first converts the thesaurus into a directed acyclic graph changing this difficult problem into a shortest path problem with a graph where we can use an efficient algorithm. We then show that T^* can also be used to find the optimally pruned decision tree. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Machine Learning / Structured Attributes / Generalization / Thesaurus / Corpus / Machine Translation |
Paper # | |
Date of Issue |
Conference Information | |
Committee | NLC |
---|---|
Conference Date | 1995/7/21(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Natural Language Understanding and Models of Communication (NLC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | A Linear-Time Algorithm for Optimal Generalization of Language Data |
Sub Title (in English) | |
Keyword(1) | Machine Learning |
Keyword(2) | Structured Attributes |
Keyword(3) | Generalization |
Keyword(4) | Thesaurus |
Keyword(5) | Corpus |
Keyword(6) | Machine Translation |
1st Author's Name | Hideki Tanaka |
1st Author's Affiliation | NHK Science and Technical Research Laboratories() |
Date | 1995/7/21 |
Paper # | |
Volume (vol) | vol.95 |
Number (no) | 169 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |