Presentation | 2007/7/17 Extracting Low Frequency Terms Using Substring Perplexities Yasuhide MIURA, Hiroshi MASUICHI, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | This paper describes a extraction method of low frequency domain specific terms, using substring perplexities. When a string is given, n-grams of characters that compose the string are extracted, and their perplexities in a given corpus are calculated. Similarly, n-grams of characters that appear beside the string and their perplexities are extracted. The ratio of these two kinds of perplexities is set as a score that represents the word fitness of the string. As an experiment, n-grams that compose entries in a disease dictionary and a anatomy dictionary, and appear 5 times or less in the corpus of about 67,000 medical texts are scored with the proposed method. In comparison, the same n-grams are scored with TermExtract. The result is, the average accuracy of 70.4% is gained with 1-gram scoring, and 83.5% is gained with 2-gram scoring, and is better compared with 70.6% of that of TermExtract. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Perplexity / Term Extraction / Named Entity Extraction |
Paper # | NLC2007-24 |
Date of Issue |
Conference Information | |
Committee | NLC |
---|---|
Conference Date | 2007/7/17(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Natural Language Understanding and Models of Communication (NLC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Extracting Low Frequency Terms Using Substring Perplexities |
Sub Title (in English) | |
Keyword(1) | Perplexity |
Keyword(2) | Term Extraction |
Keyword(3) | Named Entity Extraction |
1st Author's Name | Yasuhide MIURA |
1st Author's Affiliation | Corporate Research Group, Fuji Xerox Co., Ltd.() |
2nd Author's Name | Hiroshi MASUICHI |
2nd Author's Affiliation | Corporate Research Group, Fuji Xerox Co., Ltd. |
Date | 2007/7/17 |
Paper # | NLC2007-24 |
Volume (vol) | vol.107 |
Number (no) | 158 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |