Presentation | 1996/7/18 Exploiting Text Structure for Topic Identification Tadashi Nomoto, Yuji Matsumoto, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | The paper demonstrates how information on text structure can be used to improve the performance on the identification of topical words in texts, which is based on a probabilistic model of text categorization. We use texts which are not explicitly structured. A text structure is identified by measuring the similarity between segments comprising the text and its title. It is shown that a text structure thus identified gives a good clue to finding out parts of the text most relevant to its content. The significance of exploiting information on the structure for topic identification is demonstrated by a set of experiments conducted on the 19Mb of Japanese newspaper articles. The paper also brings concepts from the rhetorical structure theory (RST) to the statistical analysis of a text structure. Finally, it is shown that information on text structure is more effective for large documents than for small documents. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Japanese / discourse / text categorization / topic identification / text structuring |
Paper # | NLC96-16 |
Date of Issue |
Conference Information | |
Committee | NLC |
---|---|
Conference Date | 1996/7/18(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Natural Language Understanding and Models of Communication (NLC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Exploiting Text Structure for Topic Identification |
Sub Title (in English) | |
Keyword(1) | Japanese |
Keyword(2) | discourse |
Keyword(3) | text categorization |
Keyword(4) | topic identification |
Keyword(5) | text structuring |
1st Author's Name | Tadashi Nomoto |
1st Author's Affiliation | Advanced Research Laboratory, Hitachi Ltd.() |
2nd Author's Name | Yuji Matsumoto |
2nd Author's Affiliation | Nara Institute of Science and Technology |
Date | 1996/7/18 |
Paper # | NLC96-16 |
Volume (vol) | vol.96 |
Number (no) | 157 |
Page | pp.pp.- |
#Pages | 8 |
Date of Issue |