Presentation 1996/7/18
Exploiting Text Structure for Topic Identification
Tadashi Nomoto, Yuji Matsumoto,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) The paper demonstrates how information on text structure can be used to improve the performance on the identification of topical words in texts, which is based on a probabilistic model of text categorization. We use texts which are not explicitly structured. A text structure is identified by measuring the similarity between segments comprising the text and its title. It is shown that a text structure thus identified gives a good clue to finding out parts of the text most relevant to its content. The significance of exploiting information on the structure for topic identification is demonstrated by a set of experiments conducted on the 19Mb of Japanese newspaper articles. The paper also brings concepts from the rhetorical structure theory (RST) to the statistical analysis of a text structure. Finally, it is shown that information on text structure is more effective for large documents than for small documents.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Japanese / discourse / text categorization / topic identification / text structuring
Paper # NLC96-16
Date of Issue

Conference Information
Committee NLC
Conference Date 1996/7/18(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Exploiting Text Structure for Topic Identification
Sub Title (in English)
Keyword(1) Japanese
Keyword(2) discourse
Keyword(3) text categorization
Keyword(4) topic identification
Keyword(5) text structuring
1st Author's Name Tadashi Nomoto
1st Author's Affiliation Advanced Research Laboratory, Hitachi Ltd.()
2nd Author's Name Yuji Matsumoto
2nd Author's Affiliation Nara Institute of Science and Technology
Date 1996/7/18
Paper # NLC96-16
Volume (vol) vol.96
Number (no) 157
Page pp.pp.-
#Pages 8
Date of Issue