Presentation 2003/10/30
Multiple Document Summarization using Sequential Pattern Mining(Natural Language Understanding and Models of Communication)
Tsutomu Hirao, Jun Suzuki, Hideki Isozaki, Eisaku Maeda,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this paper, we propose a multiple document summarization method using a sequential pattern mining algorithm. We extract important sentences in the following way; First, extracting term patterns from target docment set by using PrefixS- pan. Second, identifying significant patterns based on x^2 statistics, Third, determining a sentence score using the patterns weighting based on TF・IDF. Moreover, we propose a kernel-based MMR (Maximal Marginal Relevance) for minimizing reduandant sentences. This method employs a similarity measure based on Extended String Subsequence kernel instead of cosine similarity. In addition, we define an evaluation measure for deta set includes redundant sentences, i.e., there are many sentences whose meaning are the same. The evaluation results show that our extraction method is better than conventional methods and the kernel-based MMR outperforms conventional MMR.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Multiple Document Summarization / Sequencial Pattern Mining / Kernel Methods / Maximal Marginal Rele-vance
Paper # NLC2003-30
Date of Issue

Conference Information
Committee NLC
Conference Date 2003/10/30(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Multiple Document Summarization using Sequential Pattern Mining(Natural Language Understanding and Models of Communication)
Sub Title (in English)
Keyword(1) Multiple Document Summarization
Keyword(2) Sequencial Pattern Mining
Keyword(3) Kernel Methods
Keyword(4) Maximal Marginal Rele-vance
1st Author's Name Tsutomu Hirao
1st Author's Affiliation NTT Communication Science Laboratories, NTT Corp()
2nd Author's Name Jun Suzuki
2nd Author's Affiliation NTT Communication Science Laboratories, NTT Corp
3rd Author's Name Hideki Isozaki
3rd Author's Affiliation NTT Communication Science Laboratories, NTT Corp
4th Author's Name Eisaku Maeda
4th Author's Affiliation NTT Communication Science Laboratories, NTT Corp
Date 2003/10/30
Paper # NLC2003-30
Volume (vol) vol.103
Number (no) 407
Page pp.pp.-
#Pages 8
Date of Issue