Presentation | 2003/10/30 Multiple Document Summarization using Sequential Pattern Mining(Natural Language Understanding and Models of Communication) Tsutomu Hirao, Jun Suzuki, Hideki Isozaki, Eisaku Maeda, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In this paper, we propose a multiple document summarization method using a sequential pattern mining algorithm. We extract important sentences in the following way; First, extracting term patterns from target docment set by using PrefixS- pan. Second, identifying significant patterns based on x^2 statistics, Third, determining a sentence score using the patterns weighting based on TF・IDF. Moreover, we propose a kernel-based MMR (Maximal Marginal Relevance) for minimizing reduandant sentences. This method employs a similarity measure based on Extended String Subsequence kernel instead of cosine similarity. In addition, we define an evaluation measure for deta set includes redundant sentences, i.e., there are many sentences whose meaning are the same. The evaluation results show that our extraction method is better than conventional methods and the kernel-based MMR outperforms conventional MMR. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Multiple Document Summarization / Sequencial Pattern Mining / Kernel Methods / Maximal Marginal Rele-vance |
Paper # | NLC2003-30 |
Date of Issue |
Conference Information | |
Committee | NLC |
---|---|
Conference Date | 2003/10/30(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Natural Language Understanding and Models of Communication (NLC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Multiple Document Summarization using Sequential Pattern Mining(Natural Language Understanding and Models of Communication) |
Sub Title (in English) | |
Keyword(1) | Multiple Document Summarization |
Keyword(2) | Sequencial Pattern Mining |
Keyword(3) | Kernel Methods |
Keyword(4) | Maximal Marginal Rele-vance |
1st Author's Name | Tsutomu Hirao |
1st Author's Affiliation | NTT Communication Science Laboratories, NTT Corp() |
2nd Author's Name | Jun Suzuki |
2nd Author's Affiliation | NTT Communication Science Laboratories, NTT Corp |
3rd Author's Name | Hideki Isozaki |
3rd Author's Affiliation | NTT Communication Science Laboratories, NTT Corp |
4th Author's Name | Eisaku Maeda |
4th Author's Affiliation | NTT Communication Science Laboratories, NTT Corp |
Date | 2003/10/30 |
Paper # | NLC2003-30 |
Volume (vol) | vol.103 |
Number (no) | 407 |
Page | pp.pp.- |
#Pages | 8 |
Date of Issue |