DE2000-29 圧縮接尾辞配列に関する考察

Presentation	2000/7/19 A Note on the Compressed Suffix Arrays Kunihiko Sadakane,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	The suffix array, a kind of full-text indices, is more space-efficient than other full-text indices;however it is still larger than word indices such as the inverted files. Though the recently proposed compressed suffix array solved the problem in part, its index size is always larger than the text because it uses the text itself for keyword queries. In this paper we extend the text search algorithm using the compressed suffix array in order to work without the text itself. We also propose an algorithm to recover the whole or in part of the text. By using the algorithms both text comression and fast queries can be achieved. Experimental results on a large collection of English and Japanese texts and DNA sequences show that the compressed suffix array is more effective than using the inverted files or sequential searches if the number of occurrences of keywords is not too large.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	full-text index / suffix array / compression
Paper #	DE2000-29
Date of Issue

Paper Information
Registration To	Data Engineering (DE)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	A Note on the Compressed Suffix Arrays
Sub Title (in English)
Keyword(1)	full-text index
Keyword(2)	suffix array
Keyword(3)	compression
1st Author's Name	Kunihiko Sadakane
1st Author's Affiliation	Department of System Information Science Graduate School of Information Sciences, Tohoku University()
Date	2000/7/19
Paper #	DE2000-29
Volume (vol)	vol.100
Number (no)	226
Page	pp.pp.-
#Pages	8
Date of Issue