Presentation 2000/7/19
A Note on the Compressed Suffix Arrays
Kunihiko Sadakane,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) The suffix array, a kind of full-text indices, is more space-efficient than other full-text indices;however it is still larger than word indices such as the inverted files. Though the recently proposed compressed suffix array solved the problem in part, its index size is always larger than the text because it uses the text itself for keyword queries. In this paper we extend the text search algorithm using the compressed suffix array in order to work without the text itself. We also propose an algorithm to recover the whole or in part of the text. By using the algorithms both text comression and fast queries can be achieved. Experimental results on a large collection of English and Japanese texts and DNA sequences show that the compressed suffix array is more effective than using the inverted files or sequential searches if the number of occurrences of keywords is not too large.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) full-text index / suffix array / compression
Paper # DE2000-29
Date of Issue

Conference Information
Committee DE
Conference Date 2000/7/19(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Data Engineering (DE)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) A Note on the Compressed Suffix Arrays
Sub Title (in English)
Keyword(1) full-text index
Keyword(2) suffix array
Keyword(3) compression
1st Author's Name Kunihiko Sadakane
1st Author's Affiliation Department of System Information Science Graduate School of Information Sciences, Tohoku University()
Date 2000/7/19
Paper # DE2000-29
Volume (vol) vol.100
Number (no) 226
Page pp.pp.-
#Pages 8
Date of Issue