Presentation 2012/12/5
Variable-to-Fixed-Length Encoding for Large Texts Using a Re-Pair Algorithm with Shared Dictionaries
KEI SEKINE, HIROHITO SASAKAWA, SATOSHI YOSHIDA, TAKUYA KIDA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) The Re-Pair algorithm proposed by Larsson and Moffat in 1999 is a simple grammar-based com-pression method that achieves an extremely high compression ratio. However, Re-Pair is an offline and very space consuming algorithm. Thus, to apply it to a very large text, we need to divide the text into smaller blocks. Consequently, if we share a part of the dictionary among all blocks, we expect that the compression speed and ratio of the algorithm will improve. In this paper, we implemented our method with exploiting variable-to-fixed-length codes, and empirically show how the compression speed and ratio of the method vary by adjusting three parameters: block size, dictionary size, and size of shared dictionary. Finally, we discuss the tendencies of compression speed and ratio with respect to the three parameters.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) grammer compression / large text / blocked compression
Paper # Vol.2012-DBS-156No.7
Date of Issue

Conference Information
Committee DE
Conference Date 2012/12/5(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Data Engineering (DE)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Variable-to-Fixed-Length Encoding for Large Texts Using a Re-Pair Algorithm with Shared Dictionaries
Sub Title (in English)
Keyword(1) grammer compression
Keyword(2) large text
Keyword(3) blocked compression
1st Author's Name KEI SEKINE
1st Author's Affiliation ()
2nd Author's Name HIROHITO SASAKAWA
2nd Author's Affiliation
3rd Author's Name SATOSHI YOSHIDA
3rd Author's Affiliation
4th Author's Name TAKUYA KIDA
4th Author's Affiliation
Date 2012/12/5
Paper # Vol.2012-DBS-156No.7
Volume (vol) vol.112
Number (no) 346
Page pp.pp.-
#Pages 6
Date of Issue