Presentation 2016-05-20
A Grammar-Based Data Compression Algorithm Using Chomsky Normal Form of Context Free Grammar
Mitsuharu Arimura,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper proposes a new class of grammar-based lossless source code. For the simplicity, a Semi-Chomsky Normal Form (semi-CNF) of context free grammar is introduced. A semi-CNF is a modified form of the CNF proposed in this paper. Grammar-based code is a class of universal data compression algorithm using a context-free grammar. The proposed algorithm translates a given sequence to the irreducible grammar generating only a given sequence in two step. In the first step, a semi-CNF of the set of production rules is constructed using repeated substitution from a pair of symbols or variables to a new variable. In the second step, semi-CNF is translated to an irreducible or smaller grammar by eliminating production rules which are used only once in the other production rules. LZ78, Multilevel Pattern Matching (MPM) and Byte Pair Encoding (BPE) algorithms can be treated as examples of this class of codes. These algorithms use only the first step of this two-step procedure. Therefore, the proposed method can improve the compression performance of these algorithms by the unified procedure. This method has an advantage that, transformation from a given sequence to the grammar is quite simple and efficient, by using the two step algorithm through semi-CNF.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Lossless Data Compression / Context Free Grammar / Chomsky Normal Form
Paper # IT2016-12,EMM2016-12
Date of Issue 2016-05-12 (IT, EMM)

Conference Information
Committee IT / EMM
Conference Date 2016/5/19(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Otaru Economic Center
Topics (in Japanese) (See Japanese page)
Topics (in English) Information Security, Information Theory, Information Hiding, etc.
Chair Yasutada Oohama(Univ. of Electro-Comm.) / Akinori Ito(Tohoku Univ.)
Vice Chair Tadashi Wadayama(Nagoya Inst. of Tech.) / Masashi Unoki(JAIST) / Masaki Kawamura(Yamaguchi Univ.)
Secretary Tadashi Wadayama(Univ. of Electro-Comm.) / Masashi Unoki(Wakayama Univ.) / Masaki Kawamura(Univ. of Electro-Comm.)
Assistant Takuya Kusaka(Okayama Univ.) / Motoi Iwata(Osaka Pref. Univ.) / Kazuhiro Kohno(Kansai Univ.)

Paper Information
Registration To Technical Committee on Information Theory / Technical Committee on Enriched MultiMedia
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) A Grammar-Based Data Compression Algorithm Using Chomsky Normal Form of Context Free Grammar
Sub Title (in English)
Keyword(1) Lossless Data Compression
Keyword(2) Context Free Grammar
Keyword(3) Chomsky Normal Form
1st Author's Name Mitsuharu Arimura
1st Author's Affiliation Shonan Institute of Technology(Shonan Inst. Tech.)
Date 2016-05-20
Paper # IT2016-12,EMM2016-12
Volume (vol) vol.116
Number (no) IT-33,EMM-34
Page pp.pp.69-74(IT), pp.69-74(EMM),
#Pages 6
Date of Issue 2016-05-12 (IT, EMM)