文脈自由文法のチョムスキー標準形を用いた文法圧縮アルゴリズム

有村 光晴

講演名	2016-05-20 文脈自由文法のチョムスキー標準形を用いた文法圧縮アルゴリズム有村光晴(湘南工科大),
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	本稿では新しい，文法に基いた無歪み情報源符号のクラスを提案する．簡潔さのために，文脈自由文法の準チョムスキー標準形(Semi-Chomsky Normal Form, semi-CNF)を導入する．semi-CNFはCNFを変形したものであり，本稿の中で提案する．文法圧縮符号は文脈自由文法を用いたユニバーサルデータ圧縮アルゴリズムのクラスである．提案するアルゴリズムは，与えられた系列を，その系列のみを生成する既約な文法に二段階で変換する．第一段階では，シンボルもしくは変数のペアから新しい変数への置き換えを繰り返すことで，生成規則の集合のsemi-CNFを構成する．第二段階では，他の生成規則の中で一度しか用いられていない生成規則を削除することで，semi-CNFを既約もしくはより小さな文法に変換する．LZ78，Multilevel Pattern Matching (MPM)法，Byte Pair Encoding (BPE)法が，このクラスの符号として扱うことができるが，これらのアルゴリズムは，この二段階の手続きのうち最初のものしか用いていない．よって，提案する方法によって，統一した手続きでこれらのアルゴリズムの圧縮性能を向上させることができる．この手法には，semi-CNFを経由する二段階アルゴリズムを用いることで，与えられた系列から文法への変換が極めて簡潔で効率的であるという利点がある．
抄録(英)	This paper proposes a new class of grammar-based lossless source code. For the simplicity, a Semi-Chomsky Normal Form (semi-CNF) of context free grammar is introduced. A semi-CNF is a modified form of the CNF proposed in this paper. Grammar-based code is a class of universal data compression algorithm using a context-free grammar. The proposed algorithm translates a given sequence to the irreducible grammar generating only a given sequence in two step. In the first step, a semi-CNF of the set of production rules is constructed using repeated substitution from a pair of symbols or variables to a new variable. In the second step, semi-CNF is translated to an irreducible or smaller grammar by eliminating production rules which are used only once in the other production rules. LZ78, Multilevel Pattern Matching (MPM) and Byte Pair Encoding (BPE) algorithms can be treated as examples of this class of codes. These algorithms use only the first step of this two-step procedure. Therefore, the proposed method can improve the compression performance of these algorithms by the unified procedure. This method has an advantage that, transformation from a given sequence to the grammar is quite simple and efficient, by using the two step algorithm through semi-CNF.
キーワード(和)	無歪みデータ圧縮 / 文脈自由文法 / チョムスキー標準形
キーワード(英)	Lossless Data Compression / Context Free Grammar / Chomsky Normal Form
資料番号	IT2016-12,EMM2016-12
発行日	2016-05-12 (IT, EMM)

研究会情報
研究会	IT / EMM
開催期間	2016/5/19(から2日開催)
開催地（和）	小樽経済センター
開催地（英）	Otaru Economic Center
テーマ（和）	情報セキュリティ，情報理論，情報ハイディング，一般
テーマ（英）	Information Security, Information Theory, Information Hiding, etc.
委員長氏名（和）	大濱靖匡(電通大) / 伊藤彰則(東北大)
委員長氏名（英）	Yasutada Oohama(Univ. of Electro-Comm.) / Akinori Ito(Tohoku Univ.)
副委員長氏名（和）	和田山正(名工大) / 鵜木祐史(北陸先端大) / 川村正樹(山口大)
副委員長氏名（英）	Tadashi Wadayama(Nagoya Inst. of Tech.) / Masashi Unoki(JAIST) / Masaki Kawamura(Yamaguchi Univ.)
幹事氏名（和）	岩本貢(電通大) / 葛岡成晃(和歌山大) / 市野将嗣(電通大) / 薗田光太郎(長崎大)
幹事氏名（英）	Mitsugu Iwamoto(Univ. of Electro-Comm.) / Shigeaki Kuzuoka(Wakayama Univ.) / Masatsugu Ichino(Univ. of Electro-Comm.) / Kotaro Sonoda(Nagasaki Univ.)
幹事補佐氏名（和）	日下卓也(岡山大) / 岩田基(阪府大) / 河野和宏(関西大)
幹事補佐氏名（英）	Takuya Kusaka(Okayama Univ.) / Motoi Iwata(Osaka Pref. Univ.) / Kazuhiro Kohno(Kansai Univ.)

講演論文情報詳細
申込み研究会	Technical Committee on Information Theory / Technical Committee on Enriched MultiMedia
本文の言語	JPN
タイトル（和）	文脈自由文法のチョムスキー標準形を用いた文法圧縮アルゴリズム
サブタイトル（和）
タイトル（英）	A Grammar-Based Data Compression Algorithm Using Chomsky Normal Form of Context Free Grammar
サブタイトル（和）
キーワード(1)（和/英）	無歪みデータ圧縮 / Lossless Data Compression
キーワード(2)（和/英）	文脈自由文法 / Context Free Grammar
キーワード(3)（和/英）	チョムスキー標準形 / Chomsky Normal Form
第 1 著者氏名（和/英）	有村光晴 / Mitsuharu Arimura
第 1 著者所属（和/英）	湘南工科大学(略称：湘南工科大) Shonan Institute of Technology(略称：Shonan Inst. Tech.)
発表年月日	2016-05-20
資料番号	IT2016-12,EMM2016-12
巻番号（vol）	vol.116
号番号（no）	IT-33,EMM-34
ページ範囲	pp.69-74(IT), pp.69-74(EMM),
ページ数	6
発行日	2016-05-12 (IT, EMM)