終端記号の部分集合と最小記述長原理を用いた文法学習(一般,ヒューマンインタラクションとパターン認識・メディア解理・言語理解,一般)

木谷(クリス) 真実; 佐藤 洋一; 杉本 晃宏

講演名	2006-10-19 終端記号の部分集合と最小記述長原理を用いた文法学習(一般,ヒューマンインタラクションとパターン認識・メディア解理・言語理解,一般) 木谷(クリス) 真実, 佐藤洋一, 杉本晃宏,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	自然言語の構文解析に用いられている確率文脈自由文法は,映像による人物の行動解析にも使われており,その有効性が報告されている.しかしながら,文書の単語列とは異なり,映像から得られる人物行動の記号列には多くのノイズが含まれているため,行動文法の学習が困難になる.したがって,高精度の文法学習を行うためには,ノイズ記号を除外した終端記号集合を特定する必要がある.そこで本研究では,最小記述長原理にもとづき,ノイズを除外した終端記号集合とそれに伴う文法の獲得手法を提案する.提案手法では,終端記号の全組合せを評価し,各々の部分集合の下で得られた文法の複雑さと観測データの記号列尤度とのトレードオフを定量化する.これにより,評価値の高い終端記号集合と文法の候補を特定することができ,記号列に含まれるノイズを除去しつつ行動文法の基本構造の獲得を可能とする.シミュレーションデータを用いた実験により、提案手法の有効性を示す.
抄録(英)	Stochastic Context-Free Grammars (SCFG) have been shown to be useful for applications beyond natural language analysis, specifically vision-based human activity analysis. Vision-based symbol strings differ from natural language strings, in that a string of symbols produced by video often times contains noise symbols, making grammatical inference very difficult. In order to obtain reliable results from grammatical inference, it is necessary to identify these noise symbols. In our work, we propose a new technique for identifying the best subset of non-noise terminal symbols and acquiring the best activity grammar. Our approach uses the Minimum Description Length (MDL) principle, to evaluate the trade-offs between model complexity and data fit to quantify the difference between the results of each terminal subset. The evaluation results are then used to identify of a class of candidate terminal subsets and grammars that remove the noise and enable the discovery of the basic structure of an activity. In this paper, we present the validity of our proposed method based on experiments with synthetic data.
キーワード(和)	文法学習 / 構文解析 / 最小記述長原理 / 行動認識 / 文脈自由文法
キーワード(英)	Grammatical Inference / Syntactic Analysis / Minimum Description Length Principle / Activity Recognition / Context-Free Grammar
資料番号	PRMU2006-94
発行日

研究会情報
研究会	PRMU
開催期間	2006/10/12(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Pattern Recognition and Media Understanding (PRMU)
本文の言語	JPN
タイトル（和）	終端記号の部分集合と最小記述長原理を用いた文法学習(一般,ヒューマンインタラクションとパターン認識・メディア解理・言語理解,一般)
サブタイトル（和）
タイトル（英）	Grammar Learning Using Partial Sets of Terminals and the MDL Principle
サブタイトル（和）
キーワード(1)（和/英）	文法学習 / Grammatical Inference
キーワード(2)（和/英）	構文解析 / Syntactic Analysis
キーワード(3)（和/英）	最小記述長原理 / Minimum Description Length Principle
キーワード(4)（和/英）	行動認識 / Activity Recognition
キーワード(5)（和/英）	文脈自由文法 / Context-Free Grammar
第 1 著者氏名（和/英）	木谷(クリス) 真実 / Kris M. KITANI
第 1 著者所属（和/英）	東京大学生産技術研究所 Institute of Industrial Science, The University of Tokyo
第 2 著者氏名（和/英）	佐藤洋一 / Yoichi SATO
第 2 著者所属（和/英）	東京大学生産技術研究所 Institute of Industrial Science, The University of Tokyo
第 3 著者氏名（和/英）	杉本晃宏 / Akihiro SUGIMOTO
第 3 著者所属（和/英）	国立情報学研究所 National Institute of Informatics
発表年月日	2006-10-19
資料番号	PRMU2006-94
巻番号（vol）	vol.106
号番号（no）	300
ページ範囲	pp.-
ページ数	6
発行日