意味・構文クラスタリングを用いた統計文法の獲得

荒井 和博; Wight Jeremy H.; Riccardi Giuseppe; Gorin Allen L.

講演名	1997/12/12 意味・構文クラスタリングを用いた統計文法の獲得荒井和博, Wight Jeremy H., Riccardi Giuseppe, Gorin Allen L.,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	本稿では意味的にも構文的にも類似した単語列の集合をクラスタとして生成し, 自由発話の音声理解に用いるクラスタリングアルゴリズムについて述べる. 本手法では, まず学習データ内で高い頻度で観測される単語列がクラスタリング候補として選択される. 次に選択された候補単語列のそれぞれに対して, 先行コンテキスト, 後続コンテキスト及ぴ対話システム応答に関する確率分布が求められる. 候補単語列間の類似性は3種類の確率分布それぞれのKullback-Leibler距離によって定義され, 各候補単語列間に3つの距離が求められる. 単語列のクラスタリングにおいては, 3種類の距離がいずれも短い単語列どうしが同一のクラスタにまとめられ, 単語列クラスタが構成される. 本手法をAT&Tの電話サービス案内タスクに適用したところ, 学習データ内では観測されず評価データ内のみで観測された単語列を246個生成できた. また, コールタイプ識別率(音声理解率)が約3%改善され, 本手法の有効性が確認された.
抄録(英)	A new method for automatically acquiring grammar fragments for understanding fluently spoken language is proposed. The goal of this method is to generate a collection of grammar fragments each representing a set of syntactically and semantically similar phrases. First phrases observed frequently in the training set are selected as candidates. Each candidate phrase has three associated probability distributions: of succeeding contexts, of preceding contexts, and of associated machine actions. The similarity between candidate phrases is measured by applying the Kullback-Leibler distance to three probability distributions. Candidate phrases which are close in all three distances are clustered into a grammar fragment. This approach detected 246 phrases in the test-set that were not present in the training-set. Experimental results show that a 3% improvement in the call-type classification performance has been achieved by introducing these fragments.
キーワード(和)	音声理解 / 先行・後続コンテキスト / Kullback-Leibler距離 / 単語列の類似性 / 単語列クラスタリング
キーワード(英)	spoken understanding / preceding and succeeding contexts / Kullback-Leibler distance / phrase similarity / phrase clustering
資料番号	SP97-84
発行日

研究会情報
研究会	SP
開催期間	1997/12/12(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Speech (SP)
本文の言語	ENG
タイトル（和）	意味・構文クラスタリングを用いた統計文法の獲得
サブタイトル（和）
タイトル（英）	Grammar Fragment Acquisition using Syntactic and Semantic Clustering
サブタイトル（和）
キーワード(1)（和/英）	音声理解 / spoken understanding
キーワード(2)（和/英）	先行・後続コンテキスト / preceding and succeeding contexts
キーワード(3)（和/英）	Kullback-Leibler距離 / Kullback-Leibler distance
キーワード(4)（和/英）	単語列の類似性 / phrase similarity
キーワード(5)（和/英）	単語列クラスタリング / phrase clustering
第 1 著者氏名（和/英）	荒井和博 / Kazuhiro Arai
第 1 著者所属（和/英）	NTTヒューマンインタフェース研究所 NTT Human Interface Laboratories
第 2 著者氏名（和/英）	Wight Jeremy H. / Jeremy H. Wight
第 2 著者所属（和/英）	AT&T Laboratories-Research AT&T Laboratories-Research
第 3 著者氏名（和/英）	Riccardi Giuseppe / Giuseppe Riccardi
第 3 著者所属（和/英）	AT&T Laboratories-Research AT&T Laboratories-Research
第 4 著者氏名（和/英）	Gorin Allen L. / Allen L. Gorin
第 4 著者所属（和/英）	AT&T Laboratories-Research AT&T Laboratories-Research
発表年月日	1997/12/12
資料番号	SP97-84
巻番号（vol）	vol.97
号番号（no）	442
ページ範囲	pp.-
ページ数	8
発行日