0-gram汎用LVCSRと音素弁別特徴ベクトルを利用した対話音声認識の検討

伊勢路 真吾; 福田 隆; 桂田 浩一; 新田 恒雄

講演名	2002/12/13 0-gram汎用LVCSRと音素弁別特徴ベクトルを利用した対話音声認識の検討伊勢路真吾, 福田隆, 桂田浩一, 新田恒雄,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	本報告では汎用LVCSRソフトウェアを利用して,対話音声を高精度で認識する方法を提案する.提案方式は,LVCSRが出力する音素系列を弁別的な特徴ベクトル系列に変換した後,対話管理部が指示する対話記述(語彙と文法)を利用してキーワードをスポッティングする.本方式の特徴は以下の二点にある.(1)LVCSRの言語制約を緩めることにより(0-gram,挿入ペナルティ有),LVCSRの持つ高い音素識別能力を最大限に利用している.(2)音素系列出力を弁別的な特徴ベクトル系列に置き換え,キーワードスポッティングを行うことにより,置換・脱落・付加誤りに対処している.本文では,道案内タスクの対話音声データを用いて,言語モデルにおける言語制約の違い,サブワードモデルとの比較,および混同行列を用いた整合方式との比較を行い,提案方式の有効性を示す.
抄録(英)	This paper describes an attempt to recognize spontaneously spoken dialogue by using a general-purpose LVCSR software. In the proposed method, a phoneme string output from the LVCSR is converted into a sequence of vectors represented with distinctive phonetic features, then keywords assigned by a dialogue manager are detected from the input vector sequence. The method takes advantage of the potential abilities of: (1) precise phoneme discrimination achieved by relaxing the linguistic constraint in the LVCSR, and (2) coping with the issued of substitution, deletion and insertion errors by combining a conversion process from a phoneme into a distinctive phonetic feature vector and a key-word spotting process. The proposed method shows significant improvements in comparison with the LVCSR software in an experiment with a spoken dialogue corpus of a map guidance task.
キーワード(和)	音声対話 / LVCSR / キーワードスポッティング / 言語モデル / サブワードモデル / 音素弁別特徴 / 混同行列
キーワード(英)	Spoken Dialogue / LVCSR / Keyword Spotting / Language Model / Sub-word Model / Distinctive Phonetic Feature / Confusion Matrix
資料番号	NLC2002-79
発行日

研究会情報
研究会	NLC
開催期間	2002/12/13(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Natural Language Understanding and Models of Communication (NLC)
本文の言語	JPN
タイトル（和）	0-gram汎用LVCSRと音素弁別特徴ベクトルを利用した対話音声認識の検討
サブタイトル（和）
タイトル（英）	Recognition of Spontaneous Speech by Using a General-Purpose LVCSR with 0-gram and Distinctive Phonetic Features
サブタイトル（和）
キーワード(1)（和/英）	音声対話 / Spoken Dialogue
キーワード(2)（和/英）	LVCSR / LVCSR
キーワード(3)（和/英）	キーワードスポッティング / Keyword Spotting
キーワード(4)（和/英）	言語モデル / Language Model
キーワード(5)（和/英）	サブワードモデル / Sub-word Model
キーワード(6)（和/英）	音素弁別特徴 / Distinctive Phonetic Feature
キーワード(7)（和/英）	混同行列 / Confusion Matrix
第 1 著者氏名（和/英）	伊勢路真吾 / Shingo ISEJI
第 1 著者所属（和/英）	豊橋技術科学大学大学院工学研究科 Graduate School of Engineering, Toyohashi University of Technology
第 2 著者氏名（和/英）	福田隆 / Takashi FUKUDA
第 2 著者所属（和/英）	豊橋技術科学大学大学院工学研究科 Graduate School of Engineering, Toyohashi University of Technology
第 3 著者氏名（和/英）	桂田浩一 / Kouichi KATSURADA
第 3 著者所属（和/英）	豊橋技術科学大学大学院工学研究科 Graduate School of Engineering, Toyohashi University of Technology
第 4 著者氏名（和/英）	新田恒雄 / Tsuneo NITTA
第 4 著者所属（和/英）	豊橋技術科学大学大学院工学研究科 Graduate School of Engineering, Toyohashi University of Technology
発表年月日	2002/12/13
資料番号	NLC2002-79
巻番号（vol）	vol.102
号番号（no）	528
ページ範囲	pp.-
ページ数	6
発行日