SP2000-23 言語モデルの役割と韻律情報との相互作用を用いた大語彙連続音声認識の探索空間の最適制御

李 時旭; 広瀬 啓吉

講演名	2000/6/16 SP2000-23 言語モデルの役割と韻律情報との相互作用を用いた大語彙連続音声認識の探索空間の最適制御李時旭, 広瀬啓吉,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	情報源としての韻律-統語境界は音声認識における探索空間の制約に利用することが可能である。本論文では大語彙連続音声認識における単語間遷移と単語内部遷移を考慮した動的ビーム探索の方法を提案し、既存の認識デコーダに韻律-統語境界を導入することを考慮する。大語彙連続音声認識における木構造単語辞書の問題は言語モデルのスコアの適用がビーム探索の音響モデルのスコアの適用と整合されないことであり、巨大な探索空間を必要とする。本稿ではビーム幅を選択する場合の言語モデルの影響と韻律-統語境界情報を利用する戦略を延べ、効率的に計算量を減少させる方法を提案する。評価実験は日本語新聞記事文章2万単語タスクとn-gram言語モデルに対し行ない、その結果から提案したアルゴリズムの有効性を証明する。
抄録(英)	Prosodic-syntactic boundary as an information source can be used to constrain the search space for automatic speech recognition. This paper presents a dynamic beam search strategy regarding within-word and cross-word path in Large Vocabulary Continuous Speech Recognition(LVCSR)and an approach which interleaves prosodic-syntactic boundary information to acoustic-phonetic decoding. Firstly, we report analysis of the pruning procedures and of the search interaction with language medel. The problem with tree-structured lexicon in LVCSR is that the application of the language model score is not matched to time alignment in data-driven beam search and results in a huge search space. To reduce computational effort and get efficient search space, we introduce prosodic boundary information in LVCSR. In this paper, we address the effect of a language model in setting beam width and dynamic beam search strategy using prosodic-syntactic information. The recognition experiments, carried out on the Japanese Newspaper Article Sentences(JNAS)20, 000-ward task and a n-gram language model, demonstrated that, in comparison to the static beam search strategy, the proposed method led to a significant reduction in the search space.
キーワード(和)	大語彙連続音声認識 / 高効率探索 / 韻律情報
キーワード(英)	Large Vocabulary Continuous Speech Recognition(LVCSR) / Efficient search / prosodic information
資料番号	SP2000-23
発行日

研究会情報
研究会	SP
開催期間	2000/6/16(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Speech (SP)
本文の言語	JPN
タイトル（和）	SP2000-23 言語モデルの役割と韻律情報との相互作用を用いた大語彙連続音声認識の探索空間の最適制御
サブタイトル（和）
タイトル（英）	SP2000-23 Efficient control of LVCSR search space using prosodic information with considerations on the interaction of language model
サブタイトル（和）
キーワード(1)（和/英）	大語彙連続音声認識 / Large Vocabulary Continuous Speech Recognition(LVCSR)
キーワード(2)（和/英）	高効率探索 / Efficient search
キーワード(3)（和/英）	韻律情報 / prosodic information
第 1 著者氏名（和/英）	李時旭 / Shi-wook Lee
第 1 著者所属（和/英）	東京大学大学院工学系研究科 School of Engineering, University of Tokyo
第 2 著者氏名（和/英）	広瀬啓吉 / Keikichi Hirose
第 2 著者所属（和/英）	東京大学大学院新領域創成科学研究科 School of Frontier Sciences, University of Tokyo
発表年月日	2000/6/16
資料番号	SP2000-23
巻番号（vol）	vol.100
号番号（no）	137
ページ範囲	pp.-
ページ数	6
発行日