話者クラス音響モデルを用いた講演音声認識におけるクラスタリング手法の各種検討(ポスターセッション)

今野 和樹; 大山 拓也; 加藤 正治; 小坂 哲夫

講演名	2012/12/13 話者クラス音響モデルを用いた講演音声認識におけるクラスタリング手法の各種検討(ポスターセッション) 今野和樹, 大山拓也, 加藤正治, 小坂哲夫,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	本稿では,話し言葉音声認識の性能向上を目指し,クラスタ数100以上の大規模な話者クラスタリングによる話者クラス音響モデルの検討を行った.この際,1クラスタ当たりの学習データの減少を防ぐため,1人の話者が複数のクラスタに属すことを許すソフトクラスタリングを用いた.認識結果は,話者クラス音響モデルの数だけ得られるため,その中から最終的な認識結果を選択する必要がある.その選択方法として,各話者クラス音響モデル間で最大尤度を示す話者クラス音響モデルの認識結果を最終的な認識結果とする方法を用いた.モデルの選択は,話者毎及び発話毎の2種類を行った.以上の評価を日本語話し言葉コーパス(CSJ)を用いて行った。認識実験の結果,べ一スラインの単語誤り率21.08%に対し,提案手法によるクラスタリングで,単語誤り率20.59%(話者毎のモデル選択)と単語誤り率20.69%(発話毎のモデル選択)を得た.以上より,話し言葉音声認識において,提案手法が有効であることが分かった.
抄録(英)	In this paper, we have examined speaker clustering method using more than 100 clusters in order to improve the performance of spontaneous speech recognition. In this method, we use a soft clustering algorithm that allows a speaker to belong to more than one cluster in order to prevent a decrease in amount of training data per cluster. In the recognition procedure, the system needs to select one recognition result from the results of each speaker-class model. The selection can be conducted on the basis of the maximum likelihood among speaker-class model. In this work, we carry out two types of selection method; one is the method that selects the model every speaker and the other is the method that selects the model every utterance. The evaluation is conducted on CSJ (Corpus of Spontaneous Japanese). As the results, a word error rate of 21.08% was obtained in the baseline experiment. Meanwhile, 20.59%(selection every speaker) and 20.69%(selection every utterance) were obtained by using the proposed method. The results showed that the proposed method was effective for spontaneous speech recognition.
キーワード(和)	大語彙連続音声認識 / 話者クラス音響モデル / ハードクラスタリング / ソフトクラスタリング
キーワード(英)	LVCSR / speaker-class model / hard clustering / soft clustering
資料番号	SLP-94
発行日

研究会情報
研究会	SP
開催期間	2012/12/13(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Speech (SP)
本文の言語	JPN
タイトル（和）	話者クラス音響モデルを用いた講演音声認識におけるクラスタリング手法の各種検討(ポスターセッション)
サブタイトル（和）
タイトル（英）	An Ivestigation of Clustering Methods using Speaker-Class Models in Lecture Speech Recognition
サブタイトル（和）
キーワード(1)（和/英）	大語彙連続音声認識 / LVCSR
キーワード(2)（和/英）	話者クラス音響モデル / speaker-class model
キーワード(3)（和/英）	ハードクラスタリング / hard clustering
キーワード(4)（和/英）	ソフトクラスタリング / soft clustering
第 1 著者氏名（和/英）	今野和樹
第 1 著者所属（和/英）	山形大学 Yamagata University
第 2 著者氏名（和/英）	大山拓也
第 2 著者所属（和/英）	山形大学:現在ユニアデックス株式会社 Yamagata University:Presently with UNIADEX Ltd
第 3 著者氏名（和/英）	加藤正治
第 3 著者所属（和/英）	山形大学 Yamagata University
第 4 著者氏名（和/英）	小坂哲夫
第 4 著者所属（和/英）	山形大学 Yamagata University
発表年月日	2012/12/13
資料番号	SLP-94
巻番号（vol）	vol.112
号番号（no）	369
ページ範囲	pp.-
ページ数	6
発行日