息継ぎ音を利用した電話音声の発話分割(音声処理,時系列パターン認識)

福田 隆; 市川 治; 西村 雅史

講演名	2012-02-10 息継ぎ音を利用した電話音声の発話分割(音声処理,時系列パターン認識) 福田隆, 市川治, 西村雅史,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	近年,音声認識を利用した通話監視技術に注目が集まっている.コールセンターを対象とした音声認識では,会話音声をあらかじめ発話単位に分割し,認識処理の不要な無音部分を取り除いた後,検出された発話の単位で認識処理を行う.そのため,各発話は文脈上意味のある単位で区切られていることが望ましい.しかし,従来の発話検出技術は,雑音の混入した入力信号から人間の発声部分を正確に抽出することにのみ焦点が当てられており,発話の検出単位については検討されてこなかった.本報告では,人間の息継ぎ音(吸気音)に注目し,入力信号から吸気音を高精度に検出することによって,入力音声を文脈上意味のある単位に自動に分割する方法を提案する.提案法では,呼吸音に特化した音響特徴量を利用し,識別器を段階的に構成することによって吸気音を高精度に抽出する.提案法は97.4%の吸気音検出精度を達成し,音声認識性能の改善にも寄与することを確認した.
抄録(英)	In the ASR technology for call center conversations, the system usually divides an input signal into separate utterances and eliminates the unneeded silence parts of the signal before doing ASR processing on the detected utterances. This means the input signal should be split into utterances of the proper length for both ASR performance and readability. However, typical VAD techniques sometimes generate overly long speech segments because they are focused only on the length of the pause (non-speech) between sentences. In contrast, it is shown that speakers typically take breaths for when speaking more than one sentence or long sentences. These breaths are highly correlated with the major prosodic breaks. In this paper, we focus on the breath events in the pause intervals and attempt to split the input signal into utterances by detecting the breathing events. The proposed method leverages acoustic information that is specialized for breathing sounds, which led to a two-step approach to detect the breath events with an accuracy of 97.4%. Also, the proper speech phrasing based on breath events improved word error rate in ASR.
キーワード(和)	発話分割 / 息継ぎ音(呼気音)検出 / コールモニタリング / 音声認識 / 発話区間検出
キーワード(英)	Speech phrasing / breath detection / call monitoring / automatic speech recognition / voice activity detection
資料番号	PRMU2011-238,SP2011-153
発行日

研究会情報
研究会	PRMU
開催期間	2012/2/2(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Pattern Recognition and Media Understanding (PRMU)
本文の言語	JPN
タイトル（和）	息継ぎ音を利用した電話音声の発話分割(音声処理,時系列パターン認識)
サブタイトル（和）
タイトル（英）	Telephony Speech Phrasing based on Breath Event Detection
サブタイトル（和）
キーワード(1)（和/英）	発話分割 / Speech phrasing
キーワード(2)（和/英）	息継ぎ音(呼気音)検出 / breath detection
キーワード(3)（和/英）	コールモニタリング / call monitoring
キーワード(4)（和/英）	音声認識 / automatic speech recognition
キーワード(5)（和/英）	発話区間検出 / voice activity detection
第 1 著者氏名（和/英）	福田隆 / Takashi FUKUDA
第 1 著者所属（和/英）	日本アイ・ビー・エム株式会社東京基礎研究所 IBM Research-Tokyo
第 2 著者氏名（和/英）	市川治 / Osamu ICHIKAWA
第 2 著者所属（和/英）	日本アイ・ビー・エム株式会社東京基礎研究所 IBM Research-Tokyo
第 3 著者氏名（和/英）	西村雅史 / Masafumi NISHIMURA
第 3 著者所属（和/英）	日本アイ・ビー・エム株式会社東京基礎研究所 IBM Research-Tokyo
発表年月日	2012-02-10
資料番号	PRMU2011-238,SP2011-153
巻番号（vol）	vol.111
号番号（no）	430
ページ範囲	pp.-
ページ数	6
発行日