音声翻訳単位の推定における句読点情報の効果(ドキュメント処理・翻訳・言語獲得,第10回音声言語シンポジウム)

Presentation	2008-12-10 Effect of punctuation marks for speech translation unit boundary detection Tohru SHIMIZU, Satoshi NAKAMURA, Tatsuya KAWAHARA,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	As automatic speech recognition and translation of long and complicated utterance cause more errors, there is increasing requirement for utterance segmentation techniques. We proposed speech translation unit (STU), which is a segment of an utterance which the human interpreter treats as a single cognitive unit, and also proposed STU boundary detection method using a SVM based chunker which combines lexical features and prosodic features. It is well known that comma and period are the most widely used punctuation marks in written text. In this paper, characteristics of STU and punctuation marks are investigated, and a STU boundary detection method which combines both STU boundary information and punctuation marks is proposed. An experimental evaluation using CSJ corpus shows STU boundary detection achieved a F-measure of 0.88 for input text with punctuation marks and 0.86 for input text without punctuation marks, which is better than or equal to the STU boundary detection accuracy of human interpreters (F-measure of 0.84).
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Speech translation unit boundary (STU) / punctuation marks / chunking / SVM
Paper #	NLC2008-45,SP2008-100
Date of Issue

Paper Information
Registration To	Natural Language Understanding and Models of Communication (NLC)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Effect of punctuation marks for speech translation unit boundary detection
Sub Title (in English)
Keyword(1)	Speech translation unit boundary (STU)
Keyword(2)	punctuation marks
Keyword(3)	chunking
Keyword(4)	SVM
1st Author's Name	Tohru SHIMIZU
1st Author's Affiliation	Knowledge Creating Communication Research Center, National Institute of Information and Communication Technology:ATR Spoken Language Communication Research Labs.:School of Informatics, Kyoto University()
2nd Author's Name	Satoshi NAKAMURA
2nd Author's Affiliation	Knowledge Creating Communication Research Center, National Institute of Information and Communication Technology:ATR Spoken Language Communication Research Labs.
3rd Author's Name	Tatsuya KAWAHARA
3rd Author's Affiliation	Knowledge Creating Communication Research Center, National Institute of Information and Communication Technology:School of Informatics, Kyoto University
Date	2008-12-10
Paper #	NLC2008-45,SP2008-100
Volume (vol)	vol.108
Number (no)	337
Page	pp.pp.-
#Pages	5
Date of Issue