Presentation 2008-12-10
Effect of punctuation marks for speech translation unit boundary detection
Tohru SHIMIZU, Satoshi NAKAMURA, Tatsuya KAWAHARA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) As automatic speech recognition and translation of long and complicated utterance cause more errors, there is increasing requirement for utterance segmentation techniques. We proposed speech translation unit (STU), which is a segment of an utterance which the human interpreter treats as a single cognitive unit, and also proposed STU boundary detection method using a SVM based chunker which combines lexical features and prosodic features. It is well known that comma and period are the most widely used punctuation marks in written text. In this paper, characteristics of STU and punctuation marks are investigated, and a STU boundary detection method which combines both STU boundary information and punctuation marks is proposed. An experimental evaluation using CSJ corpus shows STU boundary detection achieved a F-measure of 0.88 for input text with punctuation marks and 0.86 for input text without punctuation marks, which is better than or equal to the STU boundary detection accuracy of human interpreters (F-measure of 0.84).
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Speech translation unit boundary (STU) / punctuation marks / chunking / SVM
Paper # NLC2008-45,SP2008-100
Date of Issue

Conference Information
Committee NLC
Conference Date 2008/12/2(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Effect of punctuation marks for speech translation unit boundary detection
Sub Title (in English)
Keyword(1) Speech translation unit boundary (STU)
Keyword(2) punctuation marks
Keyword(3) chunking
Keyword(4) SVM
1st Author's Name Tohru SHIMIZU
1st Author's Affiliation Knowledge Creating Communication Research Center, National Institute of Information and Communication Technology:ATR Spoken Language Communication Research Labs.:School of Informatics, Kyoto University()
2nd Author's Name Satoshi NAKAMURA
2nd Author's Affiliation Knowledge Creating Communication Research Center, National Institute of Information and Communication Technology:ATR Spoken Language Communication Research Labs.
3rd Author's Name Tatsuya KAWAHARA
3rd Author's Affiliation Knowledge Creating Communication Research Center, National Institute of Information and Communication Technology:School of Informatics, Kyoto University
Date 2008-12-10
Paper # NLC2008-45,SP2008-100
Volume (vol) vol.108
Number (no) 337
Page pp.pp.-
#Pages 5
Date of Issue