Presentation | 2008-12-10 Effect of punctuation marks for speech translation unit boundary detection Tohru SHIMIZU, Satoshi NAKAMURA, Tatsuya KAWAHARA, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | As automatic speech recognition and translation of long and complicated utterance cause more errors, there is increasing requirement for utterance segmentation techniques. We proposed speech translation unit (STU), which is a segment of an utterance which the human interpreter treats as a single cognitive unit, and also proposed STU boundary detection method using a SVM based chunker which combines lexical features and prosodic features. It is well known that comma and period are the most widely used punctuation marks in written text. In this paper, characteristics of STU and punctuation marks are investigated, and a STU boundary detection method which combines both STU boundary information and punctuation marks is proposed. An experimental evaluation using CSJ corpus shows STU boundary detection achieved a F-measure of 0.88 for input text with punctuation marks and 0.86 for input text without punctuation marks, which is better than or equal to the STU boundary detection accuracy of human interpreters (F-measure of 0.84). |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Speech translation unit boundary (STU) / punctuation marks / chunking / SVM |
Paper # | NLC2008-45,SP2008-100 |
Date of Issue |
Conference Information | |
Committee | NLC |
---|---|
Conference Date | 2008/12/2(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Natural Language Understanding and Models of Communication (NLC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Effect of punctuation marks for speech translation unit boundary detection |
Sub Title (in English) | |
Keyword(1) | Speech translation unit boundary (STU) |
Keyword(2) | punctuation marks |
Keyword(3) | chunking |
Keyword(4) | SVM |
1st Author's Name | Tohru SHIMIZU |
1st Author's Affiliation | Knowledge Creating Communication Research Center, National Institute of Information and Communication Technology:ATR Spoken Language Communication Research Labs.:School of Informatics, Kyoto University() |
2nd Author's Name | Satoshi NAKAMURA |
2nd Author's Affiliation | Knowledge Creating Communication Research Center, National Institute of Information and Communication Technology:ATR Spoken Language Communication Research Labs. |
3rd Author's Name | Tatsuya KAWAHARA |
3rd Author's Affiliation | Knowledge Creating Communication Research Center, National Institute of Information and Communication Technology:School of Informatics, Kyoto University |
Date | 2008-12-10 |
Paper # | NLC2008-45,SP2008-100 |
Volume (vol) | vol.108 |
Number (no) | 337 |
Page | pp.pp.- |
#Pages | 5 |
Date of Issue |