Presentation | 2005/12/14 Improving recognition performance of spoken documents using similar documents on the Internet Yuusuke Itoh, Hiromitsu Nishizaki, Yoshihiro Sekiguchi, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | This paper describes a technique of improving recognition performance of spoken documents by using a language model adaptation from similar documents on the Internet and combining various transcriptions from LVCSR systems. A language model and a dictionary made from the similar documents which may be relative to the spoken document give improvement of Out-of-vocabulary rate in the dictionary. We used three kinds of language models in a LVCSR system as follows : (1) a general 20K language model from newspaper articles (75 months), (2) a topic adapted language model using the similar WEB documents, (3) a class-based language model in which only proper nouns are classed. Three kinds of outputs from the LVCSR systems, where those three language models are used respectively, are combined by using a simple voting scheme. In an experimental result, the proposed method has improved the recognition performances comparing with the case of using the language model from the newspaper articles only. The word correct and accuracy rates were improved to 47.8% from 47.0%, to 39.5% from 37.7%, respectively. Especially, the correct rate in so far as proper nouns was dramatically gotten improved to 56.1% from 43.9%. These result showed that our technique was effective to transcribing the news documents automatically. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Spoken document / speech recognition / WEB document / error correction / language model adaptation |
Paper # | NLC2005-65,SP2005-98 |
Date of Issue |
Conference Information | |
Committee | NLC |
---|---|
Conference Date | 2005/12/14(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Natural Language Understanding and Models of Communication (NLC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Improving recognition performance of spoken documents using similar documents on the Internet |
Sub Title (in English) | |
Keyword(1) | Spoken document |
Keyword(2) | speech recognition |
Keyword(3) | WEB document |
Keyword(4) | error correction |
Keyword(5) | language model adaptation |
1st Author's Name | Yuusuke Itoh |
1st Author's Affiliation | Graduate School of Medical and Engineering Science Department of Education, University of Yamanashi() |
2nd Author's Name | Hiromitsu Nishizaki |
2nd Author's Affiliation | Graduate School of Medicine and Engineering Science Department of Research, University of Yamanashi |
3rd Author's Name | Yoshihiro Sekiguchi |
3rd Author's Affiliation | Graduate School of Medicine and Engineering Science Department of Research, University of Yamanashi |
Date | 2005/12/14 |
Paper # | NLC2005-65,SP2005-98 |
Volume (vol) | vol.105 |
Number (no) | 493 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |