Presentation | 2004/12/14 Trigger-Based Language Model Construction by Combining Different Corpora Carlos TRONCOSO, Tatsuya KAWAHARA, Hirofiimi YAMAMOTO, Genichiro KIKUI, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | We study the trigger-based language model (LM) for large vocabulary continuous speech recognition (LVCSR), which can model dependencies between words longer than those modeled by the n-gram LM. In general, in language modeling for LVCSR, when the training corpus matches the target task, its size is typically small, and therefore insufficient for providing us with reliable probability estimates. On the other hand, large corpora are often too general to capture task dependency. The proposed approach tries to overcome this generality-sparseness trade-off problem by constructing a trigger-based LM in which task-dependent trigger pairs are first extracted from the corpus that matches the task, and then the occurrence probabilities of the pairs are estimated from a huge text corpus. We report evaluation results in ATR's Basic Travel Expression Corpus (BTEC) as well as in the Corpus of Spontaneous Japanese (CSJ). |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Language Model / Speech Recognition / Trigger-Based Language Model / Text Corpus |
Paper # | NLC2004-60,SP2004-100 |
Date of Issue |
Conference Information | |
Committee | NLC |
---|---|
Conference Date | 2004/12/14(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Natural Language Understanding and Models of Communication (NLC) |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Trigger-Based Language Model Construction by Combining Different Corpora |
Sub Title (in English) | |
Keyword(1) | Language Model |
Keyword(2) | Speech Recognition |
Keyword(3) | Trigger-Based Language Model |
Keyword(4) | Text Corpus |
1st Author's Name | Carlos TRONCOSO |
1st Author's Affiliation | School of Informatics, Kyoto University() |
2nd Author's Name | Tatsuya KAWAHARA |
2nd Author's Affiliation | School of Informatics, Kyoto University |
3rd Author's Name | Hirofiimi YAMAMOTO |
3rd Author's Affiliation | Spoken Language Translation Research Laboratories, ATR |
4th Author's Name | Genichiro KIKUI |
4th Author's Affiliation | Spoken Language Translation Research Laboratories, ATR |
Date | 2004/12/14 |
Paper # | NLC2004-60,SP2004-100 |
Volume (vol) | vol.104 |
Number (no) | 539 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |