Presentation 2004/12/14
Trigger-Based Language Model Construction by Combining Different Corpora
Carlos TRONCOSO, Tatsuya KAWAHARA, Hirofiimi YAMAMOTO, Genichiro KIKUI,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) We study the trigger-based language model (LM) for large vocabulary continuous speech recognition (LVCSR), which can model dependencies between words longer than those modeled by the n-gram LM. In general, in language modeling for LVCSR, when the training corpus matches the target task, its size is typically small, and therefore insufficient for providing us with reliable probability estimates. On the other hand, large corpora are often too general to capture task dependency. The proposed approach tries to overcome this generality-sparseness trade-off problem by constructing a trigger-based LM in which task-dependent trigger pairs are first extracted from the corpus that matches the task, and then the occurrence probabilities of the pairs are estimated from a huge text corpus. We report evaluation results in ATR's Basic Travel Expression Corpus (BTEC) as well as in the Corpus of Spontaneous Japanese (CSJ).
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Language Model / Speech Recognition / Trigger-Based Language Model / Text Corpus
Paper # NLC2004-60,SP2004-100
Date of Issue

Conference Information
Committee NLC
Conference Date 2004/12/14(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Trigger-Based Language Model Construction by Combining Different Corpora
Sub Title (in English)
Keyword(1) Language Model
Keyword(2) Speech Recognition
Keyword(3) Trigger-Based Language Model
Keyword(4) Text Corpus
1st Author's Name Carlos TRONCOSO
1st Author's Affiliation School of Informatics, Kyoto University()
2nd Author's Name Tatsuya KAWAHARA
2nd Author's Affiliation School of Informatics, Kyoto University
3rd Author's Name Hirofiimi YAMAMOTO
3rd Author's Affiliation Spoken Language Translation Research Laboratories, ATR
4th Author's Name Genichiro KIKUI
4th Author's Affiliation Spoken Language Translation Research Laboratories, ATR
Date 2004/12/14
Paper # NLC2004-60,SP2004-100
Volume (vol) vol.104
Number (no) 539
Page pp.pp.-
#Pages 6
Date of Issue