Presentation 2006/12/15
Efficient Language Model Construction for Spoken Dialog Systems by Web Text Selection Considering Domain and Utterance Style
Teruhisa MISU, Tatsuya KAWAHARA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper proposes a bootstrapping method of constructing statistical language models for new spoken dialog systems by collecting and selecting sentences from the World Wide Web (WWW). To make effective search queries that cover the target domain in full detail, we exploit the document set described about the target domain as seeding data. An important issue is how to filter the retrieved Web pages, since all of the retrieved Web texts are not necessarily suitable as training data. We induct an existing dialog corpus of different domain to prefer the texts of spoken style. The proposed method was evaluated on two different tasks of software support and sightseeing guidance, and significant reduction of the word error rate was achieved. We show that it is vital to incorporate the dialog corpus, though not relevant to the target domain, in the text selection phase.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Speech recognition / Language model / Spoken dialog system / Web text selection
Paper # NLC2006-70,SP2006-126
Date of Issue

Conference Information
Committee SP
Conference Date 2006/12/15(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Speech (SP)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Efficient Language Model Construction for Spoken Dialog Systems by Web Text Selection Considering Domain and Utterance Style
Sub Title (in English)
Keyword(1) Speech recognition
Keyword(2) Language model
Keyword(3) Spoken dialog system
Keyword(4) Web text selection
1st Author's Name Teruhisa MISU
1st Author's Affiliation School of Informatics, Kyoto University()
2nd Author's Name Tatsuya KAWAHARA
2nd Author's Affiliation School of Informatics, Kyoto University
Date 2006/12/15
Paper # NLC2006-70,SP2006-126
Volume (vol) vol.106
Number (no) 444
Page pp.pp.-
#Pages 6
Date of Issue