Presentation | 2006/12/15 Efficient Language Model Construction for Spoken Dialog Systems by Web Text Selection Considering Domain and Utterance Style Teruhisa MISU, Tatsuya KAWAHARA, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | This paper proposes a bootstrapping method of constructing statistical language models for new spoken dialog systems by collecting and selecting sentences from the World Wide Web (WWW). To make effective search queries that cover the target domain in full detail, we exploit the document set described about the target domain as seeding data. An important issue is how to filter the retrieved Web pages, since all of the retrieved Web texts are not necessarily suitable as training data. We induct an existing dialog corpus of different domain to prefer the texts of spoken style. The proposed method was evaluated on two different tasks of software support and sightseeing guidance, and significant reduction of the word error rate was achieved. We show that it is vital to incorporate the dialog corpus, though not relevant to the target domain, in the text selection phase. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Speech recognition / Language model / Spoken dialog system / Web text selection |
Paper # | NLC2006-70,SP2006-126 |
Date of Issue |
Conference Information | |
Committee | SP |
---|---|
Conference Date | 2006/12/15(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Speech (SP) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Efficient Language Model Construction for Spoken Dialog Systems by Web Text Selection Considering Domain and Utterance Style |
Sub Title (in English) | |
Keyword(1) | Speech recognition |
Keyword(2) | Language model |
Keyword(3) | Spoken dialog system |
Keyword(4) | Web text selection |
1st Author's Name | Teruhisa MISU |
1st Author's Affiliation | School of Informatics, Kyoto University() |
2nd Author's Name | Tatsuya KAWAHARA |
2nd Author's Affiliation | School of Informatics, Kyoto University |
Date | 2006/12/15 |
Paper # | NLC2006-70,SP2006-126 |
Volume (vol) | vol.106 |
Number (no) | 444 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |