ドメインとスタイルを考慮したWebテキストの選択による対話システム用言語モデルの構築(Session-7 システム,第8回音声言語シンポジウム)

Presentation	2006/12/15 Efficient Language Model Construction for Spoken Dialog Systems by Web Text Selection Considering Domain and Utterance Style Teruhisa MISU, Tatsuya KAWAHARA,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	This paper proposes a bootstrapping method of constructing statistical language models for new spoken dialog systems by collecting and selecting sentences from the World Wide Web (WWW). To make effective search queries that cover the target domain in full detail, we exploit the document set described about the target domain as seeding data. An important issue is how to filter the retrieved Web pages, since all of the retrieved Web texts are not necessarily suitable as training data. We induct an existing dialog corpus of different domain to prefer the texts of spoken style. The proposed method was evaluated on two different tasks of software support and sightseeing guidance, and significant reduction of the word error rate was achieved. We show that it is vital to incorporate the dialog corpus, though not relevant to the target domain, in the text selection phase.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Speech recognition / Language model / Spoken dialog system / Web text selection
Paper #	NLC2006-70,SP2006-126
Date of Issue

Paper Information
Registration To	Speech (SP)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Efficient Language Model Construction for Spoken Dialog Systems by Web Text Selection Considering Domain and Utterance Style
Sub Title (in English)
Keyword(1)	Speech recognition
Keyword(2)	Language model
Keyword(3)	Spoken dialog system
Keyword(4)	Web text selection
1st Author's Name	Teruhisa MISU
1st Author's Affiliation	School of Informatics, Kyoto University()
2nd Author's Name	Tatsuya KAWAHARA
2nd Author's Affiliation	School of Informatics, Kyoto University
Date	2006/12/15
Paper #	NLC2006-70,SP2006-126
Volume (vol)	vol.106
Number (no)	444
Page	pp.pp.-
#Pages	6
Date of Issue