国会および地方議会会議録をコーパスとした事前学習済み言語モデルの構築と検証

Presentation	2023-09-06 Construction and Validation of Pre-trained Language Model Using Corpus of National and Local Assembly Minutes Keiyu Nagafuchi, Eisaku Sato, Yasutomo Kimura, Kazuma Kadowaki, Kenji Araki,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In recent years, there has been a surge in pre-trained language models based on the large-scale corpora derived from the Common Crawl data collected from the Web. While such corpora significantly enhance the performance of the models, they can contain false information, potentially leading to hallucinations in generated AI content. In this study, we focused on the quality and reliability of the minutes from assemblies, constructing a large-scale corpus of these records. Based on this corpus, we developed pre-trained language models and evaluated their performance on general domain tasks. In addition, we also evaluated their effectiveness on political domain tasks related to the minutes' corpus.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	pre-trained language model / corpus construction / assembly minutes / domain adaptation
Paper #	NLC2023-3
Date of Issue	2023-08-30 (NLC)

Conference Information
Committee	NLC
Conference Date	2023/9/6(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Osaka Metropolitan University. Nakamozu Campus.
Topics (in Japanese)	(See Japanese page)
Topics (in English)	The 20th Text Analytics Symposium
Chair	Mitsuo Yoshida(Univ. of Tsukuba)
Vice Chair	Hiroki Sakaji(Univ. of Tokyo) / Takeshi Kobayakawa(NHK)
Secretary	Hiroki Sakaji(rinna) / Takeshi Kobayakawa(Hiroshima Univ. of Economics)
Assistant	Kanjin Takahashi(Sansan) / Yasuhiro Ogawa(Nagoya Univ.)

Paper Information
Registration To	Technical Committee on Natural Language Understanding and Models of Communication
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Construction and Validation of Pre-trained Language Model Using Corpus of National and Local Assembly Minutes
Sub Title (in English)
Keyword(1)	pre-trained language model
Keyword(2)	corpus construction
Keyword(3)	assembly minutes
Keyword(4)	domain adaptation
1st Author's Name	Keiyu Nagafuchi
1st Author's Affiliation	Hokkaido University(HU)
2nd Author's Name	Eisaku Sato
2nd Author's Affiliation	Otaru University of Commerce(OUC)
3rd Author's Name	Yasutomo Kimura
3rd Author's Affiliation	Otaru University of Commerce(OUC)
4th Author's Name	Kazuma Kadowaki
4th Author's Affiliation	The Japan Research Institute, Limited(JRI)
5th Author's Name	Kenji Araki
5th Author's Affiliation	Hokkaido University(HU)
Date	2023-09-06
Paper #	NLC2023-3
Volume (vol)	vol.123
Number (no)	NLC-176
Page	pp.pp.12-17(NLC),
#Pages	6
Date of Issue	2023-08-30 (NLC)