Presentation 2023-09-06
Construction and Validation of Pre-trained Language Model Using Corpus of National and Local Assembly Minutes
Keiyu Nagafuchi, Eisaku Sato, Yasutomo Kimura, Kazuma Kadowaki, Kenji Araki,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In recent years, there has been a surge in pre-trained language models based on the large-scale corpora derived from the Common Crawl data collected from the Web. While such corpora significantly enhance the performance of the models, they can contain false information, potentially leading to hallucinations in generated AI content. In this study, we focused on the quality and reliability of the minutes from assemblies, constructing a large-scale corpus of these records. Based on this corpus, we developed pre-trained language models and evaluated their performance on general domain tasks. In addition, we also evaluated their effectiveness on political domain tasks related to the minutes' corpus.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) pre-trained language model / corpus construction / assembly minutes / domain adaptation
Paper # NLC2023-3
Date of Issue 2023-08-30 (NLC)

Conference Information
Committee NLC
Conference Date 2023/9/6(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Osaka Metropolitan University. Nakamozu Campus.
Topics (in Japanese) (See Japanese page)
Topics (in English) The 20th Text Analytics Symposium
Chair Mitsuo Yoshida(Univ. of Tsukuba)
Vice Chair Hiroki Sakaji(Univ. of Tokyo) / Takeshi Kobayakawa(NHK)
Secretary Hiroki Sakaji(rinna) / Takeshi Kobayakawa(Hiroshima Univ. of Economics)
Assistant Kanjin Takahashi(Sansan) / Yasuhiro Ogawa(Nagoya Univ.)

Paper Information
Registration To Technical Committee on Natural Language Understanding and Models of Communication
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Construction and Validation of Pre-trained Language Model Using Corpus of National and Local Assembly Minutes
Sub Title (in English)
Keyword(1) pre-trained language model
Keyword(2) corpus construction
Keyword(3) assembly minutes
Keyword(4) domain adaptation
1st Author's Name Keiyu Nagafuchi
1st Author's Affiliation Hokkaido University(HU)
2nd Author's Name Eisaku Sato
2nd Author's Affiliation Otaru University of Commerce(OUC)
3rd Author's Name Yasutomo Kimura
3rd Author's Affiliation Otaru University of Commerce(OUC)
4th Author's Name Kazuma Kadowaki
4th Author's Affiliation The Japan Research Institute, Limited(JRI)
5th Author's Name Kenji Araki
5th Author's Affiliation Hokkaido University(HU)
Date 2023-09-06
Paper # NLC2023-3
Volume (vol) vol.123
Number (no) NLC-176
Page pp.pp.12-17(NLC),
#Pages 6
Date of Issue 2023-08-30 (NLC)