Presentation | 2023-09-06 Construction and Validation of Pre-trained Language Model Using Corpus of National and Local Assembly Minutes Keiyu Nagafuchi, Eisaku Sato, Yasutomo Kimura, Kazuma Kadowaki, Kenji Araki, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In recent years, there has been a surge in pre-trained language models based on the large-scale corpora derived from the Common Crawl data collected from the Web. While such corpora significantly enhance the performance of the models, they can contain false information, potentially leading to hallucinations in generated AI content. In this study, we focused on the quality and reliability of the minutes from assemblies, constructing a large-scale corpus of these records. Based on this corpus, we developed pre-trained language models and evaluated their performance on general domain tasks. In addition, we also evaluated their effectiveness on political domain tasks related to the minutes' corpus. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | pre-trained language model / corpus construction / assembly minutes / domain adaptation |
Paper # | NLC2023-3 |
Date of Issue | 2023-08-30 (NLC) |
Conference Information | |
Committee | NLC |
---|---|
Conference Date | 2023/9/6(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Osaka Metropolitan University. Nakamozu Campus. |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | The 20th Text Analytics Symposium |
Chair | Mitsuo Yoshida(Univ. of Tsukuba) |
Vice Chair | Hiroki Sakaji(Univ. of Tokyo) / Takeshi Kobayakawa(NHK) |
Secretary | Hiroki Sakaji(rinna) / Takeshi Kobayakawa(Hiroshima Univ. of Economics) |
Assistant | Kanjin Takahashi(Sansan) / Yasuhiro Ogawa(Nagoya Univ.) |
Paper Information | |
Registration To | Technical Committee on Natural Language Understanding and Models of Communication |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Construction and Validation of Pre-trained Language Model Using Corpus of National and Local Assembly Minutes |
Sub Title (in English) | |
Keyword(1) | pre-trained language model |
Keyword(2) | corpus construction |
Keyword(3) | assembly minutes |
Keyword(4) | domain adaptation |
1st Author's Name | Keiyu Nagafuchi |
1st Author's Affiliation | Hokkaido University(HU) |
2nd Author's Name | Eisaku Sato |
2nd Author's Affiliation | Otaru University of Commerce(OUC) |
3rd Author's Name | Yasutomo Kimura |
3rd Author's Affiliation | Otaru University of Commerce(OUC) |
4th Author's Name | Kazuma Kadowaki |
4th Author's Affiliation | The Japan Research Institute, Limited(JRI) |
5th Author's Name | Kenji Araki |
5th Author's Affiliation | Hokkaido University(HU) |
Date | 2023-09-06 |
Paper # | NLC2023-3 |
Volume (vol) | vol.123 |
Number (no) | NLC-176 |
Page | pp.pp.12-17(NLC), |
#Pages | 6 |
Date of Issue | 2023-08-30 (NLC) |