Presentation | 2022-05-27 [Poster Presentation] A Transformer for Long Medical Documents Cherubin Mugisha, Incheon Paik, |
---|---|
PDF Download Page | ![]() |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Natural language processing models are advancing technology by extracting valuable information from different datasets. Biomedical texts are characterized by various challenges that require domain-specific models for effective biomedical text mining. Long sequences require large memory for a quadratic computation. With this work, we are introducing a transformer for long medical documents. We Trained this model with different biomedical datasets and it can handle a sequence length of up 4096 tokens. To build our model, we generated a Byte-leveltokenization using the Byte-Pair Encoding inspired by RoBERTa, in addition to an attention mechanism that scales linearly with the sequencelength. We fine-tuned our model and demonstrated competitive results on question answering tasks. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Transformer / Medical text / Tokenization / MIMIC |
Paper # | SC2022-8 |
Date of Issue | 2022-05-20 (SC) |
Conference Information | |
Committee | SC |
---|---|
Conference Date | 2022/5/27(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Online |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | AI Service and Digital Transformation, and general topics |
Chair | Shinji Kikuchi(NIMS) |
Vice Chair | Yoji Yamato(NTT) / Kosaku Kimura(Fujitsu) |
Secretary | Yoji Yamato(Kobe Univ.) / Kosaku Kimura(Tokyo Univ. of Tech.) |
Assistant | Shin Tezuka(Hitachi) / Takao Nakaguchi(KCGI) |
Paper Information | |
Registration To | Technical Committee on Service Computing |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | [Poster Presentation] A Transformer for Long Medical Documents |
Sub Title (in English) | |
Keyword(1) | Transformer |
Keyword(2) | Medical text |
Keyword(3) | Tokenization |
Keyword(4) | MIMIC |
1st Author's Name | Cherubin Mugisha |
1st Author's Affiliation | University of Aizu(UoA) |
2nd Author's Name | Incheon Paik |
2nd Author's Affiliation | University of Aizu(UoA) |
Date | 2022-05-27 |
Paper # | SC2022-8 |
Volume (vol) | vol.122 |
Number (no) | SC-50 |
Page | pp.pp.43-53(SC), |
#Pages | 11 |
Date of Issue | 2022-05-20 (SC) |