Presentation 2019-02-08
Development of Phenotyping algorithm from Medical Text-based Data using Machine Learning Methods
Takanori Yamashita, Rieko Izukura, Sachio Hirokawa, Naoki Nakashima,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Electronic medical records (EMRs) accumulated in the Hospital Information System (HIS) are called Real World Data (RWD). Utilization of RWD can help advance promising treatments, detect side effects detection, and improve the efficiency of previously-unknown medical treatment which were unknown in previous clinical research and intervention studies. We must develop some phenotyping algorithms to ensure high performance from RWD. Although drugs, laboratory tests, diagnoses and surgeries can be expressed as structured data, patient symptoms, the rationales for various medical treatments and the patient outcomes are often described in free-text format. In this research, we aimed to develop true Interstitial Pneumonia case extraction an algorithm from unstructured text-based data. 48 cases were diagnosed Interstitial pneumonia by chest physician from CT reports of sampling 100 cases. Three machine learning methods (Support Vector Machine, Feature Selection and Gradient Boosting Decision Tree) were combined for development of text corresponding phenotyping. We extracted 6 keywords as feature word from its score using machine learning methods, and PPV is 0.483 and sensitivity is 0.875 when one of them is included.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Interstitial pneumonia / SVM / GBDT / Phenotyping
Paper # NLC2018-45
Date of Issue 2019-01-31 (NLC)

Conference Information
Committee NLC / IPSJ-IFAT
Conference Date 2019/2/7(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Ryukoku University Omiya Campus
Topics (in Japanese) (See Japanese page)
Topics (in English) The 14th Text Analytics Symposium
Chair Takeshi Sakaki(Hottolink)
Vice Chair Mitsuo Yoshida(Toyohashi Univ. of Tech.) / Kazutaka Shimada(Kyushu Inst. of Tech.)
Secretary Mitsuo Yoshida(Ryukoku Univ.) / Kazutaka Shimada(NTT)
Assistant Takeshi Kobayakawa(NHK) / Hiroki Sakaji(Univ. of Tokyo)

Paper Information
Registration To Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Information Fundamentals and Access Technologies
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Development of Phenotyping algorithm from Medical Text-based Data using Machine Learning Methods
Sub Title (in English)
Keyword(1) Interstitial pneumonia
Keyword(2) SVM
Keyword(3) GBDT
Keyword(4) Phenotyping
1st Author's Name Takanori Yamashita
1st Author's Affiliation Kyushu University(Kyushu Univ.)
2nd Author's Name Rieko Izukura
2nd Author's Affiliation Kyushu University(Kyushu Univ.)
3rd Author's Name Sachio Hirokawa
3rd Author's Affiliation Kyushu University(Kyushu Univ.)
4th Author's Name Naoki Nakashima
4th Author's Affiliation Kyushu University(Kyushu Univ.)
Date 2019-02-08
Paper # NLC2018-45
Volume (vol) vol.118
Number (no) NLC-439
Page pp.pp.53-57(NLC),
#Pages 5
Date of Issue 2019-01-31 (NLC)