Presentation | 2018-09-20 Character Image Clustering for Analyzing Machine-Unreadable Historical Document Images Sora Ito, Kengo Terasawa, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | For utilization of digital archives which store and publish a lot of historical document images, we think that being shown their indexes or tagged keywords is useful. So, in our laboratory, we are developing a system which extracts keywords from machine-unreadable historical document images without character recognition. In this keyword extraction system, first, we discretize feature vectors by clustering character images expressed by the feature vector. Next, we express sentences with sequences of discretized feature vectors and analyze them. With such a system, we can realize keyword extraction without character recognition. While clustering, if ``separation of clusters'' where one character class is separated into some clusters occurs, the accuracy of keyword extraction decreases. Another problem, In the case of too many character images separated from historical document images, it is difficult to cluster them at once because of computing costs. To solve these problems, in this study, we suggest a clustering method which restrains the separation of clusters and can be adapted in case of too many character images. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Historical document / Clustering / Document analysis |
Paper # | PRMU2018-46,IBISML2018-23 |
Date of Issue | 2018-09-13 (PRMU, IBISML) |
Conference Information | |
Committee | PRMU / IBISML / IPSJ-CVIM |
---|---|
Conference Date | 2018/9/20(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Shinichi Sato(NII) / Hisashi Kashima(Kyoto Univ.) |
Vice Chair | Yoshihisa Ijiri(Omron) / Toru Tamaki(Hiroshima Univ.) / Masashi Sugiyama(Univ. of Tokyo) / Koji Tsuda(Univ. of Tokyo) |
Secretary | Yoshihisa Ijiri(NEC) / Toru Tamaki(Osaka Univ.) / Masashi Sugiyama(Nagoya Inst. of Tech.) / Koji Tsuda(AIST) |
Assistant | Go Irie(NTT) / Yoshitaka Ushiku(Univ. of Tokyo) / Tomoharu Iwata(NTT) / Shigeyuki Oba(Kyoto Univ.) |
Paper Information | |
Registration To | Technical Committee on Pattern Recognition and Media Understanding / Technical Committee on Infomation-Based Induction Sciences and Machine Learning / Special Interest Group on Computer Vision and Image Media |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Character Image Clustering for Analyzing Machine-Unreadable Historical Document Images |
Sub Title (in English) | |
Keyword(1) | Historical document |
Keyword(2) | Clustering |
Keyword(3) | Document analysis |
1st Author's Name | Sora Ito |
1st Author's Affiliation | Future University Hakodate(FUN) |
2nd Author's Name | Kengo Terasawa |
2nd Author's Affiliation | Future University Hakodate(FUN) |
Date | 2018-09-20 |
Paper # | PRMU2018-46,IBISML2018-23 |
Volume (vol) | vol.118 |
Number (no) | PRMU-219,IBISML-220 |
Page | pp.pp.67-72(PRMU), pp.67-72(IBISML), |
#Pages | 6 |
Date of Issue | 2018-09-13 (PRMU, IBISML) |