Presentation 2018-09-20
Character Image Clustering for Analyzing Machine-Unreadable Historical Document Images
Sora Ito, Kengo Terasawa,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) For utilization of digital archives which store and publish a lot of historical document images, we think that being shown their indexes or tagged keywords is useful. So, in our laboratory, we are developing a system which extracts keywords from machine-unreadable historical document images without character recognition. In this keyword extraction system, first, we discretize feature vectors by clustering character images expressed by the feature vector. Next, we express sentences with sequences of discretized feature vectors and analyze them. With such a system, we can realize keyword extraction without character recognition. While clustering, if ``separation of clusters'' where one character class is separated into some clusters occurs, the accuracy of keyword extraction decreases. Another problem, In the case of too many character images separated from historical document images, it is difficult to cluster them at once because of computing costs. To solve these problems, in this study, we suggest a clustering method which restrains the separation of clusters and can be adapted in case of too many character images.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Historical document / Clustering / Document analysis
Paper # PRMU2018-46,IBISML2018-23
Date of Issue 2018-09-13 (PRMU, IBISML)

Conference Information
Committee PRMU / IBISML / IPSJ-CVIM
Conference Date 2018/9/20(2days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Shinichi Sato(NII) / Hisashi Kashima(Kyoto Univ.)
Vice Chair Yoshihisa Ijiri(Omron) / Toru Tamaki(Hiroshima Univ.) / Masashi Sugiyama(Univ. of Tokyo) / Koji Tsuda(Univ. of Tokyo)
Secretary Yoshihisa Ijiri(NEC) / Toru Tamaki(Osaka Univ.) / Masashi Sugiyama(Nagoya Inst. of Tech.) / Koji Tsuda(AIST)
Assistant Go Irie(NTT) / Yoshitaka Ushiku(Univ. of Tokyo) / Tomoharu Iwata(NTT) / Shigeyuki Oba(Kyoto Univ.)

Paper Information
Registration To Technical Committee on Pattern Recognition and Media Understanding / Technical Committee on Infomation-Based Induction Sciences and Machine Learning / Special Interest Group on Computer Vision and Image Media
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Character Image Clustering for Analyzing Machine-Unreadable Historical Document Images
Sub Title (in English)
Keyword(1) Historical document
Keyword(2) Clustering
Keyword(3) Document analysis
1st Author's Name Sora Ito
1st Author's Affiliation Future University Hakodate(FUN)
2nd Author's Name Kengo Terasawa
2nd Author's Affiliation Future University Hakodate(FUN)
Date 2018-09-20
Paper # PRMU2018-46,IBISML2018-23
Volume (vol) vol.118
Number (no) PRMU-219,IBISML-220
Page pp.pp.67-72(PRMU), pp.67-72(IBISML),
#Pages 6
Date of Issue 2018-09-13 (PRMU, IBISML)