Presentation 2022-01-28
An Objective Article Search Method from Printed Japanese Contract Document Using Optical Character Recognition
Shixi Chen, Masaki Sakagami, Nobuo Funabiki, Takashi Toshida, Kohei Suga,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) A contract is essential for the involved companies to have successful businesses among them. Then, the contract document is the important legal document that defines the formal agreements and consists of multiple articles where one article describes the agreement on a certain subject or condition. It is useful to automatically search and extract the article describing the search subject, although contact documents are often filed on printed papers in many companies. In this paper, we propose an objective article search method from a printed Japanese contract document using the optical character recognition (OCR) technology. From the recognized characters, it finds the article whose title contains the subject, or finds the paragraph that well matches with the given keyword list. This list can be automatically generated by giving the sample articles related to the subject in existing contract documents. For evaluations, we implemented the proposed method using Python and applied it to $35$ contract documents. The results confirm the effectiveness of the proposal by successfully finding the objective articles from all of them.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) contract document / article / subject / OCR / regular expression
Paper # ICM2021-39,LOIS2021-37
Date of Issue 2022-01-20 (ICM, LOIS)

Conference Information
Committee LOIS / ICM
Conference Date 2022/1/27(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Online
Topics (in Japanese) (See Japanese page)
Topics (in English) Practical Use of Lifelog, Office Information System, Business Management, etc.
Chair Toru Kobayashi(Nagasaki Univ.) / Kazuhiko Kinoshita(Tokushima Univ.)
Vice Chair Hiroyuki Toda(NTT) / Haruo Ooishi(NTT) / Eiji Takahashi(NEC)
Secretary Hiroyuki Toda(Nagasaki Univ.) / Haruo Ooishi(NTT) / Eiji Takahashi(Bosco)
Assistant Kazuki Fukae(Nagasaki Univ.) / Yoshifumi Kato(NTT)

Paper Information
Registration To Technical Committee on Life Intelligence and Office Information Systems / Technical Committee on Information and Communication Management
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) An Objective Article Search Method from Printed Japanese Contract Document Using Optical Character Recognition
Sub Title (in English)
Keyword(1) contract document
Keyword(2) article
Keyword(3) subject
Keyword(4) OCR
Keyword(5) regular expression
1st Author's Name Shixi Chen
1st Author's Affiliation Okayama University(Okayama Univ.)
2nd Author's Name Masaki Sakagami
2nd Author's Affiliation Okayama University(Okayama Univ.)
3rd Author's Name Nobuo Funabiki
3rd Author's Affiliation Okayama University(Okayama Univ.)
4th Author's Name Takashi Toshida
4th Author's Affiliation Astrolab Inc.(Astrolab)
5th Author's Name Kohei Suga
5th Author's Affiliation Astrolab Inc.(Astrolab)
Date 2022-01-28
Paper # ICM2021-39,LOIS2021-37
Volume (vol) vol.121
Number (no) ICM-354,LOIS-355
Page pp.pp.34-39(ICM), pp.34-39(LOIS),
#Pages 6
Date of Issue 2022-01-20 (ICM, LOIS)