Presentation 2015-12-03
Detection of Mathematical Formula Regions in Images of Scientific Papers by using Deep Learning and OCR
Shintaro Date, Hideki Isozaki,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) We are building a QA system about scientific literatures. We can ask questions such as ``what is the definition of C-value?''Its answer is a mathematical formula. Thus, mathematical formulas play important roles just like named entities in Open Domain Question Answering. %We are not interested in how the formula is constructedUnlike Math OCR, we are not interested in how the formula is constructedand simply treat formulas as images. In this paper, we present a formula image detection method based on Deep Learning and open source OCR software. First, we tried Deep Learning to detect mathematical formulas, but it was difficult to detect in-line formulas. Therefore, we also used open source OCR software. We show experimental results based on ACL Anthology papers.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) QA system / Deep Learning / OCR
Paper # NLC2015-37
Date of Issue 2015-11-26 (NLC)

Conference Information
Committee NLC / IPSJ-NL / SP / IPSJ-SLP
Conference Date 2015/12/2(3days)
Place (in Japanese) (See Japanese page)
Place (in English) Nagoya Inst of Tech.
Topics (in Japanese) (See Japanese page)
Topics (in English) The Second Natural Language Processing Symposium & The 17th Spoken Language Symposium
Chair Koichi Takeuchi(Okayama Univ.) / Kentaro Inui(Tohoku Univ.) / Kazunori Mano(Shibaura Inst. of Tech.) / Koichi Shinoda(東工大)
Vice Chair Hiroshi Kanayama(IBM) / Makoto Ichise(NTT DoCoMo) / / Norihide Kitaoka(Tokushima Univ.)
Secretary Hiroshi Kanayama(Univ. of Tokyo/Hottolink) / Makoto Ichise(Ryukoku Univ.) / (Osaka Univ.) / Norihide Kitaoka(Tohoku Univ.) / (Mixi Co. Ltd.)
Assistant Kazutaka Shimada(Kyushu Inst. of Tech.) / Ryuichiro Higashinaka(NTT) / / Takashi Nose(Tohoku Univ.) / Taichi Asami(NTT)

Paper Information
Registration To Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Detection of Mathematical Formula Regions in Images of Scientific Papers by using Deep Learning and OCR
Sub Title (in English)
Keyword(1) QA system
Keyword(2) Deep Learning
Keyword(3) OCR
Keyword(4)
Keyword(5)
1st Author's Name Shintaro Date
1st Author's Affiliation Okayama Prefectural University(Okayama Pref. Univ.)
2nd Author's Name Hideki Isozaki
2nd Author's Affiliation Okayama Prefectural University(Okayama Pref. Univ.)
Date 2015-12-03
Paper # NLC2015-37
Volume (vol) vol.115
Number (no) NLC-347
Page pp.pp.19-24(NLC),
#Pages 6
Date of Issue 2015-11-26 (NLC)