Presentation | 2015-12-03 Detection of Mathematical Formula Regions in Images of Scientific Papers by using Deep Learning and OCR Shintaro Date, Hideki Isozaki, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | We are building a QA system about scientific literatures. We can ask questions such as ``what is the definition of C-value?''Its answer is a mathematical formula. Thus, mathematical formulas play important roles just like named entities in Open Domain Question Answering. %We are not interested in how the formula is constructedUnlike Math OCR, we are not interested in how the formula is constructedand simply treat formulas as images. In this paper, we present a formula image detection method based on Deep Learning and open source OCR software. First, we tried Deep Learning to detect mathematical formulas, but it was difficult to detect in-line formulas. Therefore, we also used open source OCR software. We show experimental results based on ACL Anthology papers. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | QA system / Deep Learning / OCR |
Paper # | NLC2015-37 |
Date of Issue | 2015-11-26 (NLC) |
Conference Information | |
Committee | NLC / IPSJ-NL / SP / IPSJ-SLP |
---|---|
Conference Date | 2015/12/2(3days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Nagoya Inst of Tech. |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | The Second Natural Language Processing Symposium & The 17th Spoken Language Symposium |
Chair | Koichi Takeuchi(Okayama Univ.) / Kentaro Inui(Tohoku Univ.) / Kazunori Mano(Shibaura Inst. of Tech.) / Koichi Shinoda(東工大) |
Vice Chair | Hiroshi Kanayama(IBM) / Makoto Ichise(NTT DoCoMo) / / Norihide Kitaoka(Tokushima Univ.) |
Secretary | Hiroshi Kanayama(Univ. of Tokyo/Hottolink) / Makoto Ichise(Ryukoku Univ.) / (Osaka Univ.) / Norihide Kitaoka(Tohoku Univ.) / (Mixi Co. Ltd.) |
Assistant | Kazutaka Shimada(Kyushu Inst. of Tech.) / Ryuichiro Higashinaka(NTT) / / Takashi Nose(Tohoku Univ.) / Taichi Asami(NTT) |
Paper Information | |
Registration To | Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Detection of Mathematical Formula Regions in Images of Scientific Papers by using Deep Learning and OCR |
Sub Title (in English) | |
Keyword(1) | QA system |
Keyword(2) | Deep Learning |
Keyword(3) | OCR |
Keyword(4) | |
Keyword(5) | |
1st Author's Name | Shintaro Date |
1st Author's Affiliation | Okayama Prefectural University(Okayama Pref. Univ.) |
2nd Author's Name | Hideki Isozaki |
2nd Author's Affiliation | Okayama Prefectural University(Okayama Pref. Univ.) |
Date | 2015-12-03 |
Paper # | NLC2015-37 |
Volume (vol) | vol.115 |
Number (no) | NLC-347 |
Page | pp.pp.19-24(NLC), |
#Pages | 6 |
Date of Issue | 2015-11-26 (NLC) |