Presentation 2011-03-10
Image Processing for Historical Newspaper Archives
Takahiro SHIMA, Kengo TERASAWA, Toshio KAWASHIMA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) We previously researched a fast full text searching method using word spotting technique. This method needed to segment newspaper image into character images in advance, however it was a difficult issue. Optical character recognition can be applied only if document images are noiseless and are printed in modern technique, but it cannot be applied to old and degraded document images. We propose an image processing method to improve character segmentation. To segment a whole newspaper image into paragraph images, ruled lines are detected using Hough Transform. The paragraph images contain some hindrances for character segmentation such as ruled lines, ruby characters and noises. Our algorithms remove them. The proposed system is tested for 20 paragraph images of historical newspaper. The accuracy of character segmentation is improved to approximately 92%.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Historical Document / Full Text Search / Character Segmentation / Optical Character Recognition / Digital Archive / Word Spotting
Paper # PRMU2010-237
Date of Issue

Conference Information
Committee PRMU
Conference Date 2011/3/3(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Pattern Recognition and Media Understanding (PRMU)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Image Processing for Historical Newspaper Archives
Sub Title (in English)
Keyword(1) Historical Document
Keyword(2) Full Text Search
Keyword(3) Character Segmentation
Keyword(4) Optical Character Recognition
Keyword(5) Digital Archive
Keyword(6) Word Spotting
1st Author's Name Takahiro SHIMA
1st Author's Affiliation Graduate School of Systems Information Science, Future University Hakodate()
2nd Author's Name Kengo TERASAWA
2nd Author's Affiliation Graduate School of Systems Information Science, Future University Hakodate
3rd Author's Name Toshio KAWASHIMA
3rd Author's Affiliation Graduate School of Systems Information Science, Future University Hakodate
Date 2011-03-10
Paper # PRMU2010-237
Volume (vol) vol.110
Number (no) 467
Page pp.pp.-
#Pages 6
Date of Issue