Presentation | 2011-03-10 Image Processing for Historical Newspaper Archives Takahiro SHIMA, Kengo TERASAWA, Toshio KAWASHIMA, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | We previously researched a fast full text searching method using word spotting technique. This method needed to segment newspaper image into character images in advance, however it was a difficult issue. Optical character recognition can be applied only if document images are noiseless and are printed in modern technique, but it cannot be applied to old and degraded document images. We propose an image processing method to improve character segmentation. To segment a whole newspaper image into paragraph images, ruled lines are detected using Hough Transform. The paragraph images contain some hindrances for character segmentation such as ruled lines, ruby characters and noises. Our algorithms remove them. The proposed system is tested for 20 paragraph images of historical newspaper. The accuracy of character segmentation is improved to approximately 92%. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Historical Document / Full Text Search / Character Segmentation / Optical Character Recognition / Digital Archive / Word Spotting |
Paper # | PRMU2010-237 |
Date of Issue |
Conference Information | |
Committee | PRMU |
---|---|
Conference Date | 2011/3/3(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Pattern Recognition and Media Understanding (PRMU) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Image Processing for Historical Newspaper Archives |
Sub Title (in English) | |
Keyword(1) | Historical Document |
Keyword(2) | Full Text Search |
Keyword(3) | Character Segmentation |
Keyword(4) | Optical Character Recognition |
Keyword(5) | Digital Archive |
Keyword(6) | Word Spotting |
1st Author's Name | Takahiro SHIMA |
1st Author's Affiliation | Graduate School of Systems Information Science, Future University Hakodate() |
2nd Author's Name | Kengo TERASAWA |
2nd Author's Affiliation | Graduate School of Systems Information Science, Future University Hakodate |
3rd Author's Name | Toshio KAWASHIMA |
3rd Author's Affiliation | Graduate School of Systems Information Science, Future University Hakodate |
Date | 2011-03-10 |
Paper # | PRMU2010-237 |
Volume (vol) | vol.110 |
Number (no) | 467 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |