Presentation 2006-07-14
A Proposal of Information Extraction System with Data Cleaning Facility
Yoshiharu ISHIKAWA, Sayumi KUROKAWA, Jianwei ZHANG, Hiroyuki KITAGAWA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Information extraction to acquire useful information from a large amount of text sources such as Web is one of the important research topics in data engineering. For useful information extraction, errors and noises included in extraction results should be reduced. In this paper, we propose an approach to an information extraction system with high accuracy by integrating data cleaning into information extraction and using interactive feedbacks from users. The approach is based on the bootstrap record extraction method and includes data cleaning in the process of record extraction. User feedbacks are reflected in the evaluation of the extracted records and the extraction patterns.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) information extraction / record extraction / data cleaning / bootstrapping
Paper # DE2006-102
Date of Issue

Conference Information
Committee DE
Conference Date 2006/7/7(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Data Engineering (DE)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) A Proposal of Information Extraction System with Data Cleaning Facility
Sub Title (in English)
Keyword(1) information extraction
Keyword(2) record extraction
Keyword(3) data cleaning
Keyword(4) bootstrapping
1st Author's Name Yoshiharu ISHIKAWA
1st Author's Affiliation Information Technology Center, Nagoya University()
2nd Author's Name Sayumi KUROKAWA
2nd Author's Affiliation Department of Computer Science, Graduate School of Systems and Information Engineering
3rd Author's Name Jianwei ZHANG
3rd Author's Affiliation Department of Computer Science, Graduate School of Systems and Information Engineering
4th Author's Name Hiroyuki KITAGAWA
4th Author's Affiliation Department of Computer Science, Graduate School of Systems and Information Engineering:Center for Computational Sciences University of Tsukuba
Date 2006-07-14
Paper # DE2006-102
Volume (vol) vol.106
Number (no) 150
Page pp.pp.-
#Pages 6
Date of Issue