Presentation 2001/10/4
Efficient Substructure Discovery from Large Semi-structured Data
Tatsuya ASAI, Kenji ABE, Shinji KAWASOE, Hiroki ARIMURA, Setsuo ARIKAWA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this paper, we consider a data mining problem for semi-structured data. We present an efficient algorithm for discovering frequent substructures from a given large collection of semi-structured data by modeling semi-structured data as labeled ordered trees. This algorithm is a generalization of the itemset enumeration technique, called set-enumeration tree, by Bayardo(SIGMOD'98)to ordered tree enumeration. The experiments on HTML documents show that the algorithm is efficient and scalabel on realworld data.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) semi-structured data / data mining / web mining / HTML / XML / substructure patterns
Paper # DE2001-105
Date of Issue

Conference Information
Committee DE
Conference Date 2001/10/4(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Data Engineering (DE)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Efficient Substructure Discovery from Large Semi-structured Data
Sub Title (in English)
Keyword(1) semi-structured data
Keyword(2) data mining
Keyword(3) web mining
Keyword(4) HTML
Keyword(5) XML
Keyword(6) substructure patterns
1st Author's Name Tatsuya ASAI
1st Author's Affiliation Graduate School of Information Science and Electrical Engineering, Kyushu University()
2nd Author's Name Kenji ABE
2nd Author's Affiliation Graduate School of Information Science and Electrical Engineering, Kyushu University
3rd Author's Name Shinji KAWASOE
3rd Author's Affiliation Graduate School of Information Science and Electrical Engineering, Kyushu University
4th Author's Name Hiroki ARIMURA
4th Author's Affiliation Graduate School of Information Science and Electrical Engineering, Kyushu University
5th Author's Name Setsuo ARIKAWA
5th Author's Affiliation Graduate School of Information Science and Electrical Engineering, Kyushu University
Date 2001/10/4
Paper # DE2001-105
Volume (vol) vol.101
Number (no) 342
Page pp.pp.-
#Pages 8
Date of Issue