Presentation 2003/3/6
Extraction of Tag Tree Patterns with Contractible Variables from Semistructured Data
Tetsuhiro MIYAHARA, Yusuke SUZUKI, Takayoshi SHOUDAI, Tomoyuki UCHIDA, Kenichi TAKAHASHI, Hiroaki UEDA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Information Extraction from semistructured data becomes more and more important. In order to extract meaningful or interesting contents from semistructured data, we need to extract common structured patterns from semistructured data. A tag tree pattern is an edge labeled tree with ordered children which has tree structures of tags and structured variables. An edge label is a tag, a keyword or a wildcard, and a variable can be substituted by an arbitrary tree. In particular, a contractible variable matches any subtree including a singleton vertex. A tag tree pattern is hence suited for representing common tree structured patterns in irregular semistructured data. We present a new method for extracting characteristic tag tree patterns from irregular semistructured data by using an algorithm for finding a least generalized tag tree pattern explaining given data. We report some experiments of applying this method to extracting characteristic tag tree patterns from HTML/XML files.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Information Extraction / Web based mining / semistructured data / HTML / XML file / tag tree pattern
Paper # AI2002-63
Date of Issue

Conference Information
Committee AI
Conference Date 2003/3/6(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Artificial Intelligence and Knowledge-Based Processing (AI)
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Extraction of Tag Tree Patterns with Contractible Variables from Semistructured Data
Sub Title (in English)
Keyword(1) Information Extraction
Keyword(2) Web based mining
Keyword(3) semistructured data
Keyword(4) HTML
Keyword(5) XML file
Keyword(6) tag tree pattern
1st Author's Name Tetsuhiro MIYAHARA
1st Author's Affiliation Faculty of Information Sciences, Hiroshima City University()
2nd Author's Name Yusuke SUZUKI
2nd Author's Affiliation Department of Informatics, Kyushu University
3rd Author's Name Takayoshi SHOUDAI
3rd Author's Affiliation Department of Informatics, Kyushu University
4th Author's Name Tomoyuki UCHIDA
4th Author's Affiliation Faculty of Information Sciences, Hiroshima City University
5th Author's Name Kenichi TAKAHASHI
5th Author's Affiliation Faculty of Information Sciences, Hiroshima City University
6th Author's Name Hiroaki UEDA
6th Author's Affiliation Faculty of Information Sciences, Hiroshima City University
Date 2003/3/6
Paper # AI2002-63
Volume (vol) vol.102
Number (no) 709
Page pp.pp.-
#Pages 5
Date of Issue