Presentation | 2003/3/6 Extraction of Tag Tree Patterns with Contractible Variables from Semistructured Data Tetsuhiro MIYAHARA, Yusuke SUZUKI, Takayoshi SHOUDAI, Tomoyuki UCHIDA, Kenichi TAKAHASHI, Hiroaki UEDA, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Information Extraction from semistructured data becomes more and more important. In order to extract meaningful or interesting contents from semistructured data, we need to extract common structured patterns from semistructured data. A tag tree pattern is an edge labeled tree with ordered children which has tree structures of tags and structured variables. An edge label is a tag, a keyword or a wildcard, and a variable can be substituted by an arbitrary tree. In particular, a contractible variable matches any subtree including a singleton vertex. A tag tree pattern is hence suited for representing common tree structured patterns in irregular semistructured data. We present a new method for extracting characteristic tag tree patterns from irregular semistructured data by using an algorithm for finding a least generalized tag tree pattern explaining given data. We report some experiments of applying this method to extracting characteristic tag tree patterns from HTML/XML files. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Information Extraction / Web based mining / semistructured data / HTML / XML file / tag tree pattern |
Paper # | AI2002-63 |
Date of Issue |
Conference Information | |
Committee | AI |
---|---|
Conference Date | 2003/3/6(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Artificial Intelligence and Knowledge-Based Processing (AI) |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Extraction of Tag Tree Patterns with Contractible Variables from Semistructured Data |
Sub Title (in English) | |
Keyword(1) | Information Extraction |
Keyword(2) | Web based mining |
Keyword(3) | semistructured data |
Keyword(4) | HTML |
Keyword(5) | XML file |
Keyword(6) | tag tree pattern |
1st Author's Name | Tetsuhiro MIYAHARA |
1st Author's Affiliation | Faculty of Information Sciences, Hiroshima City University() |
2nd Author's Name | Yusuke SUZUKI |
2nd Author's Affiliation | Department of Informatics, Kyushu University |
3rd Author's Name | Takayoshi SHOUDAI |
3rd Author's Affiliation | Department of Informatics, Kyushu University |
4th Author's Name | Tomoyuki UCHIDA |
4th Author's Affiliation | Faculty of Information Sciences, Hiroshima City University |
5th Author's Name | Kenichi TAKAHASHI |
5th Author's Affiliation | Faculty of Information Sciences, Hiroshima City University |
6th Author's Name | Hiroaki UEDA |
6th Author's Affiliation | Faculty of Information Sciences, Hiroshima City University |
Date | 2003/3/6 |
Paper # | AI2002-63 |
Volume (vol) | vol.102 |
Number (no) | 709 |
Page | pp.pp.- |
#Pages | 5 |
Date of Issue |