Presentation 2005/7/15
Fast Search Algorithm for Sequences of Hierarchically Structured Data
Masaki MURATA, Masao UTIYAMA, Toshiyuki KANAMARU, Hitoshi ISAHARA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) We developed an algorithm for quickly searching sequences of hierarchically structured data, such as the Kyoto Text Corpus, where each word includes information on its part of speech (POS), minor POS and word itself. Using our method, we first make a data item where each data item in a lower level is surrounded by two data items in a higher level. We then connect these data items to make a long string and store the string in a database. We use suffix arrays to retrieve queries on the database. Our experiments showed that our method was 450 times faster than a conventional method at fastest and 64 times faster on average. Our method can be used for other kinds of hierarchically structured data, such as Web applications. Methods that can be used on such data are in high demand. For example, our method can be used to retrieve Web text, that includes hierarchical information of low, middle, and high semantic levels. If we use our method for such Web text, we can query using the terms, "Middle semantic level : technique", "Word : of", and "Low semantic level : administrative organ"; in other words, our retrieval method is more useful and convenient than conventional web retrieval.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Hierarchical Structure / Fast Search / Sequence of Data / Suffix Array / Corpus / Web
Paper # NLC2005-3
Date of Issue

Conference Information
Committee NLC
Conference Date 2005/7/15(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Fast Search Algorithm for Sequences of Hierarchically Structured Data
Sub Title (in English)
Keyword(1) Hierarchical Structure
Keyword(2) Fast Search
Keyword(3) Sequence of Data
Keyword(4) Suffix Array
Keyword(5) Corpus
Keyword(6) Web
1st Author's Name Masaki MURATA
1st Author's Affiliation National Institute of Information and Communications Technology()
2nd Author's Name Masao UTIYAMA
2nd Author's Affiliation National Institute of Information and Communications Technology
3rd Author's Name Toshiyuki KANAMARU
3rd Author's Affiliation Graduate School of Human and Environmental Studies, Kyoto University:National Institute of Information and Communications Technology
4th Author's Name Hitoshi ISAHARA
4th Author's Affiliation National Institute of Information and Communications Technology
Date 2005/7/15
Paper # NLC2005-3
Volume (vol) vol.105
Number (no) 203
Page pp.pp.-
#Pages 8
Date of Issue