Presentation | 2005/7/15 Fast Search Algorithm for Sequences of Hierarchically Structured Data Masaki MURATA, Masao UTIYAMA, Toshiyuki KANAMARU, Hitoshi ISAHARA, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | We developed an algorithm for quickly searching sequences of hierarchically structured data, such as the Kyoto Text Corpus, where each word includes information on its part of speech (POS), minor POS and word itself. Using our method, we first make a data item where each data item in a lower level is surrounded by two data items in a higher level. We then connect these data items to make a long string and store the string in a database. We use suffix arrays to retrieve queries on the database. Our experiments showed that our method was 450 times faster than a conventional method at fastest and 64 times faster on average. Our method can be used for other kinds of hierarchically structured data, such as Web applications. Methods that can be used on such data are in high demand. For example, our method can be used to retrieve Web text, that includes hierarchical information of low, middle, and high semantic levels. If we use our method for such Web text, we can query using the terms, "Middle semantic level : technique", "Word : of", and "Low semantic level : administrative organ"; in other words, our retrieval method is more useful and convenient than conventional web retrieval. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Hierarchical Structure / Fast Search / Sequence of Data / Suffix Array / Corpus / Web |
Paper # | NLC2005-3 |
Date of Issue |
Conference Information | |
Committee | NLC |
---|---|
Conference Date | 2005/7/15(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Natural Language Understanding and Models of Communication (NLC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Fast Search Algorithm for Sequences of Hierarchically Structured Data |
Sub Title (in English) | |
Keyword(1) | Hierarchical Structure |
Keyword(2) | Fast Search |
Keyword(3) | Sequence of Data |
Keyword(4) | Suffix Array |
Keyword(5) | Corpus |
Keyword(6) | Web |
1st Author's Name | Masaki MURATA |
1st Author's Affiliation | National Institute of Information and Communications Technology() |
2nd Author's Name | Masao UTIYAMA |
2nd Author's Affiliation | National Institute of Information and Communications Technology |
3rd Author's Name | Toshiyuki KANAMARU |
3rd Author's Affiliation | Graduate School of Human and Environmental Studies, Kyoto University:National Institute of Information and Communications Technology |
4th Author's Name | Hitoshi ISAHARA |
4th Author's Affiliation | National Institute of Information and Communications Technology |
Date | 2005/7/15 |
Paper # | NLC2005-3 |
Volume (vol) | vol.105 |
Number (no) | 203 |
Page | pp.pp.- |
#Pages | 8 |
Date of Issue |