Presentation 2010-06-14
All Pairs Similarity Search by Multiple Sorting
Koji TSUDA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Recently it is increasingly common that images and signals are mapped to bit strings called sketches. To build a similarity network necessary for, e.g., semi-supervised learning, it is necessary to find all close pairs in terms of Hamming distance. However, it is too slow to compute all distances and pruning by metricity does not help much. In this presentation, we present the multiple sorting method which combines pattern growth and radix sort. It is shown that it is much faster than cover tree and Lanczos bisection in a large scale image dataset. In this method, the average false negative rate is computable and duplicated discoveries are deliberately avoided. It is possible to generalize the method to edit distance. In an experiment with tens of millions of biological sequences, our method was 20 to 1000 fold faster than suffix-array based methods.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Multiple Sorting Method / All Pairs Similarity Search
Paper # IBISML2010-2
Date of Issue

Conference Information
Committee IBISML
Conference Date 2010/6/7(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Information-Based Induction Sciences and Machine Learning (IBISML)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) All Pairs Similarity Search by Multiple Sorting
Sub Title (in English)
Keyword(1) Multiple Sorting Method
Keyword(2) All Pairs Similarity Search
1st Author's Name Koji TSUDA
1st Author's Affiliation Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology()
Date 2010-06-14
Paper # IBISML2010-2
Volume (vol) vol.110
Number (no) 76
Page pp.pp.-
#Pages 1
Date of Issue