大会名称
2009年 情報科学技術フォーラム(FIT)
大会コ-ド
F
開催年
2009
発行日
2009/8/20
セッション番号
5G
セッション名
データマイニング
講演日
2009/09/03
講演場所(会議室等)
G会場(9号館2F 922教室)
講演番号
F-046
タイトル
Biological Data Analysis based on Kolmogorov Complexity
著者名
伊藤 公人Zeugmann Thomas朱 ユ
キーワード
data mining, Kolmogorov Complexity, NCD, biological database
抄録
In this paper, we focus on one simple data mining tool which is called the NCD (Normalized Compression Distance) and has been suggested by Cilibrasi and Vitányi. It is an information distance between two terms, and can be derived from the ""similarity metric"", which is defined in the context of Kolmogorov complexity. We implemented the tool to actual biological data on the DNA sequences of influenza viruses. We applied the NCD matrix to clustering of different subtypes DNA sequences. The result shows that NCD is helpful for extracting hidden information in biological database.
本文pdf
PDF download (163.2KB)