Application of F-value to classification problems with numerous numbers of explanatory variables

Tomomasa Nagashima; Yoshifumi Okada; Masahiro Sawai

Summary

International Symposium on Nonlinear Theory and its Applications

2009

Session Number:C2L-B

Session:

Number:C2L-B2

Application of F-value to classification problems with numerous numbers of explanatory variables

Tomomasa Nagashima, Yoshifumi Okada, Masahiro Sawai,

pp.-

Publication Date:2009/10/18

Online ISSN:2188-5079

DOI:10.34385/proc.43.C2L-B2

PDF download (216.9KB)

Summary:

F-value is a statistics which estimates a significance of variables participating discriminant efficiency. It has been used in statistical discriminant analysis. However, it seems a few investigations on real problems which must cope with numerous numbers of explanatory variables amount to ten thousand. In such cases, it becomes important to extract useful variables for classification from numerous numbers of variables, because we do not know in advance which variables make significant contribution. In this paper, we show how F-value is used to extract important variables to classify samples in different classes, and then the extracted important variables are tested by predicting the classes of unknown samples in DNA Microarray (disease) datasets, where numerous numbers of genes as explanatory variables are involved. We clarify that our method achieved 100% accuracy to predict the classes of unknown samples in datasets using only a small limited number of genes.