Gene expression profiling reveals the activity of thousands of genes that can help to identify cancer biomarkers. However, the presence of such a large number of genes in the profiles inflicts a high computational burden on classifiers. To deal with the high-dimensional feature space, in this paper, we introduce a 3-phase feature selection framework, ANOVA-SRC-BPSO. ANOVA-SRC-BPSO first distinguishes the highly class-correlated genes utilizing the analysis of variance (ANOVA) and F-test. In the second phase, we employ Spearman rank-order correlation (SRC) to eliminate redundant genes. Finally, we leverage the binary particle swarm optimization (BPSO) with the support vector machine (SVM) classifier to select an optimized feature subset. We report the accuracy of ANOVA-SRC-BPSO utilizing the SVM classifier in seven gene expression datasets. The comparisons with fourteen state-of-the-art methods show that ANOVA-SRC-BPSO yields the highest accuracy in five datasets. Moreover, we disclose that the performances of various feature selection approaches are inconsistent across gene expression datasets.
Article ID: 2021L11
Publisher: Canadian Artificial Intelligence Association