Comparing Feature Selection Methods on Metagenomic Data using Random Forest Classifier
DOI:
https://doi.org/10.14738/tecs.121.16525Keywords:
Feature Selection, Metagenomics, Random Forest, Ensemble feature selectionAbstract
Feature selection (FS) as a data preprocessing strategy is an efficient way to prepare input data for various fields, such as metagenomics, where datasets tend to be very high-dimensional. The objectives of feature selection include creating lower dimensional and cleaner input data, along with building simpler and more coherent machine learning models. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data, which needs to be preprocessed with feature selection. In this article we provide a general overview of different feature selection methods and their applicability for disease risk prediction. From these, we selected and compared six different FS methods on two freely available metagenomic datasets using the same machine learning algorithm (Random Forest) for comparability. Based on the results of the individual FS methods, ensemble feature sets were created in multiple ways to improve the accuracy of Random Forest predictions.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Zoltán Pödör, Máté Hekfusz
This work is licensed under a Creative Commons Attribution 4.0 International License.