Variable selection is an essential tool for gaining knowledge on a problem or phenomenon, by identifying the factors that shows the highest influence on it. It is also fundamental for the implementation of machine learning-based approaches to modelling and classification tasks, by improving performances and reducing computational cost. Furthermore, in many real-world applications, such as the ones in the medical field, a relevant number of variables are jointly observed, but the number of available observations is quite limited. In these cases, variable selection is clearly essential, but standard variable selection approaches become “unstable”, as the high correlation among different variables or their similar relevance with respect to the considered target lead to multiple solutions leading to similar performances. In machine-learning based classification, the stability of variable selection, namely its robustness with respect variations in the classifier training dataset, is as important as the performance of the classifier itself. The paper presents an automatic procedure for variable selection in classification tasks, which ensures excellent stability of the selection and does not require any a priori information on the available data.
A Combined Approach for Enhancing the Stability of the Variable Selection Stage in Binary Classification Tasks
Cateni S.
;Colla V.
;Vannucci M.
2021-01-01
Abstract
Variable selection is an essential tool for gaining knowledge on a problem or phenomenon, by identifying the factors that shows the highest influence on it. It is also fundamental for the implementation of machine learning-based approaches to modelling and classification tasks, by improving performances and reducing computational cost. Furthermore, in many real-world applications, such as the ones in the medical field, a relevant number of variables are jointly observed, but the number of available observations is quite limited. In these cases, variable selection is clearly essential, but standard variable selection approaches become “unstable”, as the high correlation among different variables or their similar relevance with respect to the considered target lead to multiple solutions leading to similar performances. In machine-learning based classification, the stability of variable selection, namely its robustness with respect variations in the classifier training dataset, is as important as the performance of the classifier itself. The paper presents an automatic procedure for variable selection in classification tasks, which ensures excellent stability of the selection and does not require any a priori information on the available data.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.