Variable selection is an essential tool for gaining knowledge on a problem or phenomenon, by identifying the factors that shows the highest influence on it. It is also fundamental for the implementation of machine learning-based approaches to modelling and classification tasks, by improving performances and reducing computational cost. Furthermore, in many real-world applications, such as the ones in the medical field, a relevant number of variables are jointly observed, but the number of available observations is quite limited. In these cases, variable selection is clearly essential, but standard variable selection approaches become “unstable”, as the high correlation among different variables or their similar relevance with respect to the considered target lead to multiple solutions leading to similar performances. In machine-learning based classification, the stability of variable selection, namely its robustness with respect variations in the classifier training dataset, is as important as the performance of the classifier itself. The paper presents an automatic procedure for variable selection in classification tasks, which ensures excellent stability of the selection and does not require any a priori information on the available data.

A Combined Approach for Enhancing the Stability of the Variable Selection Stage in Binary Classification Tasks

Cateni S.
;
Colla V.
;
Vannucci M.
2021-01-01

Abstract

Variable selection is an essential tool for gaining knowledge on a problem or phenomenon, by identifying the factors that shows the highest influence on it. It is also fundamental for the implementation of machine learning-based approaches to modelling and classification tasks, by improving performances and reducing computational cost. Furthermore, in many real-world applications, such as the ones in the medical field, a relevant number of variables are jointly observed, but the number of available observations is quite limited. In these cases, variable selection is clearly essential, but standard variable selection approaches become “unstable”, as the high correlation among different variables or their similar relevance with respect to the considered target lead to multiple solutions leading to similar performances. In machine-learning based classification, the stability of variable selection, namely its robustness with respect variations in the classifier training dataset, is as important as the performance of the classifier itself. The paper presents an automatic procedure for variable selection in classification tasks, which ensures excellent stability of the selection and does not require any a priori information on the available data.
2021
978-3-030-85098-2
978-3-030-85099-9
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11382/539990
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
social impact