5-7 Apr 2023 Montpellier (France)
Sparse GEMINI for Joint Discriminative Clustering and Feature Selection
Louis Ohl  1, 2@  , Pierre-Alexandre Mattei, Frédéric Precioso, Charles Bouveyron@
1 : Laboratoire d'Informatique Signaux et Systèmes de Sophia-Antpolis
Laboratoire d\'Informatique, Signaux, et Systèmes de Sophia Antipolis, Laboratoire d\'Informatique, Signaux, et Systèmes de Sophia Antipolis, Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis
2 : Equipe Maasai, Inria Sophia Antipolis
Institut National de Recherche en Informatique et en Automatique

Feature selection in clustering is a hard task which involves simultaneously the discovery of relevant clusters as well as relevant variables with respect to these clusters. While feature selection algorithms are often model-based through optimised model selection or strong assumptions on p(x), we introduce a discriminative clustering model trying to maximise a geometry-aware generalisation of the mutual information called GEMINI with a simple \ell_1 penalty: the Sparse GEMINI. This algorithm avoids the burden of combinatorial feature subset exploration and is easily scalable to high-dimensional data and large amounts of samples while only designing a clustering model p_\theta(y|x). We demonstrate the performances of Sparse GEMINI on synthetic datasets as well as large-scale datasets. Our results show that Sparse GEMINI is a competitive algorithm and has the ability to select relevant subsets of variables with respect to the clustering without using relevance criteria or prior hypotheses.

Online user: 1 Privacy