Auteurs
Résumé
Le travail que nous présentons ici a pour but la comparaison de méthodes de sélection
Abstract
In this contribution, we review a number of approaches to feature selection, divided in two broad classes. Some are corpus-based, ie they use only the data to assess the relevance of each feature, and aim at identifying a small subset of relevant features on which to train categorisation models. Others are model-based, ie they assess the relevance of each feature on the basis of the model used for categorisation. This second class of measures allows to better understand the model decisions. Furthermore, comparing the two classes provide insight on whether or not corpus-based feature extraction is selective enough, and does not overgener- ate compared to model-based selection. Our experimental comparison is mainly based on a collection of medical abstracts, provided by the Swiss Institute of Bioinformatics.