Association Francophone de Recherche d’Information (RI) et Applications

Actes de CORIA 2008


Ahmad El Sayed, Hakim Hacid, Djamel A. Zighed




The goal of any clustering algorithm is to find the optimal clustering solution with the optimal number of clusters. In order to evaluate a clustering solution, a number of validity indices are used during or at the end of a clustering process. They can be internal, external or relative. In this paper, we provide two main contributions: First, we present an experimental study comparing the major relative indices in the context of document agglomerative cluster- ing. The objective is to highlight the limits of the existing indices for identifying both the optimal clustering solution and the optimal number of clusters in real datasets. Second, we explore the feasibility of using the relative indices as stopping criteria in agglomerative clustering algo- rithms. We present a new method that enhances the clustering process with context-awareness to improve their reliability for such utilization.

