L’attribution d’auteur peut être vue comme une tâche en catégorisation de textes qui
The authorship attribution problem can be viewed as a categorization problem. To determine the most effective features to discriminate between different writers (or categories), we have evaluated seven feature selection functions (e.g., pointwise mutual information, information gain, odds ratio, !2, or correlation coefficient). We have also considered two selection functions proposed in the context of authorship attribution. To compare these approaches, we have selected a newspaper corpus (Glasgow Herald) composed of 5,408 articles written by twenty columnists. Using the KLD (Zhao & Zobel, 2007) and the Delta (Burrows, 2002) attribution scheme, we found that some simple selection functions tend to produce results comparable to more complex ones.