ARIA

Association Francophone de Recherche d’Information (RI) et Applications

Actes de CORIA 2013
PDF

Auteurs

Romain Deveaud, Ludovic Bonnefoy, Patrice Bellot

Résumé

Nous proposons dans cet article une méthode non supervisée pour l’identification et

Abstract

In this paper we introduce an unsupervised method for mining and modeling la- tent search concepts. We use Latent Dirichlet Allocation (LDA), a generative probabilistic topic model, to exhibit highly-specific query-related topics from pseudo-relevant feedback doc- uments. Our approach automatically estimates the number of latent concepts as well as the needed amount of feedback documents, without any prior training step. Latent concepts are then weighted to reflect their relative adequacy and are further used to automatically reformu- late the initial user query. We also explore the use of different types of sources of information for modeling the latent concepts. For this purpose, we use four general sources of information of various nature (web, news, encyclopedic) from which the feedback documents are extracted. We evaluate our approach over two large ad-hoc TREC collections, and results show that it signif- icantly improves document retrieval effectiveness while best results are achieved by combining latent concepts modeled from all available sources.

Posts Récents

Catégories

A Propos

ARIA (Association Francophone de Recherche d’Information (RI) et Applications) est une société savante, association loi 1901, ayant pour but de promouvoir le savoir et les connaissances du domaine de la Recherche d’Information (RI) et des divers domaines scientifiques en jeu dans la conception, la réalisation et l’évaluation des systèmes de Recherche d’Information.