OATAO - Open Archive Toulouse Archive Ouverte Open Access Week

Statistical Analysis to Establish the Importance of Information Retrieval Parameters

Ayter, Julie and Chifu, Adrian-Gabriel and Déjean, Sébastien and Desclaux, Cecile and Mothe, Josiane Statistical Analysis to Establish the Importance of Information Retrieval Parameters. (2015) Journal of Universal Computer Science, 21 (13). 1767-1789. ISSN 0948-695X

[img]
Preview
(Document in English)

PDF (Author's version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
586kB

Official URL: http://dx.doi.org/10.3217/jucs-021-13-1767

Abstract

Search engines are based on models to index documents, match queries and documents and rank documents. Research in Information Retrieval (IR) aims at defining these models and their parameters in order to optimize the results. Using benchmark collections, it has been shown that there is not a best system configuration that works for any query, but rather that performance varies from one query to another. It would be interesting if a meta-system could decide which system configuration should process a new query by learning from the context of previousqueries. This paper reports a deep analysis considering more than 80,000 search engine config- urations applied to 100 queries and the corresponding performance. The goal of the analysis is to identify which configuration responds best to a certain type of query. We considered two approaches to define query types: one is post-evaluation, based on query clustering according to the performance measured with Average Precision, while the second approach is pre-evaluation, using query features (including query difficulty predictors) to cluster queries. Globally, we identified two parameters that should be optimized: retrieving model and TrecQueryTags process. One could ex- pect such results as these two parameters are major components of IR process. However our work results in two main conclusions: 1) based on post-evaluation approach, we found that retrieving model is the most influential parameter for easy queries while TrecQueryTags process is for hard queries; 2) for pre-evaluation, current query features do not allow to cluster queries to identify differences in the influential parameters.

Item Type:Article
Additional Information:The original PDF can be found at: http://www.jucs.org/jucs_21_13/statistical_analysis_to_establish
HAL Id:hal-01592043
Audience (journal):International peer-reviewed journal
Uncontrolled Keywords:
Institution:French research institutions > Centre National de la Recherche Scientifique - CNRS (FRANCE)
Université de Toulouse > Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE)
Université de Toulouse > Institut National des Sciences Appliquées de Toulouse - INSA (FRANCE)
Université de Toulouse > Université Toulouse III - Paul Sabatier - UT3 (FRANCE)
Université de Toulouse > Université Toulouse - Jean Jaurès - UT2J (FRANCE)
Université de Toulouse > Université Toulouse 1 Capitole - UT1 (FRANCE)
Laboratory name:
Statistics:download
Deposited On:12 Sep 2017 12:56

Repository Staff Only: item control page