OATAO - Open Archive Toulouse Archive Ouverte Open Access Week

Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies

Dinh, Duy and Tamine, Lynda and Boubekeur, Fatiha Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies. (2013) Artificial Intelligence in Medicine, 57 (2). 155-167. ISSN 0933-3657

[img]
Preview
(Document in English)

PDF (Author's version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
917kB

Official URL: http://dx.doi.org/10.1016/j.artmed.2012.08.006

Abstract

The aim of this work is to evaluate a set of indexing and retrieval strategies based on the integration of several biomedical terminologies on the available TREC Genomics collections for an ad hoc information retrieval (IR) task. Materials and methods: We propose a multi-terminology based concept extraction approach to selecting best concepts from free text by means of voting techniques. We instantiate this general approach on four terminologies (MeSH, SNOMED, ICD-10 and GO). We particularly focus on the effect of integrating terminologies into a biomedical IR process, and the utility of using voting techniques for combining the extracted concepts from each document in order to provide a list of unique concepts. Results: Experimental studies conducted on the TREC Genomics collections show that our multi-terminology IR approach based on voting techniques are statistically significant compared to the baseline. For example, tested on the 2005 TREC Genomics collection, our multi-terminology based IR approach provides an improvement rate of +6.98% in terms of MAP (mean average precision) (p < 0.05) compared to the baseline. In addition, our experimental results show that document expansion using preferred terms in combination with query expansion using terms from top ranked expanded documents improve the biomedical IR effectiveness. Conclusion: We have evaluated several voting models for combining concepts issued from multiple terminologies. Through this study, we presented many factors affecting the effectiveness of biomedical IR system including term weighting, query expansion, and document expansion models. The appropriate combination of those factors could be useful to improve the IR performance.

Item Type:Article
Additional Information:Thanks to Elsevier editor. The definitive version is available at http://www.sciencedirect.com The original PDF of the article can be found at Artificial Intelligence in Medicine website : http://www.sciencedirect.com/science/journal/09333657
HAL Id:hal-01123496
Audience (journal):International peer-reviewed journal
Uncontrolled Keywords:
Institution:French research institutions > Centre National de la Recherche Scientifique - CNRS (FRANCE)
Université de Toulouse > Institut National Polytechnique de Toulouse - INPT (FRANCE)
Université de Toulouse > Université Toulouse III - Paul Sabatier - UPS (FRANCE)
Université de Toulouse > Université Toulouse - Jean Jaurès - UT2J (FRANCE)
Université de Toulouse > Université Toulouse 1 Capitole - UT1 (FRANCE)
Other partners > Université Mouloud Mammeri Tizi Ouzou - UMMTO (ALGERIA)
Laboratory name:
Statistics:download
Deposited By: IRIT IRIT
Deposited On:05 Mar 2015 08:08

Repository Staff Only: item control page