OATAO - Open Archive Toulouse Archive Ouverte Open Access Week

DTD based costs for Tree-Edit distance in Structured Information Retrieval

Laitang, Cyril and Pinel-Sauvagnat, Karen and Boughanem, Mohand DTD based costs for Tree-Edit distance in Structured Information Retrieval. (2013) In: 35th European Conference on Information Retrieval (ECIR 2013), 24 March 2013 - 27 March 2013 (Moscou, Russian Federation).

[img]
Preview
(Document in English)

PDF (Author's version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1MB

Official URL: http://dx.doi.org/10.1007/978-3-642-36973-5_14

Abstract

In this paper we present a Structured Information Retrieval (SIR) model based on graph matching. Our approach combines content propagation, which handles sibling relationships, with a document-query structure matching process. The latter is based on Tree-Edit Distance (TED) which is the minimum set of insert, delete, and replace operations to turn one tree to another. To our knowledge this algorithm has never been used in ad-hoc SIR. As the effectiveness of TED relies both on the input tree and the edit costs, we first present a focused subtree extraction technique which selects the most representative elements of the document w.r.t the query. We then describe our TED costs setting based on the Document Type Definition (DTD). Finally we discuss our results according to the type of the collection (data-oriented or text-oriented). Experiments are conducted on two INEX test sets: the 2010 Datacentric collection and the 2005 Ad-hoc one.

Item Type:Conference or Workshop Item (Paper)
Additional Information:Thanks to Springer editor. This papers appears in Volume 87814 Lecture Notes in Computer Science ISSN : 0302-9743. ISBN: 978-3-642-36972-8. The original PDF is available at : http://link.springer.com/chapter/10.1007%2F978-3-642-36973-5_14
HAL Id:hal-01264568
Audience (conference):International conference proceedings
Uncontrolled Keywords:
Institution:Université de Toulouse > Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE)
French research institutions > Centre National de la Recherche Scientifique - CNRS (FRANCE)
Université de Toulouse > Université Toulouse III - Paul Sabatier - UT3 (FRANCE)
Université de Toulouse > Université Toulouse - Jean Jaurès - UT2J (FRANCE)
Université de Toulouse > Université Toulouse 1 Capitole - UT1 (FRANCE)
Laboratory name:
Statistics:download
Deposited On:07 Dec 2015 10:51

Repository Staff Only: item control page