OATAO - Open Archive Toulouse Archive Ouverte Open Access Week

A data-mining approach for assessing consistency between multiple representations in spatial databases

Sheeren, David and Mustière, Sébastien and Zucker, Jean-Daniel A data-mining approach for assessing consistency between multiple representations in spatial databases. (2009) International Journal of Geographical Information Science, vol. 23 (n° 8). pp. 961-992. ISSN 1365-8816

[img] (Document in English)

PDF (Author's version) - Depositor and staff only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
18MB

Official URL: http://dx.doi.org/10.1080/13658810701791949

Abstract

When different spatial databases are combined, an important issue is the identification of inconsistencies between data. Quite often, representations of the same geographical entities in databases are different and reflect different points of view. In order to fully take advantage of these differences when object instances are associated, a key issue is to determine whether the differences are normal, i.e. explained by the database specifications, or if they are due to erroneous or outdated data in one database. In this paper, we propose a knowledge-based approach to partially automate the consistency assessment between multiple representations of data. The inconsistency detection is viewed as a knowledge-acquisition problem, the source of knowledge being the data. The consistency assessment is carried out by applying a proposed method called MECO. This method is itself parameterized by some domain knowledge obtained from a second method called MACO. MACO supports two approaches (direct or indirect) to perform the knowledge acquisition using data-mining techniques. In particular, a supervised learning approach is defined to automate the knowledge acquisition so as to drastically reduce the human-domain expert's work. Thanks to this approach, the knowledge-acquisition process is sped up and less expertdependent. Training examples are obtained automatically upon completion of the spatial data matching. Knowledge extraction from data following this bottom-up approach is particularly useful, since the database specifications are generally complex, difficult to analyse, and manually encoded. Such a data-driven process also sheds some light on the gap between textual specifications and those actually used to produce the data. The methodology is illustrated and experimentally validated by comparing geometrical representations and attribute values of different vector spatial databases. The advantages and limits of such partially automatic approaches are discussed, and some future works are suggested.

Item Type:Article
Audience (journal):International peer-reviewed journal
Uncontrolled Keywords:
Institution:French research institutions > Centre National de la Recherche Scientifique - CNRS (FRANCE)
French research institutions > Institut National de l’Information Géographique et forestière - IGN (FRANCE)
Université de Toulouse > Institut National Polytechnique de Toulouse - INPT (FRANCE)
Other partners > Institut National des Sciences Appliquées de Strasbourg - INSA (FRANCE)
French research institutions > Institut de Recherche pour le Développement - IRD (FRANCE)
Other partners > Université de Strasbourg - UNISTRA (FRANCE)
Laboratory name:
Statistics:download
Deposited By: David SHEEREN
Deposited On:04 Nov 2013 07:57

Repository Staff Only: item control page