OATAO - Open Archive Toulouse Archive Ouverte Open Access Week

Predicting the encoding of secondary diagnoses. An experience based on decision trees

Chahbandarian, Ghazar and Souf, Nathalie and Megdiche Bousarsar, Imen and Bastide, Rémi and Steinbach, Jean-Christophe Predicting the encoding of secondary diagnoses. An experience based on decision trees. (2017) Ingénierie des Systèmes d'Information, 22 (2). 69-94. ISSN 1633-1311

(Document in English)

PDF (Author's version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader

Official URL: https://doi.org/10.3166/ISI.22.2.69-94


In order to measure the medical activity, hospitals are required to manually encode diagnoses concerning an inpatient episode using the International Classification of Disease (ICD-10). This task is time consuming and requires substantial training for the staff. In this paper, we are proposing an approach able to speed up and facilitate the tedious manual task of coding patient information, especially while coding some secondary diagnoses that are not well described in the medical resources such as discharge letters and medical records. Our approach leverages data mining techniques, and specifically decision trees, in order to explore medical databases that encode such diagnoses knowledge. It uses the stored structured information (age, gender, diagnoses count, medical procedures, etc.) to build a decision tree which assigns the appropriate secondary diagnosis code into the corresponding inpatient episode. We have evaluated our approach on the PMSI database using fine and coarse levels of diagnoses granularity. Three types of experimentations have been performed using different techniques to balance datasets. The results show a significant variation in the evaluation scores between the different techniques for the same studied diagnoses. We highlight the efficiency of the random sampling techniques regardless of the type of diagnoses and the type of measure (F1-measure, recall and precision).

Item Type:Article
HAL Id:hal-02864391
Audience (journal):International peer-reviewed journal
Uncontrolled Keywords:
Institution:Other partners > Centre Hospitalier InterCommunal Castres-Mazamet - CHIC (FRANCE)
French research institutions > Centre National de la Recherche Scientifique - CNRS (FRANCE)
Université de Toulouse > Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE)
Université de Toulouse > Institut National des Sciences Appliquées de Toulouse - INSA (FRANCE)
Université de Toulouse > Université Toulouse III - Paul Sabatier - UT3 (FRANCE)
Université de Toulouse > Université Toulouse - Jean Jaurès - UT2J (FRANCE)
Université de Toulouse > Université Toulouse 1 Capitole - UT1 (FRANCE)
Université de Toulouse > Institut National Universitaire Champollion - INU (FRANCE)
Laboratory name:
UT3 : Université Toulouse 3 Paul Sabatier (France) - INU Champollion : Institut national universitaire Champollion (France) - Castres-Mazamet Technopole (France) - Région Occitanie Midi-Pyrénées (France)
Deposited On:04 Jun 2020 08:24

Repository Staff Only: item control page