OATAO - Open Archive Toulouse Archive Ouverte Open Access Week

Entropy evaluation based on confidence intervals of frequency estimates : Application to the learning of decision trees

Serrurier, Mathieu and Prade, Henri Entropy evaluation based on confidence intervals of frequency estimates : Application to the learning of decision trees. (2015) In: 32nd International Conference on Machine Learning (ICML 2015), 6 July 2015 - 11 July 2015 (Lille, France).

[img]
Preview
(Document in English)

PDF (Author's version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
2MB

Official URL: http://proceedings.mlr.press/v37/serrurier15.pdf

Abstract

Entropy gain is widely used for learning decision trees. However, as we go deeper downward the tree, the examples become rarer and the faithfulness of entropy decreases. Thus, misleading choices and over-fitting may occur and the tree has to be adjusted by using an early-stop criterion or post pruning algorithms. However, these methods still depends on the choices previously made, which may be unsatisfactory. We propose a new cumulative entropy function based on confidence intervals on frequency estimates that together considers the entropy of the probability distribution and the uncertainty around the estimation of its parameters. This function takes advantage of the ability of a possibility distribution to upper bound a family of probabilities previously estimated from a limited set of examples and of the link between possibilistic specificity order and entropy. The proposed measure has several advantages over the classical one. It performs significant choices of split and provides a statistically relevant stopping criterion that allows the learning of trees whose size is wellsuited w.r.t. the available data. On the top of that, it also provides a reasonable estimator of the performances of a decision tree. Finally, we show that it can be used for designing a simple and efficient online learning algorithm.

Item Type:Conference or Workshop Item (Paper)
Additional Information:This papers appears in ICML'15 : Proceedings of the 32nd International Conference on Machine Learning - Volume 37 ISSN: 1938-7228 The original PDF is available at: http://proceedings.mlr.press/v37/serrurier15.pdf
HAL Id:hal-01809356
Audience (conference):International conference proceedings
Uncontrolled Keywords:
Institution:French research institutions > Centre National de la Recherche Scientifique - CNRS (FRANCE)
Université de Toulouse > Institut National Polytechnique de Toulouse - INPT (FRANCE)
Université de Toulouse > Université Toulouse III - Paul Sabatier - UPS (FRANCE)
Université de Toulouse > Université Toulouse - Jean Jaurès - UT2J (FRANCE)
Université de Toulouse > Université Toulouse 1 Capitole - UT1 (FRANCE)
Other partners > University of Technology, Sydney - UTS (AUSTRALIA)
Laboratory name:
Statistics:download
Deposited By: IRIT IRIT
Deposited On:09 May 2018 13:22

Repository Staff Only: item control page