OATAO - Open Archive Toulouse Archive Ouverte Open Access Week

Entropy-based adaptive exploit-explore coefficient for Monte-Carlo path planning

Carmo, Ana Raquel and Delamer, Jean-Alexis and Watanabe, Yoko and Ventura, Rodrigo and Ponzoni Carvalho Chanel, Caroline Entropy-based adaptive exploit-explore coefficient for Monte-Carlo path planning. (2020) In: 10th International Conference on Prestigious Applications of Intelligent Systems (PAIS 2020), a subconference of the 24th European Conference on Artificial Intelligence (ECAI 2020), 31 August 2020 - 3 September 2020 (Virtual, Spain).

(Document in English)

PDF (Author's version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader

Official URL: https://ecai2020.eu/papers/pais/28_paper.pdf


Efficient path planning for autonomous vehicles in cluttered environments is a challenging sequential decision-making problem under uncertainty. In this context, this paper implements a partially observable stochastic shortest path (PO-SSP) planning problem for autonomous urban navigation of Unmanned Aerial Vehicles (UAVs). To solve this planning problem, the POMCP-GO algorithm is used, which is goal oriented variant of POMCP, one of the fastest online state-of-the-art solvers for partially observable environments based on Monte Carlo Planning. This algorithm relies on the Upper Confidence Bounds (UCB1) algorithm as action selection strategy. UCB1 depends on an exploration constant typically adjusted empirically. Its best value varies significantly between planning problems, and hence, an exhaustive search to find the most suitable value is required. This exhaustive search applied to a complex path planning problem may be extremely time consuming. Moreover, considering real applications where online planning is needed, this extensive search is not suitable. Thereby this paper explores the use of an adaptive exploration coefficient for action selection during planning. Monte-Carlo value backup approximation is also applied which empirically demonstrates to accelerate the policy value convergence. Simulation results show that the use of the adaptive exploration co- efficient within a user-defined interval achieves better convergence and success rates when compared with most hand-tuned fixed coefficients in said interval, although never achieving the same results as the best fixed coefficient. Therefore, a compromise must be made between the desired quality of the results and the time one is willing to spend on the exhaustive search for the best coefficient value before planning.

Item Type:Conference or Workshop Item (Paper)
HAL Id:hal-03125159
Audience (journal):International peer-reviewed journal
Audience (conference):International conference proceedings
Uncontrolled Keywords:
Institution:Université de Toulouse > Institut Supérieur de l'Aéronautique et de l'Espace - ISAE-SUPAERO (FRANCE)
French research institutions > Office National d'Etudes et Recherches Aérospatiales - ONERA (FRANCE)
Other partners > Queen's University (CANADA)
French research institutions > Artificial and Natural Intelligence Toulouse Institute - ANITI (FRANCE)
Other partners > Universidade de Lisboa - ULisboa (PORTUGAL)
Laboratory name:
ANITI ANR-19-PI3A-0004
Deposited On:27 Aug 2020 16:15

Repository Staff Only: item control page