OATAO - Open Archive Toulouse Archive Ouverte Open Access Week

On the Locality of Action Domination in Sequential Decision Making

Rachelson, Emmanuel and Lagoudakis, Michail G. On the Locality of Action Domination in Sequential Decision Making. (2010) In: 11th International Symposium on Artificial Intelligence and Mathematics (ISIAM 2010), 6 January 2010 - 8 January 2010 (Fort Lauderdale, United States).

[img]
Preview
(Document in English)

PDF (Author's version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
393kB

Abstract

In the field of sequential decision making and reinforcement learning, it has been observed that good policies for most problems exhibit a significant amount of structure. In practice, this implies that when a learning agent discovers an action is better than any other in a given state, this action actually happens to also dominate in a certain neighbourhood around that state. This paper presents new results proving that this notion of locality in action domination can be linked to the smoothness of the environment's underlying stochastic model. Namely, we link the Lipschitz continuity of a Markov Decision Process to the Lispchitz continuity of its policies' value functions and introduce the key concept of influence radius to describe the neighbourhood of states where the dominating action is guaranteed to be constant. These ideas are directly exploited into the proposed Localized Policy Iteration (LPI) algorithm, which is an active learning version of Rollout-based Policy Iteration. Preliminary results on the Inverted Pendulum domain demonstrate the viability and the potential of the proposed approach.

Item Type:Conference or Workshop Item (Paper)
Audience (conference):International conference proceedings
Uncontrolled Keywords:
Institution:Other partners > Technical University of Crete (GREECE)
Statistics:download
Deposited On:29 Nov 2017 16:23

Repository Staff Only: item control page