OATAO - Open Archive Toulouse Archive Ouverte Open Access Week

Asymptotic optimal control of Markov-modulated restless bandits

Duran, Santiago and Verloop, Maaike Asymptotic optimal control of Markov-modulated restless bandits. (2018) In: International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2018), 18 June 2018 - 22 June 2018 (Irvine, United States).

[img]
Preview
(Document in English)

PDF (Author's version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
568kB

Official URL: https://doi.org/10.1145/3179410

Abstract

This paper studies optimal control subject to changing conditions. This is an area that recently received a lot of attention as it arises in numerous situations in practice. Some applications being cloud computing systems where the arrival rates of new jobs fluctuate over time, or the time-varying capacity as encountered in power-aware systems or wireless downlink channels. To study this, we focus on a restless bandit model, which has proved to be a powerful stochastic optimization framework to model scheduling of activities. In particular, it has been extensively applied in the context of optimal control of computing systems. This paper is a first step to its optimal control when restless bandits are subject to changing conditions, the latter being modeled by Markov-modulated environments. We consider the restless bandit problem in an asymptotic regime, which is obtained by letting the population of bandits grow large, and letting the environment change relatively fast. We present sufficient conditions for a policy to be asymptotically optimal and show that a set of priority policies satisfies these. Under an indexability assumption, an averaged version of Whittle's index policy is proved to be inside this set of asymptotic optimal policies. The performance of the averaged Whittle's index policy is numerically evaluated for a multi-class scheduling problem in a wireless downlink subject to changing conditions. While keeping the number of bandits constant, we observe that the average Whittle index policy becomes close to optimal as the speed of the modulated environment increases. https://hal.laas.fr/hal-01696329/

Item Type:Conference or Workshop Item (Paper)
Audience (conference):International conference proceedings
Uncontrolled Keywords:
Institution:Université de Toulouse > Institut National Polytechnique de Toulouse - INPT (FRANCE)
French research institutions > Centre National de la Recherche Scientifique - CNRS (FRANCE)
Université de Toulouse > Université Toulouse III - Paul Sabatier - UPS (FRANCE)
Université de Toulouse > Université Toulouse - Jean Jaurès - UT2J (FRANCE)
Université de Toulouse > Université Toulouse 1 Capitole - UT1 (FRANCE)
Laboratory name:
Statistics:download
Deposited By: IRIT IRIT
Deposited On:22 Feb 2019 09:38

Repository Staff Only: item control page