ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT

Citation:

Burnetas, A., Kanavetas, O. & Katehakis, M.N., 2016. ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT. Probability in the Engineering and Informational Sciences, pp.1-27.

Abstract:

We consider the multi-armed bandit problem under a cost constraint. Successive samples from each population are i.i.d. with unknown distribution and each sample incurs a known population-dependent cost. The objective is to design an adaptive sampling policy to maximize the expected sum of n samples such that the average cost does not exceed a given bound sample-path wise. We establish an asymptotic lower bound for the regret of feasible uniformly fast convergent policies, and construct a class of policies, which achieve the bound. We also provide their explicit form under Normal distributions with unknown means and known variances. Copyright © Cambridge University Press 2016

Notes:

cited By 0; Article in Press

Website