Computing Optimal Policies for Markovian Decision Processes Using Simulation

Citation:

Burnetas, A.N. & Katehakis, M.N., 1995. Computing Optimal Policies for Markovian Decision Processes Using Simulation. Probability in the Engineering and Informational Sciences, 9, pp.525-537.

Abstract:

A simulation method is developed for computing average reward optimal policies, for a finite state and action Markovian decision process. It is shown that the method is consistent; i.e., it produces solutions arbitrarily close to the optimal. Various types of estimation errors and confidence bounds are examined. Finally, it is shown that the probability distribution of the number of simulation cycles required to compute an é-optimal policy satisfies a large deviations property. © 1995, Cambridge University Press. All rights reserved.

Notes:

cited By 0

Website