Consider a finite state irreducible Markov reward chain. It is shown that there exist simulation estimates and confidence intervals for the expected first passage times and rewards as well as the expected average reward, with 100% coverage probability. The length of the confidence intervals converges to zero with probability one as the sample size increases; it also satisfies a large deviations property.
Various methods, such as address-calculation sorts, distribution counting sorts, radix sorts, and bucket sorts, use the values of the numbers being sorted to increase efficiency but do so at the expense of requiring additional storage space. In this paper, a specific implementation of bucket sort is presented whose primary advantanges are that (i) linear average-time performance is achieved with an additional amount of storage equal to any fraction of the number of elements being sorted and (ii) no linked-list data structures are used (all sorting is done with arrays). Analytical and empirical results show the trade-off between the additional storage space used and the improved computational efficiency obtained. Computer simulations show that for lists containing 1,000 to 30,000 uniformly distributed positive integers, the sort developed here is faster than both Quicksort and a standard implementation of bucket sort. Furthermore, the running time increases with size at a slower rate.
In this paper we consider the problem of adaptive control for Markov Decision Processes. We give the explicit form for a class of adaptive policies that possess optimal increase rate properties for the total expected finite horizon reward, under sufficient assumptions of finite state-action spaces and irreducibility of the transition law. A main feature of the proposed policies is that the choice of actions, at each state and time period, is based on indices that are inflations of the right-hand side of the estimated average reward optimality equations.