We study a system where the service provider offers priority options. We identify the optimal option pricing policy, by deriving the optimal number a customer would buy and the customer's exercise policy as a function of system congestion, options remaining, time to expiration and possibility of balking.
We investigate how a supplier can use a quantity discount schedule to influence the stocking decisions of a downstream buyer that faces a single period of stochastic demand. In contrast to much of the work that has been done on single-period supply contracts, we assume that there are no interactions between the supplier and the buyer after demand information is revealed and that the buyer has better information about the distribution of demand than does the supplier. We characterize the structure of the optimal discount schedule for both all-unit and incremental discounts and show that the supplier can earn larger profits with an all-unit discount.
The multiarmed-bandit problem is often taken as a basic model for the trade-off between the exploration and utilization required for efficient optimization under uncertainty. In this article, we study the situation in which the unknown performance of a new bandit is to be evaluated and compared with that of a known one over a finite horizon. We assume that the bandits represent random variables with distributions from the one-parameter exponential family. When the objective is to maximize the Bayes expected sum of outcomes over a finite horizon, it is shown that optimal policies tend to simple limits when the length of the horizon is large.
We consider the problem of dynamic admission control in a Markovian loss queueing system with two classes of jobs with different service rates and random revenues. We establish the existence of an optimal monotone policy. We also show that under certain conditions there exist preferred jobs from either class.
In this paper we are going to develop group replacement policies for a service system by considering the number of customers dynamically in the queue. We consider a service system with multiple independent servers operating in parallel and a single queue. Customers arrive in accordance with a Poisson process, and the service time for each customer follows an exponential distribution. The servers are unreliable with identically exponentially distributed failure times and the repair time is assumed negligible. We formulate this model as a continuous time Markov decision process, and prove the optimal group replacement policy has a threshold structure.
In this paper we study a situation in which a broker must manage the procurement of a short-life-cycle product. As the broker observes demand for the item, she learns about the demand process. However, as is often the case in practice, it becomes either more difficult or more expensive to procure the item as the selling season advances. Thus, the broker must trade off higher procurement costs against the benefit of making ordering decisions with better information about demand. Problems of this type arise, for example, in the travel industry, where a travel agent's cost of procuring airline and hotel reservations increases as the date of a vacation package approaches. We develop a newsvendor-like characterization of the optimal procurement policy. In a numerical analysis, we demonstrate how broker procurements tend to cluster just before price increases and how brokers can benefit from explicitly considering the effects of information about demand in their ordering policies.
We consider the combined problem of pricing and ordering for a perishable product with unknown demand distribution and censored demand observations resulting from lost sales, faced by a monopolistic retailer. We develop an adaptive pricing and ordering policy with the asymptotic property that the average realized profit per period converges with probability one to the optimal value under complete information on the distribution. The pricing mechanism is modeled as a multiarmed bandit problem, while the order quantity decision, made after the price level is established, is based on a stochastic approximation procedure with multiplicative updates.
Mathematical analysis and computer simulations are used to evaluate three modifications to Kauffman's NK model in an attempt to incorporate unexplored aspects of epistatic interaction between loci in genome evolution. Two modifications - one to the amount and the other to the distribution of epistatic interaction - further support Kauffman's conclusion that high levels of epistatic interaction lead to a decrease in overall fitness of the genome. The third model, however, provides a condition under which increased epistatic interaction at certain loci results in higher genome fitness.
The unknown performance of a new experiment is to be evaluated and compared with that of an existing one over a finite horizon. The explicit structure of an optimal sequential allocation policy is obtained under pertinent reward/loss functions, when the experiments are characterized by random variables with distributions from the one parameter exponential family.
Various methods, such as address-calculation sorts, distribution counting sorts, radix sorts, and bucket sorts, use the values of the numbers being sorted to increase efficiency but do so at the expense of requiring additional storage space. In this paper, a specific implementation of bucket sort is presented whose primary advantanges are that (i) linear average-time performance is achieved with an additional amount of storage equal to any fraction of the number of elements being sorted and (ii) no linked-list data structures are used (all sorting is done with arrays). Analytical and empirical results show the trade-off between the additional storage space used and the improved computational efficiency obtained. Computer simulations show that for lists containing 1,000 to 30,000 uniformly distributed positive integers, the sort developed here is faster than both Quicksort and a standard implementation of bucket sort. Furthermore, the running time increases with size at a slower rate.
Consider a finite state irreducible Markov reward chain. It is shown that there exist simulation estimates and confidence intervals for the expected first passage times and rewards as well as the expected average reward, with 100% coverage probability. The length of the confidence intervals converges to zero with probability one as the sample size increases; it also satisfies a large deviations property.
In this paper we consider the problem of adaptive control for Markov Decision Processes. We give the explicit form for a class of adaptive policies that possess optimal increase rate properties for the total expected finite horizon reward, under sufficient assumptions of finite state-action spaces and irreducibility of the transition law. A main feature of the proposed policies is that the choice of actions, at each state and time period, is based on indices that are inflations of the right-hand side of the estimated average reward optimality equations.
We consider the problem of sequential control for a finite state and action Markovian Decision Process with incomplete information regarding the transition probabilities P ∈ Papprox. Under suitable irreducibility assumptions for Papprox.. We construct adaptive policies that maximize the rate of convergence of realized rewards to that of the optimal (non adaptive) policy under complete information. These adaptive policies are specified via an easily computable index function, of states, controls and statistics, so that one takes a control with the largest index value in the current state in every period.