Markov Control with Rare State Observation: Average Optimality

S' Winkelmann

2017, v.23, Issue 1, 1-34

ABSTRACT

This paper investigates the criterion of long-term average costs for a Markov decision process (MDP) which is not permanently observable. Each observation of the process produces a fixed amount of \textit{information costs\/} which enter the considered performance criterion and preclude from arbitrarily frequent state testing. Choosing the \textit{rare\/} observation times is part of the control procedure.
In contrast to the theory of partially observable Markov decision processes, we consider an arbitrary continuous-time Markov process on a finite state space without further restrictions on the dynamics or the type of interaction.
Based on the original Markov control theory, we redefine the control model and the average cost criterion for the setting of information costs. We analyze the constant of average costs for the case of ergodic dynamics and present an optimality equation which characterizes the optimal choice of control actions and observation times.
For this purpose, we construct an equivalent freely observable MDP and translate the well-known results from the original theory to the new setting.

Keywords: Markov decision processes, partial observability, information costs, average optimality

COMMENTS

Please log in or register to leave a comment

There are no comments yet

Laboratory of large random systems, Dept. of Mechanics and Mathematics
Moscow State University, Vorobievy Gory, 119952 Moscow Russia
E-mail: editor@math-mprf.org
Please enter our email addresses to the list of allowed addresses of your mail server

The journal was founded in 1995 by prof. Malyshev V. A.
Published by Polymat Publishing Company, Moscow, Russia
ISSN 1024-2953
The journal is published quaterly

Markov Processes And Related Fields

Scientific Journal

Submit article

S' Winkelmann