Understanding A Markovian Decision Process

This paper by Richard Bellman discusses the asymptotic behavior of a sequence $\{f_N(i)\}$ generated by a non-linear recurrence relation. The sequence is defined as: \[ f_N(i) = \max_{\mathbf{q}} [ b_i(\mathbf{q}) + \sum_{j=1}^{M} a_{ij}(\mathbf{q}) f_{N-1}(j) ], \quad f_0(i) = c_i, \] where $i = 1, 2, \cdots, M$, $N = 1, 2, \cdots$. The functions $b_i(\mathbf{q})$ and $a_{ij}(\mathbf{q})$ satisfy certain conditions, ensuring that the maximum is taken within the recurrence relations. The main result is a theorem stating that under specific conditions on $b_i(\mathbf{q})$ and $a_{ij}(\mathbf{q})$, the sequence $f_N(i)$ asymptotically behaves as $f_N(i) \sim Nr$, where $r$ is a scalar quantity obtained from maximizing a certain expression involving $b(\mathbf{q})$ and $A(\mathbf{q})$. The paper also discusses the asymptotic behavior of related sequences defined by different recurrence relations and provides a detailed proof of the theorem. It highlights the importance of the condition $a_{ii}(\mathbf{q}) \geq d > 0$ to ensure the system is interlinked rather than separable. Finally, the paper briefly describes a dynamic programming process related to machine maintenance and inspection, where the goal is to minimize the expected cost of producing items by determining optimal inspection and replacement policies. The recurrence relation for this process is derived and connected to the main theorem.This paper by Richard Bellman discusses the asymptotic behavior of a sequence $\{f_N(i)\}$ generated by a non-linear recurrence relation. The sequence is defined as: \[ f_N(i) = \max_{\mathbf{q}} [ b_i(\mathbf{q}) + \sum_{j=1}^{M} a_{ij}(\mathbf{q}) f_{N-1}(j) ], \quad f_0(i) = c_i, \] where $i = 1, 2, \cdots, M$, $N = 1, 2, \cdots$. The functions $b_i(\mathbf{q})$ and $a_{ij}(\mathbf{q})$ satisfy certain conditions, ensuring that the maximum is taken within the recurrence relations. The main result is a theorem stating that under specific conditions on $b_i(\mathbf{q})$ and $a_{ij}(\mathbf{q})$, the sequence $f_N(i)$ asymptotically behaves as $f_N(i) \sim Nr$, where $r$ is a scalar quantity obtained from maximizing a certain expression involving $b(\mathbf{q})$ and $A(\mathbf{q})$. The paper also discusses the asymptotic behavior of related sequences defined by different recurrence relations and provides a detailed proof of the theorem. It highlights the importance of the condition $a_{ii}(\mathbf{q}) \geq d > 0$ to ensure the system is interlinked rather than separable. Finally, the paper briefly describes a dynamic programming process related to machine maintenance and inspection, where the goal is to minimize the expected cost of producing items by determining optimal inspection and replacement policies. The recurrence relation for this process is derived and connected to the main theorem.

A Markovian Decision Process

Vol. 6, No. 5 (1957) | RICHARD BELLMAN