Partially Observable Markov Decision Processes

Matthijs T. J. Spaan. Partially Observable Markov Decision Processes. In Marco Wiering and Martijn van Otterlo, editors, Reinforcement Learning: State of the Art, pp. 387–414, Springer Verlag, 2012.


pdf [206.7kB]  


For reinforcement learning in environments in which an agent has access to a reliable state signal, methods based on the Markov decision process (MDP) have had many successes. In many problem domains, however, an agent suffers from limited sensing capabilities that preclude it from recovering a Markovian state signal from its perceptions. Extending the MDP framework, partially observable Markov decision processes (POMDPs) allow for principled decision making under conditions of uncertain sensing. In this chapter we present the POMDP model by focusing on the differences with fully observable MDPs, and we show how optimal policies for POMDPs can be represented. Next, we give a review of model-based techniques for policy computation, followed by an overview of the available model-free methods for POMDPs. We conclude by highlighting recent trends in POMDP reinforcement learning.

BibTeX Entry

  author =       {Matthijs T. J. Spaan},
  title =        {Partially Observable {M}arkov Decision Processes},
  booktitle =    {Reinforcement Learning: State of the Art},
  publisher =    {Springer Verlag},
  year =         2012,
  editor =       {Marco Wiering and Martijn van Otterlo},
  pages =        {387--414}

Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Generated by (written by Patrick Riley) on Wed Apr 10, 2019 18:58:24 UTC