Projects (current)
Projects (past)

Approximate POMDP solving

Partially observable Markov decision processes (POMDPs) provide a rich mathematical framework for agent decision making under uncertainty, however, solving a POMDP exactly is an intractable problem. In the so-called point-based techniques, a set of belief points are first sampled from the belief simplex (by letting the agent interact with the environment) and then planning is performed on these points only. In the references below we developed Perseus, a randomized version of point-based value iteration that is very competitive to other methods in terms of computation time and solution quality. Source code for Perseus is available.

The key idea is that, in each value iteration step, we can improve the value of all points in the belief set by only updating the value and its gradient of a subset of the points, and we select these belief points in a randomized greedy manner. The figure below demonstrates Perseus on an example POMDP with two states by showing three value update stages. The POMDP's belief space is depicted on the x-axis and the y-axis represents the value of each belief. We can quickly improve the value of all belief points by sampling only a subset of the points. The series of figures show how we keep track of which points have not been improved yet, and sample only from these.