Publications

Scalable Safe Policy Improvement via Monte Carlo Tree Search

Alberto Castellini, Federico Bianchi, Edoardo Zorzi, Thiago D. Simão, Alessandro Farinelli, and Matthijs T. J. Spaan. Scalable Safe Policy Improvement via Monte Carlo Tree Search. In International Conference on Machine Learning, pp. 3732–3756, Proceedings of Machine Learning Research 202, 2023.

Download

pdf 

Abstract

Algorithms for safely improving policies are important to deploy reinforcement learning approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy. We theoretically prove that the policy generated by MCTS-SPIBB converges, as the number of simulations grows, to the optimal safely improved policy generated by Safe Policy Improvement with Baseline Bootstrapping (SPIBB), a popular algorithm based on policy iteration. Moreover, our empirical analysis performed on three standard benchmark domains shows that MCTS-SPIBB scales to significantly larger problems than SPIBB because it computes the policy online and locally, i.e., only in the states actually visited by the agent.

BibTeX Entry

@InProceedings{Castellini23icml,
  author =       {Alberto Castellini and Federico Bianchi and Edoardo
                  Zorzi and Thiago D. Sim{\~a}o and Alessandro
                  Farinelli and Matthijs T. J. Spaan},
  title =        {Scalable Safe Policy Improvement via {M}onte {C}arlo
                  Tree Search},
  booktitle =    {International Conference on Machine Learning},
  pages =        {3732--3756},
  volume =       202,
  series =       {Proceedings of Machine Learning Research},
  year =         2023
}

Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Generated by bib2html.pl (written by Patrick Riley) on Thu Feb 29, 2024 16:15:45 UTC