Publications

Structure Learning for Safe Policy Improvement

Thiago D. Simão and Matthijs T. J. Spaan. Structure Learning for Safe Policy Improvement. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 3453–3459, 2019.

Download

pdf [348.5kB]  HTML 

Abstract

We investigate how Safe Policy Improvement (SPI) algorithms can exploit the structure of factored Markov decision processes when such structure is unknown a priori. To facilitate the application of reinforcement learning in the real world, SPI provides probabilistic guarantees that policy changes in a running process will improve the performance of this process. However, current SPI algorithms have requirements that might be impractical, such as: (i) availability of a large amount of historical data, or (ii) prior knowledge of the underlying structure. To overcome these limitations we enhance a Factored SPI (FSPI) algorithm with different structure learning methods. The resulting algorithms need fewer samples to improve the policy and require weaker prior knowledge assumptions. In well-factorized domains, the proposed algorithms improve performance significantly compared to a flat SPI algorithm, demonstrating a sample complexity closer to an FSPI algorithm that knows the structure. This indicates that the combination of FSPI and structure learning algorithms is a promising solution to real-world problems involving many variables.

BibTeX Entry

@InProceedings{Simao19ijcai,
  author =       {Thiago D. Sim{\~a}o and Matthijs T. J. Spaan},
  title =        {Structure Learning for Safe Policy Improvement},
  booktitle =    {Proceedings of the Twenty-Eighth International Joint
                  Conference on Artificial Intelligence},
  pages =        {3453--3459},
  year =         2019
}

Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Generated by bib2html.pl (written by Patrick Riley) on Thu Feb 29, 2024 16:15:45 UTC