Publications

The Role of Diverse Replay for Generalisation in Reinforcement Learning

Max Weltevrede, Matthijs T. J. Spaan, and Wendelin Böhmer. The Role of Diverse Replay for Generalisation in Reinforcement Learning. arXiv:2306.05727, 2023.

Download

pdf

Abstract

In reinforcement learning (RL), key components of many algorithms are the exploration strategy and replay buffer. These strategies regulate what environment data is collected and trained on and have been extensively studied in the RL literature. In this paper, we investigate the impact of these components in the context of generalisation in multi-task RL. We investigate the hypothesis that collecting and training on more diverse data from the training environment will improve zero-shot generalisation to new environments/tasks. We motivate mathematically and show empirically that generalisation to states that are "reachable" during training is improved by increasing the diversity of transitions in the replay buffer. Furthermore, we show empirically that this same strategy also shows improvement for generalisation to similar but "unreachable" states and could be due to improved generalisation of latent representations.

BibTeX Entry

@Misc{Weltevrede23arxiv,
  author =       {Max Weltevrede and Matthijs T. J. Spaan and Wendelin
                  B{\"o}hmer},
  title =        {The Role of Diverse Replay for Generalisation in
                  Reinforcement Learning},
  howpublished = {arXiv:2306.05727},
  year =         2023
}

Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Generated by bib2html.pl (written by Patrick Riley) on Thu Feb 29, 2024 16:15:45 UTC