Publications

Back to the Future: Solving Hidden Parameter MDPs with Hindsight

Canmanie Ponnambalam, Danial Kamran, Thiago D. Simão, Frans A. Oliehoek, and Matthijs T. J. Spaan. Back to the Future: Solving Hidden Parameter MDPs with Hindsight. In Adaptive and Learning Agents, 2022. Workshop at AAMAS22

Download

pdf

Abstract

Reinforcement learning is limited by how the task is defined at the start of learning and is generally inflexible to accommodating new information during training. In contrast, humans are capable of learning from hindsight and can easily incorporate new information to gain insight into past experience. Humans also learn in a more modular fashion that facilitates transfer of knowledge across many different types of problems, resulting in flexible and sample efficient learning. This ability is often missing in reinforcement learning, as agents should generally be trained from scratch even when there are minor disruptions or changes in the environment. We aim to empower reinforcement learning agents with a modular approach that allows learning from hindsight, giving them the ability to learn from their past experience after new information is revealed. We address partially-observable problems that can be modeled as hidden parameter MDPs, where crucial state information is not observable during action selection but is later revealed. Our work focuses on the benefits of separating the tasks of policy optimization and hidden parameter estimation. By decoupling the two, we enable more data-efficient learning that is flexible to changes in the environment and can readily make use of existing predictors or offline data-sets. We demonstrate in discrete and continuous experiments that learning from hindsight offers scalable and sample efficient performance in HiP-MDPs and enables transfer of knowledge between tasks.

BibTeX Entry

@InProceedings{Ponnambalam22ala,
  author =       {Canmanie Ponnambalam and Danial Kamran and Thiago
                  D. Sim{\~a}o and Frans A. Oliehoek and Matthijs
                  T. J. Spaan},
  title =        {Back to the Future: Solving Hidden Parameter {MDPs}
                  with Hindsight},
  booktitle =    {Adaptive and Learning Agents},
  year =         2022,
  note =         {Workshop at AAMAS22}
}

Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Generated by bib2html.pl (written by Patrick Riley) on Thu Feb 29, 2024 16:15:45 UTC