Publications

Abstraction-Guided Policy Recovery from Expert Demonstrations

Canmanie T. Ponnambalam, Frans A. Oliehoek, and Matthijs T. J. Spaan. Abstraction-Guided Policy Recovery from Expert Demonstrations. In Proc. of Int. Conf. on Automated Planning and Scheduling, pp. 560–568, 2021.

Download

pdf [989.6kB]  HTML 

Abstract

Behavior cloning is a method of automated decision-making that aims to extract meaningful information from expert demonstrations and reproduce the same behavior autonomously. It is unlikely that demonstrations will exhaustively cover the potential problem space, compromising the quality of automation when out-of-distribution states are encountered. Our approach RECO jointly learns both an imitation policy and recovery policy from expert data. The recovery policy steers the agent from unknown states back to the demonstrated states in the data set. While there is, per definition, no data available to learn the recovery policy, we exploit abstractions to generalize beyond the available data and simulate the recovery problem. When the most appropriate abstraction for the given data is unknown, our method selects the best recovery policy from a set generated by several candidate abstractions. In tabular domains, where we assume an agent must call to a human supervisor for help if it is in an unknown state, we show how RECO results in drastically fewer calls without compromising solution quality and with relatively few trajectories provided by an expert. We also introduce a continuous adaptation of our method and demonstrate the ability of RECO to recover an agent from states where its supervised learning-based imitation policy would otherwise fail.

BibTeX Entry

@InProceedings{Ponnambalam21icaps,
  author =       {Canmanie T. Ponnambalam and Frans A. Oliehoek and
                  Matthijs T. J. Spaan},
  title =        {Abstraction-Guided Policy Recovery from Expert
                  Demonstrations},
  booktitle =    {Proc. of Int. Conf. on Automated Planning and
                  Scheduling},
  pages =        {560--568},
  year =         2021
}

Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Generated by bib2html.pl (written by Patrick Riley) on Thu Feb 29, 2024 16:15:45 UTC