Publications

AlwaysSafe: Reinforcement Learning without Safety Constraint Violations during Training

Thiago D. Simão, Nils Jansen, and Matthijs T. J. Spaan. AlwaysSafe: Reinforcement Learning without Safety Constraint Violations during Training. In Proc. of Int. Conference on Autonomous Agents and Multi Agent Systems, pp. 1226–1235, 2021.

Download

pdf

Abstract

Deploying reinforcement learning (RL) involves major concerns around safety. Engineering a reward signal that allows the agent to maximize its performance while remaining safe is not trivial. Safe RL studies how to mitigate such problems. For instance, we can decouple safety from reward using constrained Markov decision processes (CMDPs), where an independent signal models the safety aspects. In this setting, an RL agent can autonomously find tradeoffs between performance and safety. Unfortunately, most RL agents designed for CMDPs only guarantee safety after the learning phase, which might prevent their direct deployment. In this work, we investigate settings where a concise abstract model of the safety aspects is given, a reasonable assumption since a thorough understanding of safety-related matters is a prerequisite for deploying RL in typical applications. Factored CMDPs provide such compact models when a small subset of features describe the dynamics relevant for the safety constraints. We propose an RL algorithm that uses this abstract model to learn policies for CMDPs safely, that is without violating the constraints. During the training process, this algorithm can seamlessly switch from a conservative policy to a greedy policy without violating the safety constraints. We prove that this algorithm is safe under the given assumptions. Empirically, we show that even if safety and reward signals are contradictory, this algorithm always operates safely and, when they are aligned, this approach also improves the agent's performance.

BibTeX Entry

@InProceedings{Simao21aamas,
  author =       {Thiago D. Sim{\~a}o and Nils Jansen and Matthijs
                  T. J. Spaan},
  title =        {{AlwaysSafe}: Reinforcement Learning without Safety
                  Constraint Violations during Training},
  booktitle =    {Proc. of Int. Conference on Autonomous Agents and
                  Multi Agent Systems},
  pages =        {1226--1235},
  year =         2021
}

Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Generated by bib2html.pl (written by Patrick Riley) on Thu Feb 29, 2024 16:15:45 UTC