Publications

WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Qisong Yang, Thiago D. Simão, Simon H. Tindemans, and Matthijs T. J. Spaan. WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, pp. 10639–10646, 2021.

Download

pdf [1008.5kB]  HTML 

Abstract

Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without considering the tail of the distribution. For instance, in safety-critical domains, worst-case analysis is required to avoid disastrous results. We present a novel reinforcement learning algorithm called Worst-Case Soft Actor Critic, which extends the Soft Actor Critic algorithm with a safety critic to achieve risk control. More specifically, a certain level of conditional Value-at-Risk from the distribution is regarded as a safety measure to judge the constraint satisfaction, which guides the change of adaptive safety weights to achieve a trade-off between reward and safety. As a result, we can optimize policies under the premise that their worst-case performance satisfies the constraints. The empirical analysis shows that our algorithm attains better risk control compared to expectation-based methods.

BibTeX Entry

@InProceedings{Yang21aaai,
  author =       {Qisong Yang and Thiago D. Sim{\~a}o and Simon
                  H. Tindemans and Matthijs T. J. Spaan},
  title =        {{WCSAC}: Worst-Case Soft Actor Critic for
                  Safety-Constrained Reinforcement Learning},
  booktitle =    {Proceedings of the Thirty-Fifth AAAI Conference on
                  Artificial Intelligence},
  pages =        {10639--10646},
  year =         2021
}

Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Generated by bib2html.pl (written by Patrick Riley) on Thu Feb 29, 2024 16:15:45 UTC