TY - JOUR PY - 2023// TI - Prescribed safety performance imitation learning from a single expert dataset JO - IEEE transactions on pattern analysis and machine intelligence A1 - Cheng, Zhihao A1 - Shen, Li A1 - Zhu, Miaoxi A1 - Guo, Jiaxian A1 - Fang, Meng A1 - Liu, Liu A1 - Du, Bo A1 - Tao, Dacheng SP - ePub EP - ePub VL - ePub IS - ePub N2 - Existing safe imitation learning (safe IL) methods mainly focus on learning safe policies that are similar to expert ones, but may fail in applications requiring different safety constraints. In this paper, we propose the Lagrangian Generative Adversarial Imitation Learning (LGAIL) algorithm, which can adaptively learn safe policies from a single expert dataset under diverse prescribed safety constraints. To achieve this, we augment GAIL with safety constraints and then relax it as an unconstrained optimization problem by utilizing a Lagrange multiplier. The Lagrange multiplier enables explicit consideration of the safety and is dynamically adjusted to balance the imitation and safety performance during training. Then, we apply a two-stage optimization framework to solve LGAIL: (1) a discriminator is optimized to measure the similarity between the agent-generated data and the expert ones; (2) forward reinforcement learning is employed to improve the similarity while considering safety concerns enabled by a Lagrange multiplier. Furthermore, theoretical analyses on the convergence and safety of LGAIL demonstrate its capability of adaptively learning a safe policy given prescribed safety constraints. At last, extensive experiments in OpenAI Safety Gym conclude the effectiveness of our approach.

Language: en

LA - en SN - 0162-8828 UR - http://dx.doi.org/10.1109/TPAMI.2023.3287908 ID - ref1 ER -