TY  - JOUR
PY  - 2023//
TI  - Prescribed safety performance imitation learning from a single expert dataset
JO  - IEEE transactions on pattern analysis and machine intelligence
A1  - Cheng, Zhihao
A1  - Shen, Li
A1  - Zhu, Miaoxi
A1  - Guo, Jiaxian
A1  - Fang, Meng
A1  - Liu, Liu
A1  - Du, Bo
A1  - Tao, Dacheng
SP  - ePub
EP  - ePub
VL  - ePub
IS  - ePub
N2  - Existing safe imitation learning (safe IL) methods mainly focus on learning safe policies that are similar to expert ones, but may fail in applications requiring different safety constraints. In this paper, we propose the Lagrangian Generative Adversarial Imitation Learning (LGAIL) algorithm, which can adaptively learn safe policies from a single expert dataset under diverse prescribed safety constraints. To achieve this, we augment GAIL with safety constraints and then relax it as an unconstrained optimization problem by utilizing a Lagrange multiplier. The Lagrange multiplier enables explicit consideration of the safety and is dynamically adjusted to balance the imitation and safety performance during training. Then, we apply a two-stage optimization framework to solve LGAIL: (1) a discriminator is optimized to measure the similarity between the agent-generated data and the expert ones; (2) forward reinforcement learning is employed to improve the similarity while considering safety concerns enabled by a Lagrange multiplier. Furthermore, theoretical analyses on the convergence and safety of LGAIL demonstrate its capability of adaptively learning a safe policy given prescribed safety constraints. At last, extensive experiments in OpenAI Safety Gym conclude the effectiveness of our approach.<p />  <p>Language: en</p>
LA  - en
SN  - 0162-8828
UR  - http://dx.doi.org/10.1109/TPAMI.2023.3287908
ID  - ref1
ER  -