Robust Reinforcement Learning for Linear Temporal Logic Specifications with Finite Trajectory Duration
88 (Best Paper Award Winner!)
by Soroush Mortazavi Moghaddam, Yash Vardhan Pant, and Sebastian Fischmeister
Published onMay 27, 2024
Robust Reinforcement Learning for Linear Temporal Logic Specifications with Finite Trajectory Duration
·
Abstract
Linear Temporal Logic (LTL), a formal behavioral specification language, offers a mathematically unambiguous and succinct way to represent operating requirements for a wide variety of Artificial Intelligence (AI) systems, including autonomous and robotic systems. Despite progress, learning policies that reliably satisfy complex LTL specifications in challenging environments remains an open problem. While LTL specifications are evaluated over infinite sequences, this work focuses on solving objectives within a given finite number of steps, as is to be expected in most real-world applications involving robotic or autonomous systems. We study the problem of generating trajectories of a system that satisfy a given LTLf specification in an environment with a priori unknown transition probabilities. Our proposed approach builds upon the popular AlphaGo Zero Reinforcement Learning (RL) framework, which has found great success in the two-player game of Go, to learn policies that can satisfy an LTLf specification given a limit on the trajectory duration. Extensive simulations on complex robot motion planning problems demonstrate that our approach achieves higher success rates in satisfying studied specifications with time constraints compared to state-of-the-art methods. Importantly, our approach succeeds in cases where the baseline method fails to find any satisfying policies.