Skip to main content
SearchLoginLogin or Signup

Active Measure Reinforcement Learning for Observation Cost Minimization

Published onJun 08, 2021
Active Measure Reinforcement Learning for Observation Cost Minimization


Markov Decision Processes (MDP) with explicit measurement cost are a class of environments in which the agent learns to maximize the costed return. Here, we define the costed return as the discounted sum of rewards minus the sum of the explicit cost of measuring the next state. The RL agent can freely explore the relationship between actions and rewards but is charged each time it measures the next state. Thus, an optimal agent must learn a policy without making a large number of measurements. We propose the active measure RL framework (Amrl) as a solution to this novel class of problem, and contrast it with standard reinforcement learning under full observability and planning under partially observability. We demonstrate that Amrl-Q agents learn to shift from a reliance on costly measurements to exploiting a learned transition model in order to reduce the number of real-world measurements and achieve a higher costed return. Our results demonstrate the superiority of Amrl-Q over standard RL methods, Q-learning and Dyna-Q, and POMCP for planning under a POMDP in environments with explicit measurement costs.

Article ID: 2021L10

Month: May

Year: 2021

Address: Online

Venue: Canadian Conference on Artificial Intelligence

Publisher: Canadian Artificial Intelligence Association


No comments here
Why not start the discussion?