Optimal decision-making is critical within many organizations. A large number of these organizations are structured hierarchically but make sequential decisions (like health care, for example). Using data collected over time by these organizations, we have been able to successfully apply reinforcement learning (RL) to many sequential decision-making problems. In doing this, however, we are not taking advantage of the benefits of their hierarchical structures and the ways different layers affect each other, and thus are not able to learn optimal decision-making techniques.
Hierarchical reinforcement learning (HRL) is a powerful tool for solving extended problems with spare rewards. HRL decomposes a RL problem into a hierarchy of subtasks to be solved individually using RL. Because of this, the fundamental concept of HRL applies nicely to our problem. Unfortunately, due to their inability to handle multiple agents, implement batch learning, and model concurrent activities, classic HRL frameworks are not quite suitable for our problem. In our work, we plan to formalize a new HRL framework that is capable of building sequential decision-making support models using datasets collected from stochastic behavioral policies. In this paper, we bring light to probable obstacles as well as potential solutions to existing problems.
Article ID: 2021G08
Publisher: Canadian Artificial Intelligence Association