Recently, there has been an unprecedented interest in using reinforcement learning (RL) for recommender systems (RSs), due to its unique ability in taking into account the dynamic and long-term user engagement. However, sample ineciency is a major challenge in applying RL to problems with very dynamic environments and huge actions spaces. In this paper, we present Imitation, Reinforcement learning based Recommender System (IR2S) to combine RL with imitation learning to alleviate this problem. More
specically, by utilizing demonstrations (available user ratings), we show that IR2S can optimize its behavior faster and more eciently. The proposed IR2S, built on top of Deep Q Network (DQN), shows superior performance compared to baselines in experiments.
Article ID: 2022L8
Publisher: Canadian Artificial Intelligence Association