RL 4 Real World: Recommender Systems

Anjaly Parayil
Sep 14, 2021
2 min read

I read a ton of papers on MI and RL and I always wonder about the invisible gap between academia and industry. As a researcher interested in AI and coming from academic background, I often feel the need to catch up with new developments in industry and how reinforcement learning, and machine learning is applied to the latest technologies. I am starting with an important application of reinforcement learning, The recommender systems...

Here I will start with a specific example of recommender systems and how the system is exactly modeled. With the presence of a huge corpus of contents, recommender systems are the need of the day. YouTube, Google play store, google news, being some of them. Reinforcement learning-based recommender systems model the problem as a sequential decision process, in other words, the choices and actions of the user over time can be accounted to get highly effective recommendations.

Now let us consider a music recommender system. How can we model a music recommender system and what all are the states/ features to take into account? We need states, action, and rewards. Action is the song selected from the huge corpus of contents. Going on to the states of the RL agent, we need to account for the past

Image Credits: Unsplash.com

behavior (let's say we are interested in user reaction for the past 5 steps) of the user between and within each track (song). And the possible actions can be fast forward, move backward, no move, short/ long/no pause before play. In addition, we also augment features of the song into the states. For instance, beat, acoustics, etc. The reward can be a positive value if the user did not skip the recommendation and zero otherwise. Our aim is to maximize the cumulative reward over N steps. Other important aspects of the problem include an increase in the state space dimension and a suitable policy for the RL agent. In the next discussion, I will talk more about this and its implementation aspects.

Comments