Tabular Q learning web-app

Came alive on March 19, 2020

        In this post, you'll get to see an RL (Reinforcement Learning) agent ( 🙂 ) in action. This agent uses an algorithm called "Tabular Q-learning" to get better at the game "frozen-lake". The goal of the game is to progress from the top left corner ( 🇸 ) to the bottom right corner ( 🇬 ), without falling into the holes ( 🕳️ ) in the frozen lake ( ❄️ ). If the agent falls into a hole ( 🥶 ) while playing, its game over and the next game begins. If it succeeds ( 🤑 ), then it gets a positive reward from the game. This encourages it to repeat this behaviour in future games.

        The policy of an agent, in the context of this game, is the action that our agent will choose to make for a certain state. For instance, in the cell to the left of " 🇬 ", the policy of the agent would be to "move right" whenever it is in that cell. The agent learns this ideal policy by playing several games (a few hundred in this case), utilizing the "tabular q-learning" algorithm. The learned policy is shown on the right. This particular agent plays for 10,000 games.

        I coded up the implementation for tabular q-learning within this awesome JS version of openai gym's "frozenLake-v0" environment, and I ended up messing around with the environment code to suit my needs.

        The plot below shows the fraction of games the agent has won so far (on the y-axis), and the number of games it has finished playing (on the x-axis). The agent usually gets consistent wins after 3000 games or so, which is why each game finishes in a flash, else it would take quite some time to reach 3000 games. Go ahead and click on the green "Train Agent" button in the plot below. You'll see that slowy, but surely, the agent gets better at winning the game.

Agent Wins v/s Number of Games

Credits for JS Environment: frozen-lake.js