In this post, I compare different reinforcement learning algorithms. The code is available on my Github repository.
Reinforcement Learning (RL) is an interesting area of study. In RL, an agent learns to make decisions by connecting actions to situations. The goal is to maximize a reward. The agent isn’t told exactly what to do. Instead, it has to figure out the most rewarding actions through trial and error.
The agent’s actions affect both immediate rewards and future situations. This means actions also affect long-term rewards. This combination of exploration, trial-and-error, and delayed reward makes reinforcement learning unique.
Agent Oriented Learning
Mini-project: Comparing reinforcement learning algorithms.
For this analysis, we’ll use the OpenAI Gym library. This library has many games that can be used as environments for training AI models.
We’ll focus on the game “N-chain”. In this game:
- The agent moves along a chain of states
- There are two actions: ‘forward’ and ‘back’
- ‘Forward’ moves the agent along the chain without giving a reward
- ‘Back’ returns the agent to the start and provides a small reward
- Reaching the end of the chain gives a big reward
- If the agent keeps moving forward to the end, it can keep getting this big reward
- Sometimes the agent “slips” and does the opposite action
- The observed state is the agent’s current position (from 0 to n-1)
This game was designed and used by Malcolm J. A. Strens in his work, A Bayesian Framework for Reinforcement Learning.