- Choho Yann Eric CHOHO
- Yedidia AGNIMO
February 2024.
The goal of this project is to implement the Rainbow algorithm and compare it to the DQN algorithm.
The Rainbow algorithm is an improvement of the DQN algorithm. It combines six improvements in deep reinforcement learning. These six improvements are the following:
- Double Q-learning
- Prioritized Experience Replay
- Dueling Network Architecture
- Multi-step Learning
- Distributional RL
- Noisy Nets
We implement the Rainbow algorithm and compare it to the DQN algorithm and extensions of DQN. We will use the CartPole environment. We will compare the algorithms in terms of a score define in the paper.
The implementation of the Rainbow algorithm is based on the following steps:
- Importing the required libraries
- Defining the hyperparameters
- Defining the agent
- Defining the replay buffer
- Defining the network
- Training the agent
- Evaluating the agent
- Visualizing the agent's performance
To run this code, you will need to install the following libraries:
- Open your terminal or command prompt.
- Create a virtual environment (optional but recommended):
- Run
python -m venv envto create a virtual environment named "env".
- Run
- Activate the virtual environment:
- On Windows, run
env\Scripts\activate. - On macOS and Linux, run
source env/bin/activate.
- On Windows, run
- Install the required libraries by running
pip install -r requirements.txt.
Les bibliothèques principale sont :
- torch
- gymnasium (by OpenAI)
We define the hyperparameters for the Rainbow algorithm.
# Hyperparameters
BATCH_SIZE = 32
LR = 0.0005
EPSILON = 0.0005
GAMMA = 0.99
TARGET_UPDATE = 1000
REPLAY_MEMORY_SIZE = 15000
LEARNING_STARTS = 1000
N_ATOMS = 51
V_MIN = -10
V_MAX = 10-
To launch the code : use the
notebook.ipynb -
The
utilsfolder contains theAgentclass used for training each Q-algorithm (including Rainbow). The class name is written in uppercase (e.g.,AGENT), while the neural network classes have names ending withNetwork. Additionally, there are specific buffer classes with names starting withBuffer. -
The
Resultfolder contains testing on one episode of each algorithm in .mp4 format
