# Stable Baselines3 - Train on Atari Games

Github Repo: [https://github.com/DLR-RM/stable-baselines3](https://github.com/DLR-RM/stable-baselines3)


[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) is a collection of pre-trained Reinforcement Learning agents using Stable-Baselines3.

It also provides basic scripts for training, evaluating agents, tuning hyperparameters and recording videos.

Documentation is available online: [https://stable-baselines3.readthedocs.io/](https://stable-baselines3.readthedocs.io/)

## Install Dependencies and Stable Baselines Using Pip


```
pip install stable-baselines3[extra]
```

In [1]:
!pip install stable-baselines3[extra] ale-py==0.7.4

Defaulting to user installation because normal site-packages is not writeable
Collecting ale-py==0.7.4
  Downloading ale_py-0.7.4-cp39-cp39-win_amd64.whl (904 kB)
     ------------------------------------ 904.7/904.7 kB 147.9 kB/s eta 0:00:00
Collecting importlib-resources
  Downloading importlib_resources-5.12.0-py3-none-any.whl (36 kB)
Collecting autorom[accept-rom-license]~=0.4.2
  Downloading AutoROM-0.4.2-py3-none-any.whl (16 kB)
Collecting AutoROM.accept-rom-license
  Downloading AutoROM.accept-rom-license-0.6.0.tar.gz (434 kB)
     ------------------------------------ 434.7/434.7 kB 179.9 kB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting libtorrent
  Downloading libtorrent-2.0

## Import policy, RL agent, ...

In [2]:
from stable_baselines3 import A2C
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack

## Training on Atari

We will use atari wrapper (it will downsample the image and convert it to gray scale).

About Atari preprocessing: [Frame Skipping and Pre-Processing for Deep Q-Networks on Atari 2600 Games](https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/)

![Pong](https://cdn-images-1.medium.com/max/800/1*UHYJE7lF8IDZS_U5SsAFUQ.gif)

In [3]:
# There already exists an environment generator that will make and wrap atari environments correctly.
env = make_atari_env('PongNoFrameskip-v4', n_envs=4, seed=0)
# Stack 4 frames
env = VecFrameStack(env, n_stack=4)

In [4]:
model = A2C('CnnPolicy', env, verbose=1)
model.learn(total_timesteps=10000)

Using cpu device
Wrapping the env in a VecTransposeImage.
------------------------------------
| time/                 |          |
|    fps                | 304      |
|    iterations         | 100      |
|    time_elapsed       | 6        |
|    total_timesteps    | 2000     |
| train/                |          |
|    entropy_loss       | -1.76    |
|    explained_variance | -0.00417 |
|    learning_rate      | 0.0007   |
|    n_updates          | 99       |
|    policy_loss        | -0.135   |
|    value_loss         | 0.0935   |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 3.35e+03 |
|    ep_rew_mean        | -20.5    |
| time/                 |          |
|    fps                | 310      |
|    iterations         | 200      |
|    time_elapsed       | 12       |
|    total_timesteps    | 4000     |
| train/                |          |
|    entropy_loss       | -1.71    |
|    explained_va

<stable_baselines3.a2c.a2c.A2C at 0x2c199e1b430>

## Download / Upload Trained Agent and Continue Training

Save and download trained model

In [5]:
#from google.colab import files

ModuleNotFoundError: No module named 'google.colab'

In [7]:
model.save("a2c_pong")
#files.download("a2c_pong.zip")

Upload train agent from your local machine

In [None]:
#files.upload()

In [9]:
#!du -h a2c*

Load the agent, and then you can continue training

In [10]:
trained_model = A2C.load("a2c_pong", verbose=1)
env = make_atari_env('PongNoFrameskip-v4', n_envs=4, seed=0)
env = VecFrameStack(env, n_stack=4)
trained_model.set_env(env)

Wrapping the env in a VecTransposeImage.


In [14]:
#trained_model.learn(int(0.5e6))
trained_model.learn(int(1.0e5))

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 3.46e+03 |
|    ep_rew_mean        | -20.5    |
| time/                 |          |
|    fps                | 296      |
|    iterations         | 100      |
|    time_elapsed       | 6        |
|    total_timesteps    | 2000     |
| train/                |          |
|    entropy_loss       | -1.73    |
|    explained_variance | 0        |
|    learning_rate      | 0.0007   |
|    n_updates          | 11848    |
|    policy_loss        | -0.2     |
|    value_loss         | 0.133    |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 3.38e+03 |
|    ep_rew_mean        | -20.8    |
| time/                 |          |
|    fps                | 257      |
|    iterations         | 200      |
|    time_elapsed       | 15       |
|    total_timesteps    | 4000     |
| train/                |          |
|

<stable_baselines3.a2c.a2c.A2C at 0x2c199e14280>

In [15]:
trained_model.save("a2c_pong_3")
#files.download("a2c_pong_2.zip")