# RL Baselines3 Zoo: Training in Colab



Github Repo: [https://github.com/DLR-RM/rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo)

Stable-Baselines3 Repo: [https://github.com/DLR-RM/rl-baselines3-zoo](https://github.com/DLR-RM/stable-baselines3)


# Install Dependencies



In [1]:
!apt-get install swig cmake ffmpeg freeglut3-dev xvfb

Reading package lists... Done
Building dependency tree       
Reading state information... Done
freeglut3-dev is already the newest version (2.8.1-3).
freeglut3-dev set to manually installed.
cmake is already the newest version (3.10.2-1ubuntu2.18.04.2).
ffmpeg is already the newest version (7:3.4.8-0ubuntu0.2).
The following packages were automatically installed and are no longer required:
  libnvidia-common-460 nsight-compute-2020.2.0
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
  swig3.0
Suggested packages:
  swig-doc swig-examples swig3.0-examples swig3.0-doc
The following NEW packages will be installed:
  swig swig3.0 xvfb
0 upgraded, 3 newly installed, 0 to remove and 42 not upgraded.
Need to get 1,884 kB of archives.
After this operation, 8,094 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 swig3.0 amd64 3.0.12-1 [1,094 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe a

## Clone RL Baselines3 Zoo Repo

In [2]:
!git clone --recursive https://github.com/DLR-RM/rl-baselines3-zoo

Cloning into 'rl-baselines3-zoo'...
remote: Enumerating objects: 3885, done.[K
remote: Counting objects: 100% (25/25), done.[K
remote: Compressing objects: 100% (12/12), done.[K
remote: Total 3885 (delta 13), reused 24 (delta 13), pack-reused 3860[K
Receiving objects: 100% (3885/3885), 2.61 MiB | 25.01 MiB/s, done.
Resolving deltas: 100% (2521/2521), done.
Submodule 'rl-trained-agents' (https://github.com/DLR-RM/rl-trained-agents) registered for path 'rl-trained-agents'
Cloning into '/content/rl-baselines3-zoo/rl-trained-agents'...
remote: Enumerating objects: 2091, done.        
remote: Total 2091 (delta 0), reused 0 (delta 0), pack-reused 2091        
Receiving objects: 100% (2091/2091), 1.21 GiB | 25.17 MiB/s, done.
Resolving deltas: 100% (454/454), done.
Submodule path 'rl-trained-agents': checked out '72feeb8c2e8985e5382ee61f3542ee023ec81922'


In [3]:
%cd /content/rl-baselines3-zoo/

/content/rl-baselines3-zoo


### Install pip dependencies

In [4]:
!pip install slimevolleygym
!pip install -r requirements.txt

Collecting slimevolleygym
  Downloading slimevolleygym-0.1.0-py3-none-any.whl (17 kB)
Installing collected packages: slimevolleygym
Successfully installed slimevolleygym-0.1.0
Collecting gym==0.21
  Downloading gym-0.21.0.tar.gz (1.5 MB)
[K     |████████████████████████████████| 1.5 MB 14.4 MB/s 
[?25hCollecting stable-baselines3[docs,extra,tests]>=1.5.0
  Downloading stable_baselines3-1.5.0-py3-none-any.whl (177 kB)
[K     |████████████████████████████████| 177 kB 63.0 MB/s 
[?25hCollecting sb3-contrib>=1.5.0
  Downloading sb3_contrib-1.5.0-py3-none-any.whl (62 kB)
[K     |████████████████████████████████| 62 kB 1.3 MB/s 
[?25hCollecting box2d-py==2.3.8
  Downloading box2d_py-2.3.8-cp37-cp37m-manylinux1_x86_64.whl (448 kB)
[K     |████████████████████████████████| 448 kB 70.2 MB/s 
[?25hCollecting pybullet
  Downloading pybullet-3.2.4-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (91.7 MB)
[K     |████████████████████████████████| 91.7 MB 93 kB/s 
[?25hCollecting gym

## Train an RL Agent


The train agent can be found in the `logs/` folder.

Here we will train A2C on CartPole-v1 environment for 100 000 steps. 


To train it on Pong (Atari), you just have to pass `--env PongNoFrameskip-v4`

Note: You need to update `hyperparams/algo.yml` to support new environments. You can access it in the side panel of Google Colab. (see https://stackoverflow.com/questions/46986398/import-data-into-google-colaboratory)

In [None]:
!python train.py --algo a2c --env SlimeVolley-v0 --n-timesteps 100000

Traceback (most recent call last):
  File "train.py", line 143, in <module>
    raise ValueError(f"{env_id} not found in gym registry, you maybe meant {closest_match}?")
ValueError: SlimeVolley-v0 not found in gym registry, you maybe meant 'no close match found...'?


#### Evaluate trained agent


You can remove the `--folder logs/` to evaluate pretrained agent.

In [None]:
!python enjoy.py --algo a2c --env CartPole-v1 --no-render --n-timesteps 5000 --folder logs/

#### Tune Hyperparameters

We use [Optuna](https://optuna.org/) for optimizing the hyperparameters.

Tune the hyperparameters for PPO, using a tpe sampler and median pruner, 2 parallels jobs,
with a budget of 1000 trials and a maximum of 50000 steps

In [None]:
!python train.py --algo ppo --env SlimeVolley-v0 --gym-packages slimevolleygym -n 20000000 -optimize --n-trials 10 --n-jobs 2 --sampler tpe --pruner median

Seed: 2179949613
{'n_envs': 128, 'n_steps': 4096, 'clip_range': 0.2, 'ent_coef': 0.0, 'n_epochs': 10, 'gae_lambda': 0.95, 'gamma': 0.99, 'batch_size': 64, 'n_timesteps': 20000000.0, 'policy': 'MlpPolicy'}
Default hyperparameters for environment (ones being tuned will be overridden):
OrderedDict([('batch_size', 64),
             ('clip_range', 0.2),
             ('ent_coef', 0.0),
             ('gae_lambda', 0.95),
             ('gamma', 0.99),
             ('n_envs', 128),
             ('n_epochs', 10),
             ('n_steps', 4096),
             ('n_timesteps', 20000000.0),
             ('policy', 'MlpPolicy')])
Using 128 environments
Overwriting n_timesteps with n=20000000
Doing 200 intermediate evaluations for pruning based on the number of timesteps. (1 evaluation every 100k timesteps)
Optimizing hyperparameters
Sampler: tpe - Pruner: median
[32m[I 2022-05-03 07:01:42,883][0m A new study created in memory with name: no-name-dc333739-3768-4927-ad55-fc55b91556f5[0m
Number of fini

### Record  a Video

In [None]:
# Set up display; otherwise rendering will fail
import os
os.system("Xvfb :1 -screen 0 1024x768x24 &")
os.environ['DISPLAY'] = ':1'

In [None]:
!python -m utils.record_video --algo a2c --env CartPole-v1 --exp-id 0 -f logs/ -n 1000

### Display the video

In [None]:
import base64
from pathlib import Path

from IPython import display as ipythondisplay

def show_videos(video_path='', prefix=''):
  """
  Taken from https://github.com/eleurent/highway-env

  :param video_path: (str) Path to the folder containing videos
  :param prefix: (str) Filter the video, showing only the only starting with this prefix
  """
  html = []
  for mp4 in Path(video_path).glob("{}*.mp4".format(prefix)):
      video_b64 = base64.b64encode(mp4.read_bytes())
      html.append('''<video alt="{}" autoplay 
                    loop controls style="height: 400px;">
                    <source src="data:video/mp4;base64,{}" type="video/mp4" />
                </video>'''.format(mp4, video_b64.decode('ascii')))
  ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))

In [None]:
show_videos(video_path='logs/a2c/CartPole-v1_1/videos/', prefix='')

### Continue Training

Here, we will continue training of the previous model

In [None]:
!python train.py --algo a2c --env CartPole-v1 --n-timesteps 50000 -i logs/a2c/CartPole-v1_1/CartPole-v1.zip

In [None]:
!python enjoy.py --algo a2c --env CartPole-v1 --no-render --n-timesteps 1000 --folder logs/