<a href="https://colab.research.google.com/github/tashatsar/DRL-Course/blob/main/unit1_lunar_lander.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction
### The environment 🎮
- [LunarLander-v2](https://www.gymlibrary.dev/environments/box2d/lunar_lander/)

### The library used 📚
- [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/)

### Objectives of this notebook 🏆
At the end of the notebook, you will:
- Be able to use **Gym**, the environment library.
- Be able to use **Stable-Baselines3**, the deep reinforcement learning library.
- Be able to **push your trained agent to the Hub** with a nice video replay and an evaluation score 🔥.

### Install dependencies and create a virtual screen 🔽
The first step is to install the dependencies, we’ll install multiple ones.

- `gym[box2D]`: Contains the LunarLander-v2 environment 🌛 (we use `gym==0.21`)
- `stable-baselines3[extra]`: The deep reinforcement learning library.
- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.

To make things easier, we created a script to install all these dependencies.

### Stable-Baselines3 set up:

1️⃣ You **create your environment** (in our case it was done above)

2️⃣ You define the **model you want to use and instantiate this model** `model = PPO("MlpPolicy")`

3️⃣ You **train the agent** with `model.learn` and define the number of training timesteps

```
# Create environment
env = gym.make('LunarLander-v2')

# Instantiate the agent
model = PPO('MlpPolicy', env, verbose=1)
# Train the agent
model.learn(total_timesteps=int(2e5))
```

# Install dependencies and create a virtual screen


In [2]:
!apt install swig cmake

Reading package lists... Done
Building dependency tree       
Reading state information... Done
cmake is already the newest version (3.10.2-1ubuntu2.18.04.2).
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'apt autoremove' to remove it.
Suggested packages:
  swig-doc swig-examples swig3.0-examples swig3.0-doc
The following NEW packages will be installed:
  swig swig3.0
0 upgraded, 2 newly installed, 0 to remove and 20 not upgraded.
Need to get 1,100 kB of archives.
After this operation, 5,822 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 swig3.0 amd64 3.0.12-1 [1,094 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 swig amd64 3.0.12-1 [6,460 B]
Fetched 1,100 kB in 2s (529 kB/s)
Selecting previously unselected package swig3.0.
(Reading database ... 124016 files and directories currently installed.)
Preparing to unpack .../swig3.0_3.0.12-1_amd64.deb ...
Unpack

In [3]:
!pip install -r https://huggingface.co/spaces/ThomasSimonini/temp-space-requirements/raw/main/requirements/requirements-unit1.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting stable-baselines3[extra]
  Downloading stable_baselines3-1.6.2-py3-none-any.whl (170 kB)
[K     |████████████████████████████████| 170 kB 28.3 MB/s 
[?25hCollecting box2d
  Downloading Box2D-2.3.10-cp38-cp38-manylinux1_x86_64.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 60.1 MB/s 
[?25hCollecting box2d-kengz
  Downloading Box2D-kengz-2.3.3.tar.gz (425 kB)
[K     |████████████████████████████████| 425 kB 79.4 MB/s 
[?25hCollecting huggingface_sb3
  Downloading huggingface_sb3-2.2.4-py3-none-any.whl (9.4 kB)
Collecting pyglet==1.5.1
  Downloading pyglet-1.5.1-py2.py3-none-any.whl (1.0 MB)
[K     |████████████████████████████████| 1.0 MB 57.1 MB/s 
[?25hCollecting cloudpickle>=1.6
  Downloading cloudpickle-2.2.0-py3-none-any.whl (25 kB)
Collecting huggingface-hub~=0.8
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[K     |█████████████████

In [4]:
!sudo apt-get update
!apt install python-opengl
!apt install ffmpeg
!apt install xvfb
!pip3 install pyvirtualdisplay

Get:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]
Get:2 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Hit:3 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease
Ign:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Hit:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Hit:8 http://archive.ubuntu.com/ubuntu bionic InRelease
Hit:9 http://ppa.launchpad.net/cran/libgit2/ubuntu bionic InRelease
Get:10 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Hit:11 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu bionic InRelease
Hit:12 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease
Get:13 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [83.3 kB]
Fetched 264 kB in 2s (140 kB/s)
Reading 

To make sure the new installed libraries are used, **sometimes it's required to restart the notebook runtime**. The next cell will force the **runtime to crash, so you'll need to connect again and run the code starting from here**. Thanks for this trick, **we will be able to run our virtual screen.**

In [None]:
# restart the notebook runtime
import os
os.kill(os.getpid(), 9)

In [1]:
# Virtual display
from pyvirtualdisplay import Display

virtual_display = Display(visible=0, size=(1400, 900))
virtual_display.start()

<pyvirtualdisplay.display.Display at 0x7f11f262b0d0>

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Import the packages


In [3]:
import gym

from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.

from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.env_util import make_vec_env

## Environment check

In [4]:
import gym

# First, we create our environment called LunarLander-v2
env = gym.make("LunarLander-v2")

# Then we reset this environment
observation = env.reset()

for _ in range(20):
  # Take a random action
  action = env.action_space.sample()
  print("Action taken:", action)

  # Do this action in the environment and get
  # next_state, reward, done and info
  observation, reward, done, info = env.step(action)
  
  # If the game is done (in our case we land, crashed or timeout)
  if done:
      # Reset the environment
      print("Environment is reset")
      observation = env.reset()

Action taken: 2
Action taken: 1
Action taken: 3
Action taken: 2
Action taken: 0
Action taken: 2
Action taken: 0
Action taken: 0
Action taken: 0
Action taken: 1
Action taken: 0
Action taken: 1
Action taken: 0
Action taken: 1
Action taken: 1
Action taken: 2
Action taken: 2
Action taken: 2
Action taken: 0
Action taken: 3


Let's see what the Environment looks like:


In [None]:
# We create our environment with gym.make("<name_of_the_environment>")
env = gym.make("LunarLander-v2")
env.reset()
print("_____OBSERVATION SPACE_____ \n")
print("Observation Space Shape", env.observation_space.shape)
print("Sample observation", env.observation_space.sample()) # Get a random observation

_____OBSERVATION SPACE_____ 

Observation Space Shape (8,)
Sample observation [ 0.05059087  0.5583545   0.01732664  1.0521195  -0.73313606 -0.34977174
 -0.7522492   0.6803027 ]


We see with `Observation Space Shape (8,)` that the observation is a vector of size 8, where each value contains different information about the lander:
- Horizontal pad coordinate (x)
- Vertical pad coordinate (y)
- Horizontal speed (x)
- Vertical speed (y)
- Angle
- Angular speed
- If the left leg has contact point touched the land
- If the right leg has contact point touched the land


In [None]:
print("\n _____ACTION SPACE_____ \n")
print("Action Space Shape", env.action_space.n)
print("Action Space Sample", env.action_space.sample()) # Take a random action


 _____ACTION SPACE_____ 

Action Space Shape 4
Action Space Sample 0


The action space (the set of possible actions the agent can take) is discrete with 4 actions available 🎮: 

- Do nothing,
- Fire left orientation engine,
- Fire the main engine,
- Fire right orientation engine.

Reward function (the function that will gives a reward at each timestep) 💰:

- Moving from the top of the screen to the landing pad and zero speed is about 100~140 points.
- Firing main engine is -0.3 each frame
- Each leg ground contact is +10 points
- Episode finishes if the lander crashes (additional - 100 points) or come to rest (+100 points)

# PPO Model

## Model training

In [5]:
env = make_vec_env('LunarLander-v2', n_envs=32)

# parameters 
model = PPO(
    policy = 'MlpPolicy',
    env = env,
    n_steps = 1024, #2048, #1024,
    batch_size = 64, #64, 256, 128
    n_epochs = 3, #4, 12
    gamma = 0.999,
    gae_lambda = 0.95,
    ent_coef = 0.01,
    vf_coef = 1,
    learning_rate = 0.0001,
    verbose=1)

Using cuda device


In [None]:
# Train it for 3,000,000 timesteps
model.learn(total_timesteps=3000000)

# Specify file name for model and save the model to file
model_name = "ppo-LunarLander-v2"
model.save(model_name)

---------------------------------
| rollout/           |          |
|    ep_len_mean     | 90.8     |
|    ep_rew_mean     | -188     |
| time/              |          |
|    fps             | 4676     |
|    iterations      | 1        |
|    time_elapsed    | 7        |
|    total_timesteps | 32768    |
---------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 107         |
|    ep_rew_mean          | -135        |
| time/                   |             |
|    fps                  | 2955        |
|    iterations           | 2           |
|    time_elapsed         | 22          |
|    total_timesteps      | 65536       |
| train/                  |             |
|    approx_kl            | 0.010284571 |
|    clip_fraction        | 0.0964      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.38       |
|    explained_variance   | 0.000883    |
|    learning_rate        | 0.

In [6]:
# Train it for 10,000,000 timesteps
model.learn(total_timesteps=10000000)

# Specify file name for model and save the model to file
model_name = "ppo-LunarLander-v2-LR"
model.save(model_name)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
|    value_loss           | 122         |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 621          |
|    ep_rew_mean          | 82.8         |
| time/                   |              |
|    fps                  | 942          |
|    iterations           | 69           |
|    time_elapsed         | 2397         |
|    total_timesteps      | 2260992      |
| train/                  |              |
|    approx_kl            | 0.0027519157 |
|    clip_fraction        | 0.0201       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.906       |
|    explained_variance   | 0.828        |
|    learning_rate        | 0.0001       |
|    loss                 | 248          |
|    n_updates            | 204          |
|    policy_gradient_loss | -8e-05       |
|    value_loss           | 120   

## Evaluation
When you evaluate your agent, you should not use your training environment but create an evaluation environment.

In [None]:
#@title
#baseline, n_envs=16

eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

print('n_steps:', model.n_steps, '\nbatch_size:', model.batch_size, 
      '\nn_epochs:', model.n_epochs, '\ngamma:', model.gamma, 
      '\ngae_lambda:', model.gae_lambda, '\nent_coef:', model.ent_coef, 
      '\nvf_coef:', model.vf_coef)

mean_reward=252.63 +/- 23.703454813878075
n_steps: 1024 
batch_size: 64 
n_epochs: 4 
gamma: 0.999 
gae_lambda: 0.98 
ent_coef: 0.01 
vf_coef: 0.5


In [None]:
#@title
# with vf_coef = 0.5, n_envs=32

eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

print('n_steps:', model.n_steps, '\nbatch_size:', model.batch_size, 
      '\nn_epochs:', model.n_epochs, '\ngamma:', model.gamma, 
      '\ngae_lambda:', model.gae_lambda, '\nent_coef:', model.ent_coef, 
      '\nvf_coef:', model.vf_coef, '\nlearning_rate:', model.learning_rate)



mean_reward=131.16 +/- 120.64328577782804
n_steps: 2048 
batch_size: 256 
n_epochs: 8 
gamma: 0.9999 
gae_lambda: 0.98 
ent_coef: 0.005 
vf_coef: 0.5 
learning_rate: 0.003


In [None]:
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

print('n_steps:', model.n_steps, '\nbatch_size:', model.batch_size, 
      '\nn_epochs:', model.n_epochs, '\ngamma:', model.gamma, 
      '\ngae_lambda:', model.gae_lambda, '\nent_coef:', model.ent_coef, 
      '\nvf_coef:', model.vf_coef, '\nlearning_rate:', model.learning_rate)

mean_reward=101.93 +/- 133.0178490830127
n_steps: 2048 
batch_size: 128 
n_epochs: 3 
gamma: 0.9999 
gae_lambda: 0.98 
ent_coef: 0.005 
vf_coef: 0.5 
learning_rate: 0.0005


In [None]:
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

print('n_steps:', model.n_steps, '\nbatch_size:', model.batch_size, 
      '\nn_epochs:', model.n_epochs, '\ngamma:', model.gamma, 
      '\ngae_lambda:', model.gae_lambda, '\nent_coef:', model.ent_coef, 
      '\nvf_coef:', model.vf_coef, '\nlearning_rate:', model.learning_rate)

mean_reward=257.45 +/- 19.228081256832613
n_steps: 1024 
batch_size: 64 
n_epochs: 3 
gamma: 0.999 
gae_lambda: 0.98 
ent_coef: 0.01 
vf_coef: 0.5 
learning_rate: 0.001


In [None]:
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

print('n_steps:', model.n_steps, '\nbatch_size:', model.batch_size, 
      '\nn_epochs:', model.n_epochs, '\ngamma:', model.gamma, 
      '\ngae_lambda:', model.gae_lambda, '\nent_coef:', model.ent_coef, 
      '\nvf_coef:', model.vf_coef, '\nlearning_rate:', model.learning_rate)

mean_reward=269.67 +/- 18.35223236762697
n_steps: 1024 
batch_size: 64 
n_epochs: 3 
gamma: 0.999 
gae_lambda: 0.95 
ent_coef: 0.01 
vf_coef: 0.5 
learning_rate: 0.001


In [None]:
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

print('n_steps:', model.n_steps, '\nbatch_size:', model.batch_size, 
      '\nn_epochs:', model.n_epochs, '\ngamma:', model.gamma, 
      '\ngae_lambda:', model.gae_lambda, '\nent_coef:', model.ent_coef, 
      '\nvf_coef:', model.vf_coef, '\nlearning_rate:', model.learning_rate)



mean_reward=267.33 +/- 23.059092828887604
n_steps: 1024 
batch_size: 64 
n_epochs: 3 
gamma: 0.999 
gae_lambda: 0.95 
ent_coef: 0.01 
vf_coef: 0.5 
learning_rate: 0.0003


In [None]:
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

print('n_steps:', model.n_steps, '\nbatch_size:', model.batch_size, 
      '\nn_epochs:', model.n_epochs, '\ngamma:', model.gamma, 
      '\ngae_lambda:', model.gae_lambda, '\nent_coef:', model.ent_coef, 
      '\nvf_coef:', model.vf_coef, '\nlearning_rate:', model.learning_rate)

mean_reward=266.72 +/- 26.410045307229005
n_steps: 1024 
batch_size: 64 
n_epochs: 3 
gamma: 0.999 
gae_lambda: 0.95 
ent_coef: 0.01 
vf_coef: 0.5 
learning_rate: 0.005


In [None]:
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

print('n_steps:', model.n_steps, '\nbatch_size:', model.batch_size, 
      '\nn_epochs:', model.n_epochs, '\ngamma:', model.gamma, 
      '\ngae_lambda:', model.gae_lambda, '\nent_coef:', model.ent_coef, 
      '\nvf_coef:', model.vf_coef, '\nlearning_rate:', model.learning_rate)

mean_reward=267.92 +/- 13.440775381220826
n_steps: 1024 
batch_size: 64 
n_epochs: 3 
gamma: 0.999 
gae_lambda: 0.95 
ent_coef: 0.01 
vf_coef: 1 
learning_rate: 0.001


In [None]:
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

print('n_steps:', model.n_steps, '\nbatch_size:', model.batch_size, 
      '\nn_epochs:', model.n_epochs, '\ngamma:', model.gamma, 
      '\ngae_lambda:', model.gae_lambda, '\nent_coef:', model.ent_coef, 
      '\nvf_coef:', model.vf_coef, '\nlearning_rate:', model.learning_rate)

mean_reward=266.88 +/- 18.412456452696656
n_steps: 2048 
batch_size: 256 
n_epochs: 3 
gamma: 0.999 
gae_lambda: 0.95 
ent_coef: 0.01 
vf_coef: 1 
learning_rate: 0.001


## Evaluation on 3KK

In [None]:
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

print('n_steps:', model.n_steps, '\nbatch_size:', model.batch_size, 
      '\nn_epochs:', model.n_epochs, '\ngamma:', model.gamma, 
      '\ngae_lambda:', model.gae_lambda, '\nent_coef:', model.ent_coef, 
      '\nvf_coef:', model.vf_coef, '\nlearning_rate:', model.learning_rate)



mean_reward=276.99 +/- 15.32538461670048
n_steps: 1024 
batch_size: 64 
n_epochs: 3 
gamma: 0.999 
gae_lambda: 0.95 
ent_coef: 0.01 
vf_coef: 1 
learning_rate: 0.001


## Evaluation on 10KK


In [None]:
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

print('n_steps:', model.n_steps, '\nbatch_size:', model.batch_size, 
      '\nn_epochs:', model.n_epochs, '\ngamma:', model.gamma, 
      '\ngae_lambda:', model.gae_lambda, '\nent_coef:', model.ent_coef, 
      '\nvf_coef:', model.vf_coef, '\nlearning_rate:', model.learning_rate)



mean_reward=284.45 +/- 22.327406592399512
n_steps: 1024 
batch_size: 64 
n_epochs: 3 
gamma: 0.999 
gae_lambda: 0.95 
ent_coef: 0.01 
vf_coef: 1 
learning_rate: 0.001


## Evaluation on 10KK


In [None]:
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

print('n_steps:', model.n_steps, '\nbatch_size:', model.batch_size, 
      '\nn_epochs:', model.n_epochs, '\ngamma:', model.gamma, 
      '\ngae_lambda:', model.gae_lambda, '\nent_coef:', model.ent_coef, 
      '\nvf_coef:', model.vf_coef, '\nlearning_rate:', model.learning_rate)



mean_reward=260.70 +/- 16.392729873391172
n_steps: 1024 
batch_size: 64 
n_epochs: 3 
gamma: 0.999 
gae_lambda: 0.95 
ent_coef: 0.01 
vf_coef: 1 
learning_rate: 5e-05


In [7]:
# LR
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

print('n_steps:', model.n_steps, '\nbatch_size:', model.batch_size, 
      '\nn_epochs:', model.n_epochs, '\ngamma:', model.gamma, 
      '\ngae_lambda:', model.gae_lambda, '\nent_coef:', model.ent_coef, 
      '\nvf_coef:', model.vf_coef, '\nlearning_rate:', model.learning_rate)



mean_reward=258.67 +/- 61.29811657648224
n_steps: 1024 
batch_size: 64 
n_epochs: 3 
gamma: 0.999 
gae_lambda: 0.95 
ent_coef: 0.01 
vf_coef: 1 
learning_rate: 0.0001


# Hub

## Publish the trained model on the Hub

In [9]:
notebook_login()
!git config --global credential.helper store

Token is valid.
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /root/.huggingface/token
Login successful


In [10]:
import gym

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.env_util import make_vec_env

from huggingface_sb3 import package_to_hub

# the name of the environment
env_id = "LunarLander-v2"

# the model architecture used
model_architecture = "PPO"

## a repo_id
## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
repo_id = "tashatsar/ppo-LunarLander-v2-LR"

## the commit message
commit_message = "Upload PPO LunarLander-v2-LR trained agent"

# the evaluation env
eval_env = DummyVecEnv([lambda: gym.make(env_id)])

# the package_to_hub function 
package_to_hub(model=model, # trained model
               model_name=model_name, # name of trained model 
               model_architecture=model_architecture, # model architecture used
               env_id=env_id, # name of the environment
               eval_env=eval_env, # Evaluation Environment
               repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
               commit_message=commit_message)


[38;5;4mℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to 1min.
This is a work in progress: if you encounter a bug, please open an issue.[0m




Saving video to /tmp/tmplxlzot36/-step-0-to-step-1000.mp4
[38;5;4mℹ Pushing repo tashatsar/ppo-LunarLander-v2-LR to the Hugging Face
Hub[0m
[38;5;4mℹ Your model is pushed to the Hub. You can view your model here:
https://huggingface.co/tashatsar/ppo-LunarLander-v2-LR/tree/main/[0m


'https://huggingface.co/tashatsar/ppo-LunarLander-v2-LR/tree/main/'

## Load a saved LunarLander model from the Hub



In [None]:
from huggingface_sb3 import load_from_hub
repo_id = "Classroom-workshop/assignment2-omar" # The repo_id
filename = "ppo-LunarLander-v2.zip" # The model filename.zip

# When the model was trained on Python 3.8 the pickle protocol is 5
# But Python 3.6, 3.7 use protocol 4
# In order to get compatibility we need to:
# 1. Install pickle5 (we done it at the beginning of the colab)
# 2. Create a custom empty object we pass as parameter to PPO.load()
custom_objects = {
            "learning_rate": 0.0,
            "lr_schedule": lambda _: 0.0,
            "clip_range": lambda _: 0.0,
}

checkpoint = load_from_hub(repo_id, filename)
model = PPO.load(checkpoint, custom_objects=custom_objects, print_system_info=True)

Let's evaluate this agent:

In [None]:
#@title
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

# My Lunar Lander! 🚀🌕


In [11]:
# My Lunar Lander!
%%html

<video controls autoplay><source src="https://huggingface.co/tashatsar/ppo-LunarLander-v2-updates/resolve/main/replay.mp4" type="video/mp4"></video>