<a href="https://colab.research.google.com/github/yaashhnaaa/_BipedalWalker_v3-RL-Models/blob/main/ppo_BipedalWalker_v3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **1. Setup**


### **Install Packages**

In [1]:
# Install necessary packages
!apt install swig cmake ffmpeg xvfb python3-opengl
!pip install stable-baselines3==2.0.0a5 gymnasium[box2d] huggingface_sb3 pyvirtualdisplay imageio[ffmpeg]

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
cmake is already the newest version (3.22.1-1ubuntu1.22.04.2).
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
The following additional packages will be installed:
  freeglut3 libfontenc1 libglu1-mesa libxfont2 libxkbfile1 swig4.0 x11-xkb-utils xfonts-base
  xfonts-encodings xfonts-utils xserver-common
Suggested packages:
  libgle3 python3-numpy swig-doc swig-examples swig4.0-examples swig4.0-doc
The following NEW packages will be installed:
  freeglut3 libfontenc1 libglu1-mesa libxfont2 libxkbfile1 python3-opengl swig swig4.0
  x11-xkb-utils xfonts-base xfonts-encodings xfonts-utils xserver-common xvfb
0 upgraded, 14 newly installed, 0 to remove and 49 not upgraded.
Need to get 9,753 kB of archives.
After this operation, 25.6 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 freeglut3 amd64 2.8.1-6 [74.0 kB]
Get:2 http://

The Next Cell will force the notebook runtime to restart. This is to ensure all the new libraries installed will be used.

In [None]:
import os
os.kill(os.getpid(), 9)

### **Start Virtual Display**

In [1]:
from pyvirtualdisplay import Display
virtual_display = Display(visible=0, size=(1400, 900))
virtual_display.start()

<pyvirtualdisplay.display.Display at 0x7dbc48169360>

### **Setup Environment**

In [3]:
import gymnasium as gym
env = gym.make("BipedalWalker-v3", hardcore=True)
env.reset()

(array([ 2.7473837e-03,  1.2019667e-06, -1.5675511e-04, -1.6000008e-02,
         9.2188999e-02,  3.6391238e-04,  8.6011821e-01,  1.1139875e-03,
         1.0000000e+00,  3.2585610e-02,  3.6387687e-04,  8.5368711e-01,
        -2.4760552e-04,  1.0000000e+00,  4.4081384e-01,  4.4581994e-01,
         4.6142259e-01,  4.8954999e-01,  5.3410256e-01,  6.0246080e-01,
         7.0914859e-01,  8.8593149e-01,  9.3941510e-01,  1.0000000e+00],
       dtype=float32),
 {})

### **Observation Space**
Observation Space Shape (24,) vector of size 24, where each value contains different information about the walker:

- **Hull Angle Speed**: The speed at which the main body of the walker is rotating.
- **Angular Velocity**: The rate of change of the angular position of the walker.
- **Horizontal Speed**: The speed at which the walker is moving horizontally.
- **Vertical Speed**: The speed at which the walker is moving vertically.
- **Position of Joints**: The positions (angles) of the walker's joints. Given that the walker has 4 joints, this take up 4 values.
- **Joints Angular Speed**: The rate of change of the angular position for each joint. Again, this would be 4 values for the 4 joints.
- **Legs Contact with Ground**: Indicating whether each leg is in contact with the ground. Given two legs, this contains 2 values.
- **10 Lidar Rangefinder Measurements**: These are distance measurements to detect obstacles or terrain features around the walker. There are 10 of these values.


In [5]:
print("_____OBSERVATION SPACE_____ \n")
print("Observation Space Shape", env.observation_space.shape)
print("Sample observation", env.observation_space.sample()) # Get a random observation

_____OBSERVATION SPACE_____ 

Observation Space Shape (24,)
Sample observation [-9.7903603e-01 -1.8750577e+00 -4.8076162e+00  1.7540438e+00
  3.1009738e+00 -2.5096481e+00  6.7661935e-01  3.2315063e+00
  1.5782069e+00  2.2270032e-03 -1.1264281e+00 -1.3746770e-01
  2.5118129e+00  4.2006259e+00  2.2694233e-01 -3.6284384e-01
  4.4850060e-01  7.5454599e-01 -3.2949325e-01 -4.7509086e-01
  7.4521214e-01 -2.3470896e-01  8.8273644e-01 -6.8168813e-01]


### **Action Space**

 Actions are motor speed values in the [-1, 1] range for each of the 4 joints at both hips and knees.

In [6]:
print("\n _____ACTION SPACE_____ \n")
print("Action Space Shape", env.action_space.shape)
print("Action Space Sample", env.action_space.sample()) # Take a random action


 _____ACTION SPACE_____ 

Action Space Shape (4,)
Action Space Sample [0.6810404  0.64218205 0.6964488  0.276566  ]


### **Vectorized Environment**
Create a vectorized environment (a method for stacking multiple independent environments into a single environment) of 16 environments to have more diverse experiences.

In [7]:
from stable_baselines3.common.env_util import make_vec_env
env = make_vec_env('BipedalWalker-v3', n_envs=16)

# **2. Building the Model**

In [8]:
from stable_baselines3 import PPO
model = PPO(
    policy = 'MlpPolicy',
    env = env,
    n_steps = 2048,
    batch_size = 128,
    n_epochs = 6,
    gamma = 0.999,
    gae_lambda = 0.98,
    ent_coef = 0.01,
    verbose=1)

Using cuda device


# 3.**Video Generation**

In [9]:
from wasabi import Printer
import numpy as np
from stable_baselines3.common.base_class import BaseAlgorithm
from pathlib import Path
import tempfile
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import (
    DummyVecEnv,
    VecEnv,
    VecVideoRecorder,
)

In [10]:
msg = Printer()

In [11]:
def generate_replay(
    model: BaseAlgorithm,
    eval_env: VecEnv,
    video_length: int,
    is_deterministic: bool,
    local_path: Path,
):
    """
    Generate a replay video of the agent
    :param model: trained model
    :param eval_env: environment used to evaluate the agent
    :param video_length: length of the video (in timesteps)
    :param is_deterministic: use deterministic or stochastic actions
    :param local_path: path of the local repository
    """
    # This is another temporary directory for video outputs
    # SB3 created a -step-0-to-... meta files as well as other
    # artifacts which we don't want in the repo.
    with tempfile.TemporaryDirectory() as tmpdirname:
        # Step 1: Create the VecVideoRecorder
        env = VecVideoRecorder(
            eval_env,
            tmpdirname,
            record_video_trigger=lambda x: x == 0,
            video_length=video_length,
            name_prefix="",
        )

        obs = env.reset()
        lstm_states = None
        episode_starts = np.ones((env.num_envs,), dtype=bool)

        try:
            for _ in range(video_length):
                action, lstm_states = model.predict(
                    obs,
                    state=lstm_states,
                    episode_start=episode_starts,
                    deterministic=is_deterministic,
                )
                obs, _, episode_starts, _ = env.step(action)

            # Save the video
            env.close()

            # Convert the video with x264 codec
            inp = env.video_recorder.path
            out = local_path
            os.system(f"ffmpeg -y -i {inp} -vcodec h264 {out}".format(inp, out))
            print(f"Video saved to: {out}")
        except KeyboardInterrupt:
            pass
        except Exception as e:
            msg.fail(str(e))
            # Add a message for video
            msg.fail(
                "We are unable to generate a replay of your agent"
            )

# **4. Training, Saving and Record the Videos**

In [12]:
import os

In [13]:
#create a directory to save the videos
video_dir = "/content/videos"
if not os.path.exists(video_dir):
    os.makedirs(video_dir)

In [14]:
env_id = "BipedalWalker-v3"
# Train and generate video at every 100000 steps, adjust the timesteps to your liking
for i in range(0, 20000, 1000):
    model.learn(total_timesteps=1000)
    # Save the model
    model_name = "ppo-BipedalWalker-v3"
    model.save(model_name)
    video_name = f"replay_{i + 1000}.mp4"
    generate_replay(
        model=model,
        eval_env=DummyVecEnv([lambda: Monitor(gym.make(env_id, hardcore=True, render_mode="rgb_array"))]),
        video_length=100,
        is_deterministic=True,
        local_path=os.path.join(video_dir, video_name)
    )

model_name = "ppo-BipedalWalker-v3"
model.save(model_name)


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 386      |
|    ep_rew_mean     | -111     |
| time/              |          |
|    fps             | 2624     |
|    iterations      | 1        |
|    time_elapsed    | 12       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpcpib68a2/-step-0-to-step-100.mp4


  """


Moviepy - Building video /tmp/tmpcpib68a2/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmpcpib68a2/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpcpib68a2/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_1000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 359      |
|    ep_rew_mean     | -111     |
| time/              |          |
|    fps             | 2845     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpkhcprw8r/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmpkhcprw8r/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmpkhcprw8r/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpkhcprw8r/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_2000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 486      |
|    ep_rew_mean     | -110     |
| time/              |          |
|    fps             | 2820     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmp4wowx2qi/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmp4wowx2qi/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmp4wowx2qi/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmp4wowx2qi/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_3000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 582      |
|    ep_rew_mean     | -109     |
| time/              |          |
|    fps             | 2896     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmppdbplgcd/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmppdbplgcd/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmppdbplgcd/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmppdbplgcd/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_4000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 633      |
|    ep_rew_mean     | -107     |
| time/              |          |
|    fps             | 2797     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpefptgv6h/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmpefptgv6h/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmpefptgv6h/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpefptgv6h/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_5000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 974      |
|    ep_rew_mean     | -103     |
| time/              |          |
|    fps             | 2804     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpq3_0a_a0/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmpq3_0a_a0/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmpq3_0a_a0/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpq3_0a_a0/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_6000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.01e+03 |
|    ep_rew_mean     | -103     |
| time/              |          |
|    fps             | 2801     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpakny_2ls/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmpakny_2ls/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmpakny_2ls/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpakny_2ls/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_7000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.13e+03 |
|    ep_rew_mean     | -104     |
| time/              |          |
|    fps             | 2853     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpuhj2a4bn/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmpuhj2a4bn/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmpuhj2a4bn/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpuhj2a4bn/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_8000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.51e+03 |
|    ep_rew_mean     | -100     |
| time/              |          |
|    fps             | 2869     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpd74fovt8/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmpd74fovt8/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmpd74fovt8/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpd74fovt8/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_9000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.3e+03  |
|    ep_rew_mean     | -96      |
| time/              |          |
|    fps             | 2806     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpbfx_eb0c/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmpbfx_eb0c/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmpbfx_eb0c/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpbfx_eb0c/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_10000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.23e+03 |
|    ep_rew_mean     | -90.2    |
| time/              |          |
|    fps             | 2927     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmplrqpjjrh/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmplrqpjjrh/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmplrqpjjrh/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmplrqpjjrh/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_11000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.43e+03 |
|    ep_rew_mean     | -83.5    |
| time/              |          |
|    fps             | 2795     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpyi28e_yb/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmpyi28e_yb/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmpyi28e_yb/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpyi28e_yb/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_12000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.43e+03 |
|    ep_rew_mean     | -78.2    |
| time/              |          |
|    fps             | 2798     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpygph3mhh/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmpygph3mhh/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmpygph3mhh/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpygph3mhh/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_13000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.51e+03 |
|    ep_rew_mean     | -72.2    |
| time/              |          |
|    fps             | 2822     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpb6455bz7/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmpb6455bz7/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmpb6455bz7/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpb6455bz7/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_14000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.36e+03 |
|    ep_rew_mean     | -68.6    |
| time/              |          |
|    fps             | 2879     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmp3fglwcf9/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmp3fglwcf9/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmp3fglwcf9/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmp3fglwcf9/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_15000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.36e+03 |
|    ep_rew_mean     | -67.2    |
| time/              |          |
|    fps             | 2902     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpaytk6yvr/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmpaytk6yvr/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmpaytk6yvr/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpaytk6yvr/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_16000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.6e+03  |
|    ep_rew_mean     | -52.8    |
| time/              |          |
|    fps             | 2878     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpmky8p08l/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmpmky8p08l/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmpmky8p08l/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpmky8p08l/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_17000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.36e+03 |
|    ep_rew_mean     | -59.6    |
| time/              |          |
|    fps             | 3052     |
|    iterations      | 1        |
|    time_elapsed    | 10       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmplhr9r1n6/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmplhr9r1n6/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmplhr9r1n6/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmplhr9r1n6/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_18000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.42e+03 |
|    ep_rew_mean     | -59.7    |
| time/              |          |
|    fps             | 2933     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmplgjo1e9a/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmplgjo1e9a/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmplgjo1e9a/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmplgjo1e9a/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_19000.mp4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.6e+03  |
|    ep_rew_mean     | -36.7    |
| time/              |          |
|    fps             | 2973     |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 32768    |
---------------------------------
Saving video to /tmp/tmpym2eth9p/-step-0-to-step-100.mp4
Moviepy - Building video /tmp/tmpym2eth9p/-step-0-to-step-100.mp4.
Moviepy - Writing video /tmp/tmpym2eth9p/-step-0-to-step-100.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpym2eth9p/-step-0-to-step-100.mp4
Video saved to: /content/videos/replay_20000.mp4


In [15]:
with open(os.path.join(video_dir, "filelist.txt"), "w") as f:
    for i in range(0, 20000, 1000):
        video_name = f"replay_{i + 1000}.mp4"
        f.write(f"file '{os.path.join(video_dir, video_name)}'\n")
# Concatenate all the videos into one
os.system(f"ffmpeg -f concat -safe 0 -i {os.path.join(video_dir, 'filelist.txt')} -c copy {os.path.join(video_dir, 'replay_all.mp4')}")

0

# **5. Visualize Final Video**

In [16]:
from IPython.display import HTML
from base64 import b64encode
mp4 = open('videos/replay_all.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=600 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

# **6. Evaluate the Model**

In [17]:
from stable_baselines3.common.evaluation import evaluate_policy

In [18]:
eval_env = Monitor(gym.make("BipedalWalker-v3"))
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

mean_reward=75.39 +/- 5.649985915077246


# **7. Upload to HuggingFace**

In [19]:
from huggingface_sb3 import load_from_hub, package_to_hub
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.

In [20]:
notebook_login()
!git config --global credential.helper store

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [21]:
env_id = "BipedalWalker-v3"
model_name = "ppo-BipedalWalker-v3"
model_architecture = "PPO"

repo_id = "YoungMeng/ppo-BipedalWalker-test" # Change with your repo id

## Define the commit message
commit_message = "Upload PPO BipedalWalker-v3 trained agent"

# Create the evaluation env and set the render_mode="rgb_array"
eval_env = DummyVecEnv([lambda: gym.make(env_id, hardcore=True, render_mode="rgb_array")])

package_to_hub(model=model, # trained model
               model_name=model_name, # The name of our trained model
               model_architecture=model_architecture, # The model architecture we used: in our case PPO
               env_id=env_id, # Name of the environment
               eval_env=eval_env,
               repo_id=repo_id,
               commit_message=commit_message)

  and should_run_async(code)


[38;5;4mℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to 1min.
This is a work in progress: if you encounter a bug, please open an issue.[0m


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Saving video to /tmp/tmpmy_aznmd/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmpmy_aznmd/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmpmy_aznmd/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpmy_aznmd/-step-0-to-step-1000.mp4
[38;5;4mℹ Pushing repo YoungMeng/ppo-BipedalWalker-test to the Hugging Face
Hub[0m


HfHubHTTPError:  (Request ID: Root=1-67111468-209c67e700bca5e45b826f18;1501dead-651d-41e6-a49f-82f118145dec)

403 Forbidden: Authorization error..
Cannot access content at: https://huggingface.co/YoungMeng/ppo-BipedalWalker-test.git/info/lfs/objects/batch.
If you are trying to create or update content, make sure you have a token with the `write` role.

# **8. Load Models from HuggingFace (Optional)**

In [22]:
from huggingface_sb3 import load_from_hub
repo_id = "YoungMeng/ppo-BipedalWalker-test" # The repo_id
filename = "ppo-BipedalWalker-v3.zip" # The model filename.zip

checkpoint = load_from_hub(repo_id, filename)
model = PPO.load(checkpoint, print_system_info=True)

ppo-BipedalWalker-v3.zip:   0%|          | 0.00/175k [00:00<?, ?B/s]

== CURRENT SYSTEM INFO ==
- OS: Linux-6.1.85+-x86_64-with-glibc2.35 # 1 SMP PREEMPT_DYNAMIC Thu Jun 27 21:05:47 UTC 2024
- Python: 3.10.12
- Stable-Baselines3: 2.0.0a5
- PyTorch: 2.4.1+cu121
- GPU Enabled: True
- Numpy: 1.26.4
- Cloudpickle: 2.2.1
- Gymnasium: 0.28.1
- OpenAI Gym: 0.25.2

== SAVED MODEL SYSTEM INFO ==
- OS: Linux-5.15.120+-x86_64-with-glibc2.35 # 1 SMP Wed Aug 30 11:19:59 UTC 2023
- Python: 3.10.12
- Stable-Baselines3: 2.0.0a5
- PyTorch: 2.0.1+cu118
- GPU Enabled: False
- Numpy: 1.23.5
- Cloudpickle: 2.2.1
- Gymnasium: 0.28.1
- OpenAI Gym: 0.25.2



  th_object = th.load(file_content, map_location=device)


In [23]:
eval_env = Monitor(gym.make("BipedalWalker-v3", hardcore=True))
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

mean_reward=-94.85 +/- 28.125395429640534
