<a href="https://colab.research.google.com/github/wohecha/HuggingFace-Unit1/blob/main/notebooks/unit5/unit5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Unit 5: An Introduction to ML-Agents



- For Pyramids: Mean Reward = 1.75  
- For SnowballTarget: Mean Reward = 15 or 30 targets shoot in an episode.

### üéÆ Environments:

- [Pyramids](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Examples.md#pyramids)
- SnowballTarget

### üìö RL-Library:

- [ML-Agents](https://github.com/Unity-Technologies/ml-agents)


We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the GitHub Repo](https://github.com/huggingface/deep-rl-class/issues).

## Objectives of this notebook üèÜ

At the end of the notebook, you will:

- Understand how works **ML-Agents**, the environment library.
- Be able to **train agents in Unity Environments**.


## Prerequisites üèóÔ∏è
Before diving into the notebook, you need to:

üî≤ üìö **Study [what is ML-Agents and how it works by reading Unit 5](https://huggingface.co/deep-rl-course/unit5/introduction)**  ü§ó  

# Let's train our agents üöÄ

**To validate this hands-on for the certification process, you just need to push your trained models to the Hub**. There‚Äôs no results to attain to validate this one. But if you want to get nice results you can try to attain:

- For `Pyramids` : Mean Reward = 1.75
- For `SnowballTarget` : Mean Reward = 15 or 30 targets hit in an episode.


## Set the GPU üí™
- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step1.jpg" height='300' style="height:200px;" alt="GPU Step 1">
<!-- height='300' has effect on colab-->
<!-- style="height:200px" has effect on github md rendering"-->


- `Hardware Accelerator > GPU`

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step2.jpg" style="height:200px;" height="300" alt="GPU Step 2">

## Clone the repository üîΩ

- We need to clone the repository, that contains **ML-Agents.**

In [None]:
%%capture
# Clone the repository (can take 3min)
!git clone --depth 1 https://github.com/Unity-Technologies/ml-agents

## Setup the Virtual Environment üîΩ
- In order for the **ML-Agents** to run successfully in Colab,  Colab's Python version must meet the library's Python requirements.

- We can check for the supported Python version under the `python_requires` parameter in the `setup.py` files. These files are required to set up the **ML-Agents** library for use and can be found in the following locations:
  - `/content/ml-agents/ml-agents/setup.py`
  - `/content/ml-agents/ml-agents-envs/setup.py`

- Colab's Current Python version(can be checked using `!python --version`) doesn't match the library's `python_requires` parameter, as a result installation may silently fail and lead to errors like these, when executing the same commands later:
  - `/bin/bash: line 1: mlagents-learn: command not found`
  - `/bin/bash: line 1: mlagents-push-to-hf: command not found`

- To resolve this, we'll create a virtual environment with a Python version compatible with the **ML-Agents** library.

`Note:` *For future compatibility, always check the `python_requires` parameter in the installation files and set your virtual environment to the maximum supported Python version in the given below script if the Colab's Python version is not compatible*

In [None]:
# Colab's Current Python Version (Incompatible with ML-Agents)
!python --version

In [None]:
# Package 'mlagents' requires a different Python: 3.13.11 not in '<=3.10.12,>=3.10.1'

In [None]:
# Install virtualenv and create a virtual environment
!pip install virtualenv
!virtualenv myenv

# Download and install Miniconda
!wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
!chmod +x Miniconda3-latest-Linux-x86_64.sh
!./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local

In [None]:
# accept terms of service
!conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
!conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

In [None]:
# Activate Miniconda and install Python ver 3.10.12
!source /usr/local/bin/activate
!conda install -q -y --prefix /usr/local python=3.10.12 ujson  # Specify the version here

# Set environment variables for Python and conda paths
!export PYTHONPATH=/usr/local/lib/python3.10/site-packages/
!export CONDA_PREFIX=/usr/local/envs/myenv

In [None]:
# Python Version in New Virtual Environment (Compatible with ML-Agents)
!python --version
#should be 3.10.xx

## Installing the dependencies üîΩ

In [None]:
!pwd
# change notebooks directory (return to root)
%cd /content/


In [None]:
%%capture
# Go inside the repository and install the package (can take 3min)
%cd /content/ml-agents
!pip3 install -e ./ml-agents-envs
!pip3 install -e ./ml-agents

## SnowballTarget ‚õÑ

If you need a refresher on how this environments work check this section üëâ
https://huggingface.co/deep-rl-course/unit5/snowball-target

### Download and move the environment zip file in `./training-envs-executables/linux/`
- Our environment executable is in a zip file.
- We need to download it and place it to `./training-envs-executables/linux/`
- We use a linux executable because we use colab, and colab machines OS is Ubuntu (linux)

In [None]:
# Here, we create training-envs-executables and linux
!mkdir -p ./training-envs-executables/linux

Download the file SnowballTarget.zip from https://github.com/huggingface/Snowball-Target using `wget`

In [None]:
!wget "https://github.com/huggingface/Snowball-Target/raw/main/SnowballTarget.zip" -O ./training-envs-executables/linux/SnowballTarget.zip

Unzip the executable.zip file

In [None]:
%%capture
!unzip -d ./training-envs-executables/linux/ ./training-envs-executables/linux/SnowballTarget.zip

Make sure your file is accessible

In [None]:
!chmod -R 755 ./training-envs-executables/linux/SnowballTarget

In [None]:
ls -lah ./training-envs-executables/linux/


### Define the SnowballTarget config file
- In ML-Agents, you define the **training hyperparameters into config.yaml files.**

There are multiple hyperparameters. To know them better, you should check for each explanation with [the documentation](https://github.com/Unity-Technologies/ml-agents/blob/release_20_docs/docs/Training-Configuration-File.md)


So you need to create a `SnowballTarget.yaml` config file in ./content/ml-agents/config/ppo/

We'll give you here a first version of this config (to copy and paste into your `SnowballTarget.yaml file`), **but you should modify it**.

information about the config file might be found here:  
https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-Configuration-File.md

In [None]:
%cd /content/ml-agents/config/ppo/
!pwd

In [None]:
fname="SnowballTarget.yaml"
with open(fname, "w", encoding="utf-8") as f:
    f.write("""
behaviors:
  SnowballTarget:                    # behavior Name
    trainer_type: ppo                # ppo: Proximal Policy Optimization
    summary_freq: 10000
    keep_checkpoints: 10             # number of checkpoints (.onnx)
    checkpoint_interval: 50000       # checkpoint every n steps
    max_steps: 200000                # total training steps
    time_horizon: 64
    threaded: false
    hyperparameters:
      learning_rate: 0.0003         # alpha: learning rate
      learning_rate_schedule: linear
      batch_size: 128
      buffer_size: 2048
      beta: 0.005                   # ppo specific param (entropy regularization strngth ML-agent: 0<x<0.01 )
      epsilon: 0.2                  # Epsilon: exploration rate
      lambd: 0.95
      num_epoch: 3
    network_settings:
      normalize: false
      hidden_units: 256
      num_layers: 2
      vis_encode_type: simple
    reward_signals:                  # only use extrinsic reward (no curosity)
      extrinsic:
        gamma: 0.99                  # gamma: discount rate
        strength: 1.0
""")

with open(fname, "r", encoding="utf-8") as f:
    print(f.read())


As an experimentation, you should also try to modify some other hyperparameters. Unity provides very [good documentation explaining each of them here](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-Configuration-File.md).

Now that you've created the config file and understand what most hyperparameters do, we're ready to train our agent üî•.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Train the agent

To train our agent, we just need to **launch mlagents-learn and select the executable containing the environment.**

We define four parameters:

1. `mlagents-learn <config>`: the path where the hyperparameter config file is.
2. `--env`: where the environment executable is.
3. `--run_id`: the name you want to give to your training run id.
4. `--no-graphics`: to not launch the visualization during the training.

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/mlagentslearn.png" height='200' style="height:200px;" alt="MlAgents learn"/>

Train the model and use the `--resume` flag to continue training in case of interruption.

> It will fail first time if and when you use `--resume`, try running the block again to bypass the error.



The training will take 10 to 35min depending on your config  
<u>Note</u>: Mean rewards are written in console output as the training process goes on.

https://www.immersivelimit.com/tutorials/reinforcement-learning-penguins-part-4-unity-ml-agents

In [None]:
import os
!git clone https://github.com/Unity-Technologies/ml-agents
# Import binaries
!mv ../evol.zip .
!unzip evol.zip
!pip install -e ./ml-agents/ml-agents


In [None]:
# create the config file "SnowballTarget.yaml"
#verify that the file is uploaded correctly
!cat /content/ml-agents/config/ppo/SnowballTarget.yaml

In [None]:
!mlagents-learn /content/ml-agents/config/ppo/SnowballTarget.yaml \
--env=./training-envs-executables/linux/SnowballTarget/SnowballTarget \
--run-id="SnowballTarget1" --no-graphics

# pkg_resources is deprecated as an API.

In [None]:
!mlagents-learn ./config/ppo/SnowballTarget.yaml --env=./training-envs-executables/linux/SnowballTarget/SnowballTarget --run-id="SnowballTarget1" --no-graphics

### Push the agent to the ü§ó Hub

- Now that we trained our agent, we‚Äôre **ready to push it to the Hub to be able to visualize it playing on your browserüî•.**

To be able to share your model with the community there are three more steps to follow:

1Ô∏è‚É£ (If it's not already done) create an account to HF ‚û° https://huggingface.co/join

2Ô∏è‚É£ Sign in and then, you need to store your authentication token from the Hugging Face website.
- Create a new token (https://huggingface.co/settings/tokens) **with write role**

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/create-token.jpg" height='150' style="height:150px;" alt="Create HF Token">

- Copy the token
- Run the cell below and paste the token

In [None]:
from huggingface_hub import notebook_login
notebook_login()

In [None]:
#!huggingface-cli login
#Warning: 'huggingface-cli login' is deprecated. Use 'hf auth login' instead.

If you don't want to use a Google Colab or a Jupyter Notebook, you need to use this command instead: `huggingface-cli login`

Then, we simply need to run `mlagents-push-to-hf`.

And we define 4 parameters:

1. `--run-id`: the name of the training run id.
2. `--local-dir`: where the agent was saved, it‚Äôs results/<run_id name>, so in my case results/First Training.
3. `--repo-id`: the name of the Hugging Face repo you want to create or update. It‚Äôs always <your huggingface username>/<the repo name>
If the repo does not exist **it will be created automatically**
4. `--commit-message`: since HF repos are git repository you need to define a commit message.

For instance:

```sh
!mlagents-push-to-hf  \
--run-id="SnowballTarget1" \
--local-dir="./results/SnowballTarget1" \
--repo-id="ThomasSimonini/ppo-SnowballTarget"  \
--commit-message="First Push"`

In [None]:
# model should be located here...
!ls -lah /content/ml-agents/results


In [None]:

#curl -LsSf https://hf.co/cli/install.sh | bash
!hf auth login

In [None]:
# use %cd to move current notebook directory, and not just the subshell with !cd
%cd /content/ml-agents/results/
!pwd

In [None]:
#example:
"""
mlagents-push-to-hf \
--run-id="SnowballTarget1" \
--local-dir="./results/SnowballTarget1" \
--repo-id="ThomasSimonini/ppo-SnowballTarget" \
--commit-message="First Push"
"""

In [None]:
# if err:
# File "/usr/local/lib/python3.10/site-packages/httpx/_transports/default.py", line 118, in map_httpcore_exceptions
#    raise mapped_exc(message) from exc
# httpx.ReadTimeout: The read operation timed out
# be sure to locate the model directory properly...

username="seb-835"
repo_name="ppo-SnowballTarget"
local_dir="SnowballTarget1"

!mlagents-push-to-hf \
--run-id="SnowballTarget1" \
--local-dir=$local_dir \
--repo-id=$username/$repo_name \
--commit-message="first commit"


Else, if everything worked you should have this at the end of the process(but with a different url üòÜ) :



```
Your model is pushed to the hub. You can view your model here: https://huggingface.co/ThomasSimonini/ppo-SnowballTarget
```

It‚Äôs the link to your model, it contains a model card that explains how to use it, your Tensorboard and your config file. **What‚Äôs awesome is that it‚Äôs a git repository, that means you can have different commits, update your repository with a new push etc.**

But now comes the best: **being able to visualize your agent online üëÄ.**

### Watch your agent playing üëÄ

For this step it‚Äôs simple:

1. Go here: https://huggingface.co/spaces/ThomasSimonini/ML-Agents-SnowballTarget

2. Launch the game and put it in full screen by clicking on the bottom right button

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/snowballtarget_load.png" height='200' style="height:150px;" alt="Snowballtarget load"/>

1. In step 1, type your username (your username is case sensitive: for instance, my username is ThomasSimonini not thomassimonini or ThOmasImoNInI) and click on the search button.

2. In step 2, select your model repository.

3. In step 3, **choose which model you want to replay**:
  - I have multiple ones, since we saved a model every 500000 timesteps.
  - But since I want the more recent, I choose `SnowballTarget.onnx`

üëâ What‚Äôs nice **is to try with different models step to see the improvement of the agent.**

And don't hesitate to share the best score your agent gets on discord in #rl-i-made-this channel üî•

Let's now try a harder environment called Pyramids...

---

# Pyramids üèÜ

---


Download, unzip and move the environment zip file in `./training-envs-executables/linux/`



- Download the Pyramids.zip environment file

In [None]:
!wget "https://huggingface.co/spaces/unity/ML-Agents-Pyramids/resolve/main/Pyramids.zip" -O ./training-envs-executables/linux/Pyramids.zip

- unzip the executable.zip file to the desired location


In [None]:
%%capture
!unzip -d ./training-envs-executables/linux/ ./training-envs-executables/linux/Pyramids.zip

- Make sure the directory has read permissions

In [None]:
!chmod -R 755 ./training-envs-executables/linux/Pyramids/Pyramids

###  Modify the PyramidsRND config file
- Contrary to the first environment which was a custom one, **Pyramids was made by the Unity team**.
- Therefore, PyramidsRND config file exists and is in ./content/ml-agents/config/ppo/PyramidsRND.yaml
- What does "RND" in PyramidsRND Means ?  
RND stands for <b><font color='crimson'>Random Network Distillation</font></b>.  
It's a way to generate curiosity rewards.  
For more information on this technique, please read: https://medium.com/data-from-the-trenches/curiosity-driven-learning-through-random-network-distillation-488ffd8e5938

For the training, one thing has to be modified:
- The total training steps hyperparameter is too high, since we can hit the benchmark (mean reward = 1.75) in only 1M training steps.  
-> In the following file: <font color=cyan>config/ppo/PyramidsRND.yaml</font>. modify <font color='magenta'>max_steps</font>: <font color=darkviolet>1000000</font>.

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/pyramids-config.png" height='200' style="height:150px;" alt="Pyramids config"/>

As an experiment,  
try to modify some other hyperparameters.  
Unity provides very good [documentation](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-Configuration-File.md) on this topic.

... now we're ready to train our agent üî•.

### Train the agent
<i>The training will take 30 to 45min depending on your machine</i>.

In [None]:
!mlagents-learn ./config/ppo/PyramidsRND.yaml --env=./training-envs-executables/linux/Pyramids/Pyramids --run-id="Pyramids Training" --no-graphics

### Push the agent to the ü§ó Hub

- Now that we trained our agent, we‚Äôre **ready to push it to the Hub to be able to visualize it playing on your browserüî•.**

In [None]:

#curl -LsSf https://hf.co/cli/install.sh | bash
!hf auth login

In [None]:
# use %cd to move current notebook directory, and not just the subshell with !cd
%cd /content/ml-agents/results/
!pwd

In [None]:
!mlagents-push-to-hf  --run-id= # Add your run id  --local-dir= # Your local dir  --repo-id= # Your repo id  --commit-message= # Your commit message

In [None]:
#example:
"""
mlagents-push-to-hf \
--run-id="SnowballTarget1" \
--local-dir="./results/SnowballTarget1" \
--repo-id="ThomasSimonini/ppo-SnowballTarget" \
--commit-message="First Push"
"""

### Watch your agent playing üëÄ

üëâ https://huggingface.co/spaces/unity/ML-Agents-Pyramids

# üéÅ Bonus: Why not train on another environment?
Now that you know how to train an agent using MLAgents, **why not try another environment?**

MLAgents provides 17 different and we‚Äôre building some custom ones. The best way to learn is to try things of your own, have fun.



<img src="https://miro.medium.com/max/1400/0*xERdThTRRM2k_U9f.png" height='350' style="height:200px;" alt="Pyramids config"/>

You have the full list of the Unity official environments here üëâ https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Learning-Environment-Examples.md

For the demos to visualize your agent üëâ https://huggingface.co/unity

For now we have integrated:
- [Worm](https://huggingface.co/spaces/unity/ML-Agents-Worm) demo where you teach a **worm to crawl**.
- [Walker](https://huggingface.co/spaces/unity/ML-Agents-Walker) demo where you teach an agent **to walk towards a goal**.

That‚Äôs all for today. Congrats on finishing this tutorial!

The best way to learn is to practice and try stuff. Why not try another environment? ML-Agents has 17 different environments, but you can also create your own? Check the documentation and have fun!

See you on Unit 6 üî•,

## Keep Learning, Stay  awesome ü§ó