# Minecraft Reinforcement Learning on Ray cluster with Azure Machine Learning

In this notebook, we run scaled distributed reinforcement learning (RL) with Ray framework in Azure Machine Learning.<br>
This example is based on [this example](https://github.com/tsmatz/minecraft-rl-on-ray-cluster), in which the agent will learn to solve the maze in Minecraft.

Using Azure Machine Learning, the computing instances will automatically be scaled down to 0 instances when the training has completed.<br>
This example also sends logs (episode total and reward mean in each training iterations) to Azure Machine Learning workspace.

To run this notebook,

1. Create new "Machine Learning" resource in [Azure Portal](https://portal.azure.com/).
2. Install Azure Machine Learning CLI v2 on Ubuntu as follows

```
# install Azure CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
# install AML CLI extension
az extension add --name ml
```

> Note : It’s better to run on GPU for practical training. Change configuration for running this example on GPU. (This example is for getting started, and runs on CPU.)

> Note : You can now also use Python package ```ray-on-aml``` for running ray cluster on Azure Machine Learning. (See [here](https://github.com/microsoft/ray-on-aml).)

## 1. Create script for RL training (train_agent.py)

Save a script file (```train_agent.py```) for Ray RLlib training.

In [4]:
import os
script_folder = './script-minecraftrl'
os.makedirs(script_folder, exist_ok=True)

In [5]:
%%writefile script-minecraftrl/train_agent.py
import os
import ray
import ray.tune as tune
import argparse
import mlflow
import mpi4py
from mpi4py import MPI
import socket

parser = argparse.ArgumentParser()
parser.add_argument("--num_workers",
    type=int,
    required=False,
    default=1,
    help="number of ray workers")
parser.add_argument("--num_gpus",
    type=int,
    required=False,
    default=0,
    help="number of gpus")
parser.add_argument("--num_cpus_per_worker",
    type=int,
    required=False,
    default=1,
    help="number of cores per worker")
args = parser.parse_args()

# Function for stopping a learner when successful training
def stop_check(trial_id, result):
    return result["episode_reward_mean"] >= 85

# Function for logging in Azure Machine Learning workspace
# (Callback on train result to record metrics returned by trainer)
def on_train_result(info):
    mlflow.log_metrics({
        'episode_reward_mean': info["result"]["episode_reward_mean"],
        'episodes_total': info["result"]["episodes_total"]
    })

mpi_comm = MPI.COMM_WORLD
mpi_rank = mpi_comm.Get_rank()
# mpi_rank = int(os.getenv("OMPI_COMM_WORLD_RANK"))

#
# Wait for head and all workers
#
if mpi_rank == 0 :
    # wait for all workers
    for n in range(args.num_workers):
        if n != 0:
            req = mpi_comm.irecv(source=n, tag=n)
            data = req.wait()
else:
    # send ready message to head
    req = mpi_comm.isend("ready", dest=0, tag=mpi_rank)
    req.wait()

#
# start training (only on rank 0)
#
if mpi_rank == 0 :
    ray.init(address="auto")

    ray.tune.run(
        "IMPALA",
        config={
            "log_level": "WARN",
            "env": "custom_malmo_env:MalmoMazeEnv-v0",
            "num_workers": args.num_workers,
            "num_gpus": args.num_gpus,
            "num_cpus_per_worker": args.num_cpus_per_worker,
            "explore": True,
            "exploration_config": {
                "type": "EpsilonGreedy",
                "initial_epsilon": 1.0,
                "final_epsilon": 0.02,
                "epsilon_timesteps": 500000
            },
            "callbacks": {"on_train_result": on_train_result},
        },
        stop=stop_check,
        checkpoint_at_end=True,
        checkpoint_freq=2,
        local_dir='./outputs'
    )

    # broadcast completion
    data = mpi_comm.bcast({"status":"training done"}, root=0)
else:
    # receive broadcast message from head
    # (till completing job)
    print("waiting training to complete ...")
    data = mpi_comm.bcast(None, root=0)

Writing script-minecraftrl/train_agent.py


## 2. Create script for ray cluster setup (ray_setup.py)

Create a shell script for starting Ray cluster (head and workers).<br>
I assign the following roles in ray.

- Rank 0 : Ray Head
- Other Rank : Ray Worker

In [6]:
%%writefile script-minecraftrl/ray_start.sh
export LC_ALL=C.UTF-8  # needed for running Ray
if [ $OMPI_COMM_WORLD_RANK -eq 0 ]
then
  ray start --head --port=6379
else
  ray start --address="$AZ_BATCHAI_MPI_MASTER_NODE:6379" --redis-password="5241590000000000"
fi
unset LC_ALL           # removed for running Malmo
# status=$(ray status)
# if [[ -z "$status" ]]
# then
#     echo "ray not running"

Writing script-minecraftrl/ray_start.sh


## 3. Submit Job in Azure Machine Learning

### Prepare for connecting to Azure Machine Learning workspace

Login to Azure and prepare for connecting to Azure Machine Learning (AML) workspace.<br>
Please fill the following subscription id, AML workspace name, and resource group name.

In [None]:
!az login

In [None]:
!az account set -s {AZURE_SUBSCRIPTION_ID}

In [None]:
my_resource_group = "{AML_RESOURCE_GROUP_NAME}"
my_workspace = "{AML_WORSPACE_NAME}"

### Create cluster (multiple nodes)

Create a remote cluster with 3 nodes - 1 head node and 2 worker nodes.

Here we use ```Standard_D3_v2``` for VMs, but it's better to use GPU VMs for this training in practical use. (Dockerfile and pip packages should also be changed for running on GPU.)

In [7]:
!az ml compute create --name cluster01 \
  --resource-group $my_resource_group \
  --workspace-name $my_workspace \
  --type amlcompute \
  --min-instances 0 \
  --max-instances 3 \
  --size Standard_D3_v2

{
  "id": "/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourceGroups/AML-rg/providers/Microsoft.MachineLearningServices/workspaces/ws01/computes/cluster01",
  "idle_time_before_scale_down": 120,
  "location": "eastus",
  "max_instances": 3,
  "min_instances": 0,
  "name": "cluster01",
  "network_settings": {},
  "provisioning_state": "Succeeded",
  "resourceGroup": "AML-rg",
  "size": "STANDARD_D3_V2",
  "ssh_public_access_enabled": true,
  "tier": "dedicated",
  "type": "amlcompute"
}
[0m

### Create AML environment for running Minecraft RL

To generate an AML environment with custom container image, first I prepare Dockerfile.<br>
In this conatiner image, the following is installed and configured. (See [here](https://github.com/tsmatz/minecraft-rl-on-ray-cluster) for details.)

- Open MPI 3.1.2
- Ray 1.6.0 with TensorFlow 2.x backend
- Project Malmo with Minecraft (needs Java 8)
- Custom Gym environment to run Minecraft agent for Maze (see [here](https://github.com/tsmatz/minecraft-rl-on-ray-cluster/tree/master/Malmo_Maze_Sample/custom_malmo_env))
- MLflow for Azure ML logging

In [8]:
import os
context_folder = './docker-context-minecraftrl'
os.makedirs(context_folder, exist_ok=True)

In [9]:
%%writefile docker-context-minecraftrl/Dockerfile
FROM ubuntu:18.04

#
# Note : This image is configured for running on CPU
# (not configured for running on GPU)
#

WORKDIR /

# Prerequisites settings
RUN apt-get update && \
    apt-get install -y apt-utils git rsync wget bzip2 gcc g++ make

# Install Python
RUN apt-get install -y python3.6 && \
    apt-get install -y python3-pip && \
    pip3 install --upgrade pip
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.6 1

# Install Open MPI

#RUN wget -q https://www.open-mpi.org/software/ompi/v1.10/downloads/openmpi-1.10.4.tar.gz && \
#    tar -xzf openmpi-1.10.4.tar.gz && \
#    cd openmpi-1.10.4 && \
#    ./configure --prefix=/usr/local/mpi && \
#    make -j"$(nproc)" install && \
#    cd .. && \
#    rm -rf /openmpi-1.10.4 && \
#    rm -rf openmpi-1.10.4.tar.gz
#ENV PATH=/usr/local/mpi/bin:$PATH \
#    LD_LIBRARY_PATH=/usr/local/mpi/lib:$LD_LIBRARY_PATH

ENV OPENMPI_VERSION 3.1.2
RUN mkdir /tmp/openmpi && \
    cd /tmp/openmpi && \
    wget https://download.open-mpi.org/release/open-mpi/v3.1/openmpi-3.1.2.tar.gz && \
    tar zxf openmpi-3.1.2.tar.gz && \
    cd openmpi-3.1.2 && \
    ./configure --enable-orterun-prefix-by-default && \
    make -j $(nproc) all && \
    make install && \
    ldconfig && \
    rm -rf /tmp/openmpi
RUN pip3 install mpi4py

# Install Java 8 (JDK)
RUN apt-get install -y openjdk-8-jdk
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

# Install Ray with TensorFlow 2.x
RUN pip3 install gym==0.21.0 lxml numpy pillow && \
    pip3 install tensorflow==2.4.1 ray[default]==1.6.0 ray[rllib]==1.6.0 ray[tune]==1.6.0 attrs==19.1.0 pandas

# Install Desktop Components for Headless
RUN apt-get install -y xvfb && \
    echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections && \
    apt-get install -y lxde

# Install mlflow for logging
# (Fix version for python 3.6 support)
RUN pip3 install mlflow==1.23.1 azureml-mlflow==1.44.0

# Install Malmo
RUN pip3 install --index-url https://test.pypi.org/simple/ malmo==0.36.0
ENV MALMO_PATH=/malmo_package
WORKDIR $MALMO_PATH
RUN python3 -c "import malmo.minecraftbootstrap; malmo.minecraftbootstrap.download();"
ENV MALMO_XSD_PATH=$MALMO_PATH/MalmoPlatform/Schemas

WORKDIR /

# Install custom Gym env
RUN git clone https://github.com/tsmatz/minecraft-rl-on-ray-cluster
RUN cd minecraft-rl-on-ray-cluster && \
    pip3 install Malmo_Maze_Sample/

EXPOSE 6379 8265

Writing docker-context-minecraftrl/Dockerfile


Create an AML environment with above docker configuration.

In [10]:
%%writefile env_minecraft_rl.yml
$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
name: minecraft-rl-env
build:
  path: docker-context-minecraftrl

Writing env_minecraft_rl.yml


In [11]:
!az ml environment create --file env_minecraft_rl.yml \
  --resource-group $my_resource_group \
  --workspace-name $my_workspace

[32mUploading docker-context-minecraftrl (0.0 MBs): 100%|█| 2398/2398 [00:00<00:00, [0m
[39m

{
  "build": {
    "dockerfile_path": "Dockerfile",
    "path": "https://ws016125543015.blob.core.windows.net/azureml-blobstore-fd7d98c2-a2bd-44e4-8c0d-52c3bf7b2f7c/LocalUpload/7fbb1733faa9d8afa2f9b0ae7becca6d/docker-context-minecraftrl/"
  },
  "creation_context": {
    "created_at": "2022-08-23T03:35:24.386503+00:00",
    "created_by": "Tsuyoshi Matsuzaki",
    "created_by_type": "User",
    "last_modified_at": "2022-08-23T03:35:24.386503+00:00",
    "last_modified_by": "Tsuyoshi Matsuzaki",
    "last_modified_by_type": "User"
  },
  "id": "azureml:/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourceGroups/AML-rg/providers/Microsoft.MachineLearningServices/workspaces/ws01/environments/minecraft-rl-env/versions/1",
  "name": "minecraft-rl-env",
  "os_type": "linux",
  "resourceGroup": "AML-rg",
  "tags": {},
  "version": "1"
}
[0m

### Submit Job

Now let's run Minecraft RL training on Ray cluster.

This will launch Minecraft instance (process) with specific port, when it starts. So make sure that no node is running in starting. (When you run instance twice in the same node, the training will fail.)

> Note : For the first time to run, it builds docker image and then takes a long time to start training. (Once it's registered, it can speed up to start.)

In [12]:
%%writefile train_minecraft_rl.yml
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code: script-minecraftrl
command: |
  bash ray_start.sh
  python train_agent.py --num_workers ${{inputs.num_workers}} --num_gpus ${{inputs.num_gpus}} --num_cpus_per_worker ${{inputs.num_cpus_per_worker}}
  ray stop
inputs:
  num_workers: 3
  num_gpus: 0
  num_cpus_per_worker: 3
environment: azureml:minecraft-rl-env@latest
compute: azureml:cluster01
display_name: minecraft_rl_test
experiment_name: minecraft_rl_test
resources:
  instance_count: 3
distribution:
  type: mpi
  process_count_per_instance: 1
description: Minecraft RL in Ray cluster

Writing train_minecraft_rl.yml


In [13]:
!az ml job create --file train_minecraft_rl.yml \
  --resource-group $my_resource_group \
  --workspace-name $my_workspace

[32mUploading script-minecraftrl (0.0 MBs): 100%|█| 2866/2866 [00:00<00:00, 94838.50[0m
[39m

{
  "code": "azureml:/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourceGroups/AML-rg/providers/Microsoft.MachineLearningServices/workspaces/ws01/codes/4e30d67a-617a-4d08-b012-6a0d0fc4e11a/versions/1",
  "command": "bash ray_start.sh\npython train_agent.py --num_workers ${{inputs.num_workers}} --num_gpus ${{inputs.num_gpus}} --num_cpus_per_worker ${{inputs.num_cpus_per_worker}}\nray stop\n",
  "compute": "azureml:cluster01",
  "creation_context": {
    "created_at": "2022-08-23T03:37:19.631744+00:00",
    "created_by": "Tsuyoshi Matsuzaki",
    "created_by_type": "User"
  },
  "description": "Minecraft RL in Ray cluster",
  "display_name": "minecraft_rl_test",
  "distribution": {
    "process_count_per_instance": 1,
    "type": "mpi"
  },
  "environment": "azureml:minecraft-rl-env:1",
  "environment_variables": {},
  "experiment_name": "minecraft_rl_test",
  "id": "azureml:/subscri

Go to [Azure Machine Learning studio](https://ml.azure.com/), and see driver's log on rank 0.<br>
You will find that it shows the log outputs (such as, progressing episode count, reward mean) in each training iterations.

![driver log](./azureml_minecraft_rl_ray_cluster/driver_log.jpg)

When you wait for a while, you will also see the trained parameter's results, called checkpoint, in the outputs. (You can also check all progressing results in ```progress.csv```.)

![checkpoint result](./azureml_minecraft_rl_ray_cluster/checkpoint_output.jpg)

**This training requires about 1 day for completion when it's run on GPU.**<br>
Please cancel this job, if you don't need to continue.

### Remove cluster (Clean-up)

In [None]:
!az ml compute delete --name cluster01 \
  --resource-group $my_resource_group \
  --workspace-name $my_workspace \
  --yes