# 2024 NeurIPS - MyoChallenge

## <center> Welcome to the [**2024 NeurIPS - MyoChallenge:  Physiological Dexterity and Agility in Enhanced Humans**](https://sites.google.com/view/myosuite/myochallenge/myochallenge-2024) </center>

In the last tutorial: https://colab.research.google.com/drive/1AqC1Y7NkRnb2R1MgjT3n4u02EmSPem88#scrollTo=-mAnRvYjIS4d, you have learned how to quickly train a short policy that is likely to produce ***random actions***, and ***how to visualize it***.

This tutorial will help new comers (like you!) to get started on training a policy just that of a baseline. In this tutorial you will learn:


1.   How to start a training script that can reproduce the baseline.
2.   How to tune the reward dictionary
3.   How to access new attributes (e.g., MPL joint angles) and integrate them into your training.



# 1. Setting the environment

In [6]:
!pip install myosuite==2.5.0 --quiet
!pip install "stable-baselines3[extra]" --quiet
!pip install tqdm --quiet
!pip install mujoco==3.1.2 --quiet
!pip install sk-video --quiet
!pip install torch --quiet
!pip install opencv-python-headless
# %env MUJOCO_GL=egl
import mujoco



### Define a method to show the rendering inside the Colab

In [7]:
from IPython.display import HTML
from base64 import b64encode

def show_video(video_path, video_width = 400):

  video_file = open(video_path, "r+b").read()

  video_url = f"data:video/mp4;base64,{b64encode(video_file).decode()}"
  return HTML(f"""<video autoplay width={video_width} controls><source src="{video_url}"></video>""")

### All the `MyoSuite` imports needed to run this tutorial

In [8]:
import myosuite
from myosuite.utils import gym
import skvideo.io
import numpy as np
import os
from datetime import datetime
import time
import torch
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import SubprocVecEnv, DummyVecEnv # SubprocVecEnv is for pararrrel processing, DummyVecEnv is for sequential processing
from stable_baselines3.common.callbacks import EvalCallback, CallbackList, BaseCallback
from tqdm import tqdm_notebook as tqdm

# How to Train the Baselines?

The behavior of the baseline was not trained under one iterations, but requires multiple rounds of reward tuning via [curriculum learning](https://ronan.collobert.com/pub/2009_curriculum_icml.pdf). The basic steps to break it done follows closely to the [Lattice solution](https://github.com/amathislab/myochallenge-lattice) from last year.

1. First train the myoHand to be as close to the object as possible.
2. Train the fingers of the myoHand to wrap around the object.
3. Lift the object with myoHand.
4. Move the object and myoHand as close to MPL as possible.


### Making N parallel envs with seeding, Starting the Training, and Logging the Results

In [9]:
def make_env(env_name, idx, seed=0):
    def _init():
        env = gym.make(env_name)
        env.seed(seed + idx)
        return env
    return _init

In [13]:
env_name = 'myoChallengeBimanual-v0'
env = gym.make(env_name)
obs = env.reset()
print("Observation shape:", obs)
reward_dict = env.get_reward_dict()
print("Reward Dictionary:", reward_dict)

Observation shape: (array([ 0.0000e+00, -2.7804e-01,  1.1771e-01, -1.1772e-01,  2.7805e-01,
       -5.6432e-02,  4.5516e-01,  2.0484e-01, -2.0479e-01, -4.5521e-01,
        5.6472e-02,  2.5589e-01,  1.1482e+00, -2.5589e-01,  2.7913e-01,
        8.4482e-01, -9.9850e-02,  2.4882e-01,  3.2815e-02, -3.1963e-01,
        1.7829e-01,  1.2259e-01,  1.1385e-01,  1.5645e-01,  1.7121e-01,
        7.7256e-02,  3.1670e-02,  1.4087e-01,  1.6445e-01,  7.7208e-02,
        3.0900e-02,  1.2639e-01,  1.3858e-01,  6.5366e-02,  2.2882e-02,
        1.1718e-01,  8.9067e-02,  3.0114e-02,  2.0508e-02,  0.0000e+00,
        0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
        0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
        0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
        0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
        0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
        0.0000e+00,  0.0000e+00,  0.0000e+00

  logger.warn(


TypeError: BimanualEnvV1.get_reward_dict() missing 1 required positional argument: 'obs_dict'

In [8]:
# Check if GPU is available
print(torch.cuda.is_available())
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


# Using time to define the unique naming
start_time = time.time()
time_now = datetime.now().strftime("%Y_%m_%d_%H_%M_%S")

# Initiate N parallel envs and create them
num_cpu = 4
env_name = 'myoChallengeBimanual-v0'
envs = DummyVecEnv([make_env(env_name, i) for i in range(num_cpu)]) # This creates all envs in RAM

# Define your own log path
log_path = './MPL_baselines/policy_best_model/' + env_name + '/' + time_now + '/'

# Define how frequent you want to evaluate the model, where it is logged
eval_callback = EvalCallback(envs, best_model_save_path=log_path, log_path=log_path, eval_freq=10000, deterministic=True, render=False)

print('Begin training')

# Define your policy parameters based on your need.
policy_kwargs = {
'activation_fn': torch.nn.modules.activation.ReLU,
'net_arch': {'pi': [256, 256], 'vf': [256, 256]}
}

#start the training with PPO
model = PPO('MlpPolicy', envs, verbose=0, ent_coef= 0.001, policy_kwargs =policy_kwargs, device=device, batch_size=512)
callback = CallbackList([eval_callback])
model.learn(total_timesteps=1000000, tb_log_name=env_name + "_" + time_now, callback=callback)

True


  logger.warn(


Begin training


  logger.warn(f"{pre} is not within the observation space.")


Eval num_timesteps=40000, episode_reward=-253.59 +/- 183.68
Episode length: 250.00 +/- 0.00
New best mean reward!
Eval num_timesteps=80000, episode_reward=-181.95 +/- 114.63
Episode length: 250.00 +/- 0.00
New best mean reward!
Eval num_timesteps=120000, episode_reward=-195.82 +/- 100.65
Episode length: 250.00 +/- 0.00
Eval num_timesteps=160000, episode_reward=-248.18 +/- 83.38
Episode length: 250.00 +/- 0.00
Eval num_timesteps=200000, episode_reward=-304.17 +/- 83.20
Episode length: 250.00 +/- 0.00
Eval num_timesteps=240000, episode_reward=-353.84 +/- 59.09
Episode length: 250.00 +/- 0.00
Eval num_timesteps=280000, episode_reward=-305.64 +/- 99.96
Episode length: 250.00 +/- 0.00
Eval num_timesteps=320000, episode_reward=-321.86 +/- 93.93
Episode length: 250.00 +/- 0.00
Eval num_timesteps=360000, episode_reward=-352.38 +/- 82.83
Episode length: 250.00 +/- 0.00
Eval num_timesteps=400000, episode_reward=-144.44 +/- 113.84
Episode length: 250.00 +/- 0.00
New best mean reward!
Eval num_tim

<stable_baselines3.ppo.ppo.PPO at 0x7fc942a8a020>

## Continuing Training with Curriculum Learning.

After you have achieved a satisfying result for first step. You can use curriculum learning so policy will build upon what it has already learned in the past training. You can define the new model as:


```
model_num =   '2024_07_11_16_05_18' # Just an example, not a real policy loaded.
model = PPO.load('./MPL_baselines/policy_best_model'+ '/'+ env_name + '/' + model_num + r'/best_model', envs, verbose = 0, ent_coeff = 0.01, policy_kwargs = policy_kwargs)
```



## Visualizing the Policy

To visualize the policy, follow **Rendering your policy** in [Tutorial 1](https://colab.research.google.com/drive/1AqC1Y7NkRnb2R1MgjT3n4u02EmSPem88#scrollTo=i4w5MRvmXr8k).

In [12]:
env_name = 'myoChallengeBimanual-v0'
model_id = '2024_09_06_01_27_46_5000000'
num_cpu = 4
envs = DummyVecEnv([make_env(env_name, i) for i in range(num_cpu)])
policy_kwargs = {
'activation_fn': torch.nn.modules.activation.ReLU,
'net_arch': {'pi': [256, 256], 'vf': [256, 256]}
}
model = PPO.load('./MPL_baselines/policy_best_model' + '/' + env_name + '/' + model_id + r'/best_model', envs, verbose = 0, ent_coef = 0.01, policy_kwargs = policy_kwargs)

# Render trained policy
frames = []
env = gym.make(env_name)
for _ in range(5): # 5 random targets
    print("\n*")
    env.reset()
    ep_rewards = []
    done = False
    obs = env.reset()
    for i in range(1, 201):
        if i%10 == 0: print("=", end="")
        obs = env.obsdict2obsvec(env.obs_dict, env.obs_keys)[1]
        # get the next action from the policy
        action, _ = model.predict(obs, deterministic=True)
        geom_1_indices = np.where(env.sim.model.geom_group == 1)
        env.sim.model.geom_rgba[geom_1_indices, 3] = 0
        frame = env.sim.renderer.render_offscreen(
                            width=800,
                            height=800,
                            camera_id=1)
        frames.append(frame)
        # take an action based on the current observation
        obs, reward, done, info, _ = env.step(action)

env.close()

  logger.warn(



*
*
*
*
*

In [14]:
import cv2
import numpy as np

# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('videos/output.mp4',fourcc, 20.0, (400,400))

for i in range(len(frames)):
    out.write(cv2.cvtColor(frames[i], cv2.COLOR_RGB2BGR))

out.release()

# How to tune your reward?

Tuning reward is one of the most important part in achieving the baseline behavior, as well as getting start with the challenge. In this part, we suggest that you ***git-clone*** the myoChallenge2024 template, which will be released soon, so you don't have to adjust the source package. For now you can try cloning MyoSuite.

Step 1: Navigate to the base env
`%cd /usr/local/yourpath/myosuite/myosuite/envs/myo/myochallenge`

Step 2: Open the bimanual_v0.py

Step 3: Add or adjust the reward based on your need in the function [get_reward_dict](https://github.com/MyoHub/myosuite/blob/ec185ab58afc26711a2f18fd3e6b0cf817964f42/myosuite/envs/myo/myochallenge/bimanual_v0.py#L205C9-L205C24).

Step 4: Update your new reward term and weights in [DEFAULT_RWD_KEYS_AND_WEIGHTS](https://github.com/MyoHub/myosuite/blob/ec185ab58afc26711a2f18fd3e6b0cf817964f42/myosuite/envs/myo/myochallenge/bimanual_v0.py#L26C5-L26C33)

# How to access new parameters, such as the ones of MPL?

There are different APIs from MuJoCo to call joint angle, positions, orientation, etc. You can explore how some of these is defined in the [get_obs_dict](https://github.com/MyoHub/myosuite/blob/ec185ab58afc26711a2f18fd3e6b0cf817964f42/myosuite/envs/myo/myochallenge/bimanual_v0.py#L146).

As well as consulting the MuJoCo documentation page: https://mujoco.readthedocs.io/en/stable/APIreference/index.html