# Using Gym Interface with an ML-Agents app

We will use our earlier Tanks project. I'm including a built Tanks executable. You can build your own within the Unity editor by going to File -> Build Settings. Will go through the steps quickly in the lecture.

Once we have an executable, we can interface with it using the mlagents gym wrapper. This exposes the Tanks app data using the standard gym interface, and we can use it like we do with other gym environments like cartpole, etc.

To get started, you will need to have installed the python mlagents package, and the gym_unity package using pip and your virtual environment.

See here for the official documentation:
https://github.com/Unity-Technologies/ml-agents/tree/main/gym-unity

NOTE: The gym and python interface for mlagents has changed a bit recently. Depending on the version you have installed, the code may differ! Below I am using mlagents 0.25 (which should install mlagents-envs 0.25), and gym-unity 0.25

In [1]:
import gym
from mlagents_envs.environment import UnityEnvironment
from gym_unity.envs import UnityToGymWrapper


ML Agents has a low level Python interface we'll look at later. We first use this to open the Tanks environment, then we use the `UnityToGymWrapper` on it. When you run the code below, you should see the environment open as a tiny window.

In [2]:
# specify the path to the executable here (relative to where python or jupyter is running)
filename = "./visual-test-0/tanks-ml-tutorial.exe"  

# NOTE: this is the new sytax as of version 0.25 (earlier syntax was slightly different)
unity_env = UnityEnvironment(filename)
gym_env = UnityToGymWrapper(unity_env)


2021-03-26 09:32:44 INFO [environment.py:113] Connected to Unity environment with package version 1.0.5 and communication version 1.0.0
2021-03-26 09:32:44 INFO [environment.py:282] Connected new brain:
VisualTank1?team=0


At this point, you can use tanks_env like any other gym environment. You should have seen some log output with some info and maybe warnings. We're just showing the process, so ignore.

In [3]:
gym_env.reset()
for _ in range(100):
    gym_env.render()
    gym_env.step(gym_env.action_space.sample())
    

If you look at the tiny unity window that opened earlier, you should see the agent performing random actions. In the code above, we're running 100 steps, rendering the frames, and sampling random actions from the agent's action space.

This was just a test. Instead of running random actions, you would add your algorithm code here like any other gym environment. You can use algorithms from OpenAI baselines, Google Dopamine, or any DRL stuff in Python here. This also means you could use PyTorch, Tensorflow or anything you want, and use it. There are DRL frameworks you can experiment with. Since Gym is a common interface supported by DRL libraries, you can experiment quite a bit -- since ALL Unity ML Agents builds are compatible with Gym, ANY environment you build with it can be used for experimentation by other researchers who may not be familiar with Unity. They only need to know Python and the common Gym interface!

Remember earlier we were calling `mlagents-learn` and setting hyperparameters in a `config.yaml` file. The algorithm was a black box and we were defaulting to PPO. The PPO algorithm that mlagents uses is in Python -- if you look in the mlagents repo we downloaded earlier, there are some nested subfolders in the repo at `ml-agents/ml-agents/mlagents/trainers` which have the PPO code, along with the other Python code that was being run for us when we call the `mlagents-learn` command. Using gym, we wouldn't be forced to use their code, we could write our own or use open source stuff.

---------------------------

Now we're going to take a look at the ML Agents low level Python API a bit. It is conceptually similar to the gym interface, with some differences in syntax.

One thing to note about the mlagents gym wrapper, there are some limitations. See the documentation page for details. But one big limitation is we can only use it for training a single agent at a time. If you want to do MARL (multi-agent reinforcement learning) algorithms like DDPG, etc, you can't. Another relevant limitation is stacked observations are not currently supported. In Tanks, we used stacked observations, so this would not work (we would need to go back into Unity, change the environment to remove it, and re-build the executable).

If we use the ML Agents low level Python API, we don't have those limitations.

In [6]:
# we have already imported everything earlier
# we were using the gym wrapper on top of the environment we created with UnityEnvironment()
# now, we'll just use it directly

unity_env.reset()
# this is the same as in gym

behavior_name = list(unity_env.behavior_specs.keys())[0]

print(behavior_name)



VisualTank1?team=0


Each behavior_spec represents one of the "Behavior Parameters" in Unity (whatever you named it).

Only the ones which are set for training will show up here, with the name given in their Behavior Parameters. Above, the behavior parameters were named "VisualTank". If different teams are set (for multiagent), the team number will be part of the behavior_spec name (the "?team=0" appended above)

In [35]:
bs = unity_env.behavior_specs[behavior_name]
# get the behavior spec for VisualTank

obs_spec, act_spec = bs
# get the observation info and action info and print it out to look at
print("observations list: ", len(obs_spec))
print(obs_spec, "\n")
print("actions list: ", len(act_spec))
print(act_spec)

observations list:  3
[ObservationSpec(shape=(84, 84, 3), dimension_property=(<DimensionProperty.UNSPECIFIED: 0>, <DimensionProperty.UNSPECIFIED: 0>, <DimensionProperty.UNSPECIFIED: 0>), observation_type=<ObservationType.DEFAULT: 0>, name=''), ObservationSpec(shape=(84, 84, 3), dimension_property=(<DimensionProperty.UNSPECIFIED: 0>, <DimensionProperty.UNSPECIFIED: 0>, <DimensionProperty.UNSPECIFIED: 0>), observation_type=<ObservationType.DEFAULT: 0>, name=''), ObservationSpec(shape=(12,), dimension_property=(<DimensionProperty.UNSPECIFIED: 0>,), observation_type=<ObservationType.DEFAULT: 0>, name='')] 

actions list:  2
Continuous: 0, Discrete: (3, 3, 3)


This is using the visual agent, which has 2 visual observations (pixel inputs) of 84x84. We can see the shape is 84, 84, 3. The 3 is because we have 3 color channels (R,G,B).

You may remember the CameraSensor and RenderTextureSensors in Unity were "automatic", in that you didn't need to set the number of observations in the Behavior Parameters. ML Agents kept these sensor observations separate from our own. If you drill down and look in the list of observation specs above, there are two with (84,84,3) (the camera sensor and render texture sensor) and another with size 12. The size 12 observations are the vector observations I used (6 observations, stacked by 2 = 12).

For the actions, we see 0 Continuous actions. We also see Discrete actions (3,3,3). If you remember, we had 3 actions, each with 3 choices.

(nothing, forward, back),
(nothing, left, right),
(nothing, fire weak, fire strong).

They have both continuous and discrete actions both listed, because I believe mlagents is going to be adding the ability for agents to have both types of actions together (vice choosing only one type)

In [34]:
print(obs_spec[0], "\n")
print(obs_spec[1], "\n")
print(obs_spec[2], "\n")

ObservationSpec(shape=(84, 84, 3), dimension_property=(<DimensionProperty.UNSPECIFIED: 0>, <DimensionProperty.UNSPECIFIED: 0>, <DimensionProperty.UNSPECIFIED: 0>), observation_type=<ObservationType.DEFAULT: 0>, name='') 

ObservationSpec(shape=(84, 84, 3), dimension_property=(<DimensionProperty.UNSPECIFIED: 0>, <DimensionProperty.UNSPECIFIED: 0>, <DimensionProperty.UNSPECIFIED: 0>), observation_type=<ObservationType.DEFAULT: 0>, name='')
ObservationSpec(shape=(12,), dimension_property=(<DimensionProperty.UNSPECIFIED: 0>,), observation_type=<ObservationType.DEFAULT: 0>, name='')


Above, we print out each observation group so you can take a better look.

In [36]:
ds, ts = unity_env.get_steps(behavior_name)
# get data returned from current step
# return value is a tuple of (DecisionSteps, TerminalSteps)


DecisionSteps contains data for all agents which are waiting for an action. TerminalSteps contains data for all agents which are on their last step of their episode. If you are handling multi-agent, you would need to be tracking each agent's data and episodes. Since they may end episodes at different steps, you would need to manage this when writing your algorithm.

In [45]:
print(list(ds[0]))
# print out one agent's decision step data
# the step data includes our agent's data, including
# observations, reward, agent_id, action masks,
# and 2 others i'm not sure about right now
# (sorry, they didnt exist in earlier version, will need to investigate)

obs, reward, agent_id, masks, something, something_else = ds[0]

[[array([[[0.64705884, 0.4627451 , 0.34509805],
        [0.62352943, 0.44705883, 0.3372549 ],
        [0.627451  , 0.44705883, 0.3372549 ],
        ...,
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ]],

       [[0.627451  , 0.4509804 , 0.3372549 ],
        [0.6392157 , 0.45882353, 0.34509805],
        [0.6431373 , 0.45882353, 0.34509805],
        ...,
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ]],

       [[0.64705884, 0.4627451 , 0.34509805],
        [0.6509804 , 0.46666667, 0.34901962],
        [0.6784314 , 0.48235294, 0.35686275],
        ...,
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ]],

       ...,

       [[0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        [0.        ,

In [41]:
print(obs)
# the observation data, includes pixel data for 2 images 84x84x3, and 12 vector observations

[array([[[0.64705884, 0.4627451 , 0.34509805],
        [0.62352943, 0.44705883, 0.3372549 ],
        [0.627451  , 0.44705883, 0.3372549 ],
        ...,
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ]],

       [[0.627451  , 0.4509804 , 0.3372549 ],
        [0.6392157 , 0.45882353, 0.34509805],
        [0.6431373 , 0.45882353, 0.34509805],
        ...,
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ]],

       [[0.64705884, 0.4627451 , 0.34509805],
        [0.6509804 , 0.46666667, 0.34901962],
        [0.6784314 , 0.48235294, 0.35686275],
        ...,
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ]],

       ...,

       [[0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        [0.        , 

In [46]:
print(reward)
# reward for current step

-0.0011


In [47]:
print(agent_id)
# when dealing with multi-agent, we can track agents by ID number

0


In [44]:
print(masks)
# If you recall, for discrete actions, we can mask actions.
# True means the action is masked (cannot be performed on this step for whatever reason)
# In Tanks, if the agent cannon is not ready, it masks the fire actions.

[array([False, False, False]), array([False, False, False]), array([False,  True,  True])]


In [52]:
import numpy as np

agent_actions = np.array(((0.),(1,0,0)))
# manually set up an action to perform
# we need to use numpy arrays
# note that there are 2 agents in this environment, so we need to provide actions for both
# actions in Tanks are 3 actions, so we specify 2 arrays of 3 values above.

unity_env.set_actions(behavior_name, agent_actions)

AttributeError: 'numpy.ndarray' object has no attribute 'continuous'

In [59]:
unity_env.step()
# slightly different from gym interface here
# we set the data using env.set_actions() then call step()
# but concept is the same

Using the mlagents low level Python API, you can do similar things you would with gym. The API is more involved, but there are also less limitations.

Warning - the Python API has changed a few times, and it looks like they changed it between when I first started writing this, and now (3/26). So be aware that when you pip install, it grabs the current version. When the API changes, they mention the changes in their github repo changelog.

The set_actions code above changed, and I didn't have time to investigate the new syntax, so the previous cell doesn't work yet. Will update when I have more time.

In [60]:
unity_env.close()  # make sure you close the env when done

2021-03-26 13:19:42 INFO [environment.py:429] Environment shut down with return code 0 (CTRL_C_EVENT).
