# Unity ML-Agents Toolkit
## Environment Basics
This notebook contains a walkthrough of the basic functions of the Python API for the Unity ML-Agents toolkit. For instructions on building a Unity environment, see [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Getting-Started-with-Balance-Ball.md).

### 1. Set environment parameters

Be sure to set `env_name` to the name of the Unity environment file you want to launch. Ensure that the environment build is in `../envs`.

In [1]:
env_name = "../envs/GridWorld"  # Name of the Unity environment binary to launch
train_mode = True  # Whether to run the environment in training or inference mode

### 2. Load dependencies

The following loads the necessary dependencies and checks the Python version (at runtime). ML-Agents Toolkit (v0.3 onwards) requires Python 3.

In [2]:
import matplotlib.pyplot as plt
import numpy as np
import sys

from mlagents_envs.environment import UnityEnvironment
from mlagents_envs.side_channel.engine_configuration_channel import EngineConfig, EngineConfigurationChannel

%matplotlib inline

print("Python version:")
print(sys.version)

# check Python version
if (sys.version_info[0] < 3):
    raise Exception("ERROR: ML-Agents Toolkit (v0.3 onwards) requires Python 3")

Python version:
3.6.3 (v3.6.3:2c5fed8, Oct  3 2017, 18:11:49) [MSC v.1900 64 bit (AMD64)]


### 3. Start the environment
`UnityEnvironment` launches and begins communication with the environment when instantiated.

Environments contain _brains_ which are responsible for deciding the actions of their associated _agents_. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
engine_configuration_channel = EngineConfigurationChannel()
env = UnityEnvironment(base_port = 5006, file_name=env_name, side_channels = [engine_configuration_channel])

#Reset the environment
env.reset()

# Set the default brain to work with
group_name = env.get_agent_groups()[0]
group_spec = env.get_agent_group_spec(group_name)

# Set the time scale of the engine
engine_configuration_channel.set_configuration_parameters(time_scale = 3.0)

UnityEnvironmentException: Couldn't launch the GridWorld environment. Provided filename does not match any environments.

### 4. Examine the observation and state spaces
We can reset the environment to be provided with an initial set of observations and states for all the agents within the environment. In ML-Agents, _states_ refer to a vector of variables corresponding to relevant aspects of the environment for an agent. Likewise, _observations_ refer to a set of relevant pixel-wise visuals for an agent.

In [None]:
# Get the state of the agents
step_result = env.get_step_result(group_name)

# Examine the number of observations per Agent
print("Number of observations : ", len(group_spec.observation_shapes))

# Examine the state space for the first observation for all agents
print("Agent state looks like: \n{}".format(step_result.obs[0]))

# Examine the state space for the first observation for the first agent
print("Agent state looks like: \n{}".format(step_result.obs[0][0]))

# Is there a visual observation ?
vis_obs = any([len(shape) == 3 for shape in group_spec.observation_shapes])
print("Is there a visual observation ?", vis_obs)

# Examine the visual observations
if vis_obs:
    vis_obs_index = next(i for i,v in enumerate(group_spec.observation_shapes) if len(v) == 3)
    print("Agent visual observation look like:")
    obs = step_result.obs[vis_obs_index]
    plt.imshow(obs[0,:,:,:])


### 5. Take random actions in the environment
Once we restart an environment, we can step the environment forward and provide actions to all of the agents within the environment. Here we simply choose random actions based on the `action_space_type` of the default brain.

Once this cell is executed, 10 messages will be printed that detail how much reward will be accumulated for the next 10 episodes. The Unity environment will then pause, waiting for further signals telling it what to do next. Thus, not seeing any animation is expected when running this cell.

In [None]:
for episode in range(10):
    env.reset()
    step_result = env.get_step_result(group_name)
    done = False
    episode_rewards = 0
    while not done:
        action_size = group_spec.action_size
        if group_spec.is_action_continuous():
            action = np.random.randn(step_result.n_agents(), group_spec.action_size)
            
        if group_spec.is_action_discrete():
            branch_size = group_spec.discrete_action_branches
            action = np.column_stack([np.random.randint(0, branch_size[i], size=(step_result.n_agents())) for i in range(len(branch_size))])
        env.set_actions(group_name, action)
        env.step()
        step_result = env.get_step_result(group_name)
        episode_rewards += step_result.reward[0]
        done = step_result.done[0]
    print("Total reward this episode: {}".format(episode_rewards))

### 6. Close the environment when finished
When we are finished using an environment, we can close it with the function below.

In [None]:
env.close()

In [33]:
import time
import numpy as np

In [41]:
a = time.time()
for i in range(200000):
    mask = [False, False, False]
    done = [False, True, False]
    mask = [a or b for a, b in zip(mask, done)]
b = time.time()
print(b - a)

0.16900968551635742


In [62]:
actions = np.array([np.random.randn(2) for _ in range(3)])
print(actions)
print(type(actions))
print(type(actions[0]))

[[ 1.18682235  0.39773558]
 [-0.78240184  0.54044548]
 [ 0.42176573 -1.41088708]]
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


In [76]:
iterable = (np.random.randn(2) for _ in range(3))
np.fromiter(iterable,float,3)

ValueError: setting an array element with a sequence.

In [77]:
import numpy as np

arr_1 = np.arange(6).reshape((2, 3))
print('arr 1')
print(arr_1)
print(arr_1.dtype)
print(arr_1.shape)

arr_2 = np.array([np.arange(3), np.arange(4, 6)])
print('\narr 2')
print(arr_2)
print(arr_2.dtype)
print(arr_2.shape)

arr 1
[[0 1 2]
 [3 4 5]]
int32
(2, 3)

arr 2
[array([0, 1, 2]) array([4, 5])]
object
(2,)


In [80]:
arr = np.array([np.array([0,0]) for _ in range(3)])
print(arr)
print(arr.dtype)
print(arr.shape)

[[0 0]
 [0 0]
 [0 0]]
int32
(3, 2)


In [86]:
done = [False, False, True]
actions = np.array([np.random.randn(2) if not done[i] else np.array([0,0]) for i in range(3)])
print(actions)

[[0.49622728 0.02465189]
 [0.88064911 0.35953357]
 [0.         0.        ]]


In [100]:
done = [False, True, False]
reward = np.array([0.1,0.1,0.1])
score = np.array([1,1,1])

reward 


[list([1, 2]) list([2, 3]) list([3, 4, 5])] <class 'numpy.ndarray'> (3,)


In [155]:
class Network:
    def __init__(self, param):
        self.param = param
        
class Game:
    def __init__(self):
        pass
    
    def start(self, nets):
        return [nets[i].param for i in range(len(nets))]

n_agents = 12
n_networks = 100
networks = [Network(param=i) for i in range(n_networks)]

game = Game()
result = []
for i in range(0,n_networks,n_agents):
    result += game.start(networks[i:i+n_agents])
    print(game.start(networks[i:i+n_agents]))
print(result)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
[24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
[36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]
[48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59]
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71]
[72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83]
[84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95]
[96, 97, 98, 99]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]


In [227]:
lista = [[1,2,3],[4,5,6]]
listb = np.mean(lista, axis=0)
listc = np.mean(lista, axis=0)
print(np.concatenate((listb, listc)))



[2.5 3.5 4.5 2.5 3.5 4.5]


In [30]:
import numpy as np
arr = np.array([1.0])
arr = np.append(arr, 2.0)
print(arr)

[1. 2.]
