# Reinforcement Learning Project

## Setup

To be able to run this notebook properly please make sure to install the pettingzoo package and dependencies. This can be done by running the following command

`pip install pettingzoo[mpe]`

### Imports

In [1]:
from pettingzoo.mpe import simple_world_comm_v2
import random
import numpy as np

### Environment Initialisation

In [2]:
MAX_CYCLES = 250
NUM_OF_EPISODES = 10


env = simple_world_comm_v2.env(num_good=2, num_adversaries=4, num_obstacles=1,
                num_food=2, max_cycles=MAX_CYCLES, num_forests=2, continuous_actions=False)
env.reset()

### Policy Function

In [3]:
def random_policy(actions):
    return random.randint(0, actions-1)

### Running the environment

The `env.render(mode='human')` call will pop open a new window that shows the environment at each time step.

On my machine at least this window can only be closed while the cell is running but then freezes and is unable to be closed afterwards. In these cases restarting the kernel closed the window and any others which may have been opened due to running the cell multiple times.

Eventually running the cell enough times without restarting the kernal will cause the render call to throw an exception and not run. In this case just restart the kernal and it will begin working again.

In [4]:
print(f"Agents: {env.agents}")
print()
agent_mapping = {k: v for v, k in enumerate(env.agents)}
reward_array = np.zeros((NUM_OF_EPISODES,len(env.agents)))
cumulative_reward = np.zeros(len(env.agents))

for episode in range(NUM_OF_EPISODES):
    env.reset()
    for agent in env.agent_iter():
        observation, reward, done, info = env.last()
        cumulative_reward[agent_mapping[agent]] += reward
        
        
    #     Various print statements which can help see what is returned at each step.
    #     Commented out due to how verbose they are. Try toggling them on one at a time.
#         print(f"Current Agent: {env.agent_selection}")
    #     print(f"Obs: {observation}")
#         print(f"Rew: {reward}")      
    #     print(f"Done: {done}")
    #     print(f"Info: {info}")

    #     Renders the environment for each step in a seperate window.
        env.render(mode='human')

    #     Steps the environment forward.
        if done:
            env.step(None)
            reward_array[episode,agent_mapping[agent]] = cumulative_reward[agent_mapping[agent]]
        else:
            env.step(random_policy(env.action_space(agent).n))

Agents: ['leadadversary_0', 'adversary_0', 'adversary_1', 'adversary_2', 'agent_0', 'agent_1']



### Print Reward Array

In [5]:
print(reward_array)

[[-1.36388914e+00  7.41530256e+00 -7.29109056e+00 -1.52976672e+01
  -1.57283010e+03 -3.29932329e+03]
 [-1.32408110e+01 -1.94953344e+01 -3.32330459e+01 -5.08807089e+01
  -5.25807207e+03 -5.20731062e+03]
 [-7.55474508e+01 -6.08977836e+01 -6.50771838e+01 -9.59135860e+01
  -8.95645338e+03 -7.28613425e+03]
 [-1.24291135e+02 -9.31254460e+01 -8.96201471e+01 -1.19633351e+02
  -9.34300684e+03 -8.59137560e+03]
 [-1.31170180e+02 -9.54509942e+01 -9.65099216e+01 -1.22177482e+02
  -1.27656268e+04 -8.94805662e+03]
 [-1.95002646e+02 -1.27302022e+02 -1.32037098e+02 -1.40077209e+02
  -1.30886361e+04 -1.49005916e+04]
 [-2.29094366e+02 -1.51703790e+02 -1.53673958e+02 -1.47199030e+02
  -1.43516085e+04 -1.49495488e+04]
 [-2.50047581e+02 -2.09885983e+02 -2.00458090e+02 -1.53417977e+02
  -1.74972131e+04 -1.72914900e+04]
 [-2.55803701e+02 -2.12969261e+02 -1.82523156e+02 -1.35931579e+02
  -1.82995853e+04 -1.76165589e+04]
 [-2.75444274e+02 -2.31780948e+02 -2.25544091e+02 -1.58941131e+02
  -1.84031584e+04 -1.7961