# Train Agent

Using this notebook you can:
1. train an agent to map states to action values and plot the training process
2. watch a trained agent to navigate the environment


### 1. Import packages and declare constants
If you want to watch trained agent playing, set `WEIGHTS_FILE = output/solution.pth`.

In [None]:
# import packages and set paths
from unityagents import UnityEnvironment

import torch
import numpy as np
from collections import deque, defaultdict
import matplotlib.pyplot as plt
import time
from dqn_agent import Agent     # import agent.py
from dqn import dqn
import pandas as pd

# declare directory to save weights of trained network
WEIGHTS_PATH = 'outputs/'
# declare directory and filename of trained network weights
WEIGHTS_FILE = None          

### 2. Instantiate Unity Environment
If you have and existing weights file loaded and want to watch trained agent playing, initialise the agent with parameter: `no_graphics=False`.

In [None]:
# instantiate environment make sure you replace the filename and path with one that matches your folder structure.
env = UnityEnvironment(file_name="python/Banana.exe", no_graphics=True)

# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]
action_size = brain.vector_action_space_size

env_info = env.reset(train_mode=False)[brain_name]  # reset the environment
state = env_info.vector_observations[0]  # get the current state
state_size = len(state)
# number of agents in the environment
print('Number of agents:', len(env_info.agents))


### 3. Initialise Agent
If you want to train the agent with Dueling network structure (see details in the [report](/Report.md)), initialise the agent with `duel=True` parameter.

In [None]:
# load existing model, if you've got weights
if WEIGHTS_FILE is not None:
    qnetwork_weights = torch.load(WEIGHTS_FILE) 
else:
    qnetwork_weights = None

# Initialise agent. Set duel=True if you want dueling Q
agent = Agent(state_size, action_size, seed=0, duel=True, qnetwork_weights=qnetwork_weights)  # initialise agent

### 4. Train the agent
If you have loaded existing weights and want to watch the trained agent playing. Set the `eps_start=0` and `eps_end=0` so the agent won't take random steps.

In [None]:
# train the agent
scores, steps, yellow_bananas, blue_bananas, epsilons = dqn(env, agent, WEIGHTS_PATH, brain_name, n_episodes=2000, eps_start=1, eps_end=0.01, eps_decay=0.993)
env.close()

### 5. Plot training process

In [None]:
# put observations in lists
columns = ['scores', 'steps', 'yellow_bananas', 'blue_bananas', 'epsilons']
data = [scores, steps, yellow_bananas, blue_bananas, epsilons]

# convert to dataframe
df = pd.DataFrame(dict(zip(columns, data)))

# calculate moving average
df['Yellow Bananas Moving Avg 10'] = df['yellow_bananas'].rolling(window=10).mean()
df['Blue Bananas Moving Avg 10'] = df['blue_bananas'].rolling(window=10).mean()
df.tail()

In [None]:
# Plot banana collection
plt.plot( 'Yellow Bananas Moving Avg 10', data=df, marker='', color='olive', linewidth=2)
plt.plot( 'Blue Bananas Moving Avg 10', data=df, marker='', color='blue', linewidth=2)
plt.legend()

In [None]:
# Plot scores and epsilon
fig, ax1 = plt.subplots()

# plot episode scores
ax1.set_xlabel('Episodes')
ax1.set_ylabel('Episode Scores', color='olive')
ax1.plot(df['scores'], color='olive')
ax1.tick_params(axis='y', labelcolor='olive')

# instantiate a dual y-axis for epsilon. two plots share the same x-axis
ax2 = ax1.twinx()
ax2.set_ylabel('Epsilon', color='black')  
ax2.plot(df['epsilons'], color='black')
ax2.tick_params(axis='y', labelcolor='black')

# set layout and show
fig.tight_layout() 
plt.show()