# Navigation

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the first project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893).

### 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```

In [None]:
from unityagents import UnityEnvironment
from collections import deque
import torch
import numpy as np
import random
import matplotlib.pyplot as plt
from dqn_agent import Agent

In [None]:
# define environment which is BANANA collector.
env = UnityEnvironment(file_name="./Banana_Linux/Banana.x86_64")

Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [None]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 1. Banana environment brief description
There is 1 agent to train.

The number of actions are 4. (which are move forward/backwoard turn right/left.)

The state space has 37 dimensions and contains the agent's velocity, alogn with ray-based perception of objects around the agent's forward direction.

In [None]:
# reset the environment (train_mode is True) (To train the agent on my environment.)
env_info = env.reset(train_mode=True)[brain_name]

# name of brain
print('The name of brain: ', brain_name)

# number of agents in the environment
print('Number of agents:', len(env_info.agents))

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the state space 
state = env_info.vector_observations[0]
print('States look like:', state)
state_size = len(state)
print('States have length:', state_size)


In [None]:

# Define agent
agent = Agent(state_size = state_size, action_size = action_size, seed=40)

### Take random actions on the given environment
I stored the scores for every episodes and its mean values.
The learning will be done when the agents gets 15 reward as avgs.
And every 100*n th episodes, I will recode average score, too.

In [None]:
# From the lecture
def dqn(n_episodes=2000, max_t=1000, eps_start=1.0, eps_end=0.01, eps_decay=0.995):
    """Deep Q-Learning.
    Params
    ======
        n_episodes (int): maximum number of training episodes
        max_t (int): maximum number of timesteps per episode
        eps_start (float): starting value of epsilon, for epsilon-greedy action selection
        eps_end (float): minimum value of epsilon
        eps_decay (float): multiplicative factor (per episode) for decreasing epsilon
    """
    scores = []                        # list containing scores from each episode
    scores_window = deque(maxlen=100)  # last 100 scores
    eps = eps_start                    # initialize epsilon
    score_changes_to_plot = []
    for i_episode in range(1, n_episodes+1):
        env_info = env.reset(train_mode = True)[brain_name]
        state = env_info.vector_observations[0]        
        score = 0 # initialize score
        for t in range(max_t):
            action = agent.act(state, eps)
            # note that the return value of env.step is different with dqn solution of udacity.
            env_info = env.step(action)[brain_name] # brain info includes next_state, reward, and done info

            next_state = env_info.vector_observations
            reward = env_info.rewards[0]
            done = env_info.local_done[0]

            agent.step(state, action, reward, next_state, done)
            state = next_state
            score += reward
            if done:
                break 
        scores_window.append(score)       # save most recent score
        scores.append(score)              # save most recent score
        eps = max(eps_end, eps_decay*eps) # decrease epsilon
        score_changes_to_plot.append(np.mean(scores_window))
        print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)), end="")
        if i_episode % 100 == 0:
            print("{}th episode is passed:", i_episode)
            print('Average Score: {:.2f}'.format(i_episode, np.mean(scores_window)))
        if np.mean(scores_window)>=15.0:
            # the target score is +13. but I aimed more high scores 15.
            print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'.format(i_episode-100, np.mean(scores_window)))
            torch.save(agent.qnetwork_local.state_dict(), 'checkpoint_final.pth')
            break
    return scores, score_changes_to_plot



In [None]:
# Let's start learning!
# dqn() means the settings are ready with default value.
scores, score_changes_to_plot = dqn()

In [None]:

# plot the scores
fig = plt.figure()
ax = fig.add_subplot(111)
plt.title("Result of Project 1 Navigation")
plt.plot(np.arange(len(scores)), scores, label = "dqn agent")
plt.plot(np.arange(len(scores)), score_changes_to_plot, label = "average of scores")
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.show()

In [None]:
env.close()