# Navigation

---

In this notebook, we will train and evaluate our model for the first project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893).

### 1. Start the Environment

We begin by importing some necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np
import gym
import matplotlib.pyplot as plt
%matplotlib inline

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Banana.app"`
- **Windows** (x86): `"path/to/Banana_Windows_x86/Banana.exe"`
- **Windows** (x86_64): `"path/to/Banana_Windows_x86_64/Banana.exe"`
- **Linux** (x86): `"path/to/Banana_Linux/Banana.x86"`
- **Linux** (x86_64): `"path/to/Banana_Linux/Banana.x86_64"`
- **Linux** (x86, headless): `"path/to/Banana_Linux_NoVis/Banana.x86"`
- **Linux** (x86_64, headless): `"path/to/Banana_Linux_NoVis/Banana.x86_64"`

For instance, if you are using a Mac, then you downloaded `Banana.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Banana.app")

In [2]:
env = UnityEnvironment(file_name="/home/mirshad7/deep-reinforcement-learning/p2_continuous-control/Reacher_Linux_NoVis/Reacher.x86_64")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_speed -> 1.0
		goal_size -> 5.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


### 2. Define the Agent 
We first define and intialize a navigation agent based on Deep Q Network. This agent is directly imported here from nav_agent.py script. Please refer to nav_agent.py to define a new agent or modify the existing one 

In [3]:
from ddpg_agent import Agent

brain_name = env.brain_names[0]
brain = env.brains[brain_name]
env_info = env.reset(train_mode=True)[brain_name]
state_dim = env_info.vector_observations.shape[1]
action_dim = brain.vector_action_space_size
# number of agents
num_agents = len(env_info.agents)

agent = Agent(state_dim=state_dim, action_dim=action_dim,num_agents = num_agents, seed=np.random.randint(100))

### 3. Train the agent 
Run the code below to train the agent from scratch


In [4]:
from collections import deque
import torch
import numpy as np
import math
import time


def interact(env, brain_name, agent, num_agents, num_episodes=200, window=30, max_iter=1000):
    scores = []
    scores_window = deque(maxlen=window)
    # get the default brain of UnityML Agents
    # brain_name = env.brain_names[0]
    # brain = env.brains[brain_name]
    for i_episode in range(1, num_episodes+1):
        # Reset env and get current state
        env_info = env.reset(train_mode=True)[brain_name]
        states = env_info.vector_observations
        score = np.zeros(num_agents)
        agent.reset()
        for t in range(max_iter):
            actions = agent.act(states)
            env_info = env.step(actions)[brain_name]
            next_states = env_info.vector_observations
            rewards = env_info.rewards
            dones = env_info.local_done
            agent.step(states, actions, rewards, next_states, dones)
            states = next_states
            score += rewards
            if any(dones):
                break
        scores.append(np.mean(score))
        scores_window.append(np.mean(score))
        print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)))
        torch.save(agent.actor_local.state_dict(), './logging/checkpoint_actor.pth')
        torch.save(agent.critic_local.state_dict(), './logging/checkpoint_critic.pth')
        if np.mean(scores_window)>30:
            scores_filename = "./logging/ddpg_agent_" +str(i_episode) + ".csv"
            np.savetxt(scores_filename, scores, delimiter=",")
        if i_episode % window == 0:
            print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)))
    return scores

scores = interact(env, brain_name, agent)

Episode 100	Average Score: 0.93
Episode 200	Average Score: 3.28
Episode 300	Average Score: 7.64
Episode 400	Average Score: 10.67
Episode 500	Average Score: 12.39
Episode 600	Average Score: 13.14
Episode 700	Average Score: 13.53
Episode 800	Average Score: 14.38

Environment solved in 774 episodes!	Average Score: 15.00


In [3]:
from plot import plot_results

plot_results(baseline_score=13)

### 3. Watch a smart agent
Run the code below to watch a smart agent navigating inside the enviornment


In [4]:
import gym
from nav_agent import Agent
import numpy as np
from unityagents import UnityEnvironment
import matplotlib.pyplot as plt
from monitor import interact
import torch

env = UnityEnvironment(file_name="/home/mirshad7/deep-reinforcement-learning/p1_navigation/Banana_Linux/Banana.x86")
# reset env and extract state_dim and action_dim
brain_name = env.brain_names[0]
brain = env.brains[brain_name]
env_info = env.reset(train_mode=True)[brain_name]
state_dim = len(env_info.vector_observations[0])
action_dim = brain.vector_action_space_size

agent = Agent(state_dim=state_dim, action_dim=action_dim, seed=0)

#Watch a smart agent 
agent.qNetwork_local.load_state_dict(torch.load('./logging/nav_dqn_model_20200517-174735.pth'))
  
state = env_info.vector_observations[0]                # get the current state (for each agent)
score = 0                                             # initialize the score (for each agent)
while True:
    action = agent.act(state)                          # select an action (for each agent)
    env_info = env.step(action)[brain_name]            # send all actions to tne environment
    next_state = env_info.vector_observations[0]       # get next state (for each agent)
    reward = env_info.rewards[0]                       # get reward (for each agent)
    dones = env_info.local_done[0]                     # see if episode finished
    score += env_info.rewards[0]                         # update the score (for each agent)
    state = next_state                               # roll over states to next time step
    if np.any(dones):                                  # exit loop if episode finished
        break
print('Total score (averaged over agents) this episode: {}'.format(score))

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Total score (averaged over agents) this episode: 13.0


In [5]:
env.close()