# Navigation

---

*Jun Zhu, zhujun981661@gmail.com, 07.2020*

In this notebook, I present the solution for the first project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893).


### 1. Start the Environment

In [None]:
from unityagents import UnityEnvironment
import numpy as np

!pip install matplotlib

import matplotlib.pyplot as plt
%matplotlib inline

import torch
import torch.nn as nn
import torch.nn.functional as F

from dqn_agent import DqnAgent
from utilities import check_environment, play

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Banana.app"`
- **Windows** (x86): `"path/to/Banana_Windows_x86/Banana.exe"`
- **Windows** (x86_64): `"path/to/Banana_Windows_x86_64/Banana.exe"`
- **Linux** (x86): `"path/to/Banana_Linux/Banana.x86"`
- **Linux** (x86_64): `"path/to/Banana_Linux/Banana.x86_64"`
- **Linux** (x86, headless): `"path/to/Banana_Linux_NoVis/Banana.x86"`
- **Linux** (x86_64, headless): `"path/to/Banana_Linux_NoVis/Banana.x86_64"`

For instance, if you are using a Mac, then you downloaded `Banana.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Banana.app")
```

In [None]:
env = UnityEnvironment(file_name="Banana_Linux/Banana.x86_64")

Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

### 2. Examine the State and Action Spaces

The simulation contains a single agent that navigates a large environment.  At each time step, it has four actions at its disposal:
- `0` - walk forward 
- `1` - walk backward
- `2` - turn left
- `3` - turn right

The state space has `37` dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.  A reward of `+1` is provided for collecting a yellow banana, and a reward of `-1` is provided for collecting a blue banana. 

Run the code cell below to print some information about the environment.

In [None]:
brain_name, state_size, action_size = check_environment(env)
brain_name

### 3. Run the environment with random actions

In [None]:
# play(env, brain_name)

### 4. Train an agent with deep-Q network (DQN)

The improvements include:
- Double DQN

In [None]:
class BananaBrainQNetwork(nn.Module):
    def __init__(self):
        super().__init__()

        # state_space = 37
        # action_space = 4
        self._fc1 = nn.Linear(state_space, 128)
        self._fc2 = nn.Linear(128, 32)
        self._fc_final = nn.Linear(32, action_space)

    def forward(self, state):
        x = state
        for fc in (self._fc1, self._fc2):
            x = F.relu(fc(x))
        return self._fc_final(x)
    

In [None]:
actions = np.arange(action_size)

model = BananaBrainQNetwork()

dqn_agent = DqnAgent(model, actions, 
                     replay_memory_size=100000,
                     double_dqn=True)

scores = dqn_agent.train(env,
                         n_episodes=2000,
                         batch_size=64,
                         target_network_update_frequency=4,
                         gamma=0.99,
                         learning_rate=5e-4,
                         output_frequency=50,
                         save_frequency=100,
                         target_score=13)

Visualize the learning history.

In [None]:
_, ax = plt.subplots(1, 1, figsize=(12, 6))
ax.plot(scores)
ax.set_xlabel("Episode", fontsize=16)
ax.set_ylabel("Score", fontsize=16)

In [None]:
dqn_agent.play(env)

In [None]:
# env.close()

### 5. Further improvement