# Project #2 : Continuous Control

## Part 2: Watch the Trained Agent

---

This notebook implements my solution for the Continous Control project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

I implemented the Deep Deterministic Policy Gradient (DDPG) training algorithm for the Unity Reacher environment.

The code in this notebook (Continuous_Control.ipynb), the ddpg_agent.py, and model.py files is derived from the 'ddpg-pendulum' code example used in the nanodegree program. The code was modified to successfully train both the single agent and multiple agents version of the the Unity Reacher environment. The code also implements the logic to stop the training when the environment is considered solved. The project requirement is to reach an average score of 30, measured over the last 100 training episodes. In the case of the multiple agents version, the episode score is the mean of the agents' scores of the episode.

### 1. Import libraries

In [1]:
import torch
import numpy as np
from collections import deque
#import random
import time

### 2. Load the Environment and Agent classes 

In [2]:
from unityagents import UnityEnvironment
from ddpg_agent import DDPG_Agent


   *****************
   *** Using CPU ***
   *****************



### 3. Initialize the environment

**_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Reacher.app"`
- **Windows** (x86): `"path/to/Reacher_Windows_x86/Reacher.exe"`
- **Windows** (x86_64): `"path/to/Reacher_Windows_x86_64/Reacher.exe"`
- **Linux** (x86): `"path/to/Reacher_Linux/Reacher.x86"`
- **Linux** (x86_64): `"path/to/Reacher_Linux/Reacher.x86_64"`

For instance, if you are using a Mac, then you downloaded `Reacher.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Reacher.app")
```

In [3]:
# on my iMac 
env = UnityEnvironment(file_name="Reacher.app")

# in my Udacity Workspace
#env = UnityEnvironment(file_name='/data/Reacher_Linux_NoVis/Reacher.x86_64')

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_speed -> 1.0
		goal_size -> 5.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


In [4]:
brain_name = env.brain_names[0]
brain = env.brains[brain_name]
env_info = env.reset(train_mode=False)[brain_name]     # reset the environment    
num_agents = len(env_info.agents)
action_size = brain.vector_action_space_size
states = env_info.vector_observations                  # get the current state (for each agent)
state_size = states.shape[1]

### 4. Initialize the agent 

In [5]:
agent = DDPG_Agent(state_size, action_size, num_agents)

# Load the weights generated during training
agent.actor_local.load_state_dict(torch.load("checkpoint_actor.pth", map_location=lambda storage, loc: storage))
agent.critic_local.load_state_dict(torch.load("checkpoint_critic.pth", map_location=lambda storage, loc: storage))

### 5. Watch the agent interact with its environment

In [6]:
# Run episode
timesteps = 1000
scores = np.zeros(num_agents)                          # initialize the score (for each agent)
for t in range(timesteps):
    actions = agent.act(states)                        # select an action (for each agent)
    actions = np.clip(actions, -1, 1)                  # all actions between -1 and 1
    env_info = env.step(actions)[brain_name]           # send all actions to tne environment
    next_states = env_info.vector_observations         # get next state (for each agent)
    rewards = env_info.rewards                         # get reward (for each agent)
    dones = env_info.local_done                        # see if episode finished
    scores += env_info.rewards                         # update the score (for each agent)
    states = next_states                               # roll over states to next time step
    if np.any(dones):                                  # exit loop if episode finished
        break
print('Total score (averaged over agents) : {:.2f}'.format(np.mean(scores)))

Total score (averaged over agents) : 33.55


In [7]:
env.close()