# LunarLanderContinuous-v2
---
In this notebook, you will implement a DDPG agent with OpenAI Gym's LunarLanderContinuous-v2 environment.

### 1. Import the Necessary Packages

In [1]:
import gym.spaces
import random
import torch
import numpy as np
from collections import deque
import matplotlib.pyplot as plt
%matplotlib inline

from ddpg_agent import Agent

### 2. Instantiate the Environment and Agent

Initialize the environment in the code cell below.

In [2]:
env = gym.make('LunarLanderContinuous-v2')
env.seed(0)

# size of each action
action_size = env.action_space.shape[0]
print('Size of each action:', action_size)

# examine the state space 
states = env.observation_space.shape
state_size = states[0]
print('Size of state:', state_size)

[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m
[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m
Size of each action: 2
Size of state: 8


Before running the next code cell, familiarize yourself with the code in **Step 2** and **Step 3** of this notebook, along with the code in `dqn_agent.py` and `model.py`.  Once you have an understanding of how the different files work together, 
- Define a neural network architecture in `model.py` that maps states to action values.  This file is mostly empty - it's up to you to define your own deep Q-network!
- Finish the `learn` method in the `Agent` class in `dqn_agent.py`.  The sampled batch of experience tuples is already provided for you; you need only use the local and target Q-networks to compute the loss, before taking a step towards minimizing the loss.

Once you have completed the code in `dqn_agent.py` and `model.py`, run the code cell below.  (_If you end up needing to make multiple changes and get unexpected behavior, please restart the kernel and run the cells from the beginning of the notebook!_)

You can find the solution files, along with saved model weights for a trained agent, in the `solution/` folder.  (_Note that there are many ways to solve this exercise, and the "solution" is just one way of approaching the problem, to yield a trained agent._)

### 3. Train the Agent with DDPG

Run the code cell below to train the agent from scratch.  You are welcome to amend the supplied values of the parameters in the function, to try to see if you can get better performance!

In [3]:
from collections import deque
from itertools import count
import time
import torch
from ddpg_agent import Agent
import matplotlib.pyplot as plt
%matplotlib inline

agent = Agent(state_size=state_size, action_size=action_size, random_seed=0)

In [4]:
def ddpg(n_episodes=100000, max_t=50000, print_every=100):
    scores_deque = deque(maxlen=print_every)
    scores = []
    for i_episode in range(1, n_episodes+1):
        state = env.reset()
        agent.reset()
        score = 0
        for t in range(max_t):
            action = agent.act(state)
            action = np.clip(action, -1, 1)            
            next_state, reward, done, _ = env.step(action)
            agent.step(state, action, reward, next_state, done, t)
            state = next_state
            score += reward
            if done:
                break 
                
        scores_deque.append(score)
        scores.append(score)
        print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_deque)), end="")
                
        if i_episode % print_every == 0:
            print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_deque)))
            
        if np.mean(scores_deque) >= 200.0:            
            torch.save(agent.actor_local.state_dict(), 'checkpoint_actor.pth')
            torch.save(agent.critic_local.state_dict(), 'checkpoint_critic.pth')
            print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_deque)))            
            break            
            
            
    return scores

scores = ddpg()

fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(np.arange(1, len(scores)+1), scores)
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.show()

Episode 100	Average Score: -418.53
Episode 200	Average Score: -318.82
Episode 300	Average Score: -229.89
Episode 400	Average Score: -200.34
Episode 500	Average Score: -209.79
Episode 600	Average Score: -199.28
Episode 700	Average Score: -222.77
Episode 800	Average Score: -187.14
Episode 900	Average Score: -165.02
Episode 1000	Average Score: -180.48
Episode 1100	Average Score: -173.88
Episode 1200	Average Score: -160.03
Episode 1300	Average Score: -141.64
Episode 1400	Average Score: -119.29
Episode 1500	Average Score: -125.34
Episode 1600	Average Score: -120.48
Episode 1700	Average Score: -110.83
Episode 1800	Average Score: -169.50
Episode 1900	Average Score: -117.10
Episode 2000	Average Score: -78.462
Episode 2100	Average Score: -108.00
Episode 2200	Average Score: -73.592
Episode 2300	Average Score: -78.71
Episode 2400	Average Score: -110.43
Episode 2500	Average Score: -2.2501
Episode 2600	Average Score: -71.20
Episode 2700	Average Score: -100.69
Episode 2800	Average Score: -22.777
Epi

NameError: name 'score_average' is not defined

### 4. Watch a Smart Agent!

In the next code cell, you will load the trained weights from file to watch a smart agent!

In [None]:
agent.actor_local.load_state_dict(torch.load('checkpoint_actor.pth'))
agent.critic_local.load_state_dict(torch.load('checkpoint_critic.pth'))

state = env.reset()
for t in range(200):
    action = agent.act(state, add_noise=False)
    env.render()
    state, reward, done, _ = env.step(action)
    if done:
        break 

env.close()