___

<a href='http://www.pieriandata.com'><img src='../COURSE_NOTEBOOKS/Pierian_Data_Logo.png'/></a>
___
<center><em>Copyright by Pierian Data Inc.</em></center>
<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>

# Keras-RL DQN Exercise


In this exercise you are going to implement your first keras-rl agent based on the **Acrobot** environment (https://gym.openai.com/envs/Acrobot-v1/) <br />
The goal of this environment is to maneuver the robot arm upwards above the line with as little steps as possible

**TASK: Import necessary libraries** <br />

In [12]:
import gym

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense, Flatten
from tensorflow.keras.optimizers import Adam

from rl.agents.dqn import DQNAgent

from rl.memory import SequentialMemory
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy

**TASK: Create the environment** <br />
The name is: *Acrobot-v1*

In [13]:
env_name = "Acrobot-v1"
env = gym.make(env_name)

In [14]:
num_actions = env.action_space.n
num_observations = env.observation_space.shape
print(f"Action Space: {env.action_space.n}")
print(f"Observation Space: {num_observations}")

assert num_actions == 3 and num_observations == (6,) , "Wrong environment!"

Action Space: 3
Observation Space: (6,)


**TASK: Create the Neural Network for your Deep-Q-Agent** <br />
Take a look at the size of the action space and the size of the observation space.
You are free to chose any architecture you want! <br />
Hint: It already works with three layers, each having 64 neurons.

In [15]:
model = Sequential()

model.add(Flatten(input_shape=((1,)+num_observations)))

model.add(Dense(64))
model.add(Activation('relu'))

model.add(Dense(64))
model.add(Activation('relu'))

model.add(Dense(64))
model.add(Activation('relu'))

model.add(Dense(num_actions))
model.add(Activation('linear'))

print(model.summary())

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 6)                 0         
_________________________________________________________________
dense_4 (Dense)              (None, 64)                448       
_________________________________________________________________
activation_4 (Activation)    (None, 64)                0         
_________________________________________________________________
dense_5 (Dense)              (None, 64)                4160      
_________________________________________________________________
activation_5 (Activation)    (None, 64)                0         
_________________________________________________________________
dense_6 (Dense)              (None, 64)                4160      
_________________________________________________________________
activation_6 (Activation)    (None, 64)               

**TASK: Initialize the circular buffer**<br />
Make sure you set the limit appropriately (50000 works well)

In [16]:
memory = SequentialMemory(limit=50000, window_length=1)

**TASK: Use the epsilon greedy action selection strategy with *decaying* epsilon**

In [17]:
policy = LinearAnnealedPolicy(EpsGreedyQPolicy(),
                                attr='eps',
                                value_max=1.0,
                                value_min=0.1,
                                value_test=0.05,
                                nb_steps=150000)

**TASK: Create the DQNAgent** <br />
Feel free to play with the nb_steps_warump, target_model_update, batch_size and gamma parameters. <br />
Hint:<br />
You can try *nb_steps_warmup*=1000, *target_model_update*=1000, *batch_size*=32 and *gamma*=0.99 as a first guess

In [18]:
dqn = DQNAgent(model=model, memory=memory, policy=policy, nb_actions=num_actions,
                nb_steps_warmup=1000, target_model_update=1000, batch_size=32, gamma=0.99)

**TASK: Compile the model** <br />
Feel free to explore the effects of different optimizers and learning rates.
You can try Adam with a learning rate of 1e-3 as a first guess 

In [19]:
dqn.compile(Adam(lr=1e-3), metrics=['mae'])



**TASK: Fit the model** <br />
150,000 steps should be a very good starting point

In [20]:
dqn.fit(env, visualize=False, nb_steps=30000, verbose=2)

Training for 30000 steps ...




   500/30000: episode: 1, duration: 0.322s, episode steps: 500, steps per second: 1552, episode reward: -500.000, mean reward: -1.000 [-1.000, -1.000], mean action: 1.040 [0.000, 2.000],  loss: --, mae: --, mean_q: --, mean_eps: --
  1000/30000: episode: 2, duration: 0.260s, episode steps: 500, steps per second: 1922, episode reward: -500.000, mean reward: -1.000 [-1.000, -1.000], mean action: 1.012 [0.000, 2.000],  loss: --, mae: --, mean_q: --, mean_eps: --




  1500/30000: episode: 3, duration: 2.364s, episode steps: 500, steps per second: 212, episode reward: -500.000, mean reward: -1.000 [-1.000, -1.000], mean action: 1.080 [0.000, 2.000],  loss: 0.008666, mae: 0.632068, mean_q: -0.886856, mean_eps: 0.943750
  2000/30000: episode: 4, duration: 2.096s, episode steps: 500, steps per second: 239, episode reward: -500.000, mean reward: -1.000 [-1.000, -1.000], mean action: 1.046 [0.000, 2.000],  loss: 0.000223, mae: 0.630398, mean_q: -0.918326, mean_eps: 0.921273
  2500/30000: episode: 5, duration: 1.982s, episode steps: 500, steps per second: 252, episode reward: -500.000, mean reward: -1.000 [-1.000, -1.000], mean action: 1.074 [0.000, 2.000],  loss: 0.007087, mae: 1.414952, mean_q: -2.069570, mean_eps: 0.898773
  3000/30000: episode: 6, duration: 2.016s, episode steps: 500, steps per second: 248, episode reward: -500.000, mean reward: -1.000 [-1.000, -1.000], mean action: 1.008 [0.000, 2.000],  loss: 0.001163, mae: 1.408325, mean_q: -2.078

<keras.callbacks.History at 0x23c9639aeb0>

In [21]:
dqn.save_weights('my_weights_acrobat', overwrite=True)

  and should_run_async(code)


[TIP] Next time specify overwrite=True!


**TASK: Evaluate the model**

In [22]:
env = gym.make(env_name)
dqn.test(env, visualize=True, nb_episodes=5)
env.test()

  deprecation(
  deprecation(
See here for more information: https://www.gymlibrary.ml/content/api/[0m
  deprecation(


Testing for 5 episodes ...
Episode 1: reward: -500.000, steps: 500


KeyboardInterrupt: 

: 