___

<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>
___
<center><em>Copyright by Pierian Data Inc.</em></center>
<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>

# Keras RL DQN on Image Environment - Exercise - Solutions



In thise notebook you will implement a DQN agent on the famous game of Pong:

**Pong-v0**
(https://gym.openai.com/envs/Pong-v0/) <br />

**TASK: Import necessary libraries and create the environment. Also extract the possible actions** <br />

In [None]:
%config Completer.use_jedi = False
# https://github.com/keras-rl/keras-rl/blob/master/examples/dqn_atari.py


from PIL import Image  # To transform the image in the Processor
import numpy as np
import gym
from gym.utils import play

# Convolutional Backbone Network
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Flatten, Convolution2D, Permute
from tensorflow.keras.optimizers import Adam

# Keras-RL
from rl.agents.dqn import DQNAgent
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy
from rl.memory import SequentialMemory
from rl.core import Processor
from rl.callbacks import FileLogger, ModelIntervalCheckpoint


In [None]:
env = gym.make("Pong-v0")
nb_actions = env.action_space.n


**TASK: Play the game manually (keys: a and d to move the bars)** <br />

In [None]:
play.play(env)

**TASK: Define an input size and the window length** <br />

In [None]:
IMG_SHAPE = (84, 84)
WINDOW_LENGTH = 4


**TASK: Create the ImageProcessor** <br />
It needs to:
1. Resize the image
2. Convert it to grayscale
3. Standardize it
4. Be memory efficient

Dont forget the reward clipping

In [None]:
class ImageProcessor(Processor):
    def process_observation(self, observation):
        # First convert the numpy array to a PIL Image
        img = Image.fromarray(observation)
        # Then resize the image
        img = img.resize(IMG_SHAPE)
        # And convert it to grayscale  (The L stands for luminance)
        img = img.convert("L")
        # Convert the image back to a numpy array and finally return the image
        img = np.array(img)
        return img.astype('uint8')  # saves storage in experience memory
    
    def process_state_batch(self, batch):

        # We divide the observations by 255 to compress it into the intervall [0, 1].
        # This supports the training of the network
        # We perform this operation here to save memory.
        processed_batch = batch.astype('float32') / 255.
        return processed_batch



**TASK: Design the Convolutional Neural Network** <br />
Hint: Make sure to get the right input shape!

You can try the same architecture than presented in the previous notebook:
1. Conv2D(filters=32, kernel_size=8, stride=4)
2. Conv2D(filters=64, kernel_size=4, stride=2)
3. Conv2D(filters=64, kernel_size=3, stride=1)
4. Dense(512)

Dont forget the activation function

In [None]:
input_shape = (WINDOW_LENGTH, IMG_SHAPE[0], IMG_SHAPE[1])

model = Sequential()
model.add(Permute((2, 3, 1), input_shape=input_shape))

model.add(Convolution2D(32, (8, 8), strides=(4, 4),kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(Convolution2D(64, (4, 4), strides=(2, 2), kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(Convolution2D(64, (3, 3), strides=(1, 1), kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dense(nb_actions))
model.add(Activation('linear'))
print(model.summary())


**TASK: Create the Replay Memory** <br />


In [None]:
memory = SequentialMemory(limit=1000000, window_length=WINDOW_LENGTH)


**TASK: Create the processor** <br />


In [None]:
processor = ImageProcessor()


**TASK: Define the action selection policy.** <br />
Feel free to try all policies you like. (Hint: decaying epsilon greedy also works here)

In [None]:
policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=.1, value_test=.05,
                              nb_steps=1000000)


**TASK: Create the agent.** <br />
Dont forget to compile!

In [None]:
dqn = DQNAgent(model=model, nb_actions=nb_actions, policy=policy, memory=memory,
               processor=processor, nb_steps_warmup=50000, gamma=.99, target_model_update=10000,
              train_interval=4, delta_clip=1)

In [None]:
dqn.compile(Adam(lr=.00025), metrics=['mae'])


**TASK: Define a checkpoint callback to store the weights during training.** <br />
Please name it differently than our provided checkpoint to avoid overwriting it

In [None]:
weights_filename = 'dqn_pong_weights_student.h5f'
checkpoint_weights_filename = 'dqn_' + "pong" + '_weights_student_{step}.h5f'
checkpoint_callback = ModelIntervalCheckpoint(checkpoint_weights_filename, interval=100000)


**TASK: Train the agent.** <br />

In [None]:
dqn.fit(env, nb_steps=1500000, callbacks=[checkpoint_callback], log_interval=10000, visualize=False)

# After training is done, we save the final weights one more time.
dqn.save_weights(weights_filename, overwrite=True)



**TASK: Evaluate the agent.** <br />

In [None]:
dqn.test(env, nb_episodes=5, visualize=True)

**TASK: Load your weights (or the provided ones) and create an agent from those** <br />

In [None]:
# Load the weights
model.load_weights("weights_exercise/dqn_PONG_weights_1500000.h5f")


memory = SequentialMemory(limit=1000000, window_length=WINDOW_LENGTH)
policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1, value_min=.1, value_test=.05,
                              nb_steps=100000)

processor = ImageProcessor()

# Initialize the DQNAgent with the new model and updated policy and compile it
dqn = DQNAgent(model=model, nb_actions=nb_actions, policy=policy, memory=memory,
               processor=processor, nb_steps_warmup=50000, gamma=.99, target_model_update=10000)
dqn.compile(Adam(lr=.00025), metrics=['mae'])


In [None]:
dqn.test(env, nb_episodes=5, visualize=True)