# Reinforcement Learning with Keras interface: DQN

The goal of this notebook is to demonstrate how easy it is to do reinforcement learning with OpenMined and PySyft using the Keras interface. We will apply DQN to the game MountainCar-V0 from gym. The original code was written by Yash Patel. The original code can be found [here](https://towardsdatascience.com/reinforcement-learning-w-keras-openai-dqns-1eed3a5338c).

In [1]:
import syft



In [2]:
import gym
import numpy as np
import random

from syft.interfaces.keras.models import Sequential
from syft.interfaces.keras.layers import Dense, Dropout
from syft.interfaces.keras.optimizers import SGD

from collections import deque
from syft import FloatTensor

Using TensorFlow backend.
lol... Just Kidding... Using OpenMined Backend

In [3]:
class DQN:
    def __init__(self, env):
        self.env     = env
        self.memory  = deque(maxlen=2000)
        
        self.gamma = 0.85
        self.epsilon = 1.0
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.learning_rate = 0.005
        self.tau = .125

        self.model        = self.create_model()
        self.target_model = self.create_model()

    def create_model(self):
        model   = Sequential()
        state_shape  = self.env.observation_space.shape
        model.add(Dense(24, input_shape=state_shape[0], activation="relu"))
        model.add(Dense(48, activation="relu"))
        model.add(Dense(24, activation="relu"))
        model.add(Dense(self.env.action_space.n))
        model.compile(loss='mean_squared_error',
                      optimizer=SGD(lr=0.01), metrics =[])
        return model

    def act(self, state):
        self.epsilon *= self.epsilon_decay
        self.epsilon = max(self.epsilon_min, self.epsilon)
        if np.random.random() < self.epsilon:
            return self.env.action_space.sample()
        return np.argmax(self.model.predict(state))

    def remember(self, state, action, reward, new_state, done):
        self.memory.append([state, action, reward, new_state, done])

    def replay(self):
        batch_size = 32
        state_store = []
        target_store = []
        if len(self.memory) < batch_size: 
            return
        samples = random.sample(self.memory, batch_size)
        for sample in samples:
            state, action, reward, new_state, done = sample
            target = self.target_model.predict(state)
            if done:
                target[0][action] = reward
            else:
                Q_future = max(self.target_model.predict(new_state))
                target[0][action] = (Q_future * self.gamma + reward)[0]
       
            state_store.append(state)
            target_store.append(target)
            
        state_store = np.array(state_store).reshape(batch_size,self.env.observation_space.shape[0])
        target_store = np.array(target_store).reshape(batch_size,self.env.action_space.n)
        
        self.model.fit(state_store, target_store, batch_size=1,epochs=1,verbose=False,validation_data=None)


    def target_train(self):
        weights = self.model.get_weights()
        target_weights = self.target_model.get_weights()
        for i in range(len(target_weights)):
            target_weights[i] *= 0
            target_weights[i] += weights[i] * self.tau + target_weights[i] * (1 - self.tau)

def main():
    env     = gym.make("MountainCar-v0")
    gamma   = 0.9
    epsilon = .95

    trials  = 1000
    trial_len = 500

    # updateTargetNetwork = 1000
    dqn_agent = DQN(env=env)
    steps = []
    for trial in range(trials):
        cur_state = env.reset().reshape(1,2)
        for step in range(trial_len):
            action = dqn_agent.act(cur_state)
            new_state, reward, done, _ = env.step(action)

            # reward = reward if not done else -20
            # MOD - convert new_state to FloatTensor
            new_state = new_state.reshape(1,2)
            dqn_agent.remember(cur_state, action, reward, new_state, done)
            
            dqn_agent.replay()       # internally iterates default (prediction) model
            dqn_agent.target_train() # iterates target model

            cur_state = new_state
            if done:
                break
        if step >= 199:
            print("Failed to complete in trial {}".format(trial))
        else:
            print("Completed in {} trials".format(trial))
            break

if __name__ == "__main__":
    main()


[2018-01-24 18:34:35,794] Making new env: MountainCar-v0


Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Bat

Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Bat

Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Bat

Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Failed to complete in trial 5
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Numb

Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Failed to complete in trial 7
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Numb

Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Failed to complete in trial 9
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Numb

Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Failed to complete in trial 11
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Num

Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Bat

Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Bat

Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32
Number of Batches:32


KeyboardInterrupt: 