<a href="https://colab.research.google.com/github/wissam124/iasd-deep-learning-go/blob/master/DeepLearningProject.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Deep Learning Project

This is the page for the Deep Learning Project of the master IASD. The goal is to train a network for playing the game of Go. In order to be fair about training ressources the number of parameters for the networks you submit must be lower than 1 000 000. The maximum number of students per team is two. The data used for training comes from Facebook ELF opengo Go program self played games. There are more than 98 000 000 different states in total in the training set. The input data is composed of 8 19x19 planes (color to play, ladders, current state on two planes, two previous states on four planes). The output targets are the policy (a vector of size 361 with 1.0 for the move played, 0.0 for the other moves), the value (1.0 if White won, 0.0 if Black won) and the state at the end of the game (two planes).

In [0]:
!wget https://www.lamsade.dauphine.fr/~cazenave/DeepLearningProject.zip

--2019-12-30 14:35:35--  https://www.lamsade.dauphine.fr/~cazenave/DeepLearningProject.zip
Resolving www.lamsade.dauphine.fr (www.lamsade.dauphine.fr)... 193.48.71.250
Connecting to www.lamsade.dauphine.fr (www.lamsade.dauphine.fr)|193.48.71.250|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 211774472 (202M) [application/zip]
Saving to: ‘DeepLearningProject.zip.2’


2019-12-30 14:35:56 (10.6 MB/s) - ‘DeepLearningProject.zip.2’ saved [211774472/211774472]



In [0]:
!unzip -j DeepLearningProject.zip
# Copy all files into root directory
# !cp -r DeepLearningProject/* .

Archive:  DeepLearningProject.zip
replace Board.h? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
  inflating: Board.h                 
  inflating: Game.h                  
  inflating: Rzone.h                 
  inflating: compileMAC.sh           
  inflating: compile.sh              
  inflating: ls.sh                   
  inflating: golois.cpp              
  inflating: games.data              
  inflating: golois.py               
  inflating: end.npy                 
  inflating: input_data.npy          
  inflating: policy.npy              
  inflating: value.npy               
  inflating: README                  


In [0]:
!ls -all

total 2789096
drwxr-xr-x 1 root root       4096 Dec 30 14:36 .
drwxr-xr-x 1 root root       4096 Dec 30 13:39 ..
-rw-r--r-- 1 root root     104265 Dec 10 12:17 Board.h
-rwxrwxr-x 1 root root        224 Dec  5 21:19 compileMAC.sh
-rwxr-xr-x 1 root root        156 Nov 23 12:23 compile.sh
drwxr-xr-x 1 root root       4096 Dec 18 16:52 .config
-rw-r--r-- 1 root root  211774472 Dec 10 13:51 DeepLearningProject.zip
-rw-r--r-- 1 root root  211774472 Dec 10 13:51 DeepLearningProject.zip.1
-rw-r--r-- 1 root root  211774472 Dec 10 13:51 DeepLearningProject.zip.2
drwx------ 4 root root       4096 Dec 30 13:52 drive
-rw-rw-r-- 1 root root  288800128 Dec  1 19:27 end.npy
-rw-r--r-- 1 root root       7900 Dec 10 13:36 Game.h
-rw-r--r-- 1 root root  631559220 Dec 10 12:28 games.data
-rw-r--r-- 1 root root       3104 Dec  1 18:18 golois.cpp
-rwxr-xr-x 1 root root     162520 Dec 30 13:51 golois.cpython-36m-x86_64-linux-gnu.so
-rw-r--r-- 1 root root       1915 Dec  1 19:19 golois.py
-rw-rw-r-- 1 root ro

In [0]:
!rm -r golois.py

In [0]:
!pip3 install pybind11



In [0]:
!./compile.sh

In file included from [01m[Kgolois.cpp:17:0[m[K:
[01m[KBoard.h:[m[K In member function ‘[01m[Kbool Board::isCapturedLadder(int, int, Rzone*)[m[K’:
    int [01;35m[Kn1[m[K = nbLiberties (inter, liberties1, stones1, 3);
        [01;35m[K^~[m[K
        int [01;35m[Kn1[m[K = nbLiberties (inter, liberties1, stones1, 3);
            [01;35m[K^~[m[K
[01m[KBoard.h:[m[K In member function ‘[01m[Kvoid Board::computeLadders(int)[m[K’:
     int [01;35m[Kother[m[K = opponent (color);
         [01;35m[K^~~~~[m[K
[01m[KBoard.h:[m[K In member function ‘[01m[Kvoid Board::computeAllLadders(int, bool)[m[K’:
     int [01;35m[Kn1[m[K = nbLiberties (i, liberties1, stones1);
         [01;35m[K^~[m[K
   int [01;35m[Kn1[m[K = nbLiberties (i, liberties1, stones1);
       [01;35m[K^~[m[K
     int [01;35m[Kn1[m[K = nbLiberties (i, liberties1, stones1);
         [01;35m[K^~[m[K
   int [01;35m[Kn1[m[K = nbLiberties (i, liberties1, stone

In [0]:
%tensorflow_version 2.x
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Input, Dense, Conv2D, Flatten, BatchNormalization, Activation, LeakyReLU, add, SpatialDropout2D, ReLU, Softmax, MaxPool2D, Dropout
from tensorflow.keras.optimizers import SGD, Adam, Adadelta
from tensorflow.keras import regularizers
from tensorflow.keras.utils import plot_model
from tensorflow.keras.callbacks import ModelCheckpoint, CSVLogger, EarlyStopping, ReduceLROnPlateau

from matplotlib import pyplot as plt
import numpy as np
import golois

class GoModel():
    def __init__(self, regParam, learningRate, inputDim, outputDim):
        self.regParam = regParam
        self.learningRate = learningRate
        self.inputDim = inputDim
        self.outputDim = outputDim

    def predict(self, x):
        return self.model.predict(x)

    def fit(self, X, y, epochs, verbose, validation_split, batch_size):

        ValCheckpoint = ModelCheckpoint('best_val_loss.h5',
                                monitor='val_loss',
                                verbose=1,
                                save_best_only=True,
                                mode='auto',
                                period=1)

        csv_logger = CSVLogger('training.log', separator=',', append=False)

        return self.model.fit(X,
                              y,
                              epochs=epochs,
                              verbose=verbose,
                              validation_split=validation_split,
                              batch_size=batch_size,
                              callbacks=[ValCheckpoint, csv_logger])

    def save_model(self):
        self.model.save('./model_' + 
                        str(self.regParam) + 'reg' +
                        '.h5')

    def summary(self):
        return self.model.summary()

    def plot_model(self):
        plot_model(self.model)

    def display_layers():
        pass


class NeuralNet(GoModel):
    def __init__(self, regParam, learningRate, inputDim, outputDim,
                 hiddenLayers, momentum):
        GoModel.__init__(self, regParam, learningRate, inputDim, outputDim)
        self.hidden_layers = hiddenLayers
        self.momentum = momentum
        self.num_layers = len(hiddenLayers)
        self.model = self.buildModel()

    def convLayer(self, x, numFilters, kernelSize):

        x = Conv2D(filters=numFilters,
                   kernel_size=kernelSize,
                   data_format='channels_last',
                   padding='same',
                   use_bias=False,
                   activation='linear',
                   kernel_regularizer=regularizers.l2(self.regParam))(x)

        x = SpatialDropout2D(rate=0.5,
                             data_format='channels_last')(x)

        x = BatchNormalization(axis=-1)(x)

        x = LeakyReLU(alpha=0.3)(x)

        return x

    def residualLayer(self, inputLayer, numFilters, kernelSize):

        x = self.convLayer(inputLayer, numFilters, kernelSize)

        x = Conv2D(filters=numFilters,
                   kernel_size=kernelSize,
                   data_format='channels_last',
                   padding='same',
                   use_bias=False,
                   activation='linear',
                   kernel_regularizer=regularizers.l2(self.regParam))(x)

        x = SpatialDropout2D(rate=0.5,
                             data_format='channels_last')(x)

        x = BatchNormalization(axis=-1)(x)

        x = add([inputLayer, x])

        x = LeakyReLU(alpha=0.3)(x)

        return (x)

    def value_head(self, x):

        x = Conv2D(filters=1,
                   kernel_size=(1, 1),
                   data_format='channels_last',
                   padding='same',
                   use_bias=False,
                   activation='linear',
                   kernel_regularizer=regularizers.l2(self.regParam))(x)
        
        x = BatchNormalization(axis=-1)(x)

        x = LeakyReLU(alpha=0.3)(x)

        x = Flatten()(x)

        x = Dense(40,
                  use_bias=False,
                  activation='linear',
                  kernel_regularizer=regularizers.l2(self.regParam))(x)

        x = LeakyReLU(alpha=0.3)(x)

        # x = Dropout(0.2)(x)

        x = Dense(1,
                  use_bias=False,
                  activation='sigmoid',
                  kernel_regularizer=regularizers.l2(self.regParam),
                  name='value')(x)

        return (x)

    def policy_head(self, x):

        x = Conv2D(filters=2,
                   kernel_size=(1, 1),
                   data_format='channels_last',
                   padding='same',
                   use_bias=False,
                   activation='linear',
                   kernel_regularizer=regularizers.l2(self.regParam))(x)

        x = BatchNormalization(axis=-1)(x)

        x = LeakyReLU(alpha=0.3)(x)

        x = Flatten()(x)

        x = Dense(self.outputDim, activation='softmax', name='policy')(x)

        return (x)

    def buildModel(self):

        mainInput = Input(shape=self.inputDim, name='board')

        x = self.convLayer(mainInput, self.hidden_layers[0]['numFilters'],
                           self.hidden_layers[0]['kernelSize'])

        if len(self.hidden_layers) > 1:
            for h in self.hidden_layers[1:]:
                x = self.residualLayer(x, h['numFilters'], h['kernelSize'])

        value_head = self.value_head(x)
        policy_head = self.policy_head(x)

        model = Model(inputs=[mainInput], outputs=[policy_head, value_head])

        # model.compile(optimizer=SGD(lr=self.learningRate, momentum=self.momentum),
        #               loss={
        #                   'value': 'mse',
        #                   'policy': 'categorical_crossentropy'
        #               },
        #               loss_weights={
        #                   'value': 100,
        #                   'policy':1
        #               },
        #               metrics=['accuracy'])

        model.compile(optimizer=SGD(lr=self.learningRate, momentum=self.momentum),
                      loss={
                          'value': 'mse',
                          'policy': 'categorical_crossentropy'
                      },
                      loss_weights={
                          'value': 1,
                          'policy':1
                      },
                      metrics=['accuracy'])

        return model

In [0]:
tf.__version__

'1.15.0'

In [0]:
def generateData(N=10000, dynamicBatch=False):
    planes = 8
    moves = 361
    dynamicBatch = True  # Pour tester réseau en générant des parties avec la librairie Golois
    if dynamicBatch:
        input_data = np.random.randint(2, size=(N, 19, 19, planes))
        input_data = input_data.astype('float32')

        policy = np.random.randint(moves, size=(N, ))
        policy = keras.utils.to_categorical(policy)

        value = np.random.randint(2, size=(N, ))
        value = value.astype('float32')

        end = np.random.randint(2, size=(N, 19, 19, 2))
        end = end.astype('float32')

        golois.getBatch(input_data, policy, value, end)
    else:
        input_data = np.load('./input_data.npy')
        policy = np.load('./policy.npy')
        value = np.load('./value.npy')
        # end = np.load('./end.npy')
    
    return input_data, policy, value

In [0]:
input_data, policy, value = generateData(300000, True)
input_data.shape

(300000, 19, 19, 8)

In [0]:
# Parameters
BATCH_SIZE = 256
EPOCHS = 30
REG_CONST = 0.001
LEARNING_RATE = 0.001
MOMENTUM = 0.9

HIDDEN_CNN_LAYERS = [{
    'numFilters': 64,
    'kernelSize': (7, 7)
}, {
    'numFilters': 64,
    'kernelSize': (5, 5)
}, {
    'numFilters': 64,
    'kernelSize': (5, 5)
}, {
    'numFilters': 64,
    'kernelSize': (3, 3)
}, {
    'numFilters': 64,
    'kernelSize': (3, 3)
}, {
    'numFilters': 64,
    'kernelSize': (3, 3)
}]

In [0]:
nHiddenLayers = len(HIDDEN_CNN_LAYERS)
print(len(HIDDEN_CNN_LAYERS))

6


In [0]:
# Create Go Neural Network
moves = 361
GoNeuralNet = NeuralNet(REG_CONST, LEARNING_RATE,
                        (19, 19, 8), moves, HIDDEN_CNN_LAYERS,
                        MOMENTUM)

In [0]:
# Display summary of neural network
GoNeuralNet.summary()

In [0]:
# # Plot model
# GoNeuralNet.plot_model()
# from IPython.display import Image
# Image('model.png');

In [0]:
GoNeuralNet.fit(input_data, {
    'policy': policy,
    'value': value
},
                epochs=25,
                verbose=1,
                validation_split=0.1,
                batch_size=BATCH_SIZE)

In [0]:
import pandas as pd
plt.style.use('seaborn')
df = pd.read_csv('./training.log')
epochs = df['epoch']
plt.clf()
f, ax = plt.subplots(2, 3, figsize=(20,10))
ax[0][0].plot(epochs, df['loss'])
ax[0][0].plot(epochs, df['val_loss'])
ax[0][0].legend(['loss', 'val_los'])
ax[0][0].set_title('Total loss')
ax[0][1].plot(epochs, df['policy_loss'])
ax[0][1].plot(epochs, df['val_policy_loss'])
ax[0][1].legend(['policy_loss', 'val_policy_loss'])
ax[0][1].set_title('Policy loss')
ax[0][2].plot(epochs, df['value_loss'])
ax[0][2].plot(epochs, df['val_value_loss'])
ax[0][2].legend(['value_loss', 'val_value_loss'])
ax[0][2].set_title('Value loss')
ax[1][1].plot(epochs, df['policy_acc'])
ax[1][1].plot(epochs, df['val_policy_acc'])
ax[1][1].legend(['policy_acc', 'val_policy_acc'])
ax[1][1].set_title('Policy acc')
ax[1][2].plot(epochs, df['value_acc'])
ax[1][2].plot(epochs, df['val_value_acc'])
ax[1][2].legend(['value_acc', 'val_value_acc'])
ax[1][2].set_title('Value accuarcy')

In [0]:
# Save weights and training log
from google.colab import files
# files.download('best_train_loss.h5')
files.download('best_val_loss.h5')
# files.download('training.log')

In [0]:
# Mount drive
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
# Load pre-trained model from Drive
from tensorflow.keras.models import load_model
new_model = load_model('./drive/My Drive/DLGO/Dupin_Rynkiewicz_30_64.h5')

In [0]:
# Baseline original model from first submission
# new_model.summary() #  
new_model.evaluate(input_data, {'policy': policy, 'value': value})



[3.3898697851308186, 2.9809532, 0.21800779, 0.30939, 0.64327]

# Iterative Training

In [0]:
# Load pre-trained model from iterative training
from tensorflow.keras.models import load_model
new_model = load_model('best_val_loss_iter_train.h5')

In [0]:
# Set layers to be trained
my_layers = new_model.layers
for layer in new_model.layers:
    layer.trainable = True
print(new_model.layers[34].trainable)

True


In [0]:
new_model.evaluate(input_data, {'policy': policy, 'value': value})



[3.2712620943196615, 2.9633567, 0.21683921, 0.31052667, 0.64411336]

In [0]:
# Recompile model before retraining
# new_model.compile(optimizer=SGD(lr=0.001, momentum=0.9),
#                       loss={
#                           'value': 'mse',
#                           'policy': 'categorical_crossentropy'
#                       },
#                       loss_weights={
#                           'value': 1,
#                           'policy': 1
#                       },
#                       metrics=['accuracy'])

# Recompile model before retraining
new_model.compile(optimizer=SGD(learning_rate=0.01),
                      loss={
                          'value': 'mse',
                          'policy': 'categorical_crossentropy'
                      },
                      loss_weights={
                          'value': 1,
                          'policy': 1
                      },
                      metrics=['accuracy'])

In [0]:
ValCheckpoint = ModelCheckpoint('best_val_loss_iter_train.h5',
                        monitor='val_loss',
                        verbose=1,
                        save_best_only=True,
                        mode='auto',
                        period=1)


# es = EarlyStopping(monitor='val_loss', 
#                    patience=7,
#                    verbose=1,
#                    mode='min', 
#                    restore_best_weights=True)


# reduce_lr = ReduceLROnPlateau(monitor='val_loss', 
#                               factor=0.5,
#                               patience=3,
#                               verbose=1,
#                               mode='min',  
#                               min_lr=0.00001)

for i in range(20):
    input_data_iter, policy_iter, value_iter = generateData(N=100000)


    new_model.fit(input_data_iter, 
                {'policy': policy_iter,
                'value': value_iter},
                epochs=3,
                verbose=1,
                validation_split=0.1,
                batch_size=256,
                callbacks=[ValCheckpoint])
    
    del input_data_iter
    del policy_iter
    del value_iter
    print('end of iteration :', i)
    # new_model.evaluate(input_data, {'policy': policy, 'value': value})

Train on 90000 samples, validate on 10000 samples
Epoch 1/3
Epoch 00001: val_loss improved from inf to 3.41184, saving model to best_val_loss_iter_train.h5
Epoch 2/3
Epoch 00002: val_loss improved from 3.41184 to 3.40709, saving model to best_val_loss_iter_train.h5
Epoch 3/3
Epoch 00003: val_loss did not improve from 3.40709
end of iteration : 0
Train on 90000 samples, validate on 10000 samples
Epoch 1/3
Epoch 00001: val_loss did not improve from 3.40709
Epoch 2/3
Epoch 00002: val_loss did not improve from 3.40709
Epoch 3/3
Epoch 00003: val_loss did not improve from 3.40709
end of iteration : 1
Train on 90000 samples, validate on 10000 samples
Epoch 1/3
Epoch 00001: val_loss did not improve from 3.40709
Epoch 2/3
Epoch 00002: val_loss did not improve from 3.40709
Epoch 3/3
Epoch 00003: val_loss did not improve from 3.40709
end of iteration : 2
Train on 90000 samples, validate on 10000 samples
Epoch 1/3
Epoch 00001: val_loss did not improve from 3.40709
Epoch 2/3
Epoch 00002: val_loss d

In [0]:
# Save weights re-trained model
from google.colab import files
files.download('best_val_loss_iter_train.h5')

In [0]:
# Load pre-trained model result from iterative training
from tensorflow.keras.models import load_model
new_model = load_model('best_val_loss_iter_train.h5')

OSError: ignored

In [0]:
new_model.evaluate(input_data, {'policy': policy, 'value': value})



KeyboardInterrupt: ignored

# Train last dense layer

In [0]:
# Load pre-trained model
from tensorflow.keras.models import load_model
new_model = load_model('best_val_loss_last_layer.h5')
# new_model = load_model('./drive/My Drive/DLGO/best_val_loss.h5')

In [0]:
# Set layers to be trained
my_layers = new_model.layers
for layer in new_model.layers:
    layer.trainable = False
new_model.layers[60].trainable = True

In [0]:
# Recompile model before retraining
# new_model.compile(optimizer=SGD(lr=0.001, momentum=0.9),
#                       loss={
#                           'value': 'mse',
#                           'policy': 'categorical_crossentropy'
#                       },
#                       loss_weights={
#                           'value': 1,
#                           'policy': 1
#                       },
#                       metrics=['accuracy'])

# Recompile model before retraining
new_model.compile(optimizer=SGD(learning_rate=0.01),
                      loss={
                          'value': 'mse',
                          'policy': 'categorical_crossentropy'
                      },
                      loss_weights={
                          'value': 1,
                          'policy': 1
                      },
                      metrics=['accuracy'])

In [0]:
ValCheckpoint = ModelCheckpoint('best_val_loss_last_layer.h5',
                        monitor='val_loss',
                        verbose=1,
                        save_best_only=True,
                        mode='auto',
                        period=1)


es = EarlyStopping(monitor='val_loss', 
                   patience=6,
                   verbose=1,
                   mode='min', 
                   restore_best_weights=True)


# reduce_lr = ReduceLROnPlateau(monitor='val_loss', 
#                               factor=0.5,
#                               patience=3,
#                               verbose=1,
#                               mode='min',  
#                               min_lr=0.00001)

for i in range(10):
    input_data_iter, policy_iter, value_iter = generateData(N=100000)

    es = EarlyStopping(monitor='val_loss', 
                    patience=6,
                    verbose=1,
                    mode='min', 
                    restore_best_weights=True)

    new_model.fit(input_data_iter, 
                {'policy': policy_iter,
                'value': value_iter},
                epochs=15,
                verbose=1,
                validation_split=0.1,
                batch_size=256,
                callbacks=[ValCheckpoint, es])
    
    del input_data_iter
    del policy_iter
    del value_iter
    print('end of iteration :', i)
    # new_model.evaluate(input_data, {'policy': policy, 'value': value})

In [0]:
# Save weights re-trained model
from google.colab import files
files.download('best_val_loss_last_layer.h5')

In [0]:
# Load pre-trained model
from tensorflow.keras.models import load_model
new_model = load_model('best_val_loss_last_layer.h5')
# new_model = load_model('./drive/My Drive/DLGO/best_val_loss.h5')

In [0]:
# Save weights re-trained model
new_model.evaluate(input_data, {'policy': policy, 'value': value})



[3.270696885668437, 2.9627993, 0.21683921, 0.31041333, 0.64411336]