# Deep Learning

Passando agora para a sub-área do Machine Learning, o Deep Learning permite tratar dados mais extensos com maior facilidade e precisão (em muitos casos).

Visto que o código desenvolvido anteriormente é bastante extenso e demorado, vamos primeiro definir os blocos de código essenciais de correr antes de efetuar a análise por Deep Learning (de forma a poder começar a análise *de novo* a partir daqui).

In [1]:
import numpy as np
import pandas as pd

## Multilayer Perceptron

In [2]:
X_train_sc = pd.read_csv("Files/x_train_sc.csv", index_col=0)

In [3]:
y_train = pd.read_csv("Files/y_train.csv", index_col=0)
y_train = pd.Series(y_train["tm"])

Tendo definido as 200 features com a maior importância (no ficheiro "SL.ipynb"), e que permitiram efetuar previsões com precisão relativamente elevada, vamos utilizar as mesmas (geradas através do método da Informação Mútua) para prosseguir com a análise utilizando 'Deep Learning'

In [4]:
# get k best scores between features and label -> pearson, spearman, f_regression and multi_info_regression
def get_k_best_corrs(k, scores):
    idxs = np.argsort(scores)[-k:]
    feats = X_train_sc.columns[idxs]
    scores = np.sort(scores)[-k:]
    return {f: c for f, c in zip(feats, scores)}

In [6]:
mutual_info = [float(elem.strip()) for elem in open("Files/mutual_info.txt").readlines()]

In [7]:
feature_names = get_k_best_corrs(200, mutual_info).keys()

In [8]:
x_train_mi = X_train_sc.loc[:, feature_names]

In [9]:
x_train_mi.describe()

Unnamed: 0,AA,DG,LR,_SecondaryStrC3,AI,_SolventAccessibilityT13,GS,VA,ST,GL,...,_SecondaryStrD3001,MolecularWeight,_PolarizabilityD1001,_PolarityD2001,_HydrophobicityD2001,_NormalizedVDWVD1001,S,N,I,Q
count,28403.0,28403.0,28403.0,28403.0,28403.0,28403.0,28403.0,28403.0,28403.0,28403.0,...,28403.0,28403.0,28403.0,28403.0,28403.0,28403.0,28403.0,28403.0,28403.0,28403.0
mean,0.077555,0.0973,0.11578,0.284679,0.096641,0.232627,0.018175,0.107986,0.059385,0.112944,...,0.017738,0.013734,0.021564,0.038799,0.040168,0.036351,0.187294,0.134017,0.274215,0.167674
std,0.073396,0.102072,0.116764,0.062164,0.096481,0.056298,0.019162,0.09655,0.062607,0.102629,...,0.021898,0.020303,0.026867,0.055728,0.055127,0.042654,0.070865,0.057128,0.10822,0.077459
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.02333,0.0,0.032,0.24679,0.0,0.196328,0.0,0.038,0.0,0.036275,...,0.005061,0.005909,0.006335,0.010179,0.011252,0.011002,0.135745,0.096186,0.2063,0.121437
50%,0.062566,0.078431,0.09,0.278174,0.076923,0.231638,0.0156,0.092,0.05124,0.098039,...,0.010732,0.010132,0.012451,0.020096,0.021828,0.021478,0.181189,0.129072,0.2687,0.157973
75%,0.113468,0.151961,0.16,0.315264,0.140659,0.269774,0.0264,0.15,0.095868,0.170588,...,0.021564,0.016075,0.027561,0.044338,0.046582,0.046157,0.230302,0.165098,0.33535,0.206071
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [10]:
y_train

0        48.4
1        48.4
2        49.0
3        55.6
4        48.4
         ... 
28691    51.8
28692    37.2
28693    64.6
28694    50.7
28695    37.6
Name: tm, Length: 28403, dtype: float64

Vamos criar uma rede neuronal constituída por camadas densas (o tipo de rede mais simples e utilizada) para poder efetuar previsões relativamente aos valores de termostabilidade. Este tipo de redes são também conhecidos como "Multilayer Perceptrons" (ou MLP).

In [33]:
#!pip install scikeras

from keras.models import Sequential
from keras.layers import Dropout, Dense
from keras.constraints import MaxNorm
from keras.optimizers import SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam
import tensorflow as tf
from sklearn.model_selection import RandomizedSearchCV
from scikeras.wrappers import KerasRegressor

import itertools

Visto que este tipo de redes (assim como outros) apresenta vários hiperparâmetros que podem ser otimizados para obter um modelo que permite uma previsão mais precisa, vamos criar uma função (chamada *create_model()*) que permite a adição de um número variável de camadas, cada uma com valores variáveis de nodos (e outros parâmetros que vão ser discutidos de seguida). A cada camada densa é adicionado também uma regularização dropout (para evitar sobreajustamentos), e por fim, a rede é compilada com a selecção de um dos otimizadores mais utilizados (selecionado através da função auxiliar *_choose_optimizer()*).

Os seguintes blocos de código foram inspirados no código apresentado no seguinte site: https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/

In [45]:
#GRID SEARCH FUNC

def _choose_optimizer(optimizer, learning_rate, momentum):
    if optimizer == "sgd":
        return SGD(learning_rate=learning_rate, momentum=momentum)
    elif optimizer == "rmsprop":
        return RMSprop(learning_rate=learning_rate, momentum=momentum)
    elif optimizer == "adagrad":
        return Adagrad(learning_rate=learning_rate)
    elif optimizer == "adadelta":
        return Adadelta(learning_rate=learning_rate)
    elif optimizer == "adam":
        return Adam(learning_rate=learning_rate)
    elif optimizer == "adamax":
        return Adamax(learning_rate=learning_rate)
    elif optimizer == "nadam":
        return Nadam(learning_rate=learning_rate)
    else:
        raise ValueError("Unrecognized optimizer")


def create_model(first_input, layers, neurons, dropout_rate, weight_constraint, learning_rate, momentum, activation, init_mode='uniform', optimizer='adam'):
    # create model
    model = Sequential()
    for ix in range(layers):
        if ix == 0:
            model.add(Dense(neurons[ix],
                            activation=activation,
                            input_shape=first_input,
                            kernel_initializer=init_mode,
                            kernel_constraint=MaxNorm(weight_constraint)))
        elif ix != layers-1:
            model.add(Dense(neurons[ix],
                            activation=activation,
                            kernel_initializer=init_mode,
                            kernel_constraint=MaxNorm(weight_constraint)))
        else:
            model.add(Dense(1,
                            activation=activation,
                            kernel_initializer=init_mode))

        if ix != layers-1:
            model.add(Dropout(dropout_rate))

    opt = _choose_optimizer(optimizer.lower(), learning_rate, momentum)

    model.compile(loss="mse", optimizer=opt, metrics=["mse"])
    return model

In [41]:
#Parameter values to optimize:
neurons = [grid + (1,) for grid in itertools.product(*[[20,50,100]]*3)]
dropout_rate = np.linspace(0.0, 0.5, 6)
weight_constraint = np.linspace(0.5, 5, 10) #Max norm value each weight parameter can be
learning_rate = np.linspace(0.005, 0.5, 100)
momentum = np.linspace(0.0, 0.9, 4) #9
optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']

#For computational purposes, all layers will have the same  weight_constraint and dropout_rate values

param_dist = dict(model__neurons=neurons,
                  model__dropout_rate=dropout_rate,
                  model__weight_constraint=weight_constraint,
                  model__learning_rate=learning_rate,
                  model__momentum=momentum,
                  model__optimizer=optimizer)

In [48]:
param_dist["model__neurons"] = [grid + (1,) for grid in itertools.product(*[[20,50,100]]*3)]

print(f"Retrieving best parameters for a 4 layered fully-connected network (n_iter=50, cv=5):\n|")
model = KerasRegressor(model=create_model, epochs=100, batch_size=10, verbose=0,
                   first_input=[x_train_mi.shape[1]],
                   layers=4,
                   activation="relu")

grid = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=50, n_jobs=-1, cv=5)
grid.fit(x_train_mi, y_train)
print(f"Best score: {grid.best_score_}")
print(f"Best parameters: {grid.best_params_}")

Retrieving best parameters for a 4 layered fully-connected network (n_iter=50, cv=5):
|
Best score: 0.370032506271634
Best parameters: {'model__weight_constraint': 2.5, 'model__optimizer': 'Adadelta', 'model__neurons': (100, 100, 50, 1), 'model__momentum': 0.9, 'model__learning_rate': 0.465, 'model__dropout_rate': 0.1}


In [44]:
param_dist["model__neurons"] = [grid + (1,) for grid in itertools.product(*[[20,50,100]]*5)]

print(f"Retrieving best parameters for a 6 layered fully-connected network (n_iter=50, cv=5):\n|")
model = KerasRegressor(model=create_model, epochs=100, batch_size=10, verbose=0,
                   first_input=[x_train_mi.shape[1]],
                   layers=6,
                   activation="relu")

grid = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=1, n_jobs=-1, cv=5)
grid.fit(x_train_mi, y_train)
print(f"Best score: {grid.best_score_}")
print(f"Best parameters: {grid.best_params_}")

Retrieving best parameters for a 6 layered fully-connected network (n_iter=50, cv=5):
|
(20, 50, 50, 20, 100, 1)
Best score: -0.010840052742769002
Best parameters: {'model__weight_constraint': 0.5, 'model__optimizer': 'Adagrad', 'model__neurons': (20, 50, 50, 20, 100, 1), 'model__momentum': 0.3, 'model__learning_rate': 0.255, 'model__dropout_rate': 0.5}


In [47]:
param_dist["model__neurons"] = [grid + (1,) for grid in itertools.product(*[[20,50,100]]*7)]

print(f"Retrieving best parameters for a 8 layered fully-connected network (n_iter=50, cv=5):\n|")
model = KerasRegressor(model=create_model, epochs=100, batch_size=10, verbose=0,
                   first_input=[x_train_mi.shape[1]],
                   layers=8,
                   activation="relu")

grid = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=1, n_jobs=-1, cv=5)
grid.fit(x_train_mi, y_train)
print(f"Best score: {grid.best_score_}")
print(f"Best parameters: {grid.best_params_}")

Retrieving best parameters for a 8 layered fully-connected network (n_iter=50, cv=5):
|
Best score: -0.009267656321677187
Best parameters: {'model__weight_constraint': 3.5, 'model__optimizer': 'Adamax', 'model__neurons': (20, 50, 100, 20, 100, 20, 20, 1), 'model__momentum': 0.0, 'model__learning_rate': 0.17, 'model__dropout_rate': 0.5}


Como se pode observar pelos resultados, o modelo com o melhor desempenho (valor mínimo de 'mse') foi o que apresentava uma rede contendo 8 camadas densas (usando os valores de hiperparâmetros indicados no output), obtendo um score de **0.0097**.

Testando um maior conjunto de valores de hiperparâmetros e número de camadas densas, era possível encontrar um modelo ainda mais preciso. Contudo, este processo tem um peso computacional significativo, o que dificulta a procura do melhor conjunto de hiperparâmetros.