# Expriment with Sine function

In this experiment, we try to build models using Keras framework to learn and predict the sine function.
We test several parameters and check how well the networks learn and generalize.

**NOTE:** The model train methods were unified into a generalized method that could receive the hyperparameters and build distinct neural networks.

## 1. Desktop used to run the experiment

- Computador Dell XPS-8930-A5GM
- Intel i7 8700
- RAM 16GB
- HD 2TB
- GeForce GTX 1050 Ti 4GB

## 2. Libraries

- Keras 2.4.3 (Using GPU)
- Tensorflow 2.4.1 (Using GPU)


## 3. Importing required libraries

In [1]:
import os

import pandas as pd
import numpy as np
import math

from sklearn import preprocessing

import tensorflow as tf
from tensorflow import keras

from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD

from sklearn.metrics import mean_squared_error

import matplotlib.pyplot as plt
from datetime import datetime

2022-06-16 22:19:31.197910: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1


## 4. Data preparation

In this assignment, the function was modified to allow changes on the train/test sets sizes, noise scale, etc.


In [2]:
def prepare_data(noise_scale=0.1, test_size=0.15,  out_start=0, out_end=2*np.pi, ntot=100, outpath="image.png"):

    
    # Proportion of train size
    train_size = 1 - test_size
    
    # Number of training examples
    ntrain = int(train_size * ntot)

    # Number of test examples
    ntest = int(test_size * ntot)

    #
    #  range of values from random number function generator
    #
    in_start = 0
    in_end = 1
    
    #
    # Mapping the original values of random to the disired scale
    #
    slope = (out_end - out_start) / (in_end - in_start)
    in_value = np.random.rand(ntot,1) 
    
    #
    # Final dataset
    #
    x = out_start + slope * (in_value - in_start)

    #
    # Generating noise
    #
    s=np.random.normal(0, noise_scale, size = (ntot,1))

    #
    # Generating output
    #
    y=np.sin(x)+s

    #
    # Dataset splitting
    #
    xtrain, xtest = x[:ntrain], x[ntrain:]
    ytrain, ytest = y[:ntrain], y[ntrain:]

    # Visualizing the train set using scatter splot
    plt.plot(xtrain.T,ytrain.T,color = 'red', marker = "o")
    plt.title("seno")
    plt.xlabel("Angulo")
    plt.ylabel("Seno")
    plt.grid()
    plt.tight_layout()
    plt.savefig(outpath, dpi=200)
    plt.close()

    return xtrain, ytrain, xtest, ytest

In [3]:
def fit_model(xtrain, ytrain, xtest, ytest, epochs, lr=0.05, ns=0.1, momentum=0.8, patience=100, batch_size=5, activation="tahn", hidden_layer_size=10):
    """
    Method to train Networks 
    """

    tf.keras.backend.clear_session()
    early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=patience)
    # define model
    model = Sequential([Dense(hidden_layer_size, activation=activation, input_dim=1),
                        Dense(1, activation='linear')
                       ])
                       
    # compile model
    opt = SGD(lr, momentum=momentum)
    model.compile(loss='mean_squared_error', optimizer=opt, metrics=['mse'])
    # fit model
    history = model.fit(xtrain, ytrain, validation_data=(xtest, ytest), epochs=epochs,  batch_size=batch_size, verbose=0, callbacks=[early_stop])

    # plot learning curves
    plt.plot(history.history['loss'], label='train')
    plt.plot(history.history['val_loss'], label='test')
    plt.title('learning rate='+str(lr), pad=-80)

    plt.legend()
    output_path = "imgs/model_loss_ns=%.2f_epochs=%d_lr=%.2f_mom=%.2f_activation=%s_pati=%d.png"  %(ns, epochs, lr, momentum, activation, patience)
    plt.tight_layout()
    plt.savefig(output_path, dpi=200)
    plt.close()

    
    return model

## 5. Report for the Exercise

In this report, we try different setups of noise scale (ns), activation functions (af), learning rates (lr), patience


In [4]:
#
# Creating hyperparameters options
#

noise_scale_list = [0.05, 0.1, 0.2]
act_func_list = ["sigmoid", "relu", "tanh"]
lr_list = [0.005, 0.01, 0.05, 0.1]
patience_list = [100, 200]
hidden_layer_list = [5, 10, 50, 100, 200]

epochs = 2000

# Min and max value of sin function to learn
min_value = 0
max_value = 2*np.pi
batch_size=5

test_ratio = 0.3

num_examples = 100

In the following code, we run the experiments for all combinations of hyperparameters

_Note:_ In our computer, we run the experiments using Keras on GPU and this step took ~2 hours.


In [5]:

data = []
for ns in noise_scale_list:
    for act_funct in act_func_list:
        for lr in lr_list:
            for patience in patience_list:
                for hidden_layer_size in hidden_layer_list:
                    now = datetime.now()
                    print("=== (%s)Training network (ns: %.2f, act_funct: %s; lr: %.3f; patience:%d, hidden_layer_size:%d) ===" % (now.strftime("%d/%m/%Y %H:%M:%S"), ns, act_funct, lr, patience, hidden_layer_size))

                    #
                    # Preparing dataset
                    #
                    out_path="imgs/dataset_ns=%.2f_activation=%s_lr=%.2f_pati=%d_hidden=%d.png"  % (ns, act_funct, lr, patience, hidden_layer_size)
                    xtrain, ytrain, xtest, ytest = prepare_data(
                        ntot=num_examples, 
                        noise_scale=ns, 
                        out_start=0, out_end=max_value, 
                        test_size=test_ratio,
                        outpath=out_path
                    )

                    scaler = preprocessing.MinMaxScaler()
                    # fit using the train set
                    scaler.fit(xtrain)
                    # transform the test test
                    xtrainN = scaler.transform(xtrain)
                    xtestN = scaler.transform(xtest) 

                    X = np.linspace(0.0 , 2.0 * np.pi, num_examples).reshape(-1, 1)
                    XN = scaler.transform(X)

                    #
                    # Fitting model
                    #
                    
                    model = fit_model(
                        xtrainN, ytrain, xtestN, ytest, epochs=epochs, lr=lr, activation=act_funct, 
                        patience=patience, hidden_layer_size=hidden_layer_size, batch_size=batch_size
                    )

                    #
                    # Making predictions
                    #
                    Y = model.predict(XN)
                    plt.plot(XN,Y,color = 'red', marker = "o", label="model prediction")
                    plt.plot(xtestN.T,ytest.T,color = 'black', marker= "+")
                    plt.title("seno")
                    plt.xlabel("Angulo")
                    plt.ylabel("Seno")
                    plt.grid()
                    output_path = "imgs/prediction_ns=%.2f_activation=%s_lr=%.2f_pati=%d_hidden=%d.png"  % (ns, act_funct, lr, patience, hidden_layer_size)
                    plt.tight_layout()
                    plt.legend()
                    plt.savefig(output_path, dpi=200)
                    plt.close()

                    y_pred = model.predict(xtestN)
                    mse = mean_squared_error(ytest, y_pred)

                    data.append([ns, act_funct, lr, patience, hidden_layer_size, mse])

df = pd.DataFrame(data, columns=[    "Ns",    "Activation",    "Lr", "patience","hidden_layer", "mse"])
df.to_excel("results.xlsx", index=False)

=== (16/06/2022 22:19:33)Training network (ns: 0.05, act_funct: sigmoid; lr: 0.005; patience:100, hidden_layer_size:5) ===


2022-06-16 22:19:33.285330: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-06-16 22:19:33.285982: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2022-06-16 22:19:33.326953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-16 22:19:33.327126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1050 Ti computeCapability: 6.1
coreClock: 1.392GHz coreCount: 6 deviceMemorySize: 3.94GiB deviceMemoryBandwidth: 104.43GiB/s
2022-06-16 22:19:33.327158: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-06-16 22:19:33.329135: I tensorflow/stream_executor/platform/

=== (16/06/2022 22:19:51)Training network (ns: 0.05, act_funct: sigmoid; lr: 0.005; patience:100, hidden_layer_size:10) ===
=== (16/06/2022 22:19:58)Training network (ns: 0.05, act_funct: sigmoid; lr: 0.005; patience:100, hidden_layer_size:50) ===
=== (16/06/2022 22:20:04)Training network (ns: 0.05, act_funct: sigmoid; lr: 0.005; patience:100, hidden_layer_size:100) ===
=== (16/06/2022 22:20:15)Training network (ns: 0.05, act_funct: sigmoid; lr: 0.005; patience:100, hidden_layer_size:200) ===
=== (16/06/2022 22:20:25)Training network (ns: 0.05, act_funct: sigmoid; lr: 0.005; patience:200, hidden_layer_size:5) ===
=== (16/06/2022 22:21:36)Training network (ns: 0.05, act_funct: sigmoid; lr: 0.005; patience:200, hidden_layer_size:10) ===
=== (16/06/2022 22:22:40)Training network (ns: 0.05, act_funct: sigmoid; lr: 0.005; patience:200, hidden_layer_size:50) ===
=== (16/06/2022 22:22:56)Training network (ns: 0.05, act_funct: sigmoid; lr: 0.005; patience:200, hidden_layer_size:100) ===
=== (1

## Results Analysis

After experimenting all the combinations for the selected hyperparameters, we will run some analysis using the spreadsheet we exported after the training step and also using the charts for the predictions made by the network compared to the test set.

### Analysis of the impact of hyperparameters on MSE

First, we analyse the impact of each hyperparameter value on the MSE for test set.

In [10]:
df = pd.read_excel("results.xlsx")

print("Checking noise scale influence")
for noise_scale in noise_scale_list:
    mse = np.mean(df[df["Ns"]==noise_scale]["mse"])
    mse_std = np.std(df[df["Ns"]==noise_scale]["mse"])

    print("noise_scale\t%s\tTest set MSE: %.3f (±%.3f)" % (noise_scale, mse, mse_std))

Checking noise scale influence
noise_scale	0.05	Test set MSE: 0.162 (±0.191)
noise_scale	0.1	Test set MSE: 0.173 (±0.175)
noise_scale	0.2	Test set MSE: 0.222 (±0.161)


In [12]:

print("Checking hidden layer Activation influence")
for activation in act_func_list:
    mse = np.mean(df[df["Activation"]==activation]["mse"])
    mse_std = np.std(df[df["Activation"]==activation]["mse"])

    print("activation\t%s\tTest set MSE: %.3f (±%.3f)" % (activation, mse, mse_std))


Checking hidden layer Activation influence
activation	sigmoid	Test set MSE: 0.248 (±0.205)
activation	relu	Test set MSE: 0.215 (±0.174)
activation	tanh	Test set MSE: 0.094 (±0.099)


In [13]:
print("Checking learning rate influence")
for lr in lr_list:
    mse = np.mean(df[df["Lr"]==lr]["mse"])
    mse_std = np.std(df[df["Lr"]==lr]["mse"])

    print("learning rate\t%.3f\tTest set MSE: %.3f (±%.3f)" % (lr, mse, mse_std))


Checking learning rate influence
learning rate	0.005	Test set MSE: 0.176 (±0.103)
learning rate	0.010	Test set MSE: 0.171 (±0.130)
learning rate	0.050	Test set MSE: 0.164 (±0.180)
learning rate	0.100	Test set MSE: 0.231 (±0.254)


In [15]:
print("Checking patience influence")
for patience in patience_list:
    mse = np.mean(df[df["patience"]==patience]["mse"])
    mse_std = np.std(df[df["patience"]==patience]["mse"])

    print("patience\t%d\tTest set MSE: %.3f (±%.3f)" % (patience, mse, mse_std))

Checking patience influence
patience	100	Test set MSE: 0.186 (±0.164)
patience	200	Test set MSE: 0.185 (±0.191)


In [17]:
print("Checking hidden layer size influence")
for hidden_layer in hidden_layer_list:
    mse = np.mean(df[df["hidden_layer"]==hidden_layer]["mse"])
    mse_std = np.std(df[df["hidden_layer"]==hidden_layer]["mse"])

    print("hidden_layer\t%d\tTest set MSE: %.3f (±%.3f)" % (hidden_layer, mse, mse_std))

Checking hidden layer size influence
hidden_layer	5	Test set MSE: 0.141 (±0.159)
hidden_layer	10	Test set MSE: 0.142 (±0.167)
hidden_layer	50	Test set MSE: 0.171 (±0.113)
hidden_layer	100	Test set MSE: 0.217 (±0.199)
hidden_layer	200	Test set MSE: 0.257 (±0.208)


From the above experiments, in *average* one can note the following:

- In terms of **noise scale**
    - It was expected the increase on MSE when training network on noisier data. 
    - In the next analysis, we will see whether the models learned the sine function regardless the noise in the data
- In terms of **Hidden layer activation function**
    - By far, `tanh` was the best performed activation function. One possible explanation is that, due to the relationship between tangent and sine functions, such activation function could better learn the sine function.
    - The `ReLu` and `sigmoid` functions could also learn the sine function, but with poorer performance.
- In terms of **learning rate**:
    - One can see that both lower and higher values of LR performed poorer on MSE
    - The mean term, i.e., $lr=0.05$ achieved the best MSE, in average.
- In terms of **patience** in Early stopping:
    - The two values tested do not had,  on average, a significant difference in MSE.
- In terms of **Hidden Layer size**:
    - Simpler networks were the best perfomant ones on MSE.
    - IN this dataset, we would choose a hidden layer size of $5$ and still get good results.

### Analysing the predictions of the best models.