# Logistic regression and simple neural networks
<span style="font-size:9pt;">
author: MWLafarge (m.w.lafarge@tue.nl); affiliation: Eindhoven University of Technology; created: Feb 2020
</span>

## Preliminaries
In this exercise you will implement a logistic regression model using the __Keras__ framework with a Tensorflow backend.  
The documentation of the Tensorflow implementation of the Keras API that will be used in the exercises can be found [here](https://www.tensorflow.org/api_docs/python/tf/keras).

The main goal of this first exercise is to get familiar with the functional Keras API and develop an intuition of training a logistic regression model for a toy dataset. Furthermore, this notebook demonstrates how to implement the components of a machine-learning model with __tf.keras__, how to train and test the model, and how to interpret the learning process by visualizing the training and testing loss curves. Then, in the exercises you will use this knowledge to solve two more difficult classification problems by extending the logistic regression classifier to a neural network.

## Import the required libraries


In [None]:
# environement setup
%load_ext autoreload
%autoreload 2
#%matplotlib widget
%matplotlib inline

# system libraries
import sys
import os

# computational libraries
import numpy as np
import tensorflow as tf

# utility functions for this exercise
from utils_ex1 import plot_2d_dataset, plot_learned_landscape, Monitoring

## Import the dataset

Load the datasets that are stored as numpy arrays. The datasets consist of pairs of 2D __features__ (points in 2D space) and binary class __labels__. 

The arrays containing the features have the following dimensions: \[number of samples, number of features\].  

The arrays containing the labels have the following dimensions: \[number of samples\].

In [None]:
PATH_BASE_DIR = os.getcwd() # current working directory

def load_data_helper(path):
    data_dict = np.load(path) 
    data_points = data_dict["points"]
    data_labels = data_dict["labels"]
    
    return data_points, data_labels

def load_data(dataset_name):
    path_train = PATH_BASE_DIR + os.sep + "../data/data_toy_{}_training.npz".format(dataset_name)
    path_test  = PATH_BASE_DIR + os.sep + "../data/data_toy_{}_test.npz".format(dataset_name)
    data_points_train, data_labels_train = load_data_helper(path_train)
    data_points_test, data_labels_test = load_data_helper(path_test)
    
    return data_points_train, data_labels_train, data_points_test, data_labels_test

data_points_train, data_labels_train, data_points_test, data_labels_test = load_data("blobs")

print("Imported training points: ", data_points_train.shape)
print("Imported training labels: ", data_labels_train.shape)

print("Imported test points: ", data_points_test.shape)
print("Imported test labels: ", data_labels_test.shape)

The dataset can be visualized in the following way:

In [None]:
plot_2d_dataset(data_points_train, data_labels_train, data_points_test, data_labels_test)

## Creating a classifier

We want to train a logistic regression model that takes 2D features as input and outputs the likelihood of the class label.

### Defining the graph operations
In symbolic programming, we want to model the desired likelihood function as a sequence of operations (that forms a graph/network) from an input placeholder object to produce an output.

We need first to instantiate all the components we might want to use to construct the network:
- __tf.keras.Input()__ to define the input placeholder.  
- __tf.keras.layers.Dense()__ to define densely connected layers (2D matrix multiplication + non-linear activation).

Since we are training a logistic regression model, only and input and output layer need to be defined.

In [None]:
# input layer: samples with two features are expected
inputs = tf.keras.Input(shape=(2)) 

# output layer with sigmoid activation
layerOut = tf.keras.layers.Dense(1, activation="sigmoid") 

### Connecting the graph and instantiating the model
We can now join the graph components to define the model __output__ as a function of the __input placeholder__. Then we can instantiate the model with __tf.keras.Model__.

In [None]:
# logistic regression
outputs = layerOut(inputs) 

# instanciate the full model with the input-output objects
model = tf.keras.Model(inputs, outputs)

### Preparing to train the model

In order to train the model, we need to define a __loss function__ that we want to optimize, and the gradient-descent procedure (__optimizer__) we want to use to update the weights of the model during training.  

The __tf.keras__ library comes with a module with pre-defined standard loss functions (__tf.keras.losses__) and a module with standard optimization algorithms (__tf.keras.optimizers__)


In [None]:
# cross-entropy loss between the distribution of ground truth labels and the model predictions
loss = tf.keras.losses.BinaryCrossentropy()

# stochastic gradient descent with momentum
optimizer = tf.keras.optimizers.SGD(
    learning_rate = 0.01,
    momentum      = 0.9)

### Compiling the model
We finally configure the model for training by indicating the loss, the optimizer and performance metrics to be computed during training.  

We use __model.summary()__ to display and check the model architecture we implemented.


In [None]:
model.compile(
    optimizer = optimizer,
    loss      = tf.keras.losses.BinaryCrossentropy(),
    metrics   = ["accuracy"])

model.summary()

## Training the model and monitoring the training process
Now that the model is compiled, it is ready to be trained. The __tf.keras.Model.fit__ function enables starting the training process in a single call.
The function arguments we use in this exercises are:
- __x, y__: numpy arrays representing the training dataset features (x) and targets (y)
- __epochs__: the number of training epoch (the number of time the full training set is processed)
- __batch_size__: the size of the mini-batches
- __validation_data__: tuple of validation (test) features and labels (__x,y__)
- __validation_freq__: frequency with which to evaluate the model on the test subset
- __callbacks__: a list of objects inherited from __tf.keras.Callback__ that can be called automatically during the training process. Here the __Monitoring()__ callback is used to monitor the training and test losses during training. This is a custom function that we have implemented for these exercises.

More details can be found in the documentation of [__tf.keras.Model.fit__](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit).


In [None]:
nbEpochs  = 300
batchSize = 32

# fit the model to our data
model.fit(
    x = data_points_train,
    y = data_labels_train,
    
    epochs     = nbEpochs,
    batch_size = batchSize,

    validation_data = (data_points_test[:32,], data_labels_test[:32,]),
    validation_freq = 10,

    verbose   = 0,
    callbacks = [Monitoring(x_range=[0, 10000], refresh_steps=10)])

## Evaluating the model
Once the model is trained, we want to quantify its performances on the hold-out test set. 
The __tf.keras.Model.evaluate__ function returns a list including the value of the loss function and the values of the chosen metrics (defined previously when calling __tf.keras.Model.compile__) for the test dataset.

In [None]:
# evaluate the model on test data
eval_out = model.evaluate(
    x = data_points_test,
    y = data_labels_test,
    verbose = 0)

print("Accuracy on test set: ", eval_out[1])

To get a better idea of the learned function we can visualize its output on the 2D plane using the function __plot_learned_landscape__.

In [None]:
plot_learned_landscape(model, data_points_train, data_labels_train, data_points_test, data_labels_test)

# Exercises

## Solving the circle (or bullseye) dataset

The circle dataset can be loaded and visualized in the following way:


In [None]:
data_points_train, data_labels_train, data_points_test, data_labels_test = load_data("circle")
plot_2d_dataset(data_points_train, data_labels_train, data_points_test, data_labels_test)

Train a neural network with one hidden layer that can solve this classification problem.  
You can use the following template to write and test your solution, which is very similar to the code above used to train logistic regression:

In [None]:
#-------------------------------------------------------------------------#

# TODO: define the layers of your neural network
# TODO: connect all the layers from an input object to an output object

#-------------------------------------------------------------------------#

# instanciate the full model with the input-output objects
model = tf.keras.Model(inputs, outputs)

# cross-entropy loss between the distribution of ground truth labels and the model predictions
loss = tf.keras.losses.BinaryCrossentropy()

# stochastic gradient descent with momentum
optimizer = tf.keras.optimizers.SGD(
    learning_rate = 0.01,
    momentum      = 0.9)

model.compile(
    optimizer = optimizer,
    loss      = tf.keras.losses.BinaryCrossentropy(),
    metrics   = ["accuracy"])

model.summary()

nbEpochs  = 300
batchSize = 32

# we fit the model to our data
model.fit(
    x = data_points_train,
    y = data_labels_train,
    
    epochs     = nbEpochs,
    batch_size = batchSize,

    validation_data = (data_points_test[:32,], data_labels_test[:32,]),
    validation_freq = 10,

    verbose   = 0,
    callbacks = [Monitoring(x_range=[0, 10000], refresh_steps=10)])

In [None]:
# Visualizing the learned function
plot_learned_landscape(model, data_points_train, data_labels_train, data_points_test, data_labels_test)

## Solving the spiral dataset

Now, attempt to solve the spiral dataset that can be loaded and visualized in the following way: 

In [None]:
data_points_train, data_labels_train, data_points_test, data_labels_test = load_data("spiral")
plot_2d_dataset(data_points_train, data_labels_train, data_points_test, data_labels_test)

Try to find the smallest neural network that can give an accurate solution. Note that you might also have to modify the learning rate of the stochastic gradient descent algorithm and/or increase the total training time (number of epochs) in order to arrive fast at an acceptable solution. You can use the following code template:

In [None]:
#-------------------------------------------------------------------------#

# TODO: define the layers of your neural network
# TODO: connect all the layers from an input object to an output object

#-------------------------------------------------------------------------#

# instanciate the full model with the input-output objects
model = tf.keras.Model(inputs, outputs)

# cross-entropy loss between the distribution of ground truth labels and the model predictions
loss = tf.keras.losses.BinaryCrossentropy()

# stochastic gradient descent with momentum
optimizer = tf.keras.optimizers.SGD(
    learning_rate = 0.01,
    momentum      = 0.9)

model.compile(
    optimizer = optimizer,
    loss      = tf.keras.losses.BinaryCrossentropy(),
    metrics   = ["accuracy"])

model.summary()

nbEpochs  = 300
batchSize = 32

# we fit the model to our data
model.fit(
    x = data_points_train,
    y = data_labels_train,
    
    epochs     = nbEpochs,
    batch_size = batchSize,

    validation_data = (data_points_test[:32,], data_labels_test[:32,]),
    validation_freq = 10,

    verbose   = 0,
    callbacks = [Monitoring(x_range=[0, 10000], refresh_steps=10)])

plot_learned_landscape(model, data_points_train, data_labels_train, data_points_test, data_labels_test)

## Overfitting

As it is now, the spiral dataset is relatively difficult to overfit since the classes are well separated and there is a large number of training samples.  
We can simulate an even more difficult problem by adding some noise to the data and subsampling the training dataset:

In [None]:
data_points_train, data_labels_train, data_points_test, data_labels_test = load_data("spiral")

data_points_train = data_points_train[::2,:]
data_labels_train = data_labels_train[::2]

data_points_train = data_points_train + np.random.uniform(high=0.8, size=data_points_train.shape)
data_points_test  = data_points_test  + np.random.uniform(high=0.8, size=data_points_test.shape)

plot_2d_dataset(data_points_train, data_labels_train, data_points_test, data_labels_test)

The code below will train a relatively large neural network with this dataset.  
Due to the noisy nature of the data and the smaller number of samples the network will start to overfit.  
Question: How can you diagnose that the model is overfiting from the training and testing loss curves?

In [None]:
inputs = tf.keras.Input(shape=(2)) 

layer1 = tf.keras.layers.Dense(100, activation="relu")
layer2 = tf.keras.layers.Dense(100, activation="relu")
layer3 = tf.keras.layers.Dense(100, activation="relu")

# output layer with sigmoid activation
layerOut = tf.keras.layers.Dense(1, activation="sigmoid") 

# neural network function
outputs = layerOut(layer3(layer2(layer1(inputs))))

# instanciate the full model with the input-output objects
model = tf.keras.Model(inputs, outputs)

# cross-entropy loss between the distribution of ground truth labels and the model predictions
loss = tf.keras.losses.BinaryCrossentropy()

# stochastic gradient descent with momentum
optimizer = tf.keras.optimizers.SGD(
    learning_rate = 0.05,
    momentum      = 0.9)

model.compile(
    optimizer = optimizer,
    loss      = tf.keras.losses.BinaryCrossentropy(),
    metrics   = ["accuracy"])

model.summary()

In [None]:
nbEpochs  = 1000
batchSize = 32

# we fit the model to our data
model.fit(
    x = data_points_train,
    y = data_labels_train,
    
    epochs     = nbEpochs,
    batch_size = batchSize,

    validation_data = (data_points_test[:32,], data_labels_test[:32,]),
    validation_freq = 10,

    verbose   = 0,
    callbacks = [Monitoring(x_range=[0, 10000], refresh_steps=10)])

In [None]:
# Visualizing the learned function
plot_learned_landscape(model, data_points_train, data_labels_train, data_points_test, data_labels_test)