# Smart Distribution Systems (B-KUL-H00P3A) - Exercise session 1
Prof. Geert Deconinck

This exercise session will familiarize the students with basic concepts of machine learning by giving them a first introduction to the most used lbraries in data science.

The students will learn to:
* set up a machine learning environment, using state-of-the-art tools, such as keras, tensorflow and theano in Python;
* implement and train a two-layer neural network using Keras;
* perform the initial data exploration steps of a real-life forecasting problem in power systems.

First, we need to import the libraries necessary during this exercise session.
* Pandas is a library providing intuitive data structures for data analysis of labeled data, you can think about these structures (DataFrames) as tables. You can find the documentation here: http://pandas.pydata.org/pandas-docs/stable/.
* Numpy is the fundamental package for scientific computing in Python. Simply put, Numpy provide MatLab-like functionality to Python. Numpy-documentation: https://docs.scipy.org/doc/numpy-1.14.0/reference/.
* Matplotlib provides a MatLab-like plotting interface to python. Docs: https://matplotlib.org/api/pyplot_api.html.

In [1]:
import pandas as pd
import numpy as np
%matplotlib ipympl
import matplotlib.pyplot as plt

The 'seed' of the numpy random generators determines which '(pseudo-)random-numbers' are being generated. <br>
To have the same results between runs and between groups, we set the seed of the random generator. More info: https://stackoverflow.com/questions/21494489/what-does-numpy-random-seed0-do.

In [2]:
np.random.seed(seed=1)

## 1 Linear (ridge) regression

To get familiar with the basic syntax of Python and the different libraries used in this exercise session, first a really simple linear regression problem is solved. 

We start by generating a simple dataset. <br>
The dataset consists of (x,y) pairs. The x-values are random numbers between 0 and 100. The y-values are constructed by adding a small random number (drawn from N(0,3)) to the x-value. As a result, if we would like to predict the y-value for an unknown x-value, the best guess would be to simply predict x. <br>
Info on the different numpy-functions used: https://docs.scipy.org/doc/numpy/reference/routines.random.html.

In [3]:
a = 0
b = 100
x = np.random.random_sample(size=(50, 1)) # draw 50 random numbers from [0,1.0) return them as vector with the shape (50, 1)
x = (b-a) * x + a # rescale to the interval [0, 100)
mean = 0
sigma = 3
r =  sigma * np.random.randn(50, 1) + mean # draw a number form N(0,1) and scale it to N(mean,sigma)
y = x + r # add the vector of 50 random numbers to the vector of x-values to get the y-values

Using matplotlib, we can easily plot these points on a scatter plot.

In [4]:
plt.figure()
plt.scatter(x, y)
plt.show()

By means of illustration, we can now train a linear regression model to approximate the funcion y=f(x). <br>
We will use ridge-regression. Therefore, we import 'linear_model' from scikit learn. Scikit-learn is a Python library build on Numpy and offers easy-to-use machine learning functions. <br>
The documentation of scikit-learn is found here: http://scikit-learn.org/stable/. <br>
And the documentation of linear_model.Ridge(): http://scikit-learn.org/stable/modules/linear_model.html#ridge-regression.

In [5]:
from sklearn import linear_model
model = linear_model.Ridge(alpha=0.5)
model.fit(x, y) # the input-matrix should have the shape [n_samples, n_features]. Thus, in this case [50, 1].

Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

We have now fitted our regression model called, using our generated samples. To see if our model approximates the function y=x, we can generate a new set of x-values and use the model to predict the y-values. <br>
In the next cell, generate a vector called x_val containing 20 random numbers in [0, 100].

In [6]:
x_val_size = 20
x_val = 100 * np.random.random_sample(size=(x_val_size, 1))

We can use our linear model to predict the y-values of x_val and again make a scatter plot. The y-values should be on the line y=x.

In [7]:
y_val = model.predict(x_val)
plt.figure()
plt.scatter(x_val, y_val)
plt.plot(x_val, y_val, color='red')
plt.show()

In the previous regression problem, there was clear linear relation between the inputs (x-values) and the targets (y-values). This is obviously not always the case. To illustrate this point we will now to try to apprximate a sine-function with the same ridge regression method. <br>
In the next cell, generate a vector (call it x_train) with shape (500,1) containing 500 random numbers in the interval [0, 2*Pi). Calculate the sine of all elements in this vector and call this y_train. <br>
Hint: check the numpy documentation for sin(x).

In [8]:
x_size = 300
x = np.random.random_sample((x_size,1)) # generate a vector with shape (1,50) of random numbers (in the interval [0, 1) )
x_train = 2*np.pi * x # multiply by 2*Pi (to have random numbers in the interval [0, 2*Pi))
y_train = np.sin(x_train) # calcaulte the sine of the sorted vector

We plot this training set together with the actual sine-function.

In [9]:
# generate values to plot the actual sine-function
x_sin = np.arange(0, 2*np.pi, 0.1) # a list containing all the numbers in [0, 2*pi] with a step of 0.1
y_sin = np.sin(x_sin) # the sine of all these numbers
# plot the sine-function as well as a scatter plot of the training set
plt.figure()
plt.scatter(x_train, y_train)
plt.plot(x_sin, y_sin, color='red')
plt.show()

Now that we have again a vector of inputs and targets for our linear model, we can fit it to approximate the sine function.

In [10]:
model = linear_model.Ridge(alpha = .5)
model.fit(x_train, y_train)

Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

Generate a new vector of x-values (between 0 and 2*Pi). Generate more or less 50 test-set values. Calculate both their true values and their predictions. Draw one scatter plot including both the targets and the predictions, in red and blue respectively. <br>
Call the x-values: 'x_test'. The true y-values and the prediction: 'y_test' and 'y_pred', respectively.

In [11]:
x_test = 2*np.pi * np.random.random_sample((1, 40)).reshape((-1, 1))
y_test = np.sin(x_test)

In [12]:
y_pred = model.predict(x_test)
plt.figure()
plt.scatter(x_test, y_pred)
plt.scatter(x_test, y_test, color='red')
plt.plot(x_sin, y_sin, color='green', alpha=0.3)
plt.show()

Why are the predictions not representing the sine-function?

It is clear that we need other regression techniques to be able to approximate a broader range of functions.

## 2 Non-linear regression

Since in a lot of real-life problems it is required to apprximate a non-linear function, we need methods to do this.

### 2.1 Kernel ridge-regression

Different techniques exist. We will first extend linear ridge-regression to a non-linear variant, using the so-called 'kernel-trick' (https://towardsdatascience.com/understanding-the-kernel-trick-e0bc6112ef78). Here, it suffices to know the basic philosophy behind kernel functions. With a kernel function we transform the input-space to a higher dimensional space. In this space, our non-linear function might be linear and we solve a linear regression problem. Afterwards, we transform our solution back to the original space. Simply put, we are thus able to approximate a non-linear function in our input-space by approximating a linear-function in a higher dimensional space. <br>
In the next piece of code we will use the same inputs and targets (of the sine-function) but now perform ridge regression with a radial basis kernel-function (rbf). More info about radial basis functions can be found here: https://en.wikipedia.org/wiki/Radial_basis_function. <br>
Thereafter, we use the same x_test and y_test you generated during the linear ridge regression to estimate y_pred. We plot both the linear regression solution, and new solution.£

In [13]:
from sklearn import kernel_ridge
model = kernel_ridge.KernelRidge(alpha=3.5, kernel='rbf', gamma=0.1)
model.fit(x_train, y_train)
y_pred_kernel = model.predict(x_test)
plt.figure()
plt.subplot(211)
plt.scatter(x_test, y_pred) # the predictions made using ridge regression
plt.scatter(x_test, y_test, color='red') # the ground truth
plt.plot(x_sin, y_sin, color='green', alpha=0.3) # the sine-function, per reference
plt.subplot(212)
plt.scatter(x_test, y_pred_kernel) # the predictions made by kernel ridge regression
plt.scatter(x_test, y_test, color='red')
plt.plot(x_sin, y_sin, color='green', alpha=0.3)
plt.show()

How do the results compare?

### 2.2 Neural networks

Nowadays, neural networks (NNs) are highly used for regression purposes. Neural networks have already been explained in the introduction slides. In this exercise session we will be using Keras, a high level neural network API in Python. Keras can use different backends, here we use the 'tensorflow' backend. Keras makes dealing with NNs easy. You can find the Keras documentation here: https://keras.io/.

In [14]:
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import RMSprop, SGD

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


Make sure to read the Keras docs dealing with the Sequential model (https://keras.io/getting-started/sequential-model-guide/). The sequential model is the most fundamental part of Keras and allows to sequentially connect different layers of a NN. Thereafter, this model can be compiled and trained. All these low-level functionalities have been implemented in Keras. Next, we create a sequential model and add one hidden layer and one output layer. We use a 'sigmoid' activation function for the hidden layer, and a linear activation function for the output layer. In order to have a non-linear regression model, we need at least one non-linear activation function. Since our model only requires one output, our output layer consists of one neuron. We, arbitrarly, decide to have 12 hidden neurons. Furthermore, we are going to use the RMSprop optimizer and a mean squared error. <br>
The amount of hidden neurons and the activation function can greatly affect the performance of our model. For now, this model suffices, but make sure to experiment with different NN architectures once you try to solve a more complex regression problem. <br>
* More info on activation functions: https://en.wikipedia.org/wiki/Activation_function, https://keras.io/activations/ <br>
* Good practices with regards to NN architecture: https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw <br>
* More info on why we need a optimizer and the different optimizers available: https://towardsdatascience.com/types-of-optimization-algorithms-used-in-neural-networks-and-ways-to-optimize-gradient-95ae5d39529f, https://keras.io/optimizers/

In [15]:
neurons = [12, 1]
activation_functions = ['sigmoid', 'linear']

model = Sequential()
model.add(Dense(neurons[0], input_dim=1, activation=activation_functions[0]))
model.add(Dense(neurons[1], activation=activation_functions[1]))
    
rprop = RMSprop(lr=0.001, rho=0.9, epsilon=1e-6)
model.compile(loss='mean_squared_error', optimizer=rprop)

We can now fit this model to our earlier defined training set (x_train, y_train). <br> 
Running the next piece of code, can take a while (wait until the MSE is printed)

In [16]:
output_training = model.fit(x_train, y_train, epochs=700, batch_size=6, verbose=0) # train the model
mse = output_training.history['loss'][-1] # from the 'history' of the error during training, take the last element (-1)
print('- mse is %.4f' % mse + ' @ ' + str(len(output_training.history['loss']))) # print the final MSE and at which episode it occured

- mse is 0.0259 @ 700


Again, we can plot the predictions and the actual y_test values.

In [17]:
# predict the test set y-values
y_pred = model.predict(x_test)
plt.figure()
plt.scatter(x_test, y_pred, alpha=0.7) # plot the y-values predicted by the NN
plt.scatter(x_test, y_test, color='red', alpha=0.4) # plot the actual y-values, a bit transparant
plt.plot(x_sin, y_sin, color='green', alpha=0.3) # plot the sine-function
plt.show()

This plot gives a pretty good idea about the performance of the NN. <br>
Feel free to experiment with different NN architectures, optimization algorithms and activation functions, by re-running the past 3 cells. <br>
A clear performance measure can give a better indication than a graph. Implement the mean_squared_error function in the next cell. This function should return the mean squared error between the vectors x and y. You can assume the vector have the shape (n_samples, n_features). <br>
Check the numpy docs!

In [18]:
def mean_squared_error(y1, y2):
    return np.mean(np.power(y1-y2, 2), axis=0)

Use this function to calculate the MSE between y_test and y_pred. <br>
Print it, and check wether it deviates from the error on the training set (printed after training). Why?

In [19]:
mse = mean_squared_error(y_test, y_pred)
print(mse)

[0.03089411]


Of course, in real life, datapoints will almost never exactly match the underlying function, and there will always be a bit of noise in the measurements, e.g. due to noise of a sensor. <br>
Lets make it a bit more challenging for the NN and add noise to the y_train values. By doing this the tuples (x,y) won't exactly match the sine-function anymore. We can show this by plotting the sine and the training set. <br>
You can play with the value of the standard deviation (sigma) and see the result on the accuracy of the prediction (of the test-set).

In [20]:
sigma = 0.1
y_train_noise = y_train + sigma * np.random.randn(y_train.shape[0], y_train.shape[1])
plt.figure()
plt.plot(x_sin, y_sin, color='red')
plt.scatter(x_train, y_train_noise, alpha=0.5)
plt.show()

In the next cell, recompile the NN, fit the NN to the new (!) training set and calculate the MSE on the test set. Again plot the test set ground truth and predictions. Add the actual sine as a reference.

In [22]:
model.compile(loss='mean_squared_error', optimizer=rprop)
output_training = model.fit(x_train, y_train_noise, epochs=700, batch_size=6, verbose=0) # train the model
mse = output_training.history['loss'][-1] # from the 'history' of the error during training, take the last element (-1)
print('- mse is %.4f' % mse + ' @ ' + str(len(output_training.history['loss']))) # print the final MSE and at which episode it occured
y_pred = model.predict(x_test)
plt.figure()
plt.scatter(x_test, y_pred, alpha=0.7) # plot the y-values predicted by the NN
plt.scatter(x_test, y_test, color='red', alpha=0.4) # plot the actual y-values, a bit transparant
plt.plot(x_sin, y_sin, color='green', alpha=0.3) # plot the sine-function
plt.show()
mse = mean_squared_error(y_test, y_pred)
print(mse)

- mse is 0.0100 @ 700


[0.00292668]


What do you think, is the NN handling the noise well?

Neural networks have been know to perform supervised learning (https://en.wikipedia.org/wiki/Supervised_learning) pretty well, as long as they have enough examples. <br>
Next, select 10% of the points from the training set (x_train, y_train_noise). Make sure to select tuples, and not x and y independently. Run the previous cell again, with this new (smaller) training set. <br>
Is the result as good/bad as you expected?

In [23]:
select_size = int(0.1 * x_size)
idxs = np.random.randint(0, x_size, size=select_size)
x_train = x_train[idxs]
y_train_noise = y_train_noise[idxs]

Change to a plot of train_loss and val_loss in function of epochs

In [24]:
sigma = 0.1
train_size = 20 
val_size = 10
nb_outliers = 3
x_train = 2*np.pi * np.random.random_sample((train_size, 1))
y_train = np.sin(x_train) #+ sigma * np.random.randn(train_size, 1)
x_out = 3*np.pi/2 + 0.1 * np.random.randn(nb_outliers, 1)
y_out = 0.1 + 0.1 * np.random.randn(nb_outliers, 1)
x_val = 2*np.pi * np.random.random_sample((val_size, 1))
y_val = np.sin(x_val)
plt.figure()
plt.scatter(x_train, y_train)
plt.scatter(x_out, y_out, color='red')
plt.scatter(x_val, y_val, color='brown')
plt.plot(x_sin, y_sin, color='green', alpha=0.3)
plt.show()
x_train = np.concatenate((x_train, x_out))
y_train = np.concatenate((y_train, y_out))

In [25]:
neurons = [20, 20, 1]
activation_functions = ['sigmoid', 'sigmoid', 'linear']

model = Sequential()
model.add(Dense(neurons[0], input_dim=1, activation=activation_functions[0]))
model.add(Dense(neurons[1], activation=activation_functions[1]))
model.add(Dense(neurons[2], activation=activation_functions[2]))
    
#optim = RMSprop(lr=0.001, rho=0.9, epsilon=1e-6)
optim = SGD()
model.compile(loss='mean_squared_error', optimizer=optim)

In [26]:
output_training = model.fit(x_train, y_train, epochs=10000, batch_size=1, verbose=0, validation_data=(x_val,y_val)) # train the model
mse = output_training.history['loss'][-1] # from the 'history' of the error during training, take the last element (-1)
print('- mse is %.4f' % mse + ' @ ' + str(len(output_training.history['loss']))) # print the final MSE and at which episode it occured

- mse is 0.1084 @ 10000


In [27]:
y_pred = model.predict(x_train)
plt.figure()
plt.scatter(x_train, y_pred, alpha=0.7) # plot the y-values predicted by the NN
plt.scatter(x_train, y_train, color='red', alpha=0.4) # plot the actual y-values, a bit transparant
plt.plot(x_sin, y_sin, color='green', alpha=0.3) # plot the sine-function
plt.show()

In [28]:
plt.figure()
plt.plot(output_training.history['loss'])
plt.plot(output_training.history['val_loss'], color='red')
plt.show()