# Smart Distribution Systems (B-KUL-H00P3A) - Exercise session 1
Prof. Geert Deconinck

This exercise session will familiarize the students with basic concepts of machine learning by giving them a first introduction to the most used lbraries in data science.

The students will learn to:
* set up a machine learning environment, using state-of-the-art tools, such as keras, tensorflow and theano in Python;
* implement and train a two-layer neural network using Keras;
* perform the initial data exploration steps of a real-life forecasting problem in power systems.

First, we need to import the libraries necessary during this exercise session.
* Pandas is a library providing intuitive data structures for data analysis of labeled data, you can think about these structures (DataFrames) as tables. You can find the documentation here: http://pandas.pydata.org/pandas-docs/stable/.
* Numpy is the fundamental package for scientific computing in Python. Simply put, Numpy provide MatLab-like functionality to Python. Numpy-documentation: https://docs.scipy.org/doc/numpy-1.14.0/reference/.
* Matplotlib provides a MatLab-like plotting interface to python. Docs: https://matplotlib.org/api/pyplot_api.html.

In [7]:
import pandas as pd
import numpy as np
%matplotlib ipympl
import matplotlib.pyplot as plt

The 'seed' of the numpy random generators determines which '(pseudo-)random-numbers' are being generated. <br>
To have the same results between runs and between groups, we set the seed of the random generator. More info: https://stackoverflow.com/questions/21494489/what-does-numpy-random-seed0-do.

In [235]:
np.random.seed(seed=1)

## 1 Linear regression

To get familiar with the basic syntax of Python and the different libraries used in this exercise session, first a really simple linear regression problem is solved. 

We start by generating a simple dataset. <br>
The dataset consists of (x,y) pairs. The x-values are random numbers between 0 and 100. The y-values are constructed by adding a small random number (drawn from N(0,3)) to the x-value. As a result, if we would like to predict the y-value for an unknown x-value, the best guess would be to simply predict x. <br>
Info on the different numpy-functions used: https://docs.scipy.org/doc/numpy/reference/routines.random.html.

In [236]:
a = 0
b = 100
x = np.random.random_sample(size=(50, 1)) # draw 50 random numbers from [0,1.0) return them as vector with the shape (50, 1)
x = (b-a) * x + a # rescale to the interval [0, 100)
mean = 0
sigma = 3
r =  sigma * np.random.randn(50, 1) + mean # draw a number form N(0,1) and scale it to N(mean,sigma)
y = x + r # add the vector of 50 random numbers to the vector of x-values to get the y-values

Using matplotlib, we can easily plot these points on a scatter plot.

In [237]:
plt.figure()
plt.scatter(x, y)
plt.show()

By means of illustration, we can now train a linear regression model to approximate the funcion y=f(x). <br>
We will use ridge-regression. Therefore, we import 'linear_model' from scikit learn. Scikit-learn is a Python library build on Numpy and offers easy-to-use machine learning functions. <br>
The documentation of scikit-learn is found here: http://scikit-learn.org/stable/. <br>
And the documentation of linear_model.Ridge(): http://scikit-learn.org/stable/modules/linear_model.html#ridge-regression.

In [238]:
from sklearn import linear_model
model = linear_model.Ridge(alpha=0.5)
model.fit(x, y) # the input-matrix should have the shape [n_samples, n_features]. Thus, in this case [50, 1].

Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

We have now fitted our regression model called, using our generated samples. To see if our model approximates the function y=x, we can generate a new set of x-values and use the model to predict the y-values. <br>
In the next cell, generate a vector called x_val containing 20 random numbers in [0, 100].

In [239]:
x_val_size = 20
x_val = x = 100 * np.random.random_sample(size=(20, 1))

We can use our linear model to predict the y-values of x_val and again make a scatter plot. The y-values should be on the line y=x.

In [240]:
y_val = model.predict(x_val)
plt.figure()
plt.scatter(x_val, y_val)
plt.plot(x_val, y_val, color='red')
plt.show()

In the previous regression problem, there was clear linear relation between the inputs (x-values) and the targets (y-values). This is obviously not always the case. To illustrate this point we will now to try to apprximate a sine-function with the same ridge regression method. <br>
In the next cell, generate a vector (call it x_train) with shape (500,1) containing 500 sorted random numbers in the interval [0, 2*Pi). Calculate the sine of all elements in this vector and call this y_train. <br>
Hint: check the numpy documentation for sorting and sin(x).

In [8]:
x_size = 500
x = np.random.random_sample((x_size,1)) # generate a vector with shape (1,50) of random numbers (in the interval [0, 1) )
x = 2*np.pi * x # multiply by 2*Pi (to have random numbers in the interval [0, 2*Pi))
x_train = np.sort(x, axis=0) # sort the vector
y_train = np.sin(x_train) # calcaulte the sine of the sorted vector

Plot the x- and y-values using matplotlib. Use plot() and not scatter().

In [9]:
plt.figure()
plt.plot(x_train, y_train)
plt.show()

Now that we have again a vector of inputs and targets for our linear model, we can fit it to approximate the sine function.

In [243]:
model = linear_model.Ridge(alpha = .5)
model.fit(x_train, y_train)

Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

Generate a new vector of x-values (between 0 and 2*Pi). Calculate both their true values and their predictions. Draw one scatter plot including both the targets and the predictions, in red and blue respectively. <br>
Call the x-values: 'x_test'. The true y-values and the prediction: 'y_test' and 'y_pred', respectively.

In [10]:
x_test = 2*np.pi * np.random.random_sample((1, 40)).reshape((-1, 1))
y_test = np.sin(x_test)

In [11]:
y_pred = model.predict(x_test)
plt.figure()
plt.scatter(x_test, y_pred)
plt.scatter(x_test, y_test, color='red')
plt.show()

Why are the predictions not representing the sine-function?

It is clear that we need other regression techniques to be able to approximate a broader range of functions.

## 2 Non-linear regression

Since in a lot of real-life problems it is required to apprximate a non-linear function, we need methods to do this.

### 2.1 Ridge-regression

Different techniques exist. We will first extend linear ridge-regression to a non-linear variant, using the so-called 'kernel-trick' (https://towardsdatascience.com/understanding-the-kernel-trick-e0bc6112ef78). Here, it suffices to know the basic philosophy behind kernel functions. With a kernel function we transform the input-space to a higher dimensional space. In this space, our non-linear function might be linear and we solve a linear regression problem. Afterwards, we transform our solution back to the original space. Simply put, we are thus able to approximate a non-linear function in our input-space by approximating a linear-function in a higher dimensional space. <br>
In the next piece of code we will use the same inputs and targets (of the sine-function) but now perform ridge regression with a radial basis kernel-function (rbf). More info about radial basis functions can be found here: https://en.wikipedia.org/wiki/Radial_basis_function. <br>
Thereafter, we use the same x_pred and y_target you generated during the linear ridge regression to estimate y_pred.

In [250]:
from sklearn import kernel_ridge
model = kernel_ridge.KernelRidge(alpha=3.5, kernel='rbf', gamma=0.1)
model.fit(x_train, y_train)
y_pred_kernel = model.predict(x_test)
plt.figure()
plt.subplot(211)
plt.scatter(x_test, y_pred)
plt.scatter(x_test, y_test, color='red')
plt.subplot(212)
plt.scatter(x_test, y_pred_kernel)
plt.scatter(x_test, y_test, color='red')
plt.show()

How do the results compare?

### 2.2 Neural networks

Nowadays, neural networks (NNs) are highly used for regression purposes. Neural networks have already been explained in the introduction slides. In this exercise session we will be using Keras, a high level neural network API in Python. Keras can use different backends, here we use the 'tensorflow' backend. Keras makes dealing with NNs easy. You can find the Keras documentation here: https://keras.io/.

In [12]:
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import RMSprop

Make sure to read the Keras docs dealing with the Sequential model (https://keras.io/getting-started/sequential-model-guide/). The sequential model is the most fundamental part of Keras and allows to sequentially connect different layers of a NN. Thereafter, this model can be compiled and trained. All these low-level functionalities have been implemented in Keras. Next, we create a sequential model and add one hidden layer and one output layer. We use a 'relu' activation function for the hidden layer, and a linear activation function for the output layer. In order to have a non-linear regression model, we need at least one non-linear activation function. Since our model only requires one output, our output layer consists of one neuron. We, arbitrarly, decide to have 12 hidden neurons. Furthermore, we are going to use the RMSprop optimizer and a mean squared error. <br>
The amount of hidden neurons and the activation function can greatly affect the performance of our model. For now, this model suffices, but make sure to experiment with different NN architectures once you try to solve a more complex regression problem. <br>
* More info on activation functions: https://en.wikipedia.org/wiki/Activation_function, https://keras.io/activations/ <br>
* Good practices with regards to NN architecture: https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw <br>
* More info on why we need a optimizer and the different optimizers available: https://towardsdatascience.com/types-of-optimization-algorithms-used-in-neural-networks-and-ways-to-optimize-gradient-95ae5d39529f, https://keras.io/optimizers/

In [13]:
neurons = [12, 1]
activation_functions = ['relu', 'linear']

model = Sequential()
model.add(Dense(neurons[0], input_dim=1, activation=activation_functions[0]))
model.add(Dense(neurons[1], activation=activation_functions[1]))
    
rprop = RMSprop(lr=0.001, rho=0.9, epsilon=1e-6)
model.compile(loss='mean_squared_error', optimizer=rprop)

We can now fit this model to our earlier defined training set (x_train, y_train).

In [14]:
output_training = model.fit(x_train, y_train, epochs=500, batch_size=32, verbose=0) # train the model
mse = output_training.history['loss'][-1] # from the 'history' of the error during training, take the last element (-1)
print('- mse is %.4f' % mse + ' @ ' + str(len(output_training.history['loss']))) # print the final MSE and at which episode it occured

- mse is 0.1464 @ 500
