# Deep Learning - **Homework #1**

* Teching assistant email: trung@uef.fi
* Deadline: **23:59 - 14/12/2019 (UPDATED)**
* Maximum: **3 points**

Goals:

* Basic Machine Learning understanding
* Perceptron algorithm
* Multi-layer Adaline algorithm

References:

1. Jean-Christophe B. Loiseau, 2019. [Rosenblatt’s perceptron, the first modern neural network][ref_1]. Towards Data Science, Medium.

How to submit:

* Option#1: **File** $\to$ **Download .ipynb** $\to$ _Send to .ipynb file to my email, or submit it to moodle page_.
* Option#2: **Share** read-only notebook link to my email.
* _If you choose to share the notebook, please re-name the notebook to your student name and student number, I will take the snapshot of your notebook before the deadline, any modification afterward will be disregarded._

**NOTE**: This is official homework and will be graded

[ref_1]: https://towardsdatascience.com/rosenblatts-perceptron-the-very-first-neural-network-37a3ec09038a

In [0]:
# All libraries we use for this HW, run this block first!
%tensorflow_version 2.x
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from sklearn.datasets import load_linnerud
from sklearn.linear_model import LinearRegression
from sklearn.cluster import KMeans
np.random.seed(8)

TensorFlow 2.x selected.


# Question 1
Run and read the following code blocks and answer following questions:

1.   Which learning scheme is used (i.e. supervised, unsupervised or reinforcment learning)?
2.   How do we interpret the results? 
3.   Are they good results, if not, what wrong with them?

**NOTE**: you have to three above questions for both _a)_ and _b)_


### **a)** First block

In [0]:
# Description of the dataset
# https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_linnerud.html#sklearn.datasets.load_linnerud
data = load_linnerud()
X = data.data[:, :2]  # only take first 2 features
y = data.target[:, 0]  # only take first target
print("Features name:", data.feature_names[:2])
print("Target name  :", data.target_names[0])

model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)


def plot_helper(chins, situps, weight, prediction):
  plt.figure(figsize=(12, 5))
  ax = plt.subplot(1, 2, 1)
  sns.scatterplot(x="Chins",
                  y="Situps",
                  size='Weight',
                  data=pd.DataFrame({
                      'Chins': chins,
                      'Situps': situps,
                      'Weight': weight
                  }),
                  ax=ax)
  ax = plt.subplot(1, 2, 2)
  sns.scatterplot(x="Chins",
                  y="Situps",
                  size='Prediction',
                  data=pd.DataFrame({
                      'Chins': chins,
                      'Situps': situps,
                      'Prediction': prediction
                  }),
                  ax=ax)

plot_helper(chins=X[:, 0], situps=X[:, 1], weight=y, prediction=y_pred)

### **b)** Second block

In [0]:
# We use the same dataset in a)
model = KMeans(2)
model.fit(X)
y_pred = model.predict(X)

sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y_pred)
plt.xlabel('Chins')
plt.ylabel("Situps")

# Question 2

### **a)** Filling in `TODO` of following code block to: _Create perceptron and use it to solve an AND classification problem_

Calculating the output of perceptron:

$y= \mathrm{F} \big( \sum_{i=0}^D w_i \cdot x_i \tag{1} \big)$

where $x_0=1$, $D$ is the number of input features, and $\mathrm{F}(.)$ is the threshold function, i.e.

$\mathrm{F}(x) =
  \begin{cases}
    1       & \quad \text{if } x \geq \mathrm{THRESHOLD}\\
    0  & \quad \text{if } x < \mathrm{THRESHOLD}
  \end{cases}$

The learning algorithm of perceptron following this update equation:

$w_i = w_i - \lambda \cdot \frac{1}{N} \sum_{j=0}^N(\bar{y}^{(j)} - y^{(j)}) x^{(j)}_i \tag{2}$

where $N$ is the total number of training examples, $\bar{y}$ is the predicted value of $y$ (the target variable), $(j)$ is the index of an example, and $\lambda$ is the learning rate.

For more detail [[1]][ref_1]

[ref_1]: https://towardsdatascience.com/rosenblatts-perceptron-the-very-first-neural-network-37a3ec09038a

In [0]:
# Number of training iterations
NUM_ITERATIONS = 25
# Threshold for 0/1 classification
THRESHOLD = 0.5
# Learning rate
LEARNING_RATE = 1e6

X = np.array(
    [[0, 0],
     [0, 1],
     [1, 0],
     [1, 1]]
)
# TODO: fill appropriate value for y
y = 

# Create perceptron weights (random weights)
weights = np.random.randn(2)

# Train perceptron
for iteration in range(NUM_ITERATIONS):
  # TODO: Calculate predictions with current weights (Equation (1))
  predictions = 

  # Calculate accuracy (not needed for training, but to track the learning progress)
  accuracy = np.mean(predictions == y)
  # Print the accuracy
  print("Iteration %d: Acc %f \t %s" % (iteration, accuracy, str(predictions)))

  # TODO: Update weights according to update rule (Equation (2))
  weights = 

# Print weights for inspection
print(weights)

Iteration 0: Acc 0.750000 	 [0 0 0 0]
Iteration 1: Acc 0.250000 	 [1 1 1 1]
Iteration 2: Acc 0.750000 	 [0 0 0 0]
Iteration 3: Acc 1.000000 	 [0 0 0 1]
Iteration 4: Acc 1.000000 	 [0 0 0 1]
Iteration 5: Acc 1.000000 	 [0 0 0 1]
Iteration 6: Acc 1.000000 	 [0 0 0 1]
Iteration 7: Acc 1.000000 	 [0 0 0 1]
Iteration 8: Acc 1.000000 	 [0 0 0 1]
Iteration 9: Acc 1.000000 	 [0 0 0 1]
Iteration 10: Acc 1.000000 	 [0 0 0 1]
Iteration 11: Acc 1.000000 	 [0 0 0 1]
Iteration 12: Acc 1.000000 	 [0 0 0 1]
Iteration 13: Acc 1.000000 	 [0 0 0 1]
Iteration 14: Acc 1.000000 	 [0 0 0 1]
Iteration 15: Acc 1.000000 	 [0 0 0 1]
Iteration 16: Acc 1.000000 	 [0 0 0 1]
Iteration 17: Acc 1.000000 	 [0 0 0 1]
Iteration 18: Acc 1.000000 	 [0 0 0 1]
Iteration 19: Acc 1.000000 	 [0 0 0 1]
Iteration 20: Acc 1.000000 	 [0 0 0 1]
Iteration 21: Acc 1.000000 	 [0 0 0 1]
Iteration 22: Acc 1.000000 	 [0 0 0 1]
Iteration 23: Acc 1.000000 	 [0 0 0 1]
Iteration 24: Acc 1.000000 	 [0 0 0 1]
[ 250000.02080584  250000.67970345 

### **b)** In theory the perceptron algorithm should be able to solve the `AND` classification problem (i.e. give 100% accuracy). What is missing from above procedure? Could you make it work?

# Question 3

Create multi-layer Adaline in `pytorch` or `tensorflow` (you only need to do **a)** or **b)** part of this question)

Approximation Error for Adaline is given by:

$E = \frac{1}{2} (\bar{y} - y)^2 \tag{3}$

where $\bar{y}$ is the predicted value of $y$ (the target variable)

Your tasks are divided into 2 steps:

1. First filling in the `TODO`, create a multi-layer Adaline, and make the algorithm running
2. Modifying the training procedure to get reasonable better results.

We will use the `linnerud` dataset from `Question 1` as training data

In [0]:
X = data.data  # take all features
y = data.target[:, 0]  # only take first target
print("Features name:", data.feature_names)
print("Target name  :", data.target_names[0])

Features name: ['Chins', 'Situps', 'Jumps']
Target name  : Weight


### a) Multi-layer Adaline with `pytorch`

Documnentation for `pytorch` neural network modules:

https://pytorch.org/docs/stable/nn.html

In [0]:
import torch

# convert data to pytorch tensor
X_pt = torch.from_numpy(X.astype('float32'))
y_pt = torch.from_numpy(y.astype('float32'))

# TODO: modify this single-layer Adaline into multi-layer adaline
network = torch.nn.Linear(X.shape[1], 1)

def approximation_error(y_pred, y_true):
  # TODO: finish this function and return the approximation error of Adaline (Equation 3)
  pass

# create Gradient descent optimizer
optimizer = torch.optim.SGD(network.parameters(), lr=0.01)

# iterate for 25 epochs
for i in range(25):
  # zero the parameter gradients
  optimizer.zero_grad()

  # forward + backward + optimize
  y_pred = network(X_pt)
  loss = approximation_error(y_pred, y_pt).mean()
  loss.backward()
  optimizer.step()
  
  # Print out error for monitoring
  print("Epoch %-3d" % i, "Error: ", loss.detach().numpy())

# Evaluate our final prediction
y_pred = network(X_pt).detach().numpy()
plot_helper(chins=X[:, 0], situps=X[:, 1], weight=y, prediction=y_pred.ravel())

### b) Multi-layer Adaline with `tensorflow`

Documentation for `tensorflow` and `keras` for neural network:

https://www.tensorflow.org/guide/keras/overview

In [0]:
import tensorflow as tf
from tensorflow import keras


# TODO: modify this single-layer Adaline into multi-layer adaline
network = keras.layers.Dense(1)

def approximation_error(y_pred, y_true):
  # TODO: finish this function and return the approximation error of Adaline (Equation 3)
  pass

# create Gradient descent optimizer
optimizer = keras.optimizers.SGD(lr=0.01)

# iterate for 25 epochs
for i in range(25):
  # forward
  with tf.GradientTape() as tape:
    y_pred = network(X)
    loss = tf.reduce_mean(approximation_error(y_pred, y))
  # backward
  gradients = tape.gradient(loss, network.trainable_variables)
  # optimize
  optimizer.apply_gradients(zip(gradients, network.trainable_variables))

  # Print out error for monitoring
  print("Epoch %-3d" % i, "Error: ", loss.numpy())

# Evaluate our final prediction
y_pred = network(X).numpy()
plot_helper(chins=X[:, 0], situps=X[:, 1], weight=y, prediction=y_pred.ravel())