
# Lab-7:
In this lab, we will examine some classifiers and the regularization concept in the classification problem.
Also, we will see

### Objectives:
1. Lasso and Ridge
2. Regularization in Keras.
2. PCA

---

## Lasso and Ridge
Both models are the regularized forms of the linear regression.
Lass with L1 regularization and Ridge with L2 regularization.
Both act as a constraint region for the coeffeicients/weight, where they must reside in.

### Issues:
1. When to use Lasso?
2. When to use Ridge?
3. Since it is hard to decide the parameters influence, How we can decide which regularization? and decide the value of lambda?


### Loading Boston dataset
Housing-Prices Values in Suburbs of Boston.

In [None]:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

X, y = load_boston(return_X_y=True)
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=1/8, random_state=123)
print(x_train.shape)

### Fitting both Lasso and Ridge
Task:

Fit two models: Lasso and Ridge - with the default alpha-.
Then print their coefficients and notice the difference.

In [None]:
from sklearn.linear_model import Lasso, Ridge



Task: Let's try different values for alpha for Lasso regressor and plot the validation loss.

In [None]:
import matplotlib.pylab as plt
import numpy as np
from sklearn.metrics import mean_squared_error
%matplotlib inline

alphas = [2.2, 2, 1.5, 1.3, 1.2, 1.1, 1, 0.3, 0.1]
losses = []
for alpha in alphas:
    # Write (5 lines): create a Lasso regressor with the alpha value.
    # Fit it to the training set, then get the prediction of the validation set (x_val).
    # calculate the mean sqaured error loss, then append it to the losses array
    
plt.plot(alphas, losses)
plt.xlabel("alpha")
plt.ylabel("Mean squared error")
plt.show()

best_alpha = alphas[np.argmin(losses)]
print("Best value of alpha:", best_alpha)

Measuring the loss on the testset with Lasso regressor with the best alpha.

In [None]:
lasso = Lasso(best_alpha)
lasso.fit(x_train, y_train)
y_pred = lasso.predict(x_test)
print("MSE on testset:", mean_squared_error(y_test, y_pred))

### How to Do it in Keras.

Task: add regularization in the dense layers in the following model, with 

In [None]:
#%%

from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, BatchNormalization
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras import regularizers

import matplotlib.pyplot as plt
from math import sqrt, ceil

(x_train, y_train), (x_test, y_test) = mnist.load_data() 
train_shape = x_train.shape
test_shape = x_test.shape
print(x_train.shape, x_test.shape)
#Images are 2D. What's the difference in 3D images?
x_train = x_train.reshape(train_shape[0], train_shape[1] * train_shape[2])
x_test = x_test.reshape(test_shape[0], test_shape[1] * test_shape[2])
print(x_train.shape, y_train.shape, x_test.shape, y_test.shape)



# Some hyperparamters:


epochs = 5
batch_size = 10
input_size = x_train.shape[1]
num_classes = 10 

# Define your regularizer here (1 line).

def get_model():
    model = Sequential()

    model.add(Dense(units=128,
                    input_dim=input_size,
                    activation='sigmoid'))

    # Add 2 hidden layers with number of units: 32, 64 


    model.add(Dense(units=num_classes, use_bias=True, activation='softmax'))
    #Try to change the optimizer, visit: https://goo.gl/dHFJNy
    #Try to change the loss func, visit: https://goo.gl/xMrooU
    #Try to change learning rate (lr)
    #In your free time take a look at different variations of GD: https://goo.gl/YFa6XY
    sgd_optimizer = SGD(lr=.01)
    adam_optimizer = Adam(lr=.001)
    model.compile(optimizer=sgd_optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

model = get_model()
model.fit(x=x_train, y=y_train, batch_size=batch_size, epochs=epochs)

loss, acc= model.evaluate(x_test, y_test)
print("Loss:", loss, ", Accuracy:", acc)


## PCA

1. How does PCA reduce data dimensionality?
2. What is eigenvector?

Task: Now you will implement basic steps of PCA: mean-centering, eigenvectors calculation using covariance matrix, projecting data to the first PC, and restoring it back.

### Generating data ###

In [None]:
# N is a sample size
N = 25
# we can fix a random seed. It allows us to get the same data
np.random.seed(10)
# form our data
x = np.linspace(-5, -3, N)
y = 10 + 2*x + np.random.random(size=(N,))
data = np.stack([x,y], axis = 1)


plt.title("Data")
plt.plot(data[:,0], data[:,1], '.', color="green")
plt.axis([-6, 2, -3, 6])
plt.grid('True')

### Center data###

In [None]:
# center data by subtracting mean value from each feature
# pay attention to mean_vector <-- we need it later for restoring our data
mean_vector = ???
data_centered = ???

plt.title("Centered data")
plt.plot(data[:,0], data[:,1], '.', color="green")
plt.plot(data_centered[:,0], data_centered[:,1], '.', color="blue")
plt.axis([-6, 2, -3, 6])
plt.grid('True')

### Covariance matrix ###

In [None]:
# calculate covariance matrix for our centered data
cov_mat = ???
print('Covariance matrix:\n', cov_mat)

# Cov(x, y) = (1 / (n - 1)) * Sum_i(x_i * y_i)
# also, to make sure you understand how to calculate covariance, calculate and print cov(X,Y)
# check that it is the same as in the covariance matrix
cov_xy = ???
print('cov(X,Y):', cov_xy)

### Eigenvectors and eigenvalues

In [None]:
# compute eigenvectors and eigenvalues, print them
eig_values, eig_vectors = ???
print('eig_values:', eig_values)
print('eig_vectors:\n', eig_vectors)

# are they already in the needed order?
# if not, order eigenvectors and eigenvalues by eigenvalues, descending (3 lines)
# Note: the eig_vectors is a col vectors.


print('\nsorteed eig_values:', eig_values)
print('sorted eig_vectors:\n', eig_vectors)

# estimate variance retained by each principal component
retained_var = eig_values / eig_values.sum()
print('\nretained variance:',   retained_var)

### Project data ###

In [None]:
# project data to the first principal component
first_pc = ???
projected_data = ???

plt.title("Projected data")
plt.plot(data[:,0], data[:,1], '.', color="green")
plt.plot(data_centered[:,0], data_centered[:,1], '.', color="blue")
plt.plot(projected_data, np.zeros(len(projected_data)), '.', color="red")
plt.axis([-6, 3, -3, 6])
plt.grid('True')

### Restore data back ###

In [None]:
# Projected_data . first_pc.T + means
# project data back to initial space
# remember to add a mean_vector to the restored data
restored_data = ???

plt.title("Restored data")
plt.plot(data[:,0], data[:,1], '.', color="green")
plt.plot(data_centered[:,0], data_centered[:,1], '.', color="blue")
plt.plot(restored_data[:,0], restored_data[:,1], '.', color="red")
plt.axis([-6, 2, -3, 6])
plt.grid('True')
plt.show()

### SKLEARN implementation ###

In [None]:
# this is to check your solution
from sklearn.decomposition import PCA
pca = PCA(n_components=1)
x_PCA = pca.fit_transform(data)

plt.title("Projected data")
plt.plot(data[:,0], data[:,1], '.', color="green")
plt.plot(data_centered[:,0], data_centered[:,1], '.', color="blue")
plt.plot(x_PCA, np.zeros(len(projected_data)), '.', color="red")
plt.axis([-6, 3, -3, 6])
plt.grid('True')

print(pca.mean_)
print(pca.components_)
print(pca.explained_variance_)
print(pca.explained_variance_ratio_)