# <center> TP 2: Regularised Regression and Logistic Regression<br> <small>Réda DEHAK<br> 19 and 22 November 2018</small> </center>

The goal of this lab is :
    - Fit generalised linear models with ridge, Lasso or Elastic Net regularisations
    - Test the logistic regression on classification problems
    
We will use a data file that contain different measurements of height (variable y) of an individual sample according to the age (variable x).

## Part 1: Regularised Regression 
### Import Data

The following dataset is from Hastie, Tibshirani and Friedman (2009), from a study by Stamey et al. (1989) of prostate cancer, measuring the correlation between the level of a prostate-specific antigen and some covariates. The covariates are
- lcavol : log-cancer volume
-  lweight : log-prostate weight
-  age : age of patient
-  lbhp : log-amount of benign hyperplasia
-  svi : seminal vesicle invasion
-  lcp : log-capsular penetration
-  gleason : Gleason Score,
-  lpsa is the response variable, log-psa.

In [None]:
%pylab
%matplotlib inline
import pickle
import matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model

In [None]:
fin = open('data2.pkl', 'rb')
xtrain = pickle.load(fin)
ytrain = pickle.load(fin)
Xtest = pickle.load(fin)
Ytest = pickle.load(fin)
fin.close()

print('Train data : ', xtrain.shape, ' ', ytrain.shape)
print('Test data : ', Xtest.shape, ' ', Ytest.shape)

### Linear Regression

Using the program of TP 1, compute the linear regression weight $w$

$$y = g(x) = W^T x =\sum_{d=0}^7 w_d x_d$$
with $x_0 = 1$

The linear regression consists in finding the parameters $W$ which minimizes the 
quadratic error:
$$E(W) = \frac{1}{60}\sum_{i=1}^{60}\left(g(x_i) - y_i\right)^2$$

The vector $W$ which minimize $E(W)$ is defined as follow:
$$W = (X X^T)^{-1}X Y$$

Compute the vector $W$ wich minimize $E(W)$ :
- Compute $w$ using the exact solution
- Compute the error on test data

- Check that you obtain the same $W$ with sklean.linear_model.LinearRegression?

### Ridge regression

The ridge regression consists in finding the parameters $W$ which minimizes:
$$\frac{1}{60}\sum_{i=1}^{60}\left(W^T x_i - y_i\right)^2 + \alpha \|W\|_2^2$$ 

- Using linear_model.Ridge and $\alpha = 0.$, check that you obtain the same $W$ as linear regression

In this part, we will check the influence of $\alpha$ on the solution of the linear regression

- Train a ridge regression with different values of $\alpha$ = np.logspace(-5, 5, 200)
- Plot how evolve each $W_i$ through the sequence of $\alpha$ values.
- Plot how evolve the mean square error through the sequence of $\alpha$ values.
- Conclude

In [1]:
alphas = np.logspace(-4, 2, 100)
...

NameError: name 'np' is not defined

### Lasso regression

The ridge regression consists in finding the parameters $W$ which minimizes:
$$\frac{1}{2 \times 60}\sum_{i=1}^{60}\left(W^T x_i - y_i\right)^2 + \alpha \|W\|_1$$

- Using linear_model.Lasso and $\alpha = 0.$, check that you obtain the same $W$ as linear regression

In this part, we will check the influence of $\alpha$ on the solution of the linear regression

- Train a Lasso regression with different values of $\alpha$ = np.logspace(-5, 5, 200)
- Plot how evolve each $W_i$ through the sequence of $\alpha$ values.
- Plot how evolve the mean square error through the sequence of $\alpha$ values.

Compare the result with ridge solution?

### Elastic Net regression

The ridge regression consists in finding the parameters $W$ which minimizes:
$$\frac{1}{2 \times 60}\sum_{i=1}^{60}\left(W^T x_i - y_i\right)^2 + \alpha \times \lambda \|W\|_1 + \frac{1}{2} \alpha \times (1 - \lambda) \|w\|^2_2 $$

- Using linear_model.ElasticNet and $\alpha = 0.$, check that you obtain the same $W$ as linear regression

In this part, we will check the influence of $\alpha$ and $\lambda$ on the solution of the linear regression

- Train an ElasticNet regression with different values of $\alpha$ = np.logspace(-5, 5, 200) and $\lambda$ = np.linspace(0, 1, 50)
- Plot how evolve each $W_i$ through the sequence of $\alpha$ and $\lambda$ values.
- Plot how evolve the mean square error through the sequence of $\alpha$ and $\lambda$ values.

- What is the best solution?

- Conclude ?