# Esercitazione 4

### Machine Learning

2025/03/27

## Struttura della soluzione generale

1. Viene creata una funzione `nested_cv` per automatizzare il processo di inner e outer cross-validation.
2. Si chiama la funzione del modello in questione, specificando la lista dei parametri in modo da utilizzare poi il metodo `nested_cv`.
3. Dato che i modelli riportano quasi le stesse metriche di valutazione, viene misurato anche il tempo di esecuzione dei processi.

In [2]:
import numpy as np
from time import time

#### Implementazione di `nested_cv`

Il metodo `nested_cv` ottimizza gli iperparametri mediante la `GridSearchCV` nella parte interna e valuta il modello sulla parte esterna del dataset.

Per ogni iterazione, calcola le metriche di valutazione richieste: $R^2$, $MAE$ (Mean Absolute Error) e $RMSE$ (Root Mean Squared Error) restituendo la *media* e la *deviazione standard* di ciascuna metrica. 

Parametri della funzione `nested_cv`:
- `model`: il modello di regressione in input;
- `param_grid`: il dizionario degli iperparametri per `GridSearchCV`;
- `X, y`: feature e variabile target;
- `outer_splits`: il numero di fold per la Cross-Validation Esterna (`default = 5`);
- `inner_splits`: il numero di fold per la Cross-Validation Interna (`default = 5`);
- `scoring`: lista delle metriche per la valutazione;
- `random_state`: seed per la riproducibilità dell'esperimento.

Output del metodo:
- Il dizionario con la media e la deviazione standard di ciascuna misura di valutazione.

In [33]:
from sklearn.model_selection import KFold, GridSearchCV
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
import numpy as np

def nested_cv_regression(model, param_grid, X, y, outer_splits=5,
                         inner_splits=5, scoring=None, random_state=42, verbose=True):
    if scoring is None:
        scoring = ['r2']  # Default metric

    outer_cv = KFold(n_splits=outer_splits, shuffle=True, random_state=random_state)
    score_results = {metric: [] for metric in scoring}

    best_param_overall = None
    best_r2_score = -np.inf 

    for outer_fold, (train_idx, test_idx) in enumerate(outer_cv.split(X), 1):
        if verbose:
            print(f"\nPerforming Outer Fold {outer_fold}/{outer_splits}")

        X_train, X_test = X[train_idx], X[test_idx]
        y_train, y_test = y[train_idx], y[test_idx]

        inner_cv = KFold(n_splits=inner_splits, shuffle=True, random_state=random_state)
        if verbose:
            print("Performing GridSearchCV...")

        grid_search = GridSearchCV(model, param_grid, cv=inner_cv,
                                   n_jobs=-1, scoring=scoring[0])
        grid_search.fit(X_train, y_train)

        best_model = grid_search.best_estimator_
        best_params = grid_search.best_params_

        if verbose:
            print(f" Best Params: {best_params}")

        y_pred = best_model.predict(X_test)

        if 'r2' in scoring:
            r2 = r2_score(y_test, y_pred)
            score_results['r2'].append(r2)
            if r2 > best_r2_score:
                best_r2_score = r2
                best_param_overall = best_params
            if verbose:
                print(f" R²: {r2:.4f}")

        if 'mae' in scoring:
            mae = mean_absolute_error(y_test, y_pred)
            score_results['mae'].append(mae)
            if verbose:
                print(f" MAE: {mae:.4f}")

        if 'rmse' in scoring:
            mse = mean_squared_error(y_test, y_pred)
            rmse = np.sqrt(mse)
            score_results['rmse'].append(rmse)
            if verbose:
                print(f" RMSE: {rmse:.4f}")

    result = {}
    for metric, scores in score_results.items():
        result[f"Nested CV {metric.upper()}"] = f"{np.mean(scores):.4f} ± {np.std(scores):.4f}"

    result["Best Parameters with highest R2 score"] = best_param_overall

    return result

Nested Cross-Validation:

> https://machinelearningmastery.com/nested-cross-validation-for-machine-learning-with-python/

#### Scaling del dataset

Il dataset scelto per l'esercitazione è `california_hounsing`, utile per i modelli di regressione. Si è scelto di utilizzare uno `StandardScaler` per normalizzare i dati.

In [16]:
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler

X, y = fetch_california_housing(return_X_y=True)
scaler = StandardScaler()
X_std = scaler.fit_transform(X) 

## Esercizio 1

Confronto di più regressori della stessa categoria (quelli base, con regolarizzazione, robusti) su un dataset concordato con Cross Validation (k-Fold Validation come esterna, GridSearch come interna) e fine tuning degli iperparametri. 

Le metriche da prendere in considerazione sono $R^2$ e $RMSE$.

**Scelta della categoria di Regressori**

I regressori che utilizzano la **regolarizzazione** scelti:
* Ridge
* Lasso
* ElasticNet
* SGDRegressor

### Ridge Regression

La formula della Ridge Regression è la seguente:
$$
J(\mathbf{w})= \text{MSE}(\mathbf{w})+\alpha\dfrac{1}{2}\sum_i w_i^2
$$

La regolarizzazione $\mathcal{l}_2$ è la sfera rappresentata nell'immagine sotto.

<center><img src = 'https://raw.githubusercontent.com/rasbt/python-machine-learning-book-3rd-edition/master/ch04/images/04_05.png' width = 500></center>

In [20]:
import sklearn
print(sklearn.__version__)

1.6.1


In [31]:
import time
from sklearn.linear_model import Ridge

t0 = time.time()

ridge_model = Ridge()

ridge_params = {
    'alpha': [1, 0.1, 0.01, 0.001, 0.0001], 
    'fit_intercept': [True, False],
    'solver': ['svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga']
}

ridge_results = nested_cv_regression(ridge_model, ridge_params, X_std, y, 
                          outer_splits = 5, inner_splits = 5, 
                          scoring = ['r2', 'rmse'])

t1 = time.time()

print(f'Ridge Regression Results: \n {ridge_results}')
print(f'Training Ridge model in {(t1 - t0):.3f}s')


Performing Outer Fold 1/5
Performing GridSearchCV...
 Best Params: {'alpha': 1, 'fit_intercept': True, 'solver': 'saga'}
 R²: 0.5757
 RMSE: 0.7456

Performing Outer Fold 2/5
Performing GridSearchCV...
 Best Params: {'alpha': 1, 'fit_intercept': True, 'solver': 'svd'}
 R²: 0.6137
 RMSE: 0.7264

Performing Outer Fold 3/5
Performing GridSearchCV...
 Best Params: {'alpha': 1, 'fit_intercept': True, 'solver': 'sparse_cg'}
 R²: 0.6086
 RMSE: 0.7137

Performing Outer Fold 4/5
Performing GridSearchCV...
 Best Params: {'alpha': 1, 'fit_intercept': True, 'solver': 'saga'}
 R²: 0.6213
 RMSE: 0.7105

Performing Outer Fold 5/5
Performing GridSearchCV...
 Best Params: {'alpha': 1, 'fit_intercept': True, 'solver': 'lsqr'}
 R²: 0.5875
 RMSE: 0.7451
Ridge Regression Results: 
 {'Nested CV R2': '0.6014 ± 0.0170', 'Nested CV RMSE': '0.7282 ± 0.0150', '\nBest Parameters with highest R2 score': {'alpha': 1, 'fit_intercept': True, 'solver': 'saga'}}
Training Ridge model in 52.285s


### Lasso Regression

La formula della Ridge Regression è la seguente:
$$
J(\mathbf{w}) = \text{MSE}(\mathbf{w}) + \alpha  \sum_i |w|_i =  \dfrac{1}{2N} \|y - \mathbf{Xw}\|^2_2 + \alpha  \|\mathbf{w}\|_1
$$

La regolarizzazione $\mathcal{l}_1$ è il quadrato ruotato rappresentato in figura.

<center><img src = 'https://raw.githubusercontent.com/rasbt/python-machine-learning-book-3rd-edition/master/ch04/images/04_06.png' width = 500></center>

In [32]:
from sklearn.linear_model import Lasso

t0 = time.time()

lasso_model = Lasso() # max_iter = 1000

lasso_params = {
    'alpha': [1, 0.1, 0.01, 0.001, 0.0001], 
    'fit_intercept': [True, False]}

lasso_results = nested_cv_regression(lasso_model, lasso_params, X_std, y, 
                          outer_splits=5, inner_splits=5, 
                          scoring=['r2', 'rmse'])

t1 = time.time()

print(f'Lasso Regression Results: \n {lasso_results}')
print(f'Trained Lasso model in {(t1 - t0):.3f}s')


Performing Outer Fold 1/5
Performing GridSearchCV...
 Best Params: {'alpha': 0.001, 'fit_intercept': True}
 R²: 0.5769
 RMSE: 0.7446

Performing Outer Fold 2/5
Performing GridSearchCV...
 Best Params: {'alpha': 0.001, 'fit_intercept': True}
 R²: 0.6136
 RMSE: 0.7266

Performing Outer Fold 3/5
Performing GridSearchCV...
 Best Params: {'alpha': 0.001, 'fit_intercept': True}
 R²: 0.6080
 RMSE: 0.7141

Performing Outer Fold 4/5
Performing GridSearchCV...
 Best Params: {'alpha': 0.01, 'fit_intercept': True}
 R²: 0.6253
 RMSE: 0.7068

Performing Outer Fold 5/5
Performing GridSearchCV...
 Best Params: {'alpha': 0.001, 'fit_intercept': True}
 R²: 0.5878
 RMSE: 0.7448
Lasso Regression Results: 
 {'Nested CV R2': '0.6023 ± 0.0176', 'Nested CV RMSE': '0.7274 ± 0.0155', '\nBest Parameters with highest R2 score': {'alpha': 0.01, 'fit_intercept': True}}
Trained Lasso model in 2.097s


### ElasticNet

La ElasticNet unisce le regolarizzazioni della Ridge Regression e della Lasso.

$$\min_\mathbf{w} \frac{1}{N}\|\mathbf{Xw}-y\|^2_2 + \alpha\rho\|\mathbf{w}\|_1 + \frac{1}{2}\alpha(1-\rho)\|\mathbf{w}\|_2$$

Il parametro $\alpha$ è della regolarizzazione $\mathcal{l}_2$, mentre $\rho$ è la penalizzazione della $\mathcal{l}_1$.

In [34]:
from sklearn.linear_model import ElasticNet

t0 = time.time()

en_model = ElasticNet()

en_params = {
    'alpha': [1, 0.1, 0.01, 0.001, 0.0001], 
    'fit_intercept': [True, False]}

en_results = nested_cv_regression(en_model, en_params, X_std, y, 
                       outer_splits = 5, inner_splits = 5, 
                       scoring = ['r2', 'rmse'], random_state = 42)
t1 = time.time()

print(f'ElasticNet Regression Results: \n {en_results}')
print(f'Trained ElasticNet model in {(t1 - t0):.3f}s')


Performing Outer Fold 1/5
Performing GridSearchCV...
 Best Params: {'alpha': 0.001, 'fit_intercept': True}
 R²: 0.5766
 RMSE: 0.7449

Performing Outer Fold 2/5
Performing GridSearchCV...
 Best Params: {'alpha': 0.001, 'fit_intercept': True}
 R²: 0.6136
 RMSE: 0.7265

Performing Outer Fold 3/5
Performing GridSearchCV...
 Best Params: {'alpha': 0.001, 'fit_intercept': True}
 R²: 0.6082
 RMSE: 0.7140

Performing Outer Fold 4/5
Performing GridSearchCV...
 Best Params: {'alpha': 0.01, 'fit_intercept': True}
 R²: 0.6251
 RMSE: 0.7069

Performing Outer Fold 5/5
Performing GridSearchCV...
 Best Params: {'alpha': 0.001, 'fit_intercept': True}
 R²: 0.5877
 RMSE: 0.7450
ElasticNet Regression Results: 
 {'Nested CV R2': '0.6022 ± 0.0177', 'Nested CV RMSE': '0.7274 ± 0.0156', 'Best Parameters with highest R2 score': {'alpha': 0.01, 'fit_intercept': True}}
Trained ElasticNet model in 2.118s


### SGD Regression

La regressione mediante discesa del gradiente stocastica è un metodo iterativo.

In [35]:
from sklearn.linear_model import SGDRegressor

t0 = time.time()

SGD_model = SGDRegressor()

sgd_params = {
    'loss': ['squared_error', 'huber', 'epsilon_insensitive', 'squared_epsilon_insensitive'],
    'alpha': [1, 0.1, 0.01, 0.001, 0.0001],
    'penalty': ['l2', 'l1', 'elasticnet', None],
    'fit_intercept': [True, False]
}

sgd_results = nested_cv_regression(SGD_model, sgd_params, X_std, y, 
                        outer_splits = 5, inner_splits = 5, 
                        scoring = ['r2', 'rmse'], random_state = 42)

t1 = time.time()

print(f'SGD Regression Results: \n {sgd_results}')
print(f'Trained SGD model in {(t1 - t0):.3f}s')


Performing Outer Fold 1/5
Performing GridSearchCV...
 Best Params: {'alpha': 0.1, 'fit_intercept': True, 'loss': 'squared_error', 'penalty': 'elasticnet'}
 R²: 0.5440
 RMSE: 0.7730

Performing Outer Fold 2/5
Performing GridSearchCV...
 Best Params: {'alpha': 0.01, 'fit_intercept': True, 'loss': 'epsilon_insensitive', 'penalty': 'l1'}
 R²: 0.5878
 RMSE: 0.7504

Performing Outer Fold 3/5
Performing GridSearchCV...
 Best Params: {'alpha': 0.01, 'fit_intercept': True, 'loss': 'epsilon_insensitive', 'penalty': 'l1'}
 R²: 0.5870
 RMSE: 0.7330

Performing Outer Fold 4/5
Performing GridSearchCV...
 Best Params: {'alpha': 0.001, 'fit_intercept': True, 'loss': 'squared_error', 'penalty': None}
 R²: 0.6202
 RMSE: 0.7115

Performing Outer Fold 5/5
Performing GridSearchCV...
 Best Params: {'alpha': 0.0001, 'fit_intercept': True, 'loss': 'huber', 'penalty': None}
 R²: 0.5488
 RMSE: 0.7793
SGD Regression Results: 
 {'Nested CV R2': '0.5775 ± 0.0282', 'Nested CV RMSE': '0.7495 ± 0.0251', 'Best Parame

## Esercizio 2

Confronto di più regressori di categorie diverse su un dataset concordato con Cross Validation e fine tuning degli iperparametri.

Le metriche da prendere in considerazione sono: $R^2$ e $MAE$.

**Scelta dei Regressori**

I regressori scelti per il confronto sono i seguenti:
* Linear
* RANSAC
* Bayesian Ridge

### Linear Regression

In [37]:
from sklearn.linear_model import LinearRegression

t0 = time.time()

linear_model = LinearRegression()

linear_param = {
    'fit_intercept': [True, False] 
}

linear_results = nested_cv(linear_model, linear_param, X_std, y, 
                           outer_splits = 5, inner_splits = 5, 
                           scoring = ['r2', 'mae'])

t1 = time.time()

print(f'Linear Regression Results: \n {linear_results}')
print(f'Trained linear model in {(t1 - t0):.3f}s')


Performing Outer Fold 1/5
Performing GridSearchCV...
 Best Params: {'fit_intercept': True}
 R²: 0.5758
 MAE: 0.5332

Performing Outer Fold 2/5
Performing GridSearchCV...
 Best Params: {'fit_intercept': True}
 R²: 0.6137
 MAE: 0.5367

Performing Outer Fold 3/5
Performing GridSearchCV...
 Best Params: {'fit_intercept': True}
 R²: 0.6086
 MAE: 0.5292

Performing Outer Fold 4/5
Performing GridSearchCV...
 Best Params: {'fit_intercept': True}
 R²: 0.6213
 MAE: 0.5171

Performing Outer Fold 5/5
Performing GridSearchCV...
 Best Params: {'fit_intercept': True}
 R²: 0.5875
 MAE: 0.5422
Linear Regression Results: 
 {'Nested CV R2': '0.6014 ± 0.0170', 'Nested CV MAE': '0.5317 ± 0.0084', 'Best Parameters per fold': [{'fit_intercept': True}, {'fit_intercept': True}, {'fit_intercept': True}, {'fit_intercept': True}, {'fit_intercept': True}]}
Trained linear model in 0.855s


### RANSAC Regressor

In [38]:
from sklearn.linear_model import RANSACRegressor

t0 = time.time()

ransac_model = RANSACRegressor()

ransac_params = {
    'min_samples': [0.5, 0.7, 0.9],  
    'residual_threshold': [5.0, 10.0, 15.0],  
    'loss': ['absolute_error', 'squared_error']
}

ransac_results = nested_cv_regression(ransac_model, ransac_params, X_std, y, 
                           outer_splits = 5, inner_splits = 5, 
                           scoring = ['r2', 'mae'], random_state = 42)

t1 = time.time()

print(f'RANSAC Regression Results: \n {ransac_results}')
print(f'Trained RANSAC model in {(t1 - t0):.3f}s')


Performing Outer Fold 1/5
Performing GridSearchCV...
 Best Params: {'loss': 'absolute_error', 'min_samples': 0.5, 'residual_threshold': 5.0}
 R²: 0.5754
 MAE: 0.5332

Performing Outer Fold 2/5
Performing GridSearchCV...
 Best Params: {'loss': 'absolute_error', 'min_samples': 0.7, 'residual_threshold': 5.0}
 R²: 0.6132
 MAE: 0.5353

Performing Outer Fold 3/5
Performing GridSearchCV...
 Best Params: {'loss': 'absolute_error', 'min_samples': 0.9, 'residual_threshold': 5.0}
 R²: 0.6104
 MAE: 0.5271

Performing Outer Fold 4/5
Performing GridSearchCV...
 Best Params: {'loss': 'absolute_error', 'min_samples': 0.5, 'residual_threshold': 10.0}
 R²: 0.6213
 MAE: 0.5171

Performing Outer Fold 5/5
Performing GridSearchCV...
 Best Params: {'loss': 'squared_error', 'min_samples': 0.7, 'residual_threshold': 15.0}
 R²: 0.5911
 MAE: 0.5404
RANSAC Regression Results: 
 {'Nested CV R2': '0.6023 ± 0.0167', 'Nested CV MAE': '0.5306 ± 0.0080', 'Best Parameters with highest R2 score': {'loss': 'absolute_err

### Bayesian Ridge

In [40]:
from sklearn.linear_model import BayesianRidge

t0 = time.time()

bayesian_ridge_model = BayesianRidge()

bayesian_param = {
    'alpha_1': [1e-6, 1e-3, 1e-1, 1],
    'alpha_2': [1e-6, 1e-3, 1e-1, 1],
    'lambda_1': [1e-6, 1e-3, 1e-1, 1],
    'lambda_2': [1e-6, 1e-3, 1e-1, 1]
}

bayesian_ridge_results = nested_cv_regression(bayesian_ridge_model,
                                              bayesian_param, X_std, y, 
                                              outer_splits = 5, 
                                              inner_splits = 5, 
                                              scoring = ['r2', 'mae'])

t1 = time.time()

print(f'Bayesian Ridge Regression Results: \n {bayesian_ridge_results}')
print(f'Trained Bayesian Ridge model in {(t1 - t0):.3f}s')


Performing Outer Fold 1/5
Performing GridSearchCV...
 Best Params: {'alpha_1': 1e-06, 'alpha_2': 1, 'lambda_1': 1, 'lambda_2': 1e-06}
 R²: 0.5759
 MAE: 0.5332

Performing Outer Fold 2/5
Performing GridSearchCV...
 Best Params: {'alpha_1': 1e-06, 'alpha_2': 1, 'lambda_1': 1, 'lambda_2': 1e-06}
 R²: 0.6137
 MAE: 0.5367

Performing Outer Fold 3/5
Performing GridSearchCV...
 Best Params: {'alpha_1': 1e-06, 'alpha_2': 1, 'lambda_1': 1, 'lambda_2': 1e-06}
 R²: 0.6085
 MAE: 0.5292

Performing Outer Fold 4/5
Performing GridSearchCV...
 Best Params: {'alpha_1': 1e-06, 'alpha_2': 1, 'lambda_1': 1, 'lambda_2': 1e-06}
 R²: 0.6213
 MAE: 0.5171

Performing Outer Fold 5/5
Performing GridSearchCV...
 Best Params: {'alpha_1': 1e-06, 'alpha_2': 1, 'lambda_1': 1, 'lambda_2': 1e-06}
 R²: 0.5875
 MAE: 0.5422
Bayesian Ridge Regression Results: 
 {'Nested CV R2': '0.6014 ± 0.0170', 'Nested CV MAE': '0.5317 ± 0.0084', 'Best Parameters with highest R2 score': {'alpha_1': 1e-06, 'alpha_2': 1, 'lambda_1': 1, 'l