In [1]:
import numpy as np
import pandas as pd
from pathlib import Path
from sklearn.linear_model import Lasso, ElasticNet, Ridge
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import RobustScaler
import optuna
from sklearn.metrics import (
    r2_score,
    make_scorer
)

  from .autonotebook import tqdm as notebook_tqdm


## Exercise 4: Regression

In [2]:
DATA_PATH = Path("../data/")

In [3]:
X_train = np.load(DATA_PATH/"regression/X_train.npy")
y_train = np.load(DATA_PATH/"regression/y_train.npy")
X_test = np.load(DATA_PATH/"regression/X_test.npy")
y_test = np.load(DATA_PATH/"regression/y_test.npy")

In [26]:
X_train.shape

(100, 100)

In [4]:
r2 = make_scorer(r2_score)

### Lasso

Firstly, we will use Lasso regression because we are working with few samples (100) and a lot of dimensions (100). As we have seen during the course, Lasso is good fit to such a problem, so let's try see how it performs.

The first step is to see how the model performs with default parameters (no hyperparameter tuning). We will use cross-validation to evaluate the model. The random state will be fixed to 0 to have reproducible results. Also, we use a RobustScaler to scale the data as it's a good practice to do so. The choice of RobustScaler has been done empirically, we have tried with StandardScaler and MinMaxScaler but the results were not as good as with RobustScaler. We will not show the results of these tests here for the sake of brevity.

In [40]:
reg_pipeline = make_pipeline(
    RobustScaler(),
    Lasso(alpha=0.001, random_state=0)  # let's use a small alpha value as a baseline
)

In [41]:
scores = cross_val_score(reg_pipeline, X_train, y_train, cv=5, scoring=r2)
print(f"Cross-validation score: {np.mean(scores)}")

Cross-validation score: 0.5768188701521931


In [42]:
reg_pipeline.fit(X_train, y_train)
preds = reg_pipeline.predict(X_test)
print(f"Test score: {r2_score(y_test, preds)}")

Test score: 0.6695160578998104


The score on the test set is `0.669`, which is decent, but we can surely do better.

Now, we will use Optuna to find the best hyperparameters for our model

In [31]:
def objective(trial):
    alpha = trial.suggest_float("alpha", 0, 1)
    
    reg_pipeline = make_pipeline(
        RobustScaler(),
        Lasso(alpha=alpha, random_state=0)
    )
    
    scores = cross_val_score(reg_pipeline, X=X_train, y=y_train, cv=5, n_jobs=-1, scoring=r2)
    return np.mean(scores)
    
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)

[I 2023-06-19 10:20:27,005] A new study created in memory with name: no-name-ae0703b5-e0be-443d-9659-bac7b48ff908
[I 2023-06-19 10:20:27,813] Trial 0 finished with value: -0.09421908554927114 and parameters: {'alpha': 0.9240365338057458}. Best is trial 0 with value: -0.09421908554927114.
[I 2023-06-19 10:20:28,065] Trial 1 finished with value: 0.8427808614646327 and parameters: {'alpha': 0.044760968030146175}. Best is trial 1 with value: 0.8427808614646327.
[I 2023-06-19 10:20:28,311] Trial 2 finished with value: -0.09421908554927114 and parameters: {'alpha': 0.785369871508585}. Best is trial 1 with value: 0.8427808614646327.
[I 2023-06-19 10:20:28,325] Trial 3 finished with value: 0.148268223481041 and parameters: {'alpha': 0.16219048504681532}. Best is trial 1 with value: 0.8427808614646327.
[I 2023-06-19 10:20:28,336] Trial 4 finished with value: 0.45208885301285545 and parameters: {'alpha': 0.12398050591117749}. Best is trial 1 with value: 0.8427808614646327.
[I 2023-06-19 10:20:28

In [None]:
fig = optuna.visualization.plot_optimization_history(study)
fig.show()

In [33]:
reg_pipeline = make_pipeline(
    RobustScaler(),
    Lasso(alpha=study.best_params["alpha"], random_state=0)
)

In [34]:
reg_pipeline.fit(X_train, y_train)

In [35]:
preds = reg_pipeline.predict(X_test)

In [36]:
r2_score(y_test, preds)

0.8879002343301418

In [37]:
study.best_params["alpha"]

0.01732009415547434

We obtain a score of `0.888` on the test set when using $\alpha=0.0173$, which breaks the score of 0.84 we had to beat.

### ElasticNet

Let's compare Lasso with ElasticNet. We will go through the same steps as before, meaning we will first use default parameters, then use Optuna to find the best hyperparameters.

In [38]:
elasticnet_pipeline = make_pipeline(
    RobustScaler(),
    ElasticNet(alpha=0.001, l1_ratio=0.5, random_state=0)
)

In [39]:
scores = cross_val_score(elasticnet_pipeline, X_train, y_train, cv=5, scoring=r2)
print(f"Cross-validation score: {np.mean(scores)}")

Cross-validation score: 0.4797631802400579


In [43]:
elasticnet_pipeline.fit(X_train, y_train)
preds = elasticnet_pipeline.predict(X_test)
print(f"Test score: {r2_score(y_test, preds)}")

Test score: 0.47618853099249825


We get a score of `0.476`, which is worse than Lasso with default parameters. Let's see if we can do better with Optuna.

In [50]:
def objective_elasticnet(trial):
    alpha = trial.suggest_float("alpha", 0, 1)
    l1_ratio = trial.suggest_float("l1_ratio", 0, 1)
    
    elasticnet_pipeline = make_pipeline(
        RobustScaler(),
        ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=0)
    )
    
    scores = cross_val_score(elasticnet_pipeline, X=X_train, y=y_train, cv=5, n_jobs=-1, scoring=r2)
    return np.mean(scores)
    
study = optuna.create_study(direction='maximize')
study.optimize(objective_elasticnet, n_trials=100)

[I 2023-06-19 10:31:26,390] A new study created in memory with name: no-name-d983a317-8948-4b2d-a978-613a9fc5df9a
[I 2023-06-19 10:31:26,422] Trial 0 finished with value: 0.8694273524778822 and parameters: {'alpha': 0.029429479931544655, 'l1_ratio': 0.7880147131849551}. Best is trial 0 with value: 0.8694273524778822.
[I 2023-06-19 10:31:26,439] Trial 1 finished with value: -0.08987732956146344 and parameters: {'alpha': 0.26952761067480946, 'l1_ratio': 0.8806677990998693}. Best is trial 0 with value: 0.8694273524778822.
[I 2023-06-19 10:31:26,452] Trial 2 finished with value: 0.46215917995926203 and parameters: {'alpha': 0.3192460929789812, 'l1_ratio': 0.27580457715177653}. Best is trial 0 with value: 0.8694273524778822.
[I 2023-06-19 10:31:26,467] Trial 3 finished with value: -0.09421908554927114 and parameters: {'alpha': 0.34838617610248457, 'l1_ratio': 0.7938001406362594}. Best is trial 0 with value: 0.8694273524778822.
[I 2023-06-19 10:31:26,493] Trial 4 finished with value: 0.45943

In [51]:
elasticnet_pipeline = make_pipeline(
    RobustScaler(),
    ElasticNet(alpha=study.best_params["alpha"], l1_ratio=study.best_params["l1_ratio"], random_state=0)
)

In [52]:
scores = cross_val_score(elasticnet_pipeline, X_train, y_train, cv=5, scoring=r2)
print(f"Cross-validation score: {np.mean(scores)}")

Cross-validation score: 0.8733447777571737


In [53]:
elasticnet_pipeline.fit(X_train, y_train)
preds = elasticnet_pipeline.predict(X_test)
print(f"Test score: {r2_score(y_test, preds)}")

Test score: 0.8832858723033203


In [54]:
study.best_params

{'alpha': 0.023830447410053968, 'l1_ratio': 0.7786000832455452}

We get way better results, with a score of `0.883` on the test set when using $\alpha=0.0238$ and l1_ratio $=0.7786$. Still, Lasso performs a bit better.

### Ridge

Finally, let's look at Ridge regression.

In [55]:
ridge_pipeline = make_pipeline(
    RobustScaler(),
    Ridge(alpha=0.001, random_state=0)
)

In [56]:
scores = cross_val_score(ridge_pipeline, X_train, y_train, cv=5, scoring=r2)
print(f"Cross-validation score: {np.mean(scores)}")

Cross-validation score: 0.11581411054273852


In [57]:
ridge_pipeline.fit(X_train, y_train)
preds = ridge_pipeline.predict(X_test)
print(f"Test score: {r2_score(y_test, preds)}")

Test score: -7.605696412327797


The test score is terrible, maybe we can do better with Optuna.

In [58]:
def objective_ridge(trial):
    alpha = trial.suggest_float("alpha", 0, 1)
    
    ridge_pipeline = make_pipeline(
        RobustScaler(),
        Ridge(alpha=alpha, random_state=0)
    )
    
    scores = cross_val_score(ridge_pipeline, X=X_train, y=y_train, cv=5, n_jobs=-1, scoring=r2)
    return np.mean(scores)

study = optuna.create_study(direction='maximize')
study.optimize(objective_ridge, n_trials=100)

[I 2023-06-19 10:35:44,191] A new study created in memory with name: no-name-89367111-f793-4864-96ff-363283ba2cb7
[I 2023-06-19 10:35:44,233] Trial 0 finished with value: 0.34683022009308717 and parameters: {'alpha': 0.5654800370063418}. Best is trial 0 with value: 0.34683022009308717.
[I 2023-06-19 10:35:44,281] Trial 1 finished with value: 0.22393001783322627 and parameters: {'alpha': 0.17325013793994415}. Best is trial 0 with value: 0.34683022009308717.
[I 2023-06-19 10:35:44,300] Trial 2 finished with value: 0.372627018640712 and parameters: {'alpha': 0.7042851471082987}. Best is trial 2 with value: 0.372627018640712.
[I 2023-06-19 10:35:44,312] Trial 3 finished with value: 0.3781900704134939 and parameters: {'alpha': 0.7385267522770584}. Best is trial 3 with value: 0.3781900704134939.
[I 2023-06-19 10:35:44,324] Trial 4 finished with value: 0.3566811703197168 and parameters: {'alpha': 0.6149391142790094}. Best is trial 3 with value: 0.3781900704134939.
[I 2023-06-19 10:35:44,336] 

In [59]:
ridge_pipeline = make_pipeline(
    RobustScaler(),
    Ridge(alpha=study.best_params["alpha"], random_state=0)
)

In [60]:
scores = cross_val_score(ridge_pipeline, X_train, y_train, cv=5, scoring=r2)
print(f"Cross-validation score: {np.mean(scores)}")

Cross-validation score: 0.4130635995830314


In [61]:
ridge_pipeline.fit(X_train, y_train)
preds = ridge_pipeline.predict(X_test)
print(f"Test score: {r2_score(y_test, preds)}")

Test score: 0.5405781459312688


In [62]:
study.best_params

{'alpha': 0.999943284101946}

Optimizing $\alpha$ with Optuna gives a score of 0.54, which is not enough to beat Lasso and the score we had to beat.

### Conclusion

Lasso performs the best, with a score of `0.888` on the test set. ElasticNet is close behind with a score of `0.883`. Ridge is not good enough, with a score of `0.54`. A reason why Lasso and ElasticNet perform better than Ridge is that we have a lot of dimensions (100) and few samples (100), so Lasso and ElasticNet are better suited for this kind of problem. Also, Lasso and ElasticNet are very similar, so it's not surprising that they perform equally well.