## Import Libraries
We begin by importing the necessary libraries. We need Scikit-learn, NumPy, and SciPy.

In [7]:
import numpy as np
from time import time
from scipy import sparse
from scipy import linalg
from sklearn.datasets import make_regression
from sklearn.linear_model import Lasso

## Generate Dense Data
Next, we generate some dense data that we will use for the Lasso regression. We use Scikit-learn's make_regression function to generate 200 samples with 5000 features.

In [8]:
X, y = make_regression(
    n_samples=200,       # 樣本數
    n_features=5000,     # 特徵數
    random_state=0
)

## Train Lasso on Dense Data
Now we train two Lasso regression models, one on the dense data and one on the sparse data. We set the alpha parameter to 1 and the maximum number of iterations to 1000.

In [None]:
# 2. 把小於閾值的元素設成 0，製造稀疏性
Xs = X.copy()
Xs[np.abs(Xs) < 2.5] = 0.0          # 讓矩陣有大量 0
X_sp = sparse.csc_matrix(Xs)        # 轉成 SciPy 稀疏格式

In [11]:
# 3. 建好兩個 Lasso 模型
alpha = 1
sparse_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=1000)
dense_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=1000)

## Fit Lasso to Dense Data
We fit the Lasso regression models to the dense data using Scikit-learn's fit function. We also time the fitting process and print the time for each Lasso model.

In [12]:
# 4. 訓練並計時 ― 稀疏版本
t0 = time()
sparse_lasso.fit(X_sp, y)
print(f"Sparse Lasso done in {time() - t0:.3f}s")

# 5. 訓練並計時 ― 稠密版本
t0 = time()
dense_lasso.fit(X, y)
print(f"Dense  Lasso done in {time() - t0:.3f}s")

Sparse Lasso done in 0.053s
Dense  Lasso done in 0.045s


## Compare Coefficients of Dense Lasso and Sparse Lasso
We compare the coefficients of the dense Lasso model and the sparse Lasso model to ensure that they are producing the same results. <br>We compute the Euclidean norm of the difference between the coefficients.

In [13]:
coeff_diff = linalg.norm(sparse_lasso.coef_ - dense_lasso.coef_)
print(f"Distance between coefficients : {coeff_diff:.2e}")

Distance between coefficients : 3.57e+02


## Generate Sparse Data
Next, we generate some sparse data that we will use for the Lasso regression. We copy the dense data from the previous step and replace all values less than 2.5 with 0. <br>We also convert the sparse data to Scipy's Compressed Sparse Column format.

In [15]:
Xs = X.copy()
Xs[Xs < 2.5] = 0.0
Xs_sp = sparse.coo_matrix(Xs)
Xs_sp = Xs_sp.tocsc()

## Train Lasso on Sparse Data
Now we train two Lasso regression models, one on the dense data and one on the sparse data. We set the alpha parameter to 0.1 and the maximum number of iterations to 10000.

In [16]:
alpha = 0.1
sparse_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=10000)
dense_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=10000)

## Fit Lasso to Sparse Data
We fit the Lasso regression models to the sparse data using Scikit-learn's fit function. We also time the fitting process and print the time for each Lasso model.

In [17]:
t0 = time()
sparse_lasso.fit(Xs_sp, y)
print(f"Sparse Lasso done in {(time() - t0):.3f}s")

t0 = time()
dense_lasso.fit(Xs, y)
print(f"Dense Lasso done in  {(time() - t0):.3f}s")

Sparse Lasso done in 0.213s
Dense Lasso done in  0.949s


## Compare Coefficients of Dense Lasso and Sparse Lasso
We compare the coefficients of the dense Lasso model and the sparse Lasso model to ensure that they are producing the same results. We compute the Euclidean norm of the difference between the coefficients.



In [18]:
coeff_diff = linalg.norm(sparse_lasso.coef_ - dense_lasso.coef_)
print(f"Distance between coefficients : {coeff_diff:.2e}")

Distance between coefficients : 9.24e-12


## Summary
In this lab, we demonstrated the use of Scikit-learn's Lasso regression algorithm on dense and sparse data. We showed that the Lasso algorithm provides the same results for dense and sparse data, and that in the case of sparse data, the algorithm is faster.