# 正則化(罰則付き)回帰モデル: Lasso回帰

正則化のもう一つの手法は、Lasso回帰になります。

リッジ回帰の違いとは、正則化項を係数の絶対値の和(L1正則化)を採用しています。

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
from sklearn import preprocessing
from sklearn.model_selection import cross_val_score

# ignore warning message
import warnings
warnings.filterwarnings('ignore')

df= pd.read_csv('data/winequality-red.csv',sep=';')
# drop result value
df1 = df.drop(columns='quality')
y = df['quality'].values.reshape(-1,1)
scaler = preprocessing.StandardScaler()
# regularization parameter
alpha = 2 ** (-16)
X = df1.values
X_fit = scaler.fit_transform(X)
# result storage
df_ridge_coeff = pd.DataFrame(columns=df1.columns) # Ridgeの場合の回帰係数
df_ridge_result = pd.DataFrame(columns=['alpha','TrainErr','TestErr'])
df_lasso_coeff = pd.DataFrame(columns=df1.columns) # Lassoの場合の回帰係数
df_lasso_result = pd.DataFrame(columns=['alpha','TrainErr','TestErr'])
while alpha <= 2 ** 12:
    # Ridge
    model_ridge = linear_model.Ridge(alpha=alpha)
    model_ridge.fit(X_fit,y)
    mse_ridge = mean_squared_error(model_ridge.predict(X_fit),y)
    scores_ridge = cross_val_score(model_ridge,X_fit,y,scoring="neg_mean_squared_error",cv=10)
    df_ridge_coeff = df_ridge_coeff.append(pd.Series(model_ridge.coef_[0],index=df_ridge_coeff.columns),ignore_index=True)
    df_ridge_result = df_ridge_result.append(pd.Series([alpha,mse_ridge,-scores_ridge.mean()],index=df_ridge_result.columns),ignore_index=True)    
    # Lasso
    model_lasso = linear_model.Lasso(alpha=alpha)
    model_lasso.fit(X_fit,y)
    mse_lasso = mean_squared_error(model_lasso.predict(X_fit),y)
    scores_lasso = cross_val_score(model_lasso,X_fit,y,scoring="neg_mean_squared_error",cv=10)
    df_lasso_coeff = df_lasso_coeff.append(pd.Series(model_lasso.coef_,index=df_lasso_coeff.columns),ignore_index=True)
    df_lasso_result = df_lasso_result.append(pd.Series([alpha,mse_lasso,-scores_lasso.mean()],index=df_lasso_result.columns),ignore_index=True)    
    alpha = alpha * 2

In [2]:
print('====================== Ridge Regression ======================')
print(df_ridge_result)
print('====================== Lasso Regression ======================')
print(df_lasso_result)

          alpha  TrainErr   TestErr
0      0.000015  0.416767  0.435185
1      0.000031  0.416767  0.435185
2      0.000061  0.416767  0.435185
3      0.000122  0.416767  0.435185
4      0.000244  0.416767  0.435185
5      0.000488  0.416767  0.435185
6      0.000977  0.416767  0.435185
7      0.001953  0.416767  0.435185
8      0.003906  0.416767  0.435185
9      0.007812  0.416767  0.435185
10     0.015625  0.416767  0.435184
11     0.031250  0.416767  0.435184
12     0.062500  0.416767  0.435182
13     0.125000  0.416767  0.435180
14     0.250000  0.416767  0.435175
15     0.500000  0.416767  0.435165
16     1.000000  0.416767  0.435146
17     2.000000  0.416768  0.435107
18     4.000000  0.416769  0.435033
19     8.000000  0.416774  0.434894
20    16.000000  0.416793  0.434649
21    32.000000  0.416863  0.434265
22    64.000000  0.417102  0.433799
23   128.000000  0.417864  0.433617
24   256.000000  0.420109  0.434870
25   512.000000  0.426075  0.440302
26  1024.000000  0.439988  0

両方とも正則化パラメータ(alpha)が小さい方が訓練誤差が少ない。

しかしテスト誤差が正則パラメータの方に相関はありません。

In [3]:
print('====================== Ridge Regression ======================')
df_ridge_coeff



Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol
0,0.043497,-0.193967,-0.035553,0.023019,-0.088183,0.045606,-0.107356,-0.033737,-0.063842,0.155277,0.294243
1,0.043497,-0.193967,-0.035553,0.023019,-0.088183,0.045606,-0.107356,-0.033737,-0.063842,0.155277,0.294243
2,0.043497,-0.193967,-0.035553,0.023019,-0.088183,0.045606,-0.107356,-0.033737,-0.063842,0.155276,0.294243
3,0.043497,-0.193967,-0.035553,0.023019,-0.088183,0.045606,-0.107356,-0.033737,-0.063842,0.155276,0.294243
4,0.043497,-0.193967,-0.035552,0.023019,-0.088183,0.045606,-0.107356,-0.033737,-0.063842,0.155276,0.294243
5,0.043497,-0.193967,-0.035552,0.023019,-0.088183,0.045606,-0.107356,-0.033737,-0.063842,0.155276,0.294243
6,0.043498,-0.193966,-0.035552,0.023019,-0.088183,0.045606,-0.107356,-0.033738,-0.063842,0.155276,0.294242
7,0.043498,-0.193966,-0.035552,0.023019,-0.088183,0.045606,-0.107356,-0.033738,-0.063842,0.155276,0.294242
8,0.043498,-0.193966,-0.035551,0.023019,-0.088183,0.045606,-0.107355,-0.033739,-0.063841,0.155276,0.294241
9,0.0435,-0.193965,-0.03555,0.02302,-0.088183,0.045605,-0.107355,-0.033741,-0.06384,0.155276,0.29424


In [4]:
print('====================== Lasso Regression ======================')
df_lasso_coeff



Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol
0,0.043368,-0.193945,-0.035475,0.022966,-0.088174,0.045574,-0.107327,-0.033629,-0.063855,0.155241,0.294283
1,0.043238,-0.193924,-0.035397,0.022913,-0.088165,0.045542,-0.107297,-0.033521,-0.063867,0.155206,0.294323
2,0.042979,-0.193882,-0.035242,0.022808,-0.088147,0.045477,-0.107239,-0.033305,-0.063892,0.155136,0.294403
3,0.042459,-0.193797,-0.034932,0.022597,-0.088111,0.045349,-0.107123,-0.032872,-0.063942,0.154996,0.294563
4,0.041418,-0.193627,-0.034312,0.022174,-0.088039,0.045092,-0.10689,-0.032005,-0.064042,0.154715,0.294884
5,0.039335,-0.193288,-0.033071,0.021328,-0.087896,0.044578,-0.106424,-0.030269,-0.064244,0.154153,0.295526
6,0.035169,-0.19261,-0.03059,0.019636,-0.087609,0.043551,-0.105494,-0.026798,-0.064648,0.153029,0.296811
7,0.026669,-0.191255,-0.025623,0.016204,-0.087053,0.041512,-0.103652,-0.019715,-0.065552,0.15077,0.299455
8,0.009667,-0.188545,-0.01569,0.009339,-0.085942,0.037435,-0.099968,-0.005547,-0.067361,0.146252,0.304743
9,0.0,-0.183183,-0.0,0.002591,-0.081657,0.027684,-0.090231,-0.0,-0.060154,0.139798,0.304033


最後に、回帰係数について確認していきます。

Ridgeの方は、alphaが増加するにつれ、Ridge回帰は少しずつ回帰係数を小さくしていきます。

しかし、Lasso回帰は影響度の小さい説明変数の回帰係数をちょうど0にすることがあります。

Ridge回帰に比べて、Lasso回帰は変数を選択して線形回帰を行っていると解釈できる。