### 一.过拟合
建模的目的是让模型学习到数据的一般性规律，但如果模型太复杂，反而容易学过头，学到一些数据的噪声特性，虽然模型在训练集上表现很好，但在测试集上结果往往会变差，这时模型陷入了**过拟合**，接下来造一些伪数据进行演示：

In [1]:
import os
os.chdir('../')
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
#生成200条数据，前100条数据做训练，后100条数据做测试
X=np.random.random(size=(200,20))
tw=np.random.random(size=(20,1))*100#设置w
tb=np.random.random()#设置b
Y=X.dot(tw)+tb
X=X+np.random.random(size=(200,20))*0.5#给X加噪声

In [3]:
from ml_models.linear_model import *
lr=LinearRegression()
lr.fit(X[:100],Y[:100])

In [4]:
#查看训练集，测试集上的误差
'训练集std:',np.std(Y[:100]-lr.predict(X[:100])),'测试集std:',np.std(Y[100:]-lr.predict(X[100:]))

('训练集std:', 38.92647827774521, '测试集std:', 43.58590001127093)

### 二.正则化
可以看到，测试集上误差比训练集的误差高出不少，避免模型陷入过拟合，通常可以通过正则化技术避免，常见的操作就是在loss函数中为权重$w$添加$L_1$或者$L_2$约束，借用上一节的公式推导，直接展示loss部分：  

1.线性回归中添加$L_1$约束称为Lasso回归，其损失函数如下：  
$$
L(w)=\sum_{i=1}^m(y_i-f(x_i))^2+\lambda||w||_1
$$  
2.线性回归中添加$L_2$约束称为Ridge回归，其损失函数如下：  
$$
L(w)=\sum_{i=1}^m(y_i-f(x_i))^2+\alpha||w||_2
$$ 
3.如果不太确定用$L_1$好，还是$L_2$好，可以用它们的组合，称作ElasticNet，损失函数如下：  
$$
L(w)=\sum_{i=1}^m(y_i-f(x_i))^2+\lambda||w||_1+\alpha||w||_2
$$ 
可以发现通过调整超参，可以控制$w$的大小，如果$\lambda$或$\alpha$设置很大，$w$会被约束的很小，而如果$\alpha$或$\lambda$设置为0，等价于原始的不带正则项的线性回归；通常可以通过交叉验证，根据验证集上的表现来设置一个合适的超参；接下来在上一节线性回归代码的基础上实现Lasso,Ridge,ElasticNet模型，另外设置两个参数`l1_ratio`以及`l2_ratio`，分别用来控制$L_1$和$L_2$的loss部分的权重
### 三.代码实现

In [5]:
class LinearRegression(object):
    def __init__(self, fit_intercept=True, solver='sgd', if_standard=True, epochs=10, eta=1e-2, batch_size=1,
                 l1_ratio=None, l2_ratio=None):
        """
        :param fit_intercept: 是否训练bias
        :param solver:
        :param if_standard:
        """
        self.w = None
        self.fit_intercept = fit_intercept
        self.solver = solver
        self.if_standard = if_standard
        if if_standard:
            self.feature_mean = None
            self.feature_std = None
        self.epochs = epochs
        self.eta = eta
        self.batch_size = batch_size
        self.l1_ratio = l1_ratio
        self.l2_ratio = l2_ratio
        # 注册sign函数
        self.sign_func = np.vectorize(utils.sign)

    def init_params(self, n_features):
        """
        初始化参数
        :return:
        """
        self.w = np.random.random(size=(n_features, 1))

    def _fit_closed_form_solution(self, x, y):
        """
        直接求闭式解
        :param x:
        :param y:
        :return:
        """
        if self.l1_ratio is None and self.l2_ratio is None:
            self.w = np.linalg.pinv(x).dot(y)
        elif self.l1_ratio is None and self.l2_ratio is not None:
            self.w = np.linalg.inv(x.T.dot(x) + self.l2_ratio * np.eye(x.shape[1])).dot(x.T).dot(y)
        else:
            self._fit_sgd(x, y)

    def _fit_sgd(self, x, y):
        """
        随机梯度下降求解
        :param x:
        :param y:
        :param epochs:
        :param eta:
        :param batch_size:
        :return:
        """
        x_y = np.c_[x, y]
        # 按batch_size更新w,b
        for _ in range(self.epochs):
            np.random.shuffle(x_y)
            for index in range(x_y.shape[0] // self.batch_size):
                batch_x_y = x_y[self.batch_size * index:self.batch_size * (index + 1)]
                batch_x = batch_x_y[:, :-1]
                batch_y = batch_x_y[:, -1:]

                dw = -2 * batch_x.T.dot(batch_y - batch_x.dot(self.w)) / self.batch_size

                # 添加l1和l2的部分
                dw_reg = np.zeros(shape=(x.shape[1] - 1, 1))
                if self.l1_ratio is not None:
                    dw_reg += self.l1_ratio * self.sign_func(self.w[:-1]) / self.batch_size
                if self.l2_ratio is not None:
                    dw_reg += 2 * self.l2_ratio * self.w[:-1] / self.batch_size
                dw_reg = np.concatenate([dw_reg, np.asarray([[0]])], axis=0)
                dw += dw_reg
                self.w = self.w - self.eta * dw

    def fit(self, x, y):
        # 是否归一化feature
        if self.if_standard:
            self.feature_mean = np.mean(x, axis=0)
            self.feature_std = np.std(x, axis=0) + 1e-8
            x = (x - self.feature_mean) / self.feature_std
        # 是否训练bias
        if self.fit_intercept:
            x = np.c_[x, np.ones_like(y)]
        # 初始化参数
        self.init_params(x.shape[1])
        # 训练模型
        if self.solver == 'closed_form':
            self._fit_closed_form_solution(x, y)
        elif self.solver == 'sgd':
            self._fit_sgd(x, y)

    def get_params(self):
        """
        输出原始的系数
        :return: w,b
        """
        if self.fit_intercept:
            w = self.w[:-1]
            b = self.w[-1]
        else:
            w = self.w
            b = 0
        if self.if_standard:
            w = w / self.feature_std.reshape(-1, 1)
            b = b - w.T.dot(self.feature_mean.reshape(-1, 1))
        return w.reshape(-1), b

    def predict(self, x):
        """
        :param x:ndarray格式数据: m x n
        :return: m x 1
        """
        if self.if_standard:
            x = (x - self.feature_mean) / self.feature_std
        if self.fit_intercept:
            x = np.c_[x, np.ones(shape=x.shape[0])]
        return x.dot(self.w)

    def plot_fit_boundary(self, x, y):
        """
        绘制拟合结果
        :param x:
        :param y:
        :return:
        """
        plt.scatter(x[:, 0], y)
        plt.plot(x[:, 0], self.predict(x), 'r')

### Lasso

In [6]:
lasso=LinearRegression(l1_ratio=0.01)
lasso.fit(X[:100],Y[:100])
#查看训练集，测试集上的误差
'训练集std:',np.std(Y[:100]-lasso.predict(X[:100])),'测试集std:',np.std(Y[100:]-lasso.predict(X[100:]))

('训练集std:', 39.739679305872265, '测试集std:', 45.36042539946315)

In [7]:
#与sklearn对比
from sklearn.linear_model import Lasso
lasso=Lasso()
lasso.fit(X[:100],Y[:100])
'训练集std:',np.std(Y[:100]-lasso.predict(X[:100])),'测试集std:',np.std(Y[100:]-lasso.predict(X[100:]))

('训练集std:', 118.61397787324181, '测试集std:', 101.62672135934089)

### Ridge

In [8]:
ridge=LinearRegression(l2_ratio=0.01)
ridge.fit(X[:100],Y[:100])
#查看训练集，测试集上的误差 
'训练集std:',np.std(Y[:100]-ridge.predict(X[:100])),'测试集std:',np.std(Y[100:]-ridge.predict(X[100:]))

('训练集std:', 38.51364038546339, '测试集std:', 42.68319768869535)

In [9]:
#与sklearn对比
from sklearn.linear_model import Ridge
ridge=Ridge()
ridge.fit(X[:100],Y[:100])
'训练集std:',np.std(Y[:100]-ridge.predict(X[:100])),'测试集std:',np.std(Y[100:]-ridge.predict(X[100:]))

('训练集std:', 37.19078218147727, '测试集std:', 42.53910405375569)

### ElasticNet

In [10]:
elastic=LinearRegression(l1_ratio=0.01,l2_ratio=0.01)
elastic.fit(X[:100],Y[:100])
#查看训练集，测试集上的误差
'训练集std:',np.std(Y[:100]-elastic.predict(X[:100])),'测试集std:',np.std(Y[100:]-elastic.predict(X[100:]))

('训练集std:', 37.39721822175668, '测试集std:', 43.91945143391026)

In [11]:
#与sklearn对比
from sklearn.linear_model import ElasticNet
elastic=ElasticNet()
elastic.fit(X[:100],Y[:100])
'训练集std:',np.std(Y[:100]-elastic.predict(X[:100])),'测试集std:',np.std(Y[100:]-elastic.predict(X[100:]))

('训练集std:', 94.67535553651277, '测试集std:', 82.85456858410238)

将sign函数整理到ml_models.utils中