3. 使用梯度下降实现最小二乘法

作业： 实现线性回归的代码，发布到各人Github

选作：
1. 收敛条件的判断
2. 学习率的选择
3. 类的形式

In [1]:
import numpy as np
import pandas as pd

In [2]:
# 初始化
data = pd.read_csv('height_train.csv')
X = data.loc[:,['father_height','mother_height','boy_dummy']].values
y = data.child_height.values

In [3]:
print("Features: \n", X)
print("Targets: \n", y)

Features: 
 [[1.76 1.6  0.  ]
 [1.71 1.63 1.  ]
 [1.7  1.66 0.  ]
 ...
 [1.72 1.6  0.  ]
 [1.77 1.63 1.  ]
 [1.72 1.69 0.  ]]
Targets: 
 [1.66 1.76 1.67 ... 1.63 1.75 1.67]


In [4]:
class LinearRegressionHW:
    def __init__(self, alpha=0.1, n_rounds=10000, epsilon=1e-9):
        self.alpha = alpha  #学习率
        self.n_rounds = n_rounds  #训练次数
        self.epsilon = epsilon   #最大残差
                
    def fit(self, X, y):
        X = np.c_[X, np.ones(X.shape[0])]  #添加常数项
        y = np.array(y)
        n_features = X.shape[1]
        self.betas = np.array([0.0]*n_features) #构建权重向量
        for i in range(self.n_rounds):
            #计算 epsilon
            epsilon = y - np.dot(X, self.betas)
            if np.sum(np.square(epsilon)) <= self.epsilon:               
                break
            #更新 beta
            for j in range(n_features):
                gradient = -np.mean(epsilon*X[:,j])
                self.betas[j] = self.betas[j] - self.alpha*gradient
                
        return self.betas
    
    def predict(self, X):
        X = np.c_[X, np.ones(X.shape[0])]  #添加常数项
        return np.dot(X, self.betas)

In [5]:
# 模型拟合
model = LinearRegressionHW()
model.fit(X,y)

test_data = pd.read_csv('height_test.csv')
testX = test_data.loc[:,['father_height','mother_height','boy_dummy']].values
model.predict(testX)

array([1.65346656, 1.73538399, 1.71662117, 1.74661988, 1.6312129 ,
       1.78894609, 1.64223067, 1.74629271, 1.7720372 , 1.65652117,
       1.64964856, 1.66055728, 1.73920199, 1.68968355, 1.64932139,
       1.71356656, 1.61277724, 1.656194  , 1.6207404 , 1.66459339,
       1.65673928, 1.65706645, 1.67855672, 1.62706774, 1.65695739,
       1.76069226, 1.66033917, 1.77912792, 1.63426751, 1.73549305,
       1.60186852, 1.76091037, 1.64277595, 1.63481279, 1.7498926 ,
       1.75741954, 1.679102  , 1.7498926 , 1.727748  , 1.65641211,
       1.66448433, 1.731566  , 1.77530993, 1.62052229, 1.76058321,
       1.63873984, 1.64201256, 1.64233973, 1.77476465, 1.66023011,
       1.65303034, 1.731566  , 1.75305627, 1.78643676, 1.68641083,
       1.73505683, 1.72785705, 1.7460746 , 1.720112  , 1.64899423,
       1.58037825, 1.68259283, 1.76472837, 1.68913827, 1.73200222,
       1.63590334, 1.69699238, 1.67561116, 1.63437657, 1.64212162,
       1.76429215, 1.74618366, 1.64964856, 1.73527494, 1.65684

In [6]:
# 用sklearn中的LinearRegression进行预测
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X, y)
print(lr.coef_, lr.intercept_)
lr.predict(testX)

[0.23959427 0.25013358 0.10030806] 0.8274299645517064


array([1.6523579 , 1.74037007, 1.72765261, 1.74797947, 1.63734989,
       1.77588764, 1.6447485 , 1.74766329, 1.76496186, 1.65422688,
       1.64975117, 1.65704439, 1.7429768 , 1.67652811, 1.64943499,
       1.72578363, 1.6249486 , 1.6539107 , 1.63047824, 1.65986191,
       1.65443767, 1.65475384, 1.66902411, 1.63442698, 1.65464845,
       1.75724706, 1.65683361, 1.76964835, 1.63921886, 1.74047547,
       1.61765538, 1.75745785, 1.64527547, 1.63974583, 1.75005924,
       1.7551673 , 1.66955107, 1.75005924, 1.73515661, 1.65412149,
       1.65975652, 1.73776334, 1.76704162, 1.63026745, 1.75714167,
       1.64245795, 1.64453772, 1.64485389, 1.76651466, 1.65672821,
       1.65193633, 1.73776334, 1.75203361, 1.77454563, 1.67444835,
       1.74005389, 1.73526201, 1.74745251, 1.72994316, 1.64911881,
       1.60338512, 1.67184162, 1.76006458, 1.67600115, 1.73818492,
       1.64079976, 1.68142539, 1.66726052, 1.63932426, 1.64464311,
       1.75964301, 1.7475579 , 1.64975117, 1.74026468, 1.65454

In [7]:
# 确认实现的LinearRegression函数得到的结果与sklearn中的LinearRegression的结果相同
print("Max difference: ", (model.predict(testX) - lr.predict(testX)).max())
print("Average difference: ", (model.predict(testX) - lr.predict(testX)).mean())

Max difference:  0.02755339950543667
Average difference:  0.0001611706407122351


对孩子身高预测的最大差异为2.76cm, 平均差异为0.016cm. 