# 从零实现线性回归

## 生成数据集

用线性模型来生成一个数据集, 然后用线性回归模型来拟合这个数据集 (也就是恢复出线性模型的参数). 

即, 构造标签的真实值公式为: $y = Xw+b+\epsilon$, 通过线性回归模型来拟合得到$\hat{w},\hat{b}$, (尽可能地) 恢复出参数$w$和$b$.

In [1]:
import torch

import LinearRegression as linear

true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = linear.synthetic_data(true_w, true_b, 1000)
print('features:', features[0], '\nlabel:', labels[0])

features: tensor([-0.0985, -1.0528]) 
label: tensor([7.5952])


## 读取数据集

In [2]:
batch_size = 10
for X, y in linear.data_iter(batch_size, features, labels):
    print(X, '\n', y)
    break

tensor([[ 0.3489,  0.7270],
        [-0.5310,  1.0907],
        [ 1.0167, -0.9754],
        [ 0.5836, -0.3068],
        [-0.0947, -0.5861],
        [ 0.5534, -0.0115],
        [-0.8351,  1.7758],
        [ 0.4035,  0.1887],
        [-0.0557, -1.5214],
        [ 1.1420,  0.1773]]) 
 tensor([[ 2.4133],
        [-0.5602],
        [ 9.5646],
        [ 6.4233],
        [ 5.9990],
        [ 5.3507],
        [-3.5125],
        [ 4.3698],
        [ 9.2586],
        [ 5.8779]])


## 定义/ 初始化模型

In [11]:
# 初始化模型参数
w, b = linear.init_params()
# 定义模型
net = linear.linreg
# 定义损失函数
loss = linear.squared_loss
# 定义优化算法
optimizer = linear.sgd

## 训练

完整流程: 

- 初始化模型参数
- 重复直到完成:
    - 计算模型在数据集上的输出
    - 计算损失
    - 优化 (即更新模型参数, 梯度反向传播)

In [12]:
# 训练模型
lr, num_epochs = 0.03, 3
linear.train_linreg(net, loss, optimizer, lr, num_epochs, batch_size, features, labels, w, b)

print("\nDiff between true_w and w: \n", true_w - w.reshape(true_w.shape))
print("Diff between true_b and b: \n", true_b - b)

epoch 1, loss 0.031734
epoch 2, loss 0.000108
epoch 3, loss 0.000051

Diff between true_w and w: 
 tensor([ 0.0005, -0.0005], grad_fn=<SubBackward0>)
Diff between true_b and b: 
 tensor([0.0003], grad_fn=<RsubBackward1>)
