##### 逻辑回归运算及训练流程
1. 准备数据
    * 模型训练数据：拆分为训练集、测试集
    * 参数初始化：包括权重参数（权重和偏执）、超参数（学习率和迭代次数）
2. 模型运算：把自变量、权重参数导入模型，计算y_hat，这个过程也称为前向运算；
3. 计算损失：通过负对数损失函数，计算y和y_hat的损失；
4. 计算梯度：通过自变量（x）,y,y_hat计算权重和偏执的梯度；
5. 更新参数：根据4中计算的梯度，更新权重和偏执；
6. 模型训练：重复2-5步骤，直到达到最大迭代次数；
7. 模型测试：使用6中训练得到的权重参数，将测试集的x和y导入模型，计算模型的准确率；

In [1]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import numpy as np

In [99]:
# 1、准备数据
# 数据集
X, y = make_classification(n_features=10)
# 拆分训练和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

In [100]:
# 初始化权重参数
theta = np.random.randn(10) # 权重：shape=(10,)
# theta = np.random.randn(1,10) # 权重：shape=(1,10)
bias = 0 # 偏置
# 初始化超参数
lr = 1e-2 # 学习率
epochs = 1000 # 迭代次数

In [101]:
# 2、模型运算
def forward(x, theta, bias):
    # 线性运算
    z = np.dot(theta, x.T) + bias
    # 激活函数
    y_hat = 1 / (1 + np.exp(-z))
    return y_hat

In [102]:
# 3、计算损失
def loss_fn(y, y_hat):
    e = 1e-8
    return -y * np.log(y_hat + e) - (1 - y) * np.log(1 - y_hat + e)

In [None]:
# 4、计算梯度
def gradient_fn(x, y, y_hat):
    # 样本数量
    m = x.shape[0]
    delta_w = np.dot(y_hat-y, x) / m
    delta_b = np.mean(y_hat - y)
    return delta_w, delta_b

In [104]:
# 模型训练
for i in range(epochs):
    # 前向运算（模型运算）
    y_hat = forward(X_train, theta, bias)
    # 计算损失
    loss = loss_fn(y_train, y_hat)
    # 计算梯度
    delta_w, delta_b = gradient_fn(X_train, y_train, y_hat)
    # 更新权重
    theta -= lr * delta_w
    bias -= lr * delta_b
    if i % 100 == 0:
        acc = np.mean(np.round(y_hat) == y_train)
        print(f'epoch: {i}, loss: {np.mean(loss)}, acc: {acc}')
print(theta)
print(bias)

epoch: 0, loss: 0.4691719843363235, acc: 0.8285714285714286
epoch: 100, loss: 0.18257798151531204, acc: 0.9428571428571428
epoch: 200, loss: 0.13020869010965674, acc: 0.9714285714285714
epoch: 300, loss: 0.10893855437647666, acc: 0.9857142857142858
epoch: 400, loss: 0.09692184822957814, acc: 0.9857142857142858
epoch: 500, loss: 0.08883132383877079, acc: 0.9857142857142858
epoch: 600, loss: 0.08282836038657056, acc: 0.9857142857142858
epoch: 700, loss: 0.07810766319930995, acc: 0.9857142857142858
epoch: 800, loss: 0.07425142291093716, acc: 0.9857142857142858
epoch: 900, loss: 0.07101540875822354, acc: 0.9857142857142858
[ 0.04423183  1.54467873  0.27714728  0.50581788 -0.00806348 -0.81124878
  1.20053651  3.79124636  0.51692506 -0.26030022]
0.12171880354243696


In [113]:
# 模型测试
idx = np.random.randint(len(X_test))
x = X_test[idx]
y = y_test[idx]
pred = np.round(forward(x, theta, bias))
print(x)
print(f'y: {y}, pred: {pred}')

[ 0.22173994  0.32231868 -1.60223837  1.39487168 -0.50878385 -0.7399641
 -0.11264061  1.40288028  0.29808052 -0.38996952]
y: 1, pred: 1.0
