# Logistic Regression Lab

## 准备工作
### 环境准备

请确保完成以下依赖包的安装，并且通过下面代码来导入与验证。运行成功后，你会看到一个新的窗口，其展示了一张空白的figure。

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from typing import Tuple, List

# display the plot in a separate window
%matplotlib tk

np.random.seed(12)

# create a figure and axis
plt.ion()
fig = plt.figure(figsize=(12, 5))

### 数据集准备

你将使用以下二维数据集来训练逻辑分类器，并观察随着训练的进行，线性分割面的变化。

该数据集包含两个特征和一个标签，其中标签 $ y \in \{-1,1\} $。

请执行下面的代码以加载数据集并对其进行可视化。

In [2]:
from data_generator import gen_2D_dataset

x_train, y_train = gen_2D_dataset(100, 100, noise = 0)
x_test, y_test = gen_2D_dataset(50, 50, noise = 0.7) 

In [3]:
from vis_util import visualize_2D_dataset, visualize_2D_border

visualize_2D_dataset(x_train, y_train)
visualize_2D_dataset(x_test, y_test)

## 逻辑回归 (10 pts)

在这一部分，你将学习并完成逻辑回归相关代码的编写与训练。

在运行这部分代码之前，请确保你已经完成了 `logistics.py` 文件的代码补全。

完成后，运行以下代码，你会看到一张figure来展示$||w||$，loss和决策边界的变化。

In [4]:
from logistic import LogisticRegression

# create a LogisticRegression object 
LR = LogisticRegression()

# fit the model to the training data without regularization (reg = 0)
LR.fit(x_train, y_train, lr=0.1, n_iter=1000,reg=0)

iter: 0, loss: 136.14532709098182, w_module: 8.993435416843157
iter: 10, loss: 0.359759630717664, w_module: 24.24719620628085
iter: 20, loss: 0.13118174150957568, w_module: 24.278731117586574
iter: 30, loss: 0.0809365439907071, w_module: 24.297910118626493
iter: 40, loss: 0.05906695316217151, w_module: 24.31222741327293
iter: 50, loss: 0.04700579115531036, w_module: 24.323904395326306
iter: 60, loss: 0.03950744604858383, w_module: 24.33393739688583
iter: 70, loss: 0.03450940749629354, w_module: 24.342868008663853
iter: 80, loss: 0.03103076711442668, w_module: 24.351025905003333
iter: 90, loss: 0.02854182134128911, w_module: 24.35862694836148
iter: 100, loss: 0.026728971724398174, w_module: 24.365819526626513
iter: 110, loss: 0.025393239126046276, w_module: 24.372709013530542
iter: 120, loss: 0.024401547545706156, w_module: 24.379371839367078
iter: 130, loss: 0.023661333032306053, w_module: 24.385864171657545
iter: 140, loss: 0.023106373306861838, w_module: 24.3922275900966
iter: 150, l

运行上述代码，你会发现，在不考虑正则化的情况下，$||w||$ 随着训练次数的增加会不断增大。

训练完成后，你可以利用训练得到的分类器来进行预测。请你编写代码，计算训练集和测试集中的预测准确率。

In [5]:
# Implement the code to compute the accuracy of logistic regression (LR) in the test set. Note that LR itself is already trained, if you have run the above code.

# training accuracy
def compute_acc(y_test, y_pred):
    return (np.sum(y_test == y_pred) / y_test.shape[0]) * 100

# TODO: compute the y_pred using LR.predict() function
x = np.concatenate((x_train, np.ones((x_train.shape[0], 1))), axis=1)
y_pred = LR.predict(x)[1]

# TODO: compute the accuracy
train_acc = compute_acc(y_train, y_pred)

print("Train accuracy: {}".format(train_acc))

# TODO: test accuracy, proceed similarly as above
x_test_modified = np.concatenate((x_test, np.ones((x_test.shape[0], 1))), axis=1)
y_pred = LR.predict(x_test_modified)[1]

test_acc = compute_acc(y_test, y_pred)

print("Test accuracy: {}".format(test_acc))

Train accuracy: 100.0
Test accuracy: 99.0


In [6]:
# create a LogisticRegression object and train it when using regularization
LR = LogisticRegression()
LR.fit(x_train, y_train, lr=0.1, n_iter=1000,reg=0.1)

iter: 0, loss: 152.45451735975882, w_module: 11.32850084425441
iter: 10, loss: 15.229012221168228, w_module: 17.179193155454783
iter: 20, loss: 12.717067631116134, w_module: 15.612012468648814
iter: 30, loss: 10.806138783472308, w_module: 14.246365049809002
iter: 40, loss: 9.41931685932985, w_module: 13.086407315557896
iter: 50, loss: 8.480656088343297, w_module: 12.136250033768023
iter: 60, loss: 7.899792694049424, w_module: 11.393157726481478
iter: 70, loss: 7.575200935924055, w_module: 10.841599358152875
iter: 80, loss: 7.411459160029551, w_module: 10.452937163736342
iter: 90, loss: 7.336022845908454, w_module: 10.191309999841135
iter: 100, loss: 7.303678380295568, w_module: 10.021462596463838
iter: 110, loss: 7.290506041904443, w_module: 9.914067060190575
iter: 120, loss: 7.28531856668592, w_module: 9.847373376301473
iter: 130, loss: 7.283314996444741, w_module: 9.806442507988576
iter: 140, loss: 7.282548008302744, w_module: 9.781511476891872
iter: 150, loss: 7.282254694905058, w_m

In [7]:
# TODO: Implement the code to compute the accuracy of logistic regression (LR) in the test set. Note that LR itself is already trained, if you have run the above code.

# TODO: compute the y_pred using LR.predict() function
x = np.concatenate((x_train, np.ones((x_train.shape[0], 1))), axis=1)
y_pred = LR.predict(x)[1]

# TODO: compute the accuracy
train_acc = compute_acc(y_train, y_pred)

print("Train accuracy: {}".format(train_acc))

# TODO: test accuracy, proceed similarly as above
x_test_modified = np.concatenate((x_test, np.ones((x_test.shape[0], 1))), axis=1)
y_pred = LR.predict(x_test_modified)[1]

test_acc = compute_acc(y_test, y_pred)

print("Test accuracy: {}".format(test_acc))

Train accuracy: 100.0
Test accuracy: 99.0


运行上述带有正则化的代码后，请观察 $||w||$ 的变化，并讨论正则化的实际意义。(请将答案写在下方)

$||w||$ 随着迭代次数的增加逐渐下降，可以防止过拟合，提高模型的泛化能力，防止某些特征获得很大的权重，降低variance。