# 全连接神经网络
在本练习中，我们将采用模块化方法实现全连接网络。对于每一层，我们都将实现一个 `forward` 和一个 `backward` 函数。`forward` 函数接收输入、权重和其他参数，并返回输出和一个用于反向传播的 `cache` 对象，如下所示：

<details><summary>英文原文</summary>

# Fully-Connected Neural Nets
In this exercise we will implement fully-connected networks using a modular approach. For each layer we will implement a `forward` and a `backward` function. The `forward` function will receive inputs, weights, and other parameters and will return both an output and a `cache` object storing data needed for the backward pass, like this:

```python
def layer_forward(x, w):
  """ Receive inputs x and weights w """
  # Do some computations ...
  z = # ... some intermediate value
  # Do some more computations ...
  out = # the output
   
  cache = (x, w, z, out) # Values we need to compute gradients
   
  return out, cache
```

The backward pass will receive upstream derivatives and the `cache` object, and will return gradients with respect to the inputs and weights, like this:

```python
def layer_backward(dout, cache):
  """
  Receive dout (derivative of loss with respect to outputs) and cache,
  and compute derivative with respect to inputs.
  """
  # Unpack cache values
  x, w, z, out = cache
  
  # Use values in cache to compute derivatives
  dx = # Derivative of loss with respect to x
  dw = # Derivative of loss with respect to w
  
  return dx, dw
```

After implementing a bunch of layers this way, we will be able to easily combine them to build classifiers with different architectures.
  
</details>

In [None]:
# 常规设置
from __future__ import print_function
import time
import numpy as np
import matplotlib.pyplot as plt
from cs231n.classifiers.fc_net import *
from cs231n.data_utils import get_CIFAR10_data
from cs231n.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array
from cs231n.solver import Solver

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # 设置默认图像大小
plt.rcParams['image.interpolation'] = 'nearest' # 设置图像插值方式
plt.rcParams['image.cmap'] = 'gray' # 设置默认色图

# 自动重载外部模块
# 参考 http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
  """ 返回相对误差 """
  return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

In [None]:
# 加载（预处理后的）CIFAR10数据。

data = get_CIFAR10_data()
for k, v in list(data.items()):
  print(('%s: ' % k, v.shape))

# 仿射层：前向
打开文件 `cs231n/layers.py` 并实现 `affine_forward` 函数。

完成后可以运行以下代码测试你的实现：
<details><summary>英文原文</summary>

# Affine layer: forward
Open the file `cs231n/layers.py` and implement the `affine_forward` function.

Once you are done you can test your implementaion by running the following:
</details>

In [None]:
# 测试 affine_forward 函数

num_inputs = 2
input_shape = (4, 5, 6)
output_dim = 3

input_size = num_inputs * np.prod(input_shape)
weight_size = output_dim * np.prod(input_shape)

x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)
w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)
b = np.linspace(-0.3, 0.1, num=output_dim)

out, _ = affine_forward(x, w, b)
correct_out = np.array([[ 1.49834967,  1.70660132,  1.91485297],
                        [ 3.25553199,  3.5141327,   3.77273342]])

# 与我们的输出进行比较。误差应在 e-9 或更小。
print('Testing affine_forward function:')
print('difference: ', rel_error(out, correct_out))

# 仿射层：反向
现在实现 `affine_backward` 函数，并使用数值梯度检查测试你的实现。
<details><summary>英文原文</summary>

# Affine layer: backward
Now implement the `affine_backward` function and test your implementation using numeric gradient checking.
</details>

In [None]:
# 测试 affine_backward 函数
np.random.seed(231)
x = np.random.randn(10, 2, 3)
w = np.random.randn(6, 5)
b = np.random.randn(5)
dout = np.random.randn(10, 5)

dx_num = eval_numerical_gradient_array(lambda x: affine_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_forward(x, w, b)[0], b, dout)

_, cache = affine_forward(x, w, b)
dx, dw, db = affine_backward(dout, cache)

# 误差应在 e-10 或更小
print('Testing affine_backward function:')
print('dx error: ', rel_error(dx_num, dx))
print('dw error: ', rel_error(dw_num, dw))
print('db error: ', rel_error(db_num, db))

# ReLU 激活：前向
在 `relu_forward` 函数中实现 ReLU 激活函数的前向传播，并使用以下代码测试你的实现：
<details><summary>英文原文</summary>

# ReLU activation: forward
Implement the forward pass for the ReLU activation function in the `relu_forward` function and test your implementation using the following:
</details>

In [None]:
# 测试 relu_forward 函数

x = np.linspace(-0.5, 0.5, num=12).reshape(3, 4)

out, _ = relu_forward(x)
correct_out = np.array([[ 0.,          0.,          0.,          0.,        ],
                        [ 0.,          0.,          0.04545455,  0.13636364,],
                        [ 0.22727273,  0.31818182,  0.40909091,  0.5,       ]])

# 与我们的输出进行比较。误差应在 e-8 量级
print('Testing relu_forward function:')
print('difference: ', rel_error(out, correct_out))

# ReLU 激活：反向
现在在 `relu_backward` 函数中实现 ReLU 激活函数的反向传播，并使用数值梯度检查测试你的实现：
<details><summary>英文原文</summary>

# ReLU activation: backward
Now implement the backward pass for the ReLU activation function in the `relu_backward` function and test your implementation using numeric gradient checking:
</details>

In [None]:
np.random.seed(231)
x = np.random.randn(10, 10)
dout = np.random.randn(*x.shape)

dx_num = eval_numerical_gradient_array(lambda x: relu_forward(x)[0], x, dout)

_, cache = relu_forward(x)
dx = relu_backward(dout, cache)

# 误差应在 e-12 量级
print('Testing relu_backward function:')
print('dx error: ', rel_error(dx_num, dx))

## 题内问题 1:

我们只要求你实现了 ReLU，但在神经网络中可以使用许多不同的激活函数，每种都有其优缺点。特别是，激活函数常见的问题是反向传播时梯度为零（或接近零）。以下哪些激活函数会出现这个问题？如果你考虑一维情况，什么类型的输入会导致这种行为？
1. Sigmoid
2. ReLU
3. Leaky ReLU

$\color{blue}{\textit 你的答案:}$ *请填写*

<details><summary>英文原文</summary>
## Inline Question 1:

We've only asked you to implement ReLU, but there are a number of different activation functions that one could use in neural networks, each with its pros and cons. In particular, an issue commonly seen with activation functions is getting zero (or close to zero) gradient flow during backpropagation. Which of the following activation functions have this problem? If you consider these functions in the one dimensional case, what types of input would lead to this behaviour?
1. Sigmoid
2. ReLU
3. Leaky ReLU

$\color{blue}{\textit Your Answer:}$ *Fill this in*
</details>

# “三明治”层
在神经网络中有一些常见的层组合模式。例如，仿射层后常常跟着 ReLU 非线性。为了方便，我们在 `cs231n/layer_utils.py` 文件中定义了几个便捷层。

现在请查看 `affine_relu_forward` 和 `affine_relu_backward` 函数，并运行以下代码对反向传播进行数值梯度检查：
<details><summary>英文原文</summary>

# "Sandwich" layers
There are some common patterns of layers that are frequently used in neural nets. For example, affine layers are frequently followed by a ReLU nonlinearity. To make these common patterns easy, we define several convenience layers in the file `cs231n/layer_utils.py`.

For now take a look at the `affine_relu_forward` and `affine_relu_backward` functions, and run the following to numerically gradient check the backward pass:
</details>

In [None]:
from cs231n.layer_utils import affine_relu_forward, affine_relu_backward
np.random.seed(231)
x = np.random.randn(2, 3, 4)
w = np.random.randn(12, 10)
b = np.random.randn(10)
dout = np.random.randn(2, 10)

out, cache = affine_relu_forward(x, w, b)
dx, dw, db = affine_relu_backward(dout, cache)

dx_num = eval_numerical_gradient_array(lambda x: affine_relu_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_relu_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_relu_forward(x, w, b)[0], b, dout)

# 相对误差应在 e-10 或更小
print('Testing affine_relu_forward and affine_relu_backward:')
print('dx error: ', rel_error(dx_num, dx))
print('dw error: ', rel_error(dw_num, dw))
print('db error: ', rel_error(db_num, db))

# 损失层：Softmax
现在在 `cs231n/layers.py` 文件中实现 softmax 的损失和梯度函数 `softmax_loss`。这些实现应与 `cs231n/classifiers/softmax.py` 中类似。其他损失函数（如 `svm_loss`）也可以模块化实现，但本次作业不要求。

你可以通过运行以下代码确保实现正确：
<details><summary>英文原文</summary>

# Loss layers: Softmax
Now implement the loss and gradient for softmax in the `softmax_loss` function in `cs231n/layers.py`. These should be similar to what you implemented in `cs231n/classifiers/softmax.py`. Other loss functions (e.g. `svm_loss`) can also be implemented in a modular way, however, it is not required for this assignment.

You can make sure that the implementations are correct by running the following:
</details>

In [None]:
np.random.seed(231)
num_classes, num_inputs = 10, 50
x = 0.001 * np.random.randn(num_inputs, num_classes)
y = np.random.randint(num_classes, size=num_inputs)


dx_num = eval_numerical_gradient(lambda x: softmax_loss(x, y)[0], x, verbose=False)
loss, dx = softmax_loss(x, y)

# 测试 softmax_loss 函数。Loss 应接近 2.3，dx 误差应在 e-8 量级
print('\nTesting softmax_loss:')
print('loss: ', loss)
print('dx error: ', rel_error(dx_num, dx))

# 两层网络
打开文件 `cs231n/classifiers/fc_net.py`，完成 `TwoLayerNet` 类的实现。请仔细阅读并理解其 API。你可以运行下面的代码测试你的实现。
<details><summary>英文原文</summary>

# Two-layer network
Open the file `cs231n/classifiers/fc_net.py` and complete the implementation of the `TwoLayerNet` class. Read through it to make sure you understand the API. You can run the cell below to test your implementation.
</details>

In [None]:
np.random.seed(231)
N, D, H, C = 3, 5, 50, 7
X = np.random.randn(N, D)
y = np.random.randint(C, size=N)

std = 1e-3
model = TwoLayerNet(input_dim=D, hidden_dim=H, num_classes=C, weight_scale=std)

print('Testing initialization ... ')
W1_std = abs(model.params['W1'].std() - std)
b1 = model.params['b1']
W2_std = abs(model.params['W2'].std() - std)
b2 = model.params['b2']
assert W1_std < std / 10, 'First layer weights do not seem right'
assert np.all(b1 == 0), 'First layer biases do not seem right'
assert W2_std < std / 10, 'Second layer weights do not seem right'
assert np.all(b2 == 0), 'Second layer biases do not seem right'

print('Testing test-time forward pass ... ')
model.params['W1'] = np.linspace(-0.7, 0.3, num=D*H).reshape(D, H)
model.params['b1'] = np.linspace(-0.1, 0.9, num=H)
model.params['W2'] = np.linspace(-0.3, 0.4, num=H*C).reshape(H, C)
model.params['b2'] = np.linspace(-0.9, 0.1, num=C)
X = np.linspace(-5.5, 4.5, num=N*D).reshape(D, N).T
scores = model.loss(X)
correct_scores = np.asarray(
  [[11.53165108,  12.2917344,   13.05181771,  13.81190102,  14.57198434, 15.33206765,  16.09215096],
   [12.05769098,  12.74614105,  13.43459113,  14.1230412,   14.81149128, 15.49994135,  16.18839143],
   [12.58373087,  13.20054771,  13.81736455,  14.43418138,  15.05099822, 15.66781506,  16.2846319 ]])
scores_diff = np.abs(scores - correct_scores).sum()
assert scores_diff < 1e-6, 'Problem with test-time forward pass'

print('Testing training loss (no regularization)')
y = np.asarray([0, 5, 1])
loss, grads = model.loss(X, y)
correct_loss = 3.4702243556
assert abs(loss - correct_loss) < 1e-10, 'Problem with training-time loss'

model.reg = 1.0
loss, grads = model.loss(X, y)
correct_loss = 26.5948426952
assert abs(loss - correct_loss) < 1e-10, 'Problem with regularization loss'

# 误差应在 e-7 或更小
for reg in [0.0, 0.7]:
  print('Running numeric gradient check with reg = ', reg)
  model.reg = reg
  loss, grads = model.loss(X, y)

  for name in sorted(grads):
    f = lambda _: model.loss(X, y)[0]
    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False)
    print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))

# 求解器
打开文件 `cs231n/solver.py`，熟悉其 API。之后，使用 `Solver` 实例训练一个 `TwoLayerNet`，使其在验证集上达到约 36% 的准确率。
<details><summary>英文原文</summary>

# Solver
Open the file `cs231n/solver.py` and read through it to familiarize yourself with the API. After doing so, use a `Solver` instance to train a `TwoLayerNet` that achieves about `36%` accuracy on the validation set.
</details>

In [None]:
input_size = 32 * 32 * 3
hidden_size = 50
num_classes = 10
model = TwoLayerNet(input_size, hidden_size, num_classes)
solver = None

##############################################################################
# TODO: 使用 Solver 实例训练一个 TwoLayerNet，使其在验证集上达到约 36% 的准确率。 #
# TODO: Use a Solver instance to train a TwoLayerNet that achieves about 36% #
# accuracy on the validation set.                                            #
##############################################################################

##############################################################################
#                             代码结束 END OF YOUR CODE                      #
#                             END OF YOUR CODE                               #
##############################################################################

# 调试训练过程
使用上述默认参数，你应该能在验证集上获得约 0.36 的准确率。这并不是很高。

一种获得问题洞察的方法是，在优化过程中绘制损失函数以及训练集和验证集上的准确率。

另一种方法是可视化网络第一层学习到的权重。在大多数针对视觉数据训练的神经网络中，第一层权重在可视化时通常会显示出一些明显的结构。
<details><summary>英文原文</summary>

# Debug the training
With the default parameters we provided above, you should get a validation accuracy of about 0.36 on the validation set. This isn't very good.

One strategy for getting insight into what's wrong is to plot the loss function and the accuracies on the training and validation sets during optimization.

Another strategy is to visualize the weights that were learned in the first layer of the network. In most neural networks trained on visual data, the first layer weights typically show some visible structure when visualized.
</details>

In [None]:
# 运行此单元以可视化训练损失和训练/验证准确率

plt.subplot(2, 1, 1)
plt.title('Training loss')
plt.plot(solver.loss_history, 'o')
plt.xlabel('Iteration')

plt.subplot(2, 1, 2)
plt.title('Accuracy')
plt.plot(solver.train_acc_history, '-o', label='train')
plt.plot(solver.val_acc_history, '-o', label='val')
plt.plot([0.5] * len(solver.val_acc_history), 'k--')
plt.xlabel('Epoch')
plt.legend(loc='lower right')
plt.gcf().set_size_inches(15, 12)
plt.show()

In [None]:
from cs231n.vis_utils import visualize_grid

# 可视化网络权重

def show_net_weights(net):
    W1 = net.params['W1']
    W1 = W1.reshape(3, 32, 32, -1).transpose(3, 1, 2, 0)
    plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))
    plt.gca().axis('off')
    plt.show()

show_net_weights(model)

# 调整超参数

**哪里出了问题？** 观察上面的可视化，我们发现损失基本线性下降，这表明学习率可能太低。此外，训练和验证准确率之间没有差距，说明我们使用的模型容量较低，应该增加模型规模。另一方面，如果模型过大，我们会看到训练和验证准确率之间有很大差距，这表明过拟合。

**调参**。调整超参数并培养对其影响最终性能的直觉是使用神经网络的重要部分，因此我们希望你多加练习。下面，你应该尝试不同的超参数值，包括隐藏层大小、学习率、训练轮数和正则化强度。你也可以考虑调整学习率衰减，但使用默认值也能获得不错的效果。

**预期结果**。你的目标是在验证集上获得超过 48% 的分类准确率。我们最好的网络在验证集上能达到 52% 以上。

**实验**：本练习的目标是用全连接神经网络在 CIFAR-10 上获得尽可能好的结果（52% 可作为参考）。你可以自由实现自己的技巧（如 PCA 降维、添加 dropout、给 solver 增加新特性等）。
<details><summary>英文原文</summary>

# Tune your hyperparameters

**What's wrong?**. Looking at the visualizations above, we see that the loss is decreasing more or less linearly, which seems to suggest that the learning rate may be too low. Moreover, there is no gap between the training and validation accuracy, suggesting that the model we used has low capacity, and that we should increase its size. On the other hand, with a very large model we would expect to see more overfitting, which would manifest itself as a very large gap between the training and validation accuracy.

**Tuning**. Tuning the hyperparameters and developing intuition for how they affect the final performance is a large part of using Neural Networks, so we want you to get a lot of practice. Below, you should experiment with different values of the various hyperparameters, including hidden layer size, learning rate, numer of training epochs, and regularization strength. You might also consider tuning the learning rate decay, but you should be able to get good performance using the default value.

**Approximate results**. You should be aim to achieve a classification accuracy of greater than 48% on the validation set. Our best network gets over 52% on the validation set.

**Experiment**: You goal in this exercise is to get as good of a result on CIFAR-10 as you can (52% could serve as a reference), with a fully-connected Neural Network. Feel free implement your own techniques (e.g. PCA to reduce dimensionality, or adding dropout, or adding features to the solver, etc.).
</details>

In [None]:
best_model = None



#################################################################################
# TODO: 使用验证集调参，并将你训练的最佳模型存储在 best_model。                      #
# TODO: Tune hyperparameters using the validation set. Store your best trained  #
# model in best_model.                                                          #
#                                                                               #
# 为了调试你的网络，可以使用类似上面可视化的方式；这些可视化会与调参前的网络有显著的定性差异。 #
# To help debug your network, it may help to use visualizations similar to the  #
# ones we used above; these visualizations will have significant qualitative    #
# differences from the ones we saw above for the poorly tuned network.          #
#                                                                               #
# 手动调参很有趣，但你可能会发现写代码自动遍历超参数组合会更有效，就像我们在前面的练习中做的那样。 #
# Tweaking hyperparameters by hand can be fun, but you might find it useful to  #
# write code to sweep through possible combinations of hyperparameters          #
# automatically like we did on thexs previous exercises.                          #
#################################################################################

################################################################################
#                              代码结束 END OF YOUR CODE                        #
#                              END OF YOUR CODE                                #
################################################################################

# 测试你的模型！
在验证集和测试集上运行你的最佳模型。你应该在验证集和测试集上都获得超过 48% 的准确率。
<details><summary>英文原文</summary>

# Test your model!
Run your best model on the validation and test sets. You should achieve above 48% accuracy on the validation set and the test set.
</details>

In [None]:
y_val_pred = np.argmax(best_model.loss(data['X_val']), axis=1)
print('Validation set accuracy: ', (y_val_pred == data['y_val']).mean())

In [None]:
y_test_pred = np.argmax(best_model.loss(data['X_test']), axis=1)
print('Test set accuracy: ', (y_test_pred == data['y_test']).mean())

In [None]:
# 保存最佳模型
best_model.save("best_two_layer_net.npy")

## 题内问题 2:

现在你已经训练了一个神经网络分类器，你可能会发现测试准确率远低于训练准确率。我们可以通过哪些方式减少这种差距？请选择所有适用项。

1. 用更大的数据集进行训练。
2. 增加隐藏单元数量。
3. 增加正则化强度。
4. 以上都不是。

$\color{blue}{\textit 你的答案:}$

$\color{blue}{\textit 你的解释:}$

<details><summary>英文原文</summary>

## Inline Question 2:

Now that you have trained a Neural Network classifier, you may find that your testing accuracy is much lower than the training accuracy. In what ways can we decrease this gap? Select all that apply.

1. Train on a larger dataset.
2. Add more hidden units.
3. Increase the regularization strength.
4. None of the above.

$\color{blue}{\textit Your Answer:}$

$\color{blue}{\textit Your Explanation:}$
</details>