## Transform

有时候，我们从磁盘中读取特征和标签之后，无法将这些特征和标签直接喂到神经网络之中。比如磁盘读入的数据是一些图片，但是我们可能对这个图片的大小要有一定的要求。或者是将标签变成`One-Hot`的形式。

## 构建神经网络

`torch.nn`命名空间下会提供所有的构建网络的模块。每个模块都将是`nn.Module`的子类。

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

In [2]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))

Using cpu device


### 定义网络类

- [https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)

所有的网络都是要去继承`nn.Module`的父类。初始化整个网络是在`__init__`中。每个`nn.Module`的子类需要去在`forward`中去实现输入数据的过网络操作。

In [3]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28 * 28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )
        
    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

In [4]:
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


如果想要实现`keras`中的`model.summary()`功能的话，我们可以参考以下教程进行安装：

1. [pytorch-summary](https://github.com/sksq96/pytorch-summary)

In [5]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print("Predicted class {}".format(y_pred))

Predicted class tensor([5])


## Flatten

- [https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html)

`Flatten`是将数据的维度压缩成末尾维度为`-1`的数据。

函数原型为：

```python
torch.nn.Flatten(start_dim=1, end_dim=-1)
```

In [6]:
from torch import nn
input_data = torch.randn(32, 1, 5, 5)
m1 = nn.Sequential(
    nn.Conv2d(1, 32, 5, 1, 1),
    nn.Flatten()
)
output = m1(input_data)
print(output.size())

torch.Size([32, 288])


## Linear

- [https://pytorch.org/docs/stable/generated/torch.nn.Linear.html](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)

函数原型为：

```python
torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)
```

我们可以对数据进行一个线性变换：$y = x A^{T} + b$。函数原型中的`in_features`表示输入数据的特征，`out_features`表示输出数据的特征维度，`bias`表示是否添加偏置项。

In [7]:
from torch import nn
m2 = nn.Linear(20, 30)
input_data = torch.rand(128, 20)
output = m2(input_data)
print(output.size())

torch.Size([128, 30])


可以通过查看模型的属性变量来查看权重和偏置。其默认初始值是：$\mathcal{U}(-\sqrt{k}, \sqrt{k})$, 其中$k = \frac{1}{in\_features}$。

In [8]:
m2.weight.size()

torch.Size([30, 20])

In [9]:
m2.bias.size()

torch.Size([30])

## ReLU

- [https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html)

神经网络的构成部分里面还有一些非线形函数。`ReLU`的形式可以表示为:

$$
\operatorname{ReLU}(x)=(x)^{+}=\max (0, x)
$$


其函数原型为：

```python
torch.nn.ReLU(inplace=False)
```

## Sequential

- [https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html)

`nn.Sequential`是一个有序的容器。数据通过这些容器的顺序就是刚开始定义时候的序列。

使用`Sequential`可以创建一个小的`model`。

In [10]:
model = nn.Sequential(
    nn.Conv2d(1, 20, 5),
    nn.ReLU(),
    nn.Conv2d(20, 64, 5),
    nn.ReLU()
)
print(model)

Sequential(
  (0): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
  (1): ReLU()
  (2): Conv2d(20, 64, kernel_size=(5, 5), stride=(1, 1))
  (3): ReLU()
)


In [11]:
from collections import OrderedDict
model = nn.Sequential(OrderedDict([
    ('conv1', nn.Conv2d(1, 20, 5)),
    ('relu', nn.ReLU()),
    ('conv2', nn.Conv2d(20, 64, 5)),
    ('relu', nn.ReLU())
]))
print(model)

Sequential(
  (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
  (relu): ReLU()
  (conv2): Conv2d(20, 64, kernel_size=(5, 5), stride=(1, 1))
)


## Softmax

- [https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html)

函数原型为：

```python
torch.nn.Softmax(dim=None)
```

它的实例化参数只有一个就是`dim`，一般传入到`softmax`中的张量都不止一个维度，可能会有两个维度，或者三个维度，我们通常只会对其中的某个维度做`softmax`，其公式可以表示为：

$$
\operatorname{Softmax}\left(x_{i}\right)=\frac{\exp \left(x_{i}\right)}{\sum_{j} \exp \left(x_{j}\right)}
$$



In [12]:
input_data = torch.randn(2, 3)

In [13]:
from torch import nn
m = nn.Softmax(dim=1)

output = m(input_data)
output

tensor([[0.5826, 0.2536, 0.1639],
        [0.7336, 0.1555, 0.1109]])

In [14]:
from torch import nn
m = nn.Softmax(dim=0)

output = m(input_data)
output

tensor([[0.6128, 0.7647, 0.7466],
        [0.3872, 0.2353, 0.2534]])

## 获取模型参数

可以通过调用`nn.Module`这个父类的`named_parameters()`方法来返回一个元祖，第一个是`name`，第二个是参数`param`。

In [15]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28 * 28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )
        
    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits
    
model = NeuralNetwork().to(device)

In [16]:
for name, param in model.named_parameters():
    print("Layer {} | Size {} | Values {}".format(name, param.size(), param[:2]))

Layer linear_relu_stack.0.weight | Size torch.Size([512, 784]) | Values tensor([[ 0.0293,  0.0166,  0.0002,  ...,  0.0328, -0.0073,  0.0030],
        [ 0.0250,  0.0135,  0.0311,  ..., -0.0034,  0.0084,  0.0155]],
       grad_fn=<SliceBackward>)
Layer linear_relu_stack.0.bias | Size torch.Size([512]) | Values tensor([0.0306, 0.0263], grad_fn=<SliceBackward>)
Layer linear_relu_stack.2.weight | Size torch.Size([512, 512]) | Values tensor([[ 0.0401, -0.0128,  0.0366,  ..., -0.0175, -0.0305,  0.0165],
        [ 0.0092, -0.0319,  0.0109,  ...,  0.0228, -0.0025, -0.0107]],
       grad_fn=<SliceBackward>)
Layer linear_relu_stack.2.bias | Size torch.Size([512]) | Values tensor([-0.0333, -0.0229], grad_fn=<SliceBackward>)
Layer linear_relu_stack.4.weight | Size torch.Size([10, 512]) | Values tensor([[ 0.0005, -0.0079,  0.0131,  ...,  0.0030, -0.0120, -0.0071],
        [-0.0024, -0.0185,  0.0096,  ...,  0.0348, -0.0007,  0.0168]],
       grad_fn=<SliceBackward>)
Layer linear_relu_stack.4.bias | S