第五章主要讲解模型的构建、模型参数的访问与初始化、设计自定义层和块、保存模型与加载模型以及使用GPU加速

# 5.0 查看网络结构的两种方法

In [52]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class MLP(nn.Module):
    def __init__(self):
        # 使用MLP的父类进行必要的初始化,(根据需要初始化模型)
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.out = nn.Linear(256, 10)

    def forward(self, x):
        # 将输入数据作为前向传播的参数
        out = self.hidden(x)
        out = F.relu(out)
        out = self.out(out)
        out = F.softmax(out, 1)
        # 通过前向传播生成输出
        return out
MLP_NET = MLP()

法一:使用print()函数打印网络结构

In [39]:
print(MLP_NET)

MLP(
  (hidden): Linear(in_features=20, out_features=256, bias=True)
  (out): Linear(in_features=256, out_features=10, bias=True)
)


法二:使用torchsummary库中Summary方法

In [58]:
from torchsummary import summary
summary(MLP_NET, (1, 20),device="cuda")

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1               [-1, 1, 256]           5,376
            Linear-2                [-1, 1, 10]           2,570
Total params: 7,946
Trainable params: 7,946
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.03
Estimated Total Size (MB): 0.03
----------------------------------------------------------------


# 5.1 层和块

神经网络块：块（block）可以是单独的一层，也可以是由多个层组成的组件或者模型本身。

## 5.1.1 自定义块

每一个自定义块必须提供的基本功能：
1. 输入数据作为前向传播方法的参数
2. 通过前向传播方法生成输出
3. 计算其输出关于输入的梯度，这个可通过其反向传播函数进行访问。
4. 存储和访问前向传播计算所需的参数
5. 根据需要初始化模型 

In [24]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class MLP(nn.Module):
    def __init__(self):
        # 使用MLP的父类进行必要的初始化,(根据需要初始化模型)
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.out = nn.Linear(256, 10)

    def forward(self, x):
        # 将输入数据作为前向传播的参数
        out = self.hidden(x)
        out = F.relu(out)
        out = self.out(out)
        out = F.softmax(out, 1)
        # 通过前向传播生成输出
        return out
X = torch.randn(1, 20)
MLP_NET = MLP()
out = MLP_NET(X)
out

tensor([[0.1392, 0.0645, 0.1064, 0.1220, 0.1044, 0.1012, 0.0886, 0.1064, 0.0608,
         0.1065]], grad_fn=<SoftmaxBackward0>)

## 5.1.2 顺序块

讲解如何构建Squqential类

构建简化的Sequential类,只需要定义两个关键的方法:
1. 将块逐个追加到列表中的方法
2. 前向传播方法

In [59]:
import torch
import torch.nn as nn

class MySequential(nn.Module):
    def __init__(self, *args):
        super().__init__()
        # enumerate()函数用于将可迭代对象组合为一个索引序列
        for idx, module in enumerate(args):
            # _modules,继承父类,父类定义的一个有序字典,保证每个添加的块都按照被添加的顺序执行,同时保证在初始化过程中,系统在_modules字典中查找需要初始化的参数
            self._modules[str(idx)] = module

    def forward(self, x):
        for block in self._modules.values():
            x = block(x)
        return x
MySequential_NET = MySequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10), nn.Softmax(dim=1))
print(MySequential_NET)
x = torch.randn(1, 20)
out = MySequential_NET(x)
out

MySequential(
  (0): Linear(in_features=20, out_features=256, bias=True)
  (1): ReLU()
  (2): Linear(in_features=256, out_features=10, bias=True)
  (3): Softmax(dim=1)
)


tensor([[0.1137, 0.1208, 0.1127, 0.1065, 0.0688, 0.0893, 0.1104, 0.0822, 0.1098,
         0.0859]], grad_fn=<SoftmaxBackward0>)

## 5.1.3 在前向传播中执行代码

告诉我们可以在前向传播中加入其他的任意代码,可以是Python的控制流程,也可以是任意的数学运算等等


**注意:在前向传播中进行非线性变换,必须使用torh.nn.functional中的非线性变换函数,否则回产生"TypeError: linear(): argument 'input' (position 1) must be Tensor, not ReLU"错误,nn中的非线性变化函数用于块中**

In [72]:
import torch
import torch.nn as nn
import torch.nn.functional as F
class FixeHiddenMLP(nn.Module):  
    def __init__(self):
        super().__init__()
        self.rand_c = torch.rand((20, 20), requires_grad=False)
        self.linear = nn.Linear(20, 20)
        
    def forward(self, x):
        x = self.linear(x)
        # 常量参数参与计算
        x = F.relu(torch.mm(x, self.rand_c) + 1)
        # 复用全连接层,相当于两个全连接层共享参数
        x = self.linear(x)
        # 在前向传播中加入控制流,注此操作可能不会用于实际任务
        while x.abs().sum() > 1:
            x /= 2
        return x.sum()
        
x = torch.randn((1, 20))
FixeHiddenMLP_Net = FixeHiddenMLP()
out = FixeHiddenMLP_Net(x)
out

tensor(0.4603, grad_fn=<SumBackward0>)

可以使用nn.Sequential()混搭各种组合块

In [78]:
import torch
import torch.nn as nn
class NestMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(20, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU()
        )
        self.linear = nn.Linear(32, 16)

    def forward(self, x):
        x = self.net(x)
        x = self.linear(x)
        return x
x = torch.rand((1, 20))
chimera = nn.Sequential(NestMLP(), nn.Linear(16, 20), FixeHiddenMLP())
print(chimera)
chimera(x)

Sequential(
  (0): NestMLP(
    (net): Sequential(
      (0): Linear(in_features=20, out_features=64, bias=True)
      (1): ReLU()
      (2): Linear(in_features=64, out_features=32, bias=True)
      (3): ReLU()
    )
    (linear): Linear(in_features=32, out_features=16, bias=True)
  )
  (1): Linear(in_features=16, out_features=20, bias=True)
  (2): FixeHiddenMLP(
    (linear): Linear(in_features=20, out_features=20, bias=True)
  )
)


tensor(-0.0997, grad_fn=<SumBackward0>)

## 小结

+ 一个块可以由许多层组成;一个块可以由多个块组成
+ 层和块,块和块之间的顺序连接由Sequential()类处理
+ 可以在前向传播中加入任意的控制代码

## 练习

1. 如果将MySequential中存储块的方式更改为Python列表，会出现什么样的问题？

In [91]:
import torch
import torch.nn as nn
class ListSequential(nn.Module):
    def __init__(self, *args):
        super().__init__()
        self.module_lsit = list(args)
    def forward(self, x):
        for block in self.module_lsit:
            x = block(x)
        return x

In [92]:
class ModuleSequential(nn.Module):
    def __init__(self, *args):
        super().__init__()
        for idx, module in enumerate(args):
            self._modules[str(idx)] = module
    def forward(self, x):
        for block in self._modules.values():
            x = block(x)
        return x

In [94]:
# 存储块的方式为list
x = torch.rand((1, 20))
ListSequential_NET = ListSequential(nn.Linear(20, 156), nn.ReLU(), nn.Linear(156, 10))
print("ListSequential_NET:", ListSequential_NET)
ListSequential_NET(x)

ListSequential_NET: ListSequential()


tensor([[-0.0035, -0.0280, -0.0672, -0.2855, -0.1194, -0.1932,  0.1229, -0.1675,
         -0.1212,  0.2715]], grad_fn=<AddmmBackward0>)

In [95]:
# 存储块的方式为_modules
x = torch.rand((1, 20))
ModuleSequential_NET = ModuleSequential(nn.Linear(20, 156), nn.ReLU(), nn.Linear(156, 10))
print("ModuleSequential_NET:", ModuleSequential_NET)
ModuleSequential_NET(x)

ModuleSequential_NET: ModuleSequential(
  (0): Linear(in_features=20, out_features=156, bias=True)
  (1): ReLU()
  (2): Linear(in_features=156, out_features=10, bias=True)
)


tensor([[-0.2038,  0.0299, -0.1446,  0.0459,  0.0473, -0.2123, -0.1981,  0.3035,
          0.3139, -0.0897]], grad_fn=<AddmmBackward0>)

**通过对比可以发现,使用List进行存储块并不进行正常的使用;但是无法打印网络结构,这是因为相较与默认位置(_modules)存储的网络,自定义位置存储的网络没有"注册"**

In [103]:
from torchsummary import summary
try:
    summary(ListSequential_NET, (1, 20), device="cpu")
except Exception as exc:
    print("Error:", exc)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
Error: 'int' object has no attribute 'numpy'


**通过测试发现存储在自定义位置上的网络,也不能使用summary()函数查看网络结构**

2. 实现一个块，它以两个块为参数，例如net1和net2，并返回前向传播中两个网络的串联输出。这也被称为平行块。

In [111]:
import torch
import torch.nn as nn
import torch.nn.functional as F
class Block1(nn.Module):
    def __init__(self):
        super().__init__()
        self.block1 = nn.Sequential(
            nn.Linear(20, 256),
            nn.ReLU(),
            nn.Linear(256, 128)
        )

    def forward(self, x):
        x = self.block1(x)
        x = F.relu(x)
        return x

class Block2(nn.Module):
    def __init__(self):
        super().__init__()
        self.block2 = nn.Sequential(
            nn.Linear(128, 256),
            nn.ReLU(),
            nn.Linear(256, 10),
            nn.ReLU()
        )
    def forward(self,x):
        x = self.block2(x)
        return x

class Block(nn.Module):
    def __init__(self, block1:nn.Module, block2:nn.Module):
        super().__init__()
        self.block = nn.Sequential(
            block1,
            block2
        )
    def forward(self, x):
        x = self.block(x)
        return x
x = torch.rand((1, 20))
Block_NET = Block(Block1(), Block2())
print("Block_NET:", Block_NET)
Block_NET(x)

Block_NET: Block(
  (block): Sequential(
    (0): Block1(
      (block1): Sequential(
        (0): Linear(in_features=20, out_features=256, bias=True)
        (1): ReLU()
        (2): Linear(in_features=256, out_features=128, bias=True)
      )
    )
    (1): Block2(
      (block2): Sequential(
        (0): Linear(in_features=128, out_features=256, bias=True)
        (1): ReLU()
        (2): Linear(in_features=256, out_features=10, bias=True)
        (3): ReLU()
      )
    )
  )
)


tensor([[0.0378, 0.0195, 0.0119, 0.0000, 0.0000, 0.0499, 0.0000, 0.0000, 0.1264,
         0.0000]], grad_fn=<ReluBackward0>)

3. 假设我们想要连接同一网络的多个实例。实现一个函数，该函数生成同一个块的多个实例，并在此基础上构建更大的网络。

In [112]:
# 无意义,维度不匹配