第五章主要讲解模型的构建、模型参数的访问与初始化、设计自定义层和块、保存模型与加载模型以及使用GPU加速

# 5.0 查看网络结构的两种方法

In [216]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class MLP(nn.Module):
    def __init__(self):
        # 使用MLP的父类进行必要的初始化,(根据需要初始化模型)
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.out = nn.Linear(256, 10)

    def forward(self, x):
        # 将输入数据作为前向传播的参数
        out = self.hidden(x)
        out = F.relu(out)
        out = self.out(out)
        out = F.softmax(out, 1)
        # 通过前向传播生成输出
        return out
MLP_NET = MLP()

法一:使用print()函数打印网络结构

In [217]:
print(MLP_NET)

MLP(
  (hidden): Linear(in_features=20, out_features=256, bias=True)
  (out): Linear(in_features=256, out_features=10, bias=True)
)


法二:使用torchsummary库中Summary方法

In [249]:
from torchsummary import summary
summary(MLP_NET, (1, 20),device="cpu")

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1               [-1, 1, 256]           5,376
            Linear-2                [-1, 1, 10]           2,570
Total params: 7,946
Trainable params: 7,946
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.03
Estimated Total Size (MB): 0.03
----------------------------------------------------------------


# 5.1 层和块

神经网络块：块（block）可以是单独的一层，也可以是由多个层组成的组件或者模型本身。

## 5.1.1 自定义块

每一个自定义块必须提供的基本功能：
1. 输入数据作为前向传播方法的参数
2. 通过前向传播方法生成输出
3. 计算其输出关于输入的梯度，这个可通过其反向传播函数进行访问。
4. 存储和访问前向传播计算所需的参数
5. 根据需要初始化模型 

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class MLP(nn.Module):
    def __init__(self):
        # 使用MLP的父类进行必要的初始化,(根据需要初始化模型)
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.out = nn.Linear(256, 10)

    def forward(self, x):
        # 将输入数据作为前向传播的参数
        out = self.hidden(x)
        out = F.relu(out)
        out = self.out(out)
        out = F.softmax(out, 1)
        # 通过前向传播生成输出
        return out
X = torch.randn(1, 20)
MLP_NET = MLP()
out = MLP_NET(X)
out

## 5.1.2 顺序块

讲解如何构建Squqential类

构建简化的Sequential类,只需要定义两个关键的方法:
1. 将块逐个追加到列表中的方法
2. 前向传播方法

In [None]:
import torch
import torch.nn as nn

class MySequential(nn.Module):
    def __init__(self, *args):
        super().__init__()
        # enumerate()函数用于将可迭代对象组合为一个索引序列
        for idx, module in enumerate(args):
            # _modules,继承父类,父类定义的一个有序字典,保证每个添加的块都按照被添加的顺序执行,同时保证在初始化过程中,系统在_modules字典中查找需要初始化的参数
            self._modules[str(idx)] = module

    def forward(self, x):
        for block in self._modules.values():
            x = block(x)
        return x
MySequential_NET = MySequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10), nn.Softmax(dim=1))
print(MySequential_NET)
x = torch.randn(1, 20)
out = MySequential_NET(x)
out

## 5.1.3 在前向传播中执行代码

告诉我们可以在前向传播中加入其他的任意代码,可以是Python的控制流程,也可以是任意的数学运算等等


**注意:在前向传播中进行非线性变换,必须使用torh.nn.functional中的非线性变换函数,否则回产生"TypeError: linear(): argument 'input' (position 1) must be Tensor, not ReLU"错误,nn中的非线性变化函数用于块中**

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
class FixeHiddenMLP(nn.Module):  
    def __init__(self):
        super().__init__()
        self.rand_c = torch.rand((20, 20), requires_grad=False)
        self.linear = nn.Linear(20, 20)
        
    def forward(self, x):
        x = self.linear(x)
        # 常量参数参与计算
        x = F.relu(torch.mm(x, self.rand_c) + 1)
        # 复用全连接层,相当于两个全连接层共享参数
        x = self.linear(x)
        # 在前向传播中加入控制流,注此操作可能不会用于实际任务
        while x.abs().sum() > 1:
            x /= 2
        return x.sum()
        
x = torch.randn((1, 20))
FixeHiddenMLP_Net = FixeHiddenMLP()
out = FixeHiddenMLP_Net(x)
out

可以使用nn.Sequential()混搭各种组合块

In [None]:
import torch
import torch.nn as nn
class NestMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(20, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU()
        )
        self.linear = nn.Linear(32, 16)

    def forward(self, x):
        x = self.net(x)
        x = self.linear(x)
        return x
x = torch.rand((1, 20))
chimera = nn.Sequential(NestMLP(), nn.Linear(16, 20), FixeHiddenMLP())
print(chimera)
chimera(x)

## 小结

+ 一个块可以由许多层组成;一个块可以由多个块组成
+ 层和块,块和块之间的顺序连接由Sequential()类处理
+ 可以在前向传播中加入任意的控制代码

## 练习

1. 如果将MySequential中存储块的方式更改为Python列表，会出现什么样的问题？

In [None]:
import torch
import torch.nn as nn
class ListSequential(nn.Module):
    def __init__(self, *args):
        super().__init__()
        self.module_lsit = list(args)
    def forward(self, x):
        for block in self.module_lsit:
            x = block(x)
        return x

In [None]:
class ModuleSequential(nn.Module):
    def __init__(self, *args):
        super().__init__()
        for idx, module in enumerate(args):
            self._modules[str(idx)] = module
    def forward(self, x):
        for block in self._modules.values():
            x = block(x)
        return x

In [None]:
# 存储块的方式为list
x = torch.rand((1, 20))
ListSequential_NET = ListSequential(nn.Linear(20, 156), nn.ReLU(), nn.Linear(156, 10))
print("ListSequential_NET:", ListSequential_NET)
ListSequential_NET(x)

In [None]:
# 存储块的方式为_modules
x = torch.rand((1, 20))
ModuleSequential_NET = ModuleSequential(nn.Linear(20, 156), nn.ReLU(), nn.Linear(156, 10))
print("ModuleSequential_NET:", ModuleSequential_NET)
ModuleSequential_NET(x)

**通过对比可以发现,使用List进行存储块并不进行正常的使用;但是无法打印网络结构,这是因为相较与默认位置(_modules)存储的网络,自定义位置存储的网络没有"注册"**

In [250]:
from torchsummary import summary
try:
    summary(ListSequential_NET, (1, 20), device="cpu")
except Exception as exc:
    print("Error:", exc)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
Error: 'int' object has no attribute 'numpy'


**通过测试发现存储在自定义位置上的网络,也不能使用summary()函数查看网络结构**

2. 实现一个块，它以两个块为参数，例如net1和net2，并返回前向传播中两个网络的串联输出。这也被称为平行块。

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
class Block1(nn.Module):
    def __init__(self):
        super().__init__()
        self.block1 = nn.Sequential(
            nn.Linear(20, 256),
            nn.ReLU(),
            nn.Linear(256, 128)
        )

    def forward(self, x):
        x = self.block1(x)
        x = F.relu(x)
        return x

class Block2(nn.Module):
    def __init__(self):
        super().__init__()
        self.block2 = nn.Sequential(
            nn.Linear(128, 256),
            nn.ReLU(),
            nn.Linear(256, 10),
            nn.ReLU()
        )
    def forward(self,x):
        x = self.block2(x)
        return x

class Block(nn.Module):
    def __init__(self, block1:nn.Module, block2:nn.Module):
        super().__init__()
        self.block = nn.Sequential(
            block1,
            block2
        )
    def forward(self, x):
        x = self.block(x)
        return x
x = torch.rand((1, 20))
Block_NET = Block(Block1(), Block2())
print("Block_NET:", Block_NET)
Block_NET(x)

3. 假设我们想要连接同一网络的多个实例。实现一个函数，该函数生成同一个块的多个实例，并在此基础上构建更大的网络。

In [None]:
# 无意义,维度不匹配

# 5.2 参数管理 

本节主要介绍:
+ 如何访问参数
+ 如何初始化参数
+ 如何在不同模型组件间共享参数

## 5.2.1 参数访问

**使用state_dict()方法，获取模型参数字典,键为网络层名，值为其权重**

**网络中每一个参数的参数名唯一**

In [None]:
import torch
import torch.nn as nn

param_net = nn.Sequential(
    nn.Linear(4, 8),
    nn.ReLU(),
    nn.Linear(8, 1)
)

In [None]:
# 获取模型全部参数
print("All Param:", param_net.state_dict())

# 获取模型一层参数
print("layer Param:", param_net[2].state_dict())

+ 对参数进行任何操作，都需要访问到底层数值
+ 参数是复合对象，包含值、梯度以及额外信息

In [None]:
# 获取网络中偏置参数
print("bias:", param_net[0].bias)
# 获取网络中偏置参数值
print("bias value:", param_net[0].bias.data)
# 获取网络中偏置参数梯度信息,由于未进行反向传播，此时梯度处于初始化状态
print("bias gradient:", param_net[0].bias.grad)

**named_parameters()方法可以获取网络参数名称、即权重**

**使用遍历的方法获取获取每一个块的参数**

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
class ParamBlock1(nn.Module):
    def __init__(self):
        super().__init__()
        self.block1 = nn.Sequential(
            nn.Linear(20, 256),
            nn.ReLU(),
            nn.Linear(256, 128)
        )

    def forward(self, x):
        x = self.block1(x)
        x = F.relu(x)
        return x

class ParamBlock2(nn.Module):
    def __init__(self):
        super().__init__()
        self.block2 = nn.Sequential(
            nn.Linear(128, 256),
            nn.ReLU(),
            nn.Linear(256, 10),
            nn.ReLU()
        )
    def forward(self,x):
        x = self.block2(x)
        return x

class ParamBlock(nn.Module):
    def __init__(self, block1:nn.Module, block2:nn.Module):
        super().__init__()
        self.block = nn.Sequential(
            block1,
            block2
        )
    def forward(self, x):
        x = self.block(x)
        return x
ParamBlock_NET = ParamBlock(Block1(), Block2())
ParamBlock_NET

In [None]:
# 遍历获取名称及权重
paramter = [(name, param.shape) for name, param in ParamBlock_NET.named_parameters()]

# 使用state_dict()方法，通过网络层名称获取权重数据
param_weight = ParamBlock_NET.state_dict()["block.0.block1.0.weight"].data

paramter, param_weight

In [None]:
# 根据网络层名称直接获取权重
import torch.nn as nn
def block1(grad=True):
    return nn.Sequential(
        nn.Linear(4, 8),
        nn.Linear(8,4)
    )

def block2():
    net = nn.Sequential()
    for i in range(4):
        # add_module()方法向网络中添加新的块
        net.add_module(f"block{i}", block1())
    return net

rgnet = nn.Sequential(block2(), nn.Linear(4, 1))
print("rgnet network struct:", rgnet)
rgnet[0][2][1].weight,rgnet[0][2][1].bias.data

## 5.2.2 参数初始化

本节主要介绍：
+ 使用Pytorch框架提供的初始化方法，初始化网络
+ 创建自定义初始化方法，初始化网络

**nn.init 模块中提供多种初始化方法**

### 内置模块的初始化

使用方法：
1. 定义初始化方法
    1. 选择初始化层
    2. 定义初始化方法
2. 使用网络的apply()方法，选择初始化方法

In [245]:
import torch.nn as nn
import time

class InitNet(nn.Module):
    def __init__(self, init=False):
        super().__init__()
        self.net = nn.Sequential()
        for i in range(3):
            self.net.add_module(f"block{i}", self.block())
        if init is True:
            self.net.apply(self.init_normal)
            
    def block(self):
        return nn.Sequential(
            nn.Linear(4, 8),
            nn.Linear(8 ,4)
        )
    # 该初始化方法采用递归，而在python中，对递归层数是有限制（3000），所以当网络结构很深时，可能会递归层数过深的错误。（测试是没问题的）
    def init_normal(self, module):
        if isinstance(module, nn.Linear):
            nn.init.zeros_(module.weight.data)
            nn.init.normal_(module.bias.data, mean=1, std=2) 
            
    def forworad(self, x):
        return self.net(x)
initNet = InitNet(init=True)
print(initNet.state_dict().keys())
print("init weight:", initNet.state_dict()["net.block0.0.weight"])
NoinitNet = InitNet()
print("No init weight:", NoinitNet.state_dict()["net.block0.0.weight"])

odict_keys(['net.block0.0.weight', 'net.block0.0.bias', 'net.block0.1.weight', 'net.block0.1.bias', 'net.block1.0.weight', 'net.block1.0.bias', 'net.block1.1.weight', 'net.block1.1.bias', 'net.block2.0.weight', 'net.block2.0.bias', 'net.block2.1.weight', 'net.block2.1.bias'])
init weight: tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
No init weight: tensor([[-0.4951,  0.0704, -0.3250, -0.4284],
        [-0.3080,  0.2684,  0.1131, -0.0220],
        [ 0.3174, -0.3203, -0.2856,  0.1511],
        [ 0.1238, -0.1565, -0.0840,  0.0787],
        [ 0.4437,  0.3488,  0.1358, -0.3189],
        [ 0.2287, -0.3093,  0.3192,  0.4809],
        [-0.0918,  0.1274, -0.4113,  0.1191],
        [ 0.0545, -0.2957, -0.2685,  0.0310]])


In [246]:
import torch.nn as nn
class InitNet(nn.Module):
    def __init__(self, init=False):
        super().__init__()
        self.net = nn.Sequential()
        for i in range(3):
            self.net.add_module(f"block{i}", self.block())
        if init is True:
            # 该方法为官方实例，使用遍历初始化网络
            for m in self.modules():
                if isinstance(m, nn.Linear):
                    nn.init.zeros_(m.weight.data)
                    nn.init.normal_(m.bias.data, mean=1, std=2)
            
    def block(self):
        return nn.Sequential(
            nn.Linear(4, 8),
            nn.Linear(8 ,4)
        )

    def forworad(self, x):
        return self.net(x)
initNet = InitNet(init=True)
print("init weight:", initNet.state_dict()["net.block0.0.weight"])

init weight: tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])


### 自定义初始化

自定义初始化与内置初始化步骤完全相同，自定义初始化需要自己设置初始化规则

## 5.2.3 参数绑定

作用：实现多个层之间共享参数,在反向传播时共享参数的层的梯度会叠加

实现：
1. 在网络外创建一个共享层
2. 将共享层添加到网络内

In [258]:
import torch
import torch.nn as nn
share_layer = nn.Linear(8, 8)

param_bind = nn.Sequential(
    nn.Linear(4, 8),
    nn.ReLU(),
    share_layer,
    nn.ReLU(),
    share_layer,
    nn.ReLU(),
    nn.Linear(8, 1)
)

x = torch.rand((1, 4))
print(param_bind(x))
# 检查参数是否相同
print(param_bind[2].weight.data[0] == param_bind[4].weight.data[0])
# 检查类型是否相同
print(type(param_bind[2].weight) == type(param_bind[4].weight))

tensor([[-0.1769]], grad_fn=<AddmmBackward0>)
tensor([True, True, True, True, True, True, True, True])
True


## 练习

1. 使用 5.1节 中定义的FancyMLP模型，访问各个层的参数。

In [263]:
import torch.nn as nn
class InitNet(nn.Module):
    def __init__(self, init=False):
        super().__init__()
        self.net = nn.Sequential()
        for i in range(3):
            self.net.add_module(f"block{i}", self.block())
        if init is True:
            # 该方法为官方实例，使用遍历初始化网络
            for m in self.modules():
                if isinstance(m, nn.Linear):
                    nn.init.normal_(m.weight.data, mean=1, std=2)
                    nn.init.zeros_(m.bias.data)
            
    def block(self):
        return nn.Sequential(
            nn.Linear(4, 8),
            nn.Linear(8 ,4)
        )

    def forworad(self, x):
        return self.net(x)

In [265]:
initNet = InitNet(init=True)
for name, param in initNet.named_parameters():
    print(name, param.data)

net.block0.0.weight tensor([[ 2.1666,  4.0705,  0.6399,  0.3653],
        [ 3.2700,  1.5498, -0.5912,  3.2030],
        [ 1.6944,  1.5039,  1.0345,  3.2819],
        [ 1.2198,  0.8282, -3.3731, -1.0712],
        [ 0.2552, -3.1542, -1.5566,  0.5375],
        [-0.9542,  0.1490, -1.4006, -1.5407],
        [-1.3326,  3.3476, -3.4974, -0.2168],
        [ 3.8500,  2.9250,  4.0575,  1.1160]])
net.block0.0.bias tensor([0., 0., 0., 0., 0., 0., 0., 0.])
net.block0.1.weight tensor([[ 0.9468, -1.1431, -0.3274, -0.0768,  1.9253, -0.1021,  3.0728,  1.1536],
        [ 1.5374,  2.3898, -0.9710, -1.3642, -0.1980,  2.2700,  2.9126,  0.0541],
        [ 4.2933,  1.1782, -1.9459,  2.9026,  1.7900, -0.5310,  3.5670, -0.2833],
        [ 1.0735,  2.5624, -2.5803,  0.9703,  3.4592, -1.4652, -1.0276, -0.4891]])
net.block0.1.bias tensor([0., 0., 0., 0.])
net.block1.0.weight tensor([[ 1.9721,  1.0801,  2.3878,  2.0278],
        [ 1.4146, -0.6792,  0.9853, -1.9449],
        [-3.4951,  0.9761, -1.3226,  0.9195],
  

2. 查看初始化模块文档以了解不同的初始化方法。

+ calculate_gain()
+ uniform_()
+ normal_()
+ constant_()
+ ones_()
+ zeros_()
+ eye_()
+ dirac_()
+ xavier_uniform_()
+ xavier_normal_()
+ kaiming_uniform_()
+ kaiming_normal_()
+ trunc_normal_()
+ orthogonal_()
+ sparse_()

3. 构建包含共享参数层的多层感知机并对其进行训练。在训练过程中，观察模型各层的参数和梯度。

4. 为什么共享参数是个好主意？