-
Notifications
You must be signed in to change notification settings - Fork 0
Home
模型自动压缩器(Auto Model Compressor, AMC)是基于Pytorch开发的,用于压缩深度神经网络计算量(浮点型乘加计算次数,Floating-Point-Fused-Multiply-Adds)的框架(暂内部开源)。
AMC主要由四个模块组成:数学压缩器(Mathematical Compressors),训练器(Trainer),超参数调整器(Hyper-Parameters Tuners)及实用工具集(Utils)。如下图所示:

-
数学压缩器(Mathematical Compressors):是指使用数学方法对某个神经网络的卷积层进行低损压缩(减少乘加计算次数)的方法。现框架主要支持两大类型方法——通道剪枝(Channel Pruning)和低秩分解(Low-rank Decomposition)。通道剪枝是利用统计学方法分析卷积层的稀疏性,选择性地保留对卷积层计算结果影响最大的特征通道,从而减少卷积层的乘加计算次数。而低秩分解则是利用卷积层本质是矩阵运算的特点,利用矩阵分解,把一个卷积层分解为两个乘加计算次数更小的卷积层相连,从而达到减少计算量的目的。两种方法各有千秋,低秩分解能较高精度地压缩稀疏性大的卷积层,但增加了神经网络的深度,这对部署时的GPU设备并不友好。而通道剪枝能不改变原有神经网络结构的情况下,压缩神经网络的计算量,但精度却不如低秩分解高。另外,试验表明,不同应用类型的神经网络,对两种方法的敏感度也不同,需要针对性选择方法。
-
训练器(Trainers):是指对压缩后的神经网络进行再次训练(或微调,Finetune)的一般实现。训练器还实现了非常多有利于训练效果的损失函数和量化训练工具。
-
超参数调整器(Hyper-Parameters Tuners):是指用于调整网络压缩的压速率,再训练的学习率等参数的“代理者”,可通过强化学习方法(Deep Deterministic Policy Gradient, DDPG)或遗传算法(Genetic Algorithms, GA)自动化调整上述参数,解放人频繁调参的双手。
-
实用工具集(Utils):包含大量方便获取Pytorch神经网络图信息、计算量等有用信息的实现。包含解析Pytorch模型并生成部署框架Venus所需二进制文件的实现。包括一些基础类定义。
给定有一个已训练好的原始模型,使用Utils提供的工具把模型解析成特定特定类:class ReconstructedNetwork(nn.Module)。
-
Step 1:使用数学压缩器,对原始网络(Origin Network)进行压缩,生成“压缩网络(Compressed Network)”。
-
Step 2:压缩网络提供了一个非常好的初值,使用训练器对压缩网络进行再训练,得到一个“再训练网络(Retrained Network)”
把再训练网络作为原始网络,可以循环Step 1和Step 2进行交替迭代。最终得到一个在计算量和效果平衡的最优网络。利用Utils提供的工具,直接转换该最优到.vnmodel二进制文件,用于三端的部署。
具体流程如下图:

Code:
如上图所示,初始化compressor的输入,origin_net(torch.nn.Module)是原始网络,trainloader(torch.utils.data.DataLoader)装载训练net的train数据,valloader(torch.utils.data.DataLoader)是测试net的test数据,trainset_ratio是压缩使用的训练数据的比例,compress_ratio是每一个卷积层压缩的计算量的比例,sampled_pixels_per_img是每层卷积层输入数据采样的像素数量,checkpointfolder是输出数据的目录,device指定原始网络和压缩网络所在的设备,accuracy_first是压缩过程中是否关注精度,accuracy_threshold代表关注精度的阈值。
对于低秩分解,nonlinear_case参数代表使用线性模式还是非线性模式。
对于通道剪枝,lars_alpha_init代表LARS解LASSO问题时的惩罚因子的初值。
调用compress()方法,直接压缩,压缩后的网络(torch.nn.Module)放到class.compressed_net中。
熟悉Pytorch的用户,可以快速地自定义自己需要的训练任务。然而每个训练任务的代码,很大一部分都是重复的。例如,Pytorch神经网络的训练任务,其实是一个非常固定的过程。用户使用数学压缩器的对原始网络(Origin Network)进行压缩后,如何用尽量少的代码,简单、快速、清晰地自定义一个训练任务,用于重训练压缩网络(Compressed Network),是一个值得考究的问题。使用AMC的训练器(Trainer),可以完成上述任务。另外,AMC的训练器集成了tensorboardx的调用,方便用户实时观察和记录训练过程的数据变化。
AMC的训练器(Trainer),本质上是固化了Pytorch神经网络的训练任务,通过自定义该固定训练任务中的某些步骤,来自定义某个网络的训练。一般来说,用户设计好Pytorch的神经网络(nn.Module)后,再自定义以下几个步骤,几乎可以实现所有情况的训练任务:
- 1)封装训练/测试用的数据与标签为torch.utils.data.DataLoader;
- 2)封装Loss-Function为criterion(torch.nn.Module);
- 3)如何从torch.utils.data.DataLoader的遍历枚举中获得criterion的输入。
- 4)定义训练优化器Optimizer(nn.optim)和学习率调度器scheduler(nn.optim.lr_scheduler)。
根据上述步骤,以自定义一个分类任务训练器为例子,说明如何自定义一个训练器,Code如下:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from trainer.basetrainer import *
from trainer.basedistiller import *
from utils.compressmethod import showFMLAs
# <class CIFAR10Trainer>
class CIFAR10Trainer(BaseTrainer):
# <method: __init__>
def __init__(self,
train_loader, val_loader, eval_loader,
network,
criterion, optimizer, scheduler,
epochs,
device, tbx_writer, checkpoints_folder, additional_args):
super(CIFAR10Trainer, self).__init__(
train_loader = train_loader,
val_loader = val_loader,
eval_loader = eval_loader,
network = network,
criterion = criterion,
optimizer = optimizer,
scheduler = scheduler,
epochs = epochs,
device = device,
tbx_writer = tbx_writer,
checkpoints_folder = checkpoints_folder,
additional_args = additional_args
)
pass
# <method: __init__>
# <method: __get_val_batch_preds__>
def __get_val_batch_preds__(self, batch_val_data, *args, **kwargs):
""" This is method must be overload. User overload it to get val predicts of network"""
inputs = batch_val_data[0]
if self._device:
inputs = inputs.to(self._device)
return self._network(inputs)
# <method: __get_val_batch_preds__>
# <method: __get_val_losses__>
def __get_val_losses__(self, batch_preds, batch_val_data, *args, **kwargs):
""" This is method must be overload. User overload it to get val loss using loss function of network"""
batch_val_lables = batch_val_data[1]
if self._device:
batch_val_lables = batch_val_lables.to(self._device)
# endif
loss = self._criterion(batch_preds, batch_val_lables)
return {'total_loss': loss}
# <method: __get_val_losses__>
# <method: __get_train_batch_preds__>
def __get_train_batch_preds__(self, batch_train_data, *args, **kwargs):
""" This is method must be overload. User overload it to get train predicts of network"""
inputs = batch_train_data[0]
if self._device:
inputs = inputs.to(self._device)
return self._network(inputs)
# <method: __get_train_batch_preds__>
# <method: __get_train_losses__>
def __get_train_losses__(self, batch_preds, batch_train_data, *args, **kwargs):
""" This is method must be overload. User overload it to get training loss using loss function of network"""
batch_train_lables = batch_train_data[1]
if self._device:
batch_train_lables = batch_train_lables.to(self._device)
# endif
loss = self._criterion(batch_preds, batch_train_lables)
return {'total_loss': loss}
# <method: __get_train_losses__>
# <method: __if_stop_trainning__>
def __if_stop_trainning__(self, *args, **kwargs):
for param in self._optimizer.param_groups:
lr=param["lr"]
# endfor
if lr < kwargs['stop_lr']:
return True
# endif
return False
# <method: __if_stop_trainning__>
# <class CIFAR10Trainer>
if __name__ == "__main__":
# set argument parser
parser = argparse.ArgumentParser(description='Arguments of PyTorch Model Auto Compression in CIFAR-10 Project')
parser.add_argument('--run_name', type = str, default = 'run_name')
parser.add_argument('--data_path', type = str, default = './path_to_cifar10_data')
parser.add_argument('--learning_rate', type = float, default = 1e-2)
parser.add_argument('--momentum', type = float, default = 0.9)
parser.add_argument('--weight_decay', type = float, default = 5e-4)
parser.add_argument('--patience', type = float, default = 15)
parser.add_argument('--batch_size', type = int, default = 128)
parser.add_argument('--num_workers', type = int, default = 8)
parser.add_argument('--checkpoints_folder', type = str, default = "./path_to_checkpoints_folder")
parser.add_argument('--gpu', type = int, default = 0)
parser.add_argument('--epochs', type = int, default = 1000)
args = parser.parse_args()
# set torch.utils.data.DataLoader of Train and Val ...
transform_train = transforms.Compose([
transforms.RandomRotation(degrees=5),
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.RandomVerticalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])
transform_test = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])
trainset = torchvision.datasets.CIFAR10(root=args.data_path, train=True, download=False, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=args.batch_size, shuffle=True, num_workers=args.num_workers)
valset = torchvision.datasets.CIFAR10(root=args.data_path, train=False, download=False, transform=transform_test)
valloader = torch.utils.data.DataLoader(valset, batch_size=args.batch_size, shuffle=False, num_workers=args.num_workers)
# set criterion ...
criterion = nn.CrossEntropyLoss()
# set optimizer ...
optimizer = optim.SGD(net.parameters(), lr=args.learning_rate, momentum=args.momentum, weight_decay=args.weight_decay)
# set learning rate scheduler ...
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, threshold=1e-5, patience=args.patience, min_lr=1e-6)
# set tensorboardx writer ...
tbx_writer = SummaryWriter(log_dir=os.path.join(checkpoints_folder, args.run_name))
# set device ...
device = """cuda:{}""".format(args.gpu) if torch.cuda.is_available() else 'cpu'
# set network ...
from cifar10.models.resnet import ResNet18
network = ResNet18()
# new a cifar10 trainer
trainer = CIFAR10Trainer(
train_loader = trainloader,
val_loader = valloader,
eval_loader = None,
network = network,
criterion = criterion,
optimizer = optimizer,
scheduler = scheduler,
epochs = args.epochs,
device = device,
tbx_writer = tbx_writer,
checkpoints_folder = args.checkpoints_folder,
additional_args = args
)
fdtrainer.__run__(stop_lr = 1e-6)
由
- [1] He Y, Zhang X, Sun J. Channel pruning for accelerating very deep neural networks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 1389-1397.
- [2] Zhang X, Zou J, He K, et al. Accelerating very deep convolutional networks for classification and detection[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 38(10): 1943-1955.
- [3] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning (2015)[J]. arXiv preprint arXiv:1509.02971, 2016.
- [4] Man K F, Tang K S, Kwong S. Genetic algorithms: concepts and applications [in engineering design][J]. IEEE transactions on Industrial Electronics, 1996, 43(5): 519-534.
- [5] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
- [6] Chen G, Choi W, Yu X, et al. Learning efficient object detection models with knowledge distillation[C]//Advances in Neural Information Processing Systems. 2017: 742-751.
- [7] Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images[R]. Technical report, University of Toronto, 2009.