-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Auto Model Compressor(AMC) is a framework based on Pytorch for compressing neural network's computation amount(Floating-Point-Fused-Multiply-Adds)。
AMC is mainly composed of four modules:Mathematical Compressors,Trainers,Hyper-Parameters Tuners and Utils. This is shown below in image:

-
Mathematical Compressors: It's a collection of mathematical methods for low-loss compression (reducing the floating point multiplcation and addition amount) of convolutional layers of a neural network. The framework so far supports two major types of methods: Channel Pruning and Lowrank Decomposition. ( Note: Channel Pruning is to use statistical methods to analyze the sparsity of the convolutional layer and selectively retain the feature channels that have the greatest influence on the calculation results. Reducing the feature channels means reducing the computation amount in convolutional calculation. While Lowran Decomposition is to use matrix decomposition methods(such as SVD/GSVD) to decompose a big convolutional layer into two serial convolutional layers. Both approaches have their merits. Lowran Decomposition can compress the computation amount of a convolutional layer with smaller precision loss but increase the depth of neural network, which may not friendly to inference of GPU devices. And Channel Pruning can compress the computation amount under the condition that do not change the original structure of neural network. However, under the same compression ratio, the accuracy of channel pruning is not as high as Lowrank Decomposition.In addition, experiments show that different application types of neural networks have different sensitivity to the two methods above. Choose the best one according to the actual situations. )
-
Trainers: It's general implementation for retraining (finetuning) the compressed neural network. ( Note: In this module, there are some useful loss functions for improving the training or quantification training. )
-
Hyper-Parameters Tuners: It containts two type of "Agent"(Deep Deterministic Policy Gradient(DDPG) and Genetic Algorithms(GA)) for tuning the compress ratios of neural network.
-
Utils: It containts a lot of useful tools(such as recording the graph of neural netowrk and converting network to .vnmodel for Venus inference).
Given a origin model trained in pytorch, use the tools provided by Utils to parse the model into specific class(ReconstructedNetwork).
-
Step 1:Compressed the "Origin Network" with a mathematical compressor to generate the "Compressed Network"。
-
Step 2:The "Compressed Network" provides a very good initial parameters. Use the trainer to retrain/finetune the "Compressed Network", and get a "Retrained Network".
Regard the "Retrained Network" as "Origin Network", then alternately iterate Step 1 and Step 2 until an optimal network with balance of computation amount and effect is obtained. And then use tool in Utils to converter the optimal network to .vnmodel file. The specific process is shown in the image below:
Code-1:
import torch
import torch.nn as nn
from cifar10.models.resnet import ResNet18
from graph import reconstructor
network = ResNet18()
reconstructor.insertCaptureBoundaryStart(network)
oup = network(torch.rand(1, 3, 32, 32))
reconstructor.insertCaptureBoundaryEnd()
reconstructed_network, graph = reconstructor.getReconstructedNetwork(
ifDraw=True,
drawPath=os.path.join('./', "origin_net")
)
As Code-1 shown, the only thing you have to do is inserting CaptureBoundary before and after the network forward process. reconstructed_network is a object of class ReconstructedNetwork that stored the graph information of the origin network.
Note: Following types of layer (torch. Nn.Module) or function operations are supported at this stage:
Protogenetic types in pythorch:
- torch.nn.modules.activation.ReLU
- torch.nn.modules.activation.ReLU6
- torch.nn.modules.activation.Tanh
- torch.nn.modules.activation.Sigmoid
- torch.nn.modules.conv.Conv2d
- torch.nn.modules.batchnorm.BatchNorm2d
- torch.nn.modules.pooling.MaxPool2d
- torch.nn.modules.pooling.AvgPool2d
- torch.nn.modules.pooling.AdaptiveAvgPool2d
- torch.nn.modules.upsampling.Upsample
- torch.nn.modules.Linear
- torch.nn.functional.max_pool2d
- torch.nn.functional.avg_pool2d
- torch.nn.functional.upsample
- torch.nn.functional.interpolate
- torch.nn.functional.pixel_shuffle
- torch.nn.functional.relu
- torch.nn.functional.relu6
- torch.nn.functional.tanh
- torch.nn.functional.sigmoid
- torch.add
- torch.cat
- torch.Tensor.view
- torch.Tensor.reshape
- torch.Tensor.permute
- torch.Tensor.squeeze
- torch.Tensor._add_
- torch.Tensor._iadd_
Custom types(see graph/modules.py):
- PrunningSampler
- TensorSqueeze
- TensorTranspose
- TensorReshape
- Interpolate
- ElemAdd
- Concatenate
- PixelShuffle
Code-2:
if compress_algorithm=='lowrankdecomposition':
if args.verbose:
print("Using Low-Rank Decomposition Algorithom to compress.\n")
ca = LowRankDecompostion(
origin_net = net,
trainloader = trainloader,
valloader = valloader,
trainset_ratio = args.compress_trainset_ratio,
sampled_pixels_per_img = args.compress_sampled_pixels_per_img,
compress_ratio = args.compress_ratios,
checkpointfolder = args.checkpoint_folder,
device = device,
verbose = args.verbose,
drawgraph = args.compress_draw_graph,
accuracy_first = True if args.compress_acc_thresh > 0 else False,
accuracy_threshold = args.compress_acc_thresh,
nonlinear_case = args.compress_nonlinear_case,
args = args
)
elif compress_algorithm=='channelprunning':
if args.verbose:
print("Using Channel Prunning Algorithom to compress.\n")
ca = ChannelPrunning(
origin_net = net,
trainloader = trainloader,
valloader = valloader,
trainset_ratio = args.compress_trainset_ratio,
sampled_pixels_per_img = args.compress_sampled_pixels_per_img,
compress_ratio = args.compress_ratios,
checkpointfolder = args.checkpoint_folder,
device = device,
drawgraph = args.compress_draw_graph,
verbose = args.verbose,
lars_alpha_init = args.lars_alpha_init,
accuracy_first = True if args.compress_acc_thresh > 0 else False,
accuracy_threshold = args.compress_acc_thresh,
args = args
)
else:
RuntimeError("""Error, Unknow Compress Algorithm : {}""".format(compress_algorithm))
#
origin_net = ca.origin_net
ca.compress()
compressed_net = ca.compressed_net
As Code-2 shown above:
- origin_net is a torch.nn.Module, is origin network;
- trainloader is a torch.utils.data.DataLoader that stored the train data for training net;
- valloader and testloader are torch.utils.data.DataLoader that stored the val/test data for validating the training net;
- trainset_ratio is the proportion of training data used in compression;
- compress_ratio is the compress ratios of each convolutional layer in network;
- sampled_pixels_per_img is the number of pixels sampled from input data of each convolution layer;
- checkpointfolder is a directory for output data; device specifies the running device of the compression and network inference;
- accuracy_first is whether the accuracy is concerned in the compression process;
- accuracy_threshold is the threadhold when the accuracy is concerned in the compression process;
For Lowrank Decomposition,arugment nonlinear_case means using linear mode or nonlinear mode.
For Channel Pruning,lars_alpha_init means the initial value of the penalty factor of LARS。
Call the compress() method,after compression completed, the compressed network was stored in class.compressed_net中。
熟悉Pytorch的用户,可以快速地自定义自己需要的训练任务。然而每个训练任务的代码,很大一部分都是重复的。例如,Pytorch神经网络的训练任务,其实是一个非常固定的过程。用户使用数学压缩器的对原始网络(Origin Network)进行压缩后,如何用尽量少的代码,简单、快速、清晰地自定义一个训练任务,用于重训练压缩网络(Compressed Network),是一个值得考究的问题。使用AMC的训练器(Trainer),可以完成上述任务。另外,AMC的训练器集成了tensorboardx的调用,方便用户实时观察和记录训练过程的数据变化。PS:tensorboardx的使用可以参照tensorboardx中文文档。另外,最新版的pytorch已集成了tensorboard,但api与tensorboardx基本相同,需要改变的东西基本不变。
AMC的训练器(Trainer),本质上是固化了Pytorch神经网络的训练任务,通过自定义该固定训练任务中的某些步骤,来自定义某个网络的训练。一般来说,用户设计好Pytorch的神经网络(nn.Module)后,再自定义以下几个步骤,几乎可以实现所有情况的训练任务:
- 1)封装训练/测试用的数据与标签为torch.utils.data.DataLoader;
- 2)封装Loss-Function为criterion(torch.nn.Module);
- 3)如何从torch.utils.data.DataLoader的遍历枚举中获得criterion的输入。
- 4)定义训练优化器Optimizer(nn.optim)和学习率调度器scheduler(nn.optim.lr_scheduler)。
根据上述步骤,以自定义一个分类任务训练器为例子,说明如何自定义一个训练器:
Code-3:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from trainer.basetrainer import *
from trainer.basedistiller import *
from utils.compressmethod import showFMLAs
# <class CIFAR10Trainer>
class CIFAR10Trainer(BaseTrainer):
# <method: __init__>
def __init__(self,
train_loader, val_loader, eval_loader,
network,
criterion, optimizer, scheduler,
epochs,
device, tbx_writer, checkpoints_folder, additional_args):
super(CIFAR10Trainer, self).__init__(
train_loader = train_loader,
val_loader = val_loader,
eval_loader = eval_loader,
network = network,
criterion = criterion,
optimizer = optimizer,
scheduler = scheduler,
epochs = epochs,
device = device,
tbx_writer = tbx_writer,
checkpoints_folder = checkpoints_folder,
additional_args = additional_args
)
pass
# <method: __init__>
# <method: __get_val_batch_preds__>
def __get_val_batch_preds__(self, batch_val_data, *args, **kwargs):
""" This is method must be overload. User overload it to get val predicts of network"""
inputs = batch_val_data[0]
if self._device:
inputs = inputs.to(self._device)
return self._network(inputs)
# <method: __get_val_batch_preds__>
# <method: __get_val_losses__>
def __get_val_losses__(self, batch_preds, batch_val_data, *args, **kwargs):
""" This is method must be overload. User overload it to get val loss using loss function of network"""
batch_val_lables = batch_val_data[1]
if self._device:
batch_val_lables = batch_val_lables.to(self._device)
# endif
loss = self._criterion(batch_preds, batch_val_lables)
return {'total_loss': loss}
# <method: __get_val_losses__>
# <method: __get_train_batch_preds__>
def __get_train_batch_preds__(self, batch_train_data, *args, **kwargs):
""" This is method must be overload. User overload it to get train predicts of network"""
inputs = batch_train_data[0]
if self._device:
inputs = inputs.to(self._device)
return self._network(inputs)
# <method: __get_train_batch_preds__>
# <method: __get_train_losses__>
def __get_train_losses__(self, batch_preds, batch_train_data, *args, **kwargs):
""" This is method must be overload. User overload it to get training loss using loss function of network"""
batch_train_lables = batch_train_data[1]
if self._device:
batch_train_lables = batch_train_lables.to(self._device)
# endif
loss = self._criterion(batch_preds, batch_train_lables)
return {'total_loss': loss}
# <method: __get_train_losses__>
# <method: __if_stop_trainning__>
def __if_stop_trainning__(self, *args, **kwargs):
for param in self._optimizer.param_groups:
lr=param["lr"]
# endfor
if lr < kwargs['stop_lr']:
return True
# endif
return False
# <method: __if_stop_trainning__>
# <class CIFAR10Trainer>
if __name__ == "__main__":
# set argument parser
parser = argparse.ArgumentParser(description='Arguments of PyTorch Model Auto Compression in CIFAR-10 Project')
parser.add_argument('--run_name', type = str, default = 'run_name')
parser.add_argument('--data_path', type = str, default = './path_to_cifar10_data')
parser.add_argument('--learning_rate', type = float, default = 1e-2)
parser.add_argument('--momentum', type = float, default = 0.9)
parser.add_argument('--weight_decay', type = float, default = 5e-4)
parser.add_argument('--patience', type = float, default = 15)
parser.add_argument('--batch_size', type = int, default = 128)
parser.add_argument('--num_workers', type = int, default = 8)
parser.add_argument('--checkpoints_folder', type = str, default = "./path_to_checkpoints_folder")
parser.add_argument('--gpu', type = int, default = 0)
parser.add_argument('--epochs', type = int, default = 1000)
args = parser.parse_args()
# set torch.utils.data.DataLoader of Train and Val ...
transform_train = transforms.Compose([
transforms.RandomRotation(degrees=5),
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.RandomVerticalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])
transform_test = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])
trainset = torchvision.datasets.CIFAR10(root=args.data_path, train=True, download=False, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=args.batch_size, shuffle=True, num_workers=args.num_workers)
valset = torchvision.datasets.CIFAR10(root=args.data_path, train=False, download=False, transform=transform_test)
valloader = torch.utils.data.DataLoader(valset, batch_size=args.batch_size, shuffle=False, num_workers=args.num_workers)
# set criterion ...
criterion = nn.CrossEntropyLoss()
# set optimizer ...
optimizer = optim.SGD(net.parameters(), lr=args.learning_rate, momentum=args.momentum, weight_decay=args.weight_decay)
# set learning rate scheduler ...
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, threshold=1e-5, patience=args.patience, min_lr=1e-6)
# set tensorboardx writer ...
tbx_writer = SummaryWriter(log_dir=os.path.join(checkpoints_folder, args.run_name))
# set device ...
device = """cuda:{}""".format(args.gpu) if torch.cuda.is_available() else 'cpu'
# set network ...
from cifar10.models.resnet import ResNet18
network = ResNet18()
# new a cifar10 trainer
trainer = CIFAR10Trainer(
train_loader = trainloader,
val_loader = valloader,
eval_loader = None,
network = network,
criterion = criterion,
optimizer = optimizer,
scheduler = scheduler,
epochs = args.epochs,
device = device,
tbx_writer = tbx_writer,
checkpoints_folder = args.checkpoints_folder,
additional_args = args
)
trainer.__run__(stop_lr = 1e-6)
上述Code-3自定义了一个名为CIFAR10Trainer(继承BaseTrainer)的类,并重载(overload)了以下函数:
- get_val_batch_preds: 此函数的输入参数“batch_val_data”表示valloader每次遍历枚举时产生的一个batch的数据。此函数定义了如何从batch_val_data挑选输入数据,并输入神经网络进行前向运算,最终返回计算val-loss时所需的Prediction数据。
- get_val_losses:此函数的输入参数“batch_preds”表示__get_val_batch_preds__函数返回的prediction数据,而“batch_val_data”表示valloader每次遍历枚举时产生的一个batch的数据。此函数定义了如何从两个输入数据产生loss,并返回一个字典,字典必须包含一个键为‘total_loss'的loss。
- get_train_batch_preds:此函数的输入参数“batch_train_data”表示,trainloader每次遍历枚举时,产生的一个batch的数据。此函数定义了如何从batch_train_data挑选输入数据,并输入神经网络进行前向运算,最终返回计算train-loss时所需的Prediction数据。
- get_train_losses:此函数的输入参数“batch_preds”表示__get_train_batch_preds__函数返回的prediction数据,而“batch_train_data”表示trainloader每次遍历枚举时产生的一个batch的数据。此函数定义了如何从两个输入数据产生loss,并返回一个字典,字典必须包含一个键为‘total_loss'的loss,用于autograd机制的backward。
- if_stop_trainning:此函数定义了每个epoch的结束时用于判断是否停止训练任务的判断条件。return True,表示终止训练任务,return False,则表示继续训练。
训练时,创建一个CIFAR10Trainer对象,输入自定义的train_loader, val_loader, network, criterion, optimizer, scheduler,然后运行__run__函数,即可开始训练任务。
PS:如果需要输入一些自定义的参数,可以通过两种方式:
- 创建CIFAR10Trainer对象时,输入additional_args(一个带有自定义属性的类)
- 参照上述Code-3,__run__函数调用时,以格式“<参数名>=<参数值>”的方式输入参数,然后在自定义函数里(如__if_stop_trainning__函数),用“kwargs[“<参数名>”]”获得自定义参数的值。
6. How to export the compressed/retrained model to venus for deploying to supported platforms(Windows/Andorid/iOS/Linux)?
The compressed netowrk(a object of class ReconstructedNetwork) can be saved into a .pkl file and .vnmodel file. You can do this by using these methods of class ReconstructedNetwork below:
- save(pkl="./path_to_file.pkl"):It is used to save a ReconstructedNetwork into a .pkl file.
- save_to_vnmodel(self, *inps, **wargs): It is used to export a ReconstructedNetwork to a .vnmodel. A .vnmoel file can be read by venus netloader.
- [1] He Y, Zhang X, Sun J. Channel pruning for accelerating very deep neural networks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 1389-1397.
- [2] Zhang X, Zou J, He K, et al. Accelerating very deep convolutional networks for classification and detection[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 38(10): 1943-1955.
- [3] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning (2015)[J]. arXiv preprint arXiv:1509.02971, 2016.
- [4] Man K F, Tang K S, Kwong S. Genetic algorithms: concepts and applications [in engineering design][J]. IEEE transactions on Industrial Electronics, 1996, 43(5): 519-534.
- [5] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
- [6] Chen G, Choi W, Yu X, et al. Learning efficient object detection models with knowledge distillation[C]//Advances in Neural Information Processing Systems. 2017: 742-751.
- [7] Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images[R]. Technical report, University of Toronto, 2009.