Skip to content
moyuda edited this page Jul 4, 2019 · 27 revisions

1. Introduction

Auto Model Compressor(AMC) is a framework based on Pytorch for compressing neural network's computation amount(Floating-Point-Fused-Multiply-Adds)。

AMC is mainly composed of four modules:Mathematical Compressors,Trainers,Hyper-Parameters Tuners and Utils. This is shown below in image:

  • Mathematical Compressors: It's a collection of mathematical methods for low-loss compression (reducing the floating point multiplcation and addition amount) of convolutional layers of a neural network. The framework so far supports two major types of methods: Channel Pruning and Lowrank Decomposition. ( Note: Channel Pruning is to use statistical methods to analyze the sparsity of the convolutional layer and selectively retain the feature channels that have the greatest influence on the calculation results. Reducing the feature channels means reducing the computation amount in convolutional calculation. While Lowran Decomposition is to use matrix decomposition methods(such as SVD/GSVD) to decompose a big convolutional layer into two serial convolutional layers. Both approaches have their merits. Lowran Decomposition can compress the computation amount of a convolutional layer with smaller precision loss but increase the depth of neural network, which may not friendly to inference of GPU devices. And Channel Pruning can compress the computation amount under the condition that do not change the original structure of neural network. However, under the same compression ratio, the accuracy of channel pruning is not as high as Lowrank Decomposition.In addition, experiments show that different application types of neural networks have different sensitivity to the two methods above. Choose the best one according to the actual situations. )

  • Trainers: It's general implementation for retraining (finetuning) the compressed neural network. ( Note: In this module, there are some useful loss functions for improving the training or quantification training. )

  • Hyper-Parameters Tuners: It containts two type of "Agent"(Deep Deterministic Policy Gradient(DDPG) and Genetic Algorithms(GA)) for tuning the compress ratios of neural network.

  • Utils: It containts a lot of useful tools(such as recording the graph of neural netowrk and converting network to .vnmodel for Venus inference).

2. Process of Model Compression

Given a origin model trained in pytorch, use the tools provided by Utils to parse the model into specific class(ReconstructedNetwork).

  • Step 1:Compressed the origin network with a mathematical compressor to generate the "Compressed Network"。

  • Step 2:The "Compressed Network" provides a very good initial parameters. Use the trainer to retrain/finetune the "Compressed Network", and get a "Retrained Network".

把再训练网络作为原始网络,可以循环Step 1和Step 2进行交替迭代。最终得到一个在计算量和效果平衡的最优网络。利用Utils提供的工具,直接转换该最优到.vnmodel二进制文件,用于三端的部署。

具体流程如下图:

3. 如何获得一个神经网络的图信息?

ReconstructedNetwork类(继承nn.Module),核心是一个nn.ModuleDict,并有两个属性:input_nodes和layer_nodes,表示神经网络输入节点的信息,以及神经网络每一层的信息。Code-1演示了如何把一个普通的pytorch神经网络(nn.Module)转为ReconstructedNetwork类。

Code-1:

import torch
import torch.nn as nn
from cifar10.models.resnet import ResNet18
from graph import reconstructor
network = ResNet18()
reconstructor.insertCaptureBoundaryStart(network)
oup = network(torch.rand(1, 3, 32, 32))
reconstructor.insertCaptureBoundaryEnd()
reconstructed_network, graph = reconstructor.getReconstructedNetwork(
    ifDraw=True, 
    drawPath=os.path.join('./', "origin_net")
    )

如Code-1所示,只需要在网络前向过程的前后,插入CaptureBoundary,即可完成神经网络图信息的获取。reconstructed_network是一个ReconstructedNetwork类的对象。

注意,现阶段支持以下类型的层(torch.nn.Module)或函数操作:

pythorch原生:

  • torch.nn.modules.activation.ReLU
  • torch.nn.modules.activation.ReLU6
  • torch.nn.modules.activation.Tanh
  • torch.nn.modules.activation.Sigmoid
  • torch.nn.modules.conv.Conv2d
  • torch.nn.modules.batchnorm.BatchNorm2d
  • torch.nn.modules.pooling.MaxPool2d
  • torch.nn.modules.pooling.AvgPool2d
  • torch.nn.modules.pooling.AdaptiveAvgPool2d
  • torch.nn.modules.upsampling.Upsample
  • torch.nn.modules.Linear
  • torch.nn.functional.max_pool2d
  • torch.nn.functional.avg_pool2d
  • torch.nn.functional.upsample
  • torch.nn.functional.interpolate
  • torch.nn.functional.pixel_shuffle
  • torch.nn.functional.relu
  • torch.nn.functional.relu6
  • torch.nn.functional.tanh
  • torch.nn.functional.sigmoid
  • torch.add
  • torch.cat
  • torch.Tensor.view
  • torch.Tensor.reshape
  • torch.Tensor.permute
  • torch.Tensor.squeeze
  • torch.Tensor._add_
  • torch.Tensor._iadd_

自定义类型(详情见代码graph.modules):

  • PrunningSampler
  • TensorSqueeze
  • TensorTranspose
  • TensorReshape
  • Interpolate
  • ElemAdd
  • Concatenate
  • PixelShuffle

4. 如何使用数学压缩器?

Code-2:

if compress_algorithm=='lowrankdecomposition':
    if args.verbose:
        print("Using Low-Rank Decomposition Algorithom to compress.\n")
    ca = LowRankDecompostion(
            origin_net              = net, 
            trainloader             = trainloader, 
            valloader               = valloader,
            trainset_ratio          = args.compress_trainset_ratio, 
            sampled_pixels_per_img  = args.compress_sampled_pixels_per_img,
            compress_ratio          = args.compress_ratios, 
            checkpointfolder        = args.checkpoint_folder, 
            device                  = device, 
            verbose                 = args.verbose,
            drawgraph               = args.compress_draw_graph,
            accuracy_first          = True if args.compress_acc_thresh > 0 else False,
            accuracy_threshold      = args.compress_acc_thresh,
            nonlinear_case          = args.compress_nonlinear_case,
            args                    = args
            )
elif compress_algorithm=='channelprunning':
    if args.verbose:
        print("Using Channel Prunning Algorithom to compress.\n")
    ca = ChannelPrunning(
        origin_net              = net,                                 
        trainloader             = trainloader,
        valloader               = valloader,
        trainset_ratio          = args.compress_trainset_ratio, 
        sampled_pixels_per_img  = args.compress_sampled_pixels_per_img,
        compress_ratio          = args.compress_ratios, 
        checkpointfolder        = args.checkpoint_folder, 
        device                  = device, 
        drawgraph               = args.compress_draw_graph,
        verbose                 = args.verbose,
        lars_alpha_init         = args.lars_alpha_init,
        accuracy_first          = True if args.compress_acc_thresh > 0 else False,
        accuracy_threshold      = args.compress_acc_thresh,
        args                    = args
        )
else:
    RuntimeError("""Error, Unknow Compress Algorithm : {}""".format(compress_algorithm))
# 
origin_net = ca.origin_net
ca.compress()
compressed_net = ca.compressed_net

如Code-2所示,初始化compressor的输入,origin_net(torch.nn.Module)是原始网络,trainloader(torch.utils.data.DataLoader)装载训练net的train数据,valloader(torch.utils.data.DataLoader)是测试net的test数据,trainset_ratio是压缩使用的训练数据的比例,compress_ratio是每一个卷积层压缩的计算量的比例,sampled_pixels_per_img是每层卷积层输入数据采样的像素数量,checkpointfolder是输出数据的目录,device指定原始网络和压缩网络所在的设备,accuracy_first是压缩过程中是否关注精度,accuracy_threshold代表关注精度的阈值。

对于低秩分解,nonlinear_case参数代表使用线性模式还是非线性模式。

对于通道剪枝,lars_alpha_init代表LARS解LASSO问题时的惩罚因子的初值。

调用compress()方法,直接压缩,压缩后的网络(torch.nn.Module)放到class.compressed_net中。

5. 如何自定义训练器?

熟悉Pytorch的用户,可以快速地自定义自己需要的训练任务。然而每个训练任务的代码,很大一部分都是重复的。例如,Pytorch神经网络的训练任务,其实是一个非常固定的过程。用户使用数学压缩器的对原始网络(Origin Network)进行压缩后,如何用尽量少的代码,简单、快速、清晰地自定义一个训练任务,用于重训练压缩网络(Compressed Network),是一个值得考究的问题。使用AMC的训练器(Trainer),可以完成上述任务。另外,AMC的训练器集成了tensorboardx的调用,方便用户实时观察和记录训练过程的数据变化。PS:tensorboardx的使用可以参照tensorboardx中文文档。另外,最新版的pytorch已集成了tensorboard,但api与tensorboardx基本相同,需要改变的东西基本不变。

AMC的训练器(Trainer),本质上是固化了Pytorch神经网络的训练任务,通过自定义该固定训练任务中的某些步骤,来自定义某个网络的训练。一般来说,用户设计好Pytorch的神经网络(nn.Module)后,再自定义以下几个步骤,几乎可以实现所有情况的训练任务:

  • 1)封装训练/测试用的数据与标签为torch.utils.data.DataLoader;
  • 2)封装Loss-Function为criterion(torch.nn.Module);
  • 3)如何从torch.utils.data.DataLoader的遍历枚举中获得criterion的输入。
  • 4)定义训练优化器Optimizer(nn.optim)和学习率调度器scheduler(nn.optim.lr_scheduler)。

根据上述步骤,以自定义一个分类任务训练器为例子,说明如何自定义一个训练器:

Code-3:

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

from trainer.basetrainer import *
from trainer.basedistiller import *
from utils.compressmethod import showFMLAs

# <class CIFAR10Trainer>
class CIFAR10Trainer(BaseTrainer):
    
    # <method: __init__>
    def __init__(self, 
        train_loader,  val_loader,  eval_loader, 
        network, 
        criterion, optimizer, scheduler, 
        epochs, 
        device, tbx_writer, checkpoints_folder, additional_args):
        super(CIFAR10Trainer, self).__init__(
            train_loader = train_loader, 
            val_loader = val_loader, 
            eval_loader = eval_loader, 
            network = network, 
            criterion = criterion, 
            optimizer = optimizer, 
            scheduler = scheduler, 
            epochs = epochs, 
            device = device, 
            tbx_writer = tbx_writer, 
            checkpoints_folder = checkpoints_folder,
            additional_args = additional_args
            )
        pass
    # <method: __init__>
    
    # <method: __get_val_batch_preds__>
    def __get_val_batch_preds__(self, batch_val_data, *args, **kwargs):
        """ This is method must be overload. User overload it to get val predicts of network"""
        inputs = batch_val_data[0]
        if self._device:
            inputs = inputs.to(self._device)
        return self._network(inputs)
    # <method: __get_val_batch_preds__>
    
    # <method: __get_val_losses__>
    def __get_val_losses__(self, batch_preds, batch_val_data, *args, **kwargs):
        """ This is method must be overload. User overload it to get val loss using loss function of network"""
        batch_val_lables = batch_val_data[1]
        if self._device:
            batch_val_lables = batch_val_lables.to(self._device)
        #   endif
        loss = self._criterion(batch_preds, batch_val_lables)
        return {'total_loss': loss}
    # <method: __get_val_losses__>              
    
    # <method: __get_train_batch_preds__>
    def __get_train_batch_preds__(self, batch_train_data, *args, **kwargs):
        """ This is method must be overload. User overload it to get train predicts of network"""
        inputs = batch_train_data[0]
        if self._device:
            inputs = inputs.to(self._device)
        return self._network(inputs)
    # <method: __get_train_batch_preds__>
    
    # <method: __get_train_losses__>
    def __get_train_losses__(self, batch_preds, batch_train_data, *args, **kwargs):
        """ This is method must be overload. User overload it to get training loss using loss function of network"""
        batch_train_lables = batch_train_data[1]
        if self._device:
            batch_train_lables = batch_train_lables.to(self._device)
        #   endif
        loss = self._criterion(batch_preds, batch_train_lables)                
        return {'total_loss': loss}
    # <method: __get_train_losses__>

    # <method: __if_stop_trainning__>
    def __if_stop_trainning__(self, *args, **kwargs):
        for param in self._optimizer.param_groups:
            lr=param["lr"]
        #   endfor
        if lr < kwargs['stop_lr']:
            return True
        #   endif
        return False
    # <method: __if_stop_trainning__>

# <class CIFAR10Trainer>

if __name__ == "__main__":
    # set argument parser
    parser = argparse.ArgumentParser(description='Arguments of PyTorch Model Auto Compression in CIFAR-10 Project')
    parser.add_argument('--run_name', type = str, default = 'run_name')
    parser.add_argument('--data_path', type = str, default = './path_to_cifar10_data')
    parser.add_argument('--learning_rate', type = float, default = 1e-2)
    parser.add_argument('--momentum', type = float, default = 0.9)
    parser.add_argument('--weight_decay', type = float, default = 5e-4)
    parser.add_argument('--patience', type = float, default = 15)
    parser.add_argument('--batch_size', type = int, default = 128)
    parser.add_argument('--num_workers', type = int, default = 8)
    parser.add_argument('--checkpoints_folder', type = str, default = "./path_to_checkpoints_folder")
    parser.add_argument('--gpu', type = int, default = 0)
    parser.add_argument('--epochs', type = int, default = 1000)
    args = parser.parse_args()
    # set torch.utils.data.DataLoader of Train and Val ...
    transform_train = transforms.Compose([
        transforms.RandomRotation(degrees=5),
        transforms.RandomCrop(32, padding=4),  
        transforms.RandomHorizontalFlip(),  
        transforms.RandomVerticalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])
    transform_test = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])
    trainset = torchvision.datasets.CIFAR10(root=args.data_path, train=True, download=False, transform=transform_train)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=args.batch_size, shuffle=True, num_workers=args.num_workers)
    valset = torchvision.datasets.CIFAR10(root=args.data_path, train=False, download=False, transform=transform_test)
    valloader = torch.utils.data.DataLoader(valset, batch_size=args.batch_size, shuffle=False, num_workers=args.num_workers)
    # set criterion ...
    criterion = nn.CrossEntropyLoss()
    # set optimizer ... 
    optimizer = optim.SGD(net.parameters(), lr=args.learning_rate, momentum=args.momentum, weight_decay=args.weight_decay)
    # set learning rate scheduler ...
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, threshold=1e-5, patience=args.patience, min_lr=1e-6)
    # set tensorboardx writer ...
    tbx_writer = SummaryWriter(log_dir=os.path.join(checkpoints_folder, args.run_name))
    # set device ...
    device = """cuda:{}""".format(args.gpu) if torch.cuda.is_available() else 'cpu'
    # set network ...
    from cifar10.models.resnet import ResNet18
    network = ResNet18()

    # new a cifar10 trainer
    trainer = CIFAR10Trainer(
        train_loader = trainloader,
        val_loader = valloader,
        eval_loader = None,
        network = network,
        criterion = criterion,
        optimizer = optimizer,
        scheduler = scheduler,
        epochs = args.epochs,
        device = device,
        tbx_writer = tbx_writer,
        checkpoints_folder = args.checkpoints_folder,
        additional_args = args
        )
    trainer.__run__(stop_lr = 1e-6)

上述Code-3自定义了一个名为CIFAR10Trainer(继承BaseTrainer)的类,并重载(overload)了以下函数:

  • get_val_batch_preds: 此函数的输入参数“batch_val_data”表示valloader每次遍历枚举时产生的一个batch的数据。此函数定义了如何从batch_val_data挑选输入数据,并输入神经网络进行前向运算,最终返回计算val-loss时所需的Prediction数据。
  • get_val_losses:此函数的输入参数“batch_preds”表示__get_val_batch_preds__函数返回的prediction数据,而“batch_val_data”表示valloader每次遍历枚举时产生的一个batch的数据。此函数定义了如何从两个输入数据产生loss,并返回一个字典,字典必须包含一个键为‘total_loss'的loss。
  • get_train_batch_preds:此函数的输入参数“batch_train_data”表示,trainloader每次遍历枚举时,产生的一个batch的数据。此函数定义了如何从batch_train_data挑选输入数据,并输入神经网络进行前向运算,最终返回计算train-loss时所需的Prediction数据。
  • get_train_losses:此函数的输入参数“batch_preds”表示__get_train_batch_preds__函数返回的prediction数据,而“batch_train_data”表示trainloader每次遍历枚举时产生的一个batch的数据。此函数定义了如何从两个输入数据产生loss,并返回一个字典,字典必须包含一个键为‘total_loss'的loss,用于autograd机制的backward。
  • if_stop_trainning:此函数定义了每个epoch的结束时用于判断是否停止训练任务的判断条件。return True,表示终止训练任务,return False,则表示继续训练。

训练时,创建一个CIFAR10Trainer对象,输入自定义的train_loader, val_loader, network, criterion, optimizer, scheduler,然后运行__run__函数,即可开始训练任务。

PS:如果需要输入一些自定义的参数,可以通过两种方式:

  • 创建CIFAR10Trainer对象时,输入additional_args(一个带有自定义属性的类)
  • 参照上述Code-3,__run__函数调用时,以格式“<参数名>=<参数值>”的方式输入参数,然后在自定义函数里(如__if_stop_trainning__函数),用“kwargs[“<参数名>”]”获得自定义参数的值。

6. 如何导出压缩后的模型,或使用Venus部署到三端(Windows/Andoird/iOS)?

压缩后的网络,可以保存为.pkl,或者.json,或者.vnmodel文件。由于压缩后的网络是一个ReconstructedNetwork类的对象,因此可以通过以下类的方法导出相应的格式:

  • 类方法 ReconstructedNetwork.save(pkl="./path_to_file.pkl"):可以存储为.pkl文件。.pkl可以方便python进行重新加载该网络。
  • 类方法 ReconstructedNetwork.save_to_json(file="./path_to_file.json"):可以导出为.json文件,可读性较强。
  • 类方法 ReconstructedNetwork.save_to_vnmodel(pkl = "./path_to_file.vnmodel", model_base_size_hw = (input_image_height, input_image_width), model_id = 0):可以导出为.vnmodel文件,用于Venus部署到三端。

7. 参考文献

Clone this wiki locally