-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Auto Model Compressor(AMC) is a framework based on Pytorch for compressing neural network's computation amount(Floating-Point-Fused-Multiply-Adds).
AMC is mainly composed of four modules:Mathematical Compressors, Trainers, Hyper-Parameters Tuners and Utils. This is shown below in image:

-
Mathematical Compressors: It's a collection of mathematical methods for low-loss compression (reducing the floating point multiplcation and addition amount) of convolutional layers of a neural network. The framework so far supports two major types of methods: Channel Pruning and Lowrank Decomposition. ( Note: Channel Pruning is to use statistical methods to analyze the sparsity of the convolutional layer and selectively retain the feature channels that have the greatest influence on the calculation results. Reducing the feature channels means reducing the computation amount in convolutional calculation. While Lowran Decomposition is to use matrix decomposition methods(such as SVD/GSVD) to decompose a big convolutional layer into two serial convolutional layers. Both approaches have their merits. Lowran Decomposition can compress the computation amount of a convolutional layer with smaller precision loss but increase the depth of neural network, which may not friendly to inference of GPU devices. And Channel Pruning can compress the computation amount under the condition that do not change the original structure of neural network. However, under the same compression ratio, the accuracy of channel pruning is not as high as Lowrank Decomposition.In addition, experiments show that different application types of neural networks have different sensitivity to the two methods above. Choose the best one according to the actual situations. )
-
Trainers: It's general implementation for retraining (finetuning) the compressed neural network. ( Note: In this module, there are some useful loss functions for improving the training or quantification training. )
-
Hyper-Parameters Tuners: It containts two type of "Agent"(Deep Deterministic Policy Gradient(DDPG) and Genetic Algorithms(GA)) for tuning the compress ratios of neural network.
-
Utils: It containts a lot of useful tools(such as recording the graph of neural netowrk and converting network to .vnmodel for Venus inference).
Given a origin model trained in pytorch, use the tools provided by Utils to parse the model into specific class(ReconstructedNetwork).
-
Step 1:Compressed the "Origin Network" with a mathematical compressor to generate the "Compressed Network".
-
Step 2:The "Compressed Network" provides a very good initial parameters. Use the trainer to retrain/finetune the "Compressed Network", and get a "Retrained Network".
Regard the "Retrained Network" as "Origin Network", then alternately iterate Step 1 and Step 2 until an optimal network with balance of computation amount and effect is obtained. And then use tool in Utils to converter the optimal network to .vnmodel file. The specific process is shown in the image below:
Code-1:
import torch
import torch.nn as nn
from cifar10.models.resnet import ResNet18
from graph import reconstructor
network = ResNet18()
reconstructor.insertCaptureBoundaryStart(network)
oup = network(torch.rand(1, 3, 32, 32))
reconstructor.insertCaptureBoundaryEnd()
reconstructed_network, graph = reconstructor.getReconstructedNetwork(
ifDraw=True,
drawPath=os.path.join('./', "origin_net")
)
As Code-1 shown, the only thing you have to do is inserting CaptureBoundary before and after the network forward process. reconstructed_network is a object of class ReconstructedNetwork that stored the graph information of the origin network.
Note: Following types of layer (torch. Nn.Module) or function operations are supported at this stage:
Protogenetic types in pythorch:
- torch.nn.modules.activation.ReLU
- torch.nn.modules.activation.ReLU6
- torch.nn.modules.activation.Tanh
- torch.nn.modules.activation.Sigmoid
- torch.nn.modules.conv.Conv2d
- torch.nn.modules.batchnorm.BatchNorm2d
- torch.nn.modules.pooling.MaxPool2d
- torch.nn.modules.pooling.AvgPool2d
- torch.nn.modules.pooling.AdaptiveAvgPool2d
- torch.nn.modules.upsampling.Upsample
- torch.nn.modules.Linear
- torch.nn.functional.max_pool2d
- torch.nn.functional.avg_pool2d
- torch.nn.functional.upsample
- torch.nn.functional.interpolate
- torch.nn.functional.pixel_shuffle
- torch.nn.functional.relu
- torch.nn.functional.relu6
- torch.nn.functional.tanh
- torch.nn.functional.sigmoid
- torch.add
- torch.cat
- torch.Tensor.view
- torch.Tensor.reshape
- torch.Tensor.permute
- torch.Tensor.squeeze
- torch.Tensor._add_
- torch.Tensor._iadd_
Custom types(see graph/modules.py):
- PrunningSampler
- TensorSqueeze
- TensorTranspose
- TensorReshape
- Interpolate
- ElemAdd
- Concatenate
- PixelShuffle
Code-2:
if compress_algorithm=='lowrankdecomposition':
if args.verbose:
print("Using Low-Rank Decomposition Algorithom to compress.\n")
ca = LowRankDecompostion(
origin_net = net,
trainloader = trainloader,
valloader = valloader,
trainset_ratio = args.compress_trainset_ratio,
sampled_pixels_per_img = args.compress_sampled_pixels_per_img,
compress_ratio = args.compress_ratios,
checkpointfolder = args.checkpoint_folder,
device = device,
verbose = args.verbose,
drawgraph = args.compress_draw_graph,
accuracy_first = True if args.compress_acc_thresh > 0 else False,
accuracy_threshold = args.compress_acc_thresh,
nonlinear_case = args.compress_nonlinear_case,
args = args
)
elif compress_algorithm=='channelprunning':
if args.verbose:
print("Using Channel Prunning Algorithom to compress.\n")
ca = ChannelPrunning(
origin_net = net,
trainloader = trainloader,
valloader = valloader,
trainset_ratio = args.compress_trainset_ratio,
sampled_pixels_per_img = args.compress_sampled_pixels_per_img,
compress_ratio = args.compress_ratios,
checkpointfolder = args.checkpoint_folder,
device = device,
drawgraph = args.compress_draw_graph,
verbose = args.verbose,
lars_alpha_init = args.lars_alpha_init,
accuracy_first = True if args.compress_acc_thresh > 0 else False,
accuracy_threshold = args.compress_acc_thresh,
args = args
)
else:
RuntimeError("""Error, Unknow Compress Algorithm : {}""".format(compress_algorithm))
#
origin_net = ca.origin_net
ca.compress()
compressed_net = ca.compressed_net
As Code-2 shown above:
- origin_net is a torch.nn.Module, is origin network;
- trainloader is a torch.utils.data.DataLoader that stored the train data for training net;
- valloader and testloader are torch.utils.data.DataLoader that stored the val/test data for validating the training net;
- trainset_ratio is the proportion of training data used in compression;
- compress_ratio is the compress ratios of each convolutional layer in network;
- sampled_pixels_per_img is the number of pixels sampled from input data of each convolution layer;
- checkpointfolder is a directory for output data; device specifies the running device of the compression and network inference;
- accuracy_first is whether the accuracy is concerned in the compression process;
- accuracy_threshold is the threadhold when the accuracy is concerned in the compression process;
For Lowrank Decomposition, arugment nonlinear_case means using linear mode or nonlinear mode.
For Channel Pruning, lars_alpha_init means the initial value of the penalty factor of LARS.
Call the compress() method, after compression completed, the compressed network was stored in class.compressed_net中.
Users familiar with Pytorch can quickly customize the training tasks they need. However, the code for each training task is largely repetitive, as the neural network training task of the Pytorch is actually a very fixed process. It is a valuable question that how to use as little, fast, simple, clear code as possible to customize a training task for training a network compressed by using mathematical compressor. ACM's Trainer is borne for this.
Also, AMC' Trainer had integrated tensorboardx, which is convenient for users to observe and record the data changes during the training process in real time. (_ PS: See tensorboardX’s documentation _)
Customize a category task trainer as an example to show how to customize a trainer:
Code-3:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from trainer.basetrainer import *
from trainer.basedistiller import *
from utils.compressmethod import showFMLAs
# <class CIFAR10Trainer>
class CIFAR10Trainer(BaseTrainer):
# <method: __init__>
def __init__(self,
train_loader, val_loader, eval_loader,
network,
criterion, optimizer, scheduler,
epochs,
device, tbx_writer, checkpoints_folder, additional_args):
super(CIFAR10Trainer, self).__init__(
train_loader = train_loader,
val_loader = val_loader,
eval_loader = eval_loader,
network = network,
criterion = criterion,
optimizer = optimizer,
scheduler = scheduler,
epochs = epochs,
device = device,
tbx_writer = tbx_writer,
checkpoints_folder = checkpoints_folder,
additional_args = additional_args
)
pass
# <method: __init__>
# <method: __get_val_batch_preds__>
def __get_val_batch_preds__(self, batch_val_data, *args, **kwargs):
""" This is method must be overload. User overload it to get val predicts of network"""
inputs = batch_val_data[0]
if self._device:
inputs = inputs.to(self._device)
return self._network(inputs)
# <method: __get_val_batch_preds__>
# <method: __get_val_losses__>
def __get_val_losses__(self, batch_preds, batch_val_data, *args, **kwargs):
""" This is method must be overload. User overload it to get val loss using loss function of network"""
batch_val_lables = batch_val_data[1]
if self._device:
batch_val_lables = batch_val_lables.to(self._device)
# endif
loss = self._criterion(batch_preds, batch_val_lables)
return {'total_loss': loss}
# <method: __get_val_losses__>
# <method: __get_train_batch_preds__>
def __get_train_batch_preds__(self, batch_train_data, *args, **kwargs):
""" This is method must be overload. User overload it to get train predicts of network"""
inputs = batch_train_data[0]
if self._device:
inputs = inputs.to(self._device)
return self._network(inputs)
# <method: __get_train_batch_preds__>
# <method: __get_train_losses__>
def __get_train_losses__(self, batch_preds, batch_train_data, *args, **kwargs):
""" This is method must be overload. User overload it to get training loss using loss function of network"""
batch_train_lables = batch_train_data[1]
if self._device:
batch_train_lables = batch_train_lables.to(self._device)
# endif
loss = self._criterion(batch_preds, batch_train_lables)
return {'total_loss': loss}
# <method: __get_train_losses__>
# <method: __if_stop_trainning__>
def __if_stop_trainning__(self, *args, **kwargs):
for param in self._optimizer.param_groups:
lr=param["lr"]
# endfor
if lr < kwargs['stop_lr']:
return True
# endif
return False
# <method: __if_stop_trainning__>
# <class CIFAR10Trainer>
if __name__ == "__main__":
# set argument parser
parser = argparse.ArgumentParser(description='Arguments of PyTorch Model Auto Compression in CIFAR-10 Project')
parser.add_argument('--run_name', type = str, default = 'run_name')
parser.add_argument('--data_path', type = str, default = './path_to_cifar10_data')
parser.add_argument('--learning_rate', type = float, default = 1e-2)
parser.add_argument('--momentum', type = float, default = 0.9)
parser.add_argument('--weight_decay', type = float, default = 5e-4)
parser.add_argument('--patience', type = float, default = 15)
parser.add_argument('--batch_size', type = int, default = 128)
parser.add_argument('--num_workers', type = int, default = 8)
parser.add_argument('--checkpoints_folder', type = str, default = "./path_to_checkpoints_folder")
parser.add_argument('--gpu', type = int, default = 0)
parser.add_argument('--epochs', type = int, default = 1000)
args = parser.parse_args()
# set torch.utils.data.DataLoader of Train and Val ...
transform_train = transforms.Compose([
transforms.RandomRotation(degrees=5),
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.RandomVerticalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])
transform_test = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])
trainset = torchvision.datasets.CIFAR10(root=args.data_path, train=True, download=False, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=args.batch_size, shuffle=True, num_workers=args.num_workers)
valset = torchvision.datasets.CIFAR10(root=args.data_path, train=False, download=False, transform=transform_test)
valloader = torch.utils.data.DataLoader(valset, batch_size=args.batch_size, shuffle=False, num_workers=args.num_workers)
# set criterion ...
criterion = nn.CrossEntropyLoss()
# set optimizer ...
optimizer = optim.SGD(net.parameters(), lr=args.learning_rate, momentum=args.momentum, weight_decay=args.weight_decay)
# set learning rate scheduler ...
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, threshold=1e-5, patience=args.patience, min_lr=1e-6)
# set tensorboardx writer ...
tbx_writer = SummaryWriter(log_dir=os.path.join(checkpoints_folder, args.run_name))
# set device ...
device = """cuda:{}""".format(args.gpu) if torch.cuda.is_available() else 'cpu'
# set network ...
from cifar10.models.resnet import ResNet18
network = ResNet18()
# new a cifar10 trainer
trainer = CIFAR10Trainer(
train_loader = trainloader,
val_loader = valloader,
eval_loader = None,
network = network,
criterion = criterion,
optimizer = optimizer,
scheduler = scheduler,
epochs = args.epochs,
device = device,
tbx_writer = tbx_writer,
checkpoints_folder = args.checkpoints_folder,
additional_args = args
)
trainer.__run__(stop_lr = 1e-6)
The above code-3 customizes a class CIFAR10Trainer (which inherits BaseTrainer) and overloads the following functions:
-
口__get_val_batch_preds__口: Input argument "batch_val_data" is a batch of data produced by valloader each time it traverses the enumeration. This function defines how to select input data from batch_val_data and put it to neural network, and finally return the required Prediction data for val-loss calculation.
-
口__get_val_losses__口:The input argument of this function "batch_preds" represents the prediction data returned by the function 口__val_batch_preds__口, and argument "batch_val_data" is a batch of data produced by valloader each time it traverses the enumeration. This function defines how to generate loss from two input data and returns a dictionary that must contain a loss with the key 'total_loss'.
-
口__get_train_batch_preds__口:Input argument "batch_train_data" is a batch of data produced by trainloader each time it traverses the enumeration. This function defines how to select input data from batch_train_data and put it to neural network, and finally return the required Prediction data for train-loss calculation.
-
口__get_train_losses__口:The input argument of this function "batch_preds" represents the prediction data returned by the function 口__train_batch_preds__口, and argument "batch_train_data" is a batch of data produced by trainloader each time it traverses the enumeration. This function defines how to generate loss from two input data and returns a dictionary that must contain a loss with the key 'total_loss', which is used in backward calculation.
-
口__if_stop_trainning__口:This function defines the judgment conditions used to determine whether to stop the training task at the end of each epoch. Return True, means stop trainning.
Create a CIFAR10Trainer object with arugments: train_loader, val_loader, network, criterion, optimizer, scheduler, and then call 口__run__口 function to start training.
PS:If you need to enter some custom arguments, you can do this in two ways:
- Input additional_args (a class with custom properties) when creating the CIFAR10Trainer object.
- Refer to Code-3 above, when calling the a function, enter the arguments in format "<argument_name>=<argument_value>", and then in calling function, use "kwargs["<argument_name>"]"to obtain the argument .
6. How to export the compressed/retrained model to venus for deploying to supported platforms(Windows/Andorid/iOS/Linux)?
The compressed netowrk(a object of class ReconstructedNetwork) can be saved into a .pkl file and .vnmodel file. You can do this by using these methods of class ReconstructedNetwork below:
- save(pkl="./path_to_file.pkl"):It is used to save a ReconstructedNetwork into a .pkl file.
- save_to_vnmodel(self, *inps, **wargs): It is used to export a ReconstructedNetwork to a .vnmodel. A .vnmoel file can be read by venus netloader.
- [1] He Y, Zhang X, Sun J. Channel pruning for accelerating very deep neural networks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 1389-1397.
- [2] Zhang X, Zou J, He K, et al. Accelerating very deep convolutional networks for classification and detection[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 38(10): 1943-1955.
- [3] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning (2015)[J]. arXiv preprint arXiv:1509.02971, 2016.
- [4] Man K F, Tang K S, Kwong S. Genetic algorithms: concepts and applications [in engineering design][J]. IEEE transactions on Industrial Electronics, 1996, 43(5): 519-534.
- [5] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
- [6] Chen G, Choi W, Yu X, et al. Learning efficient object detection models with knowledge distillation[C]//Advances in Neural Information Processing Systems. 2017: 742-751.
- [7] Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images[R]. Technical report, University of Toronto, 2009.