This is a TensorFlow implementation of the Tensor Train compression method for neural networks. It supports TT-FC layer [1] and TT-conv layer [2], which act as a fully-connected and convolutional layers correspondingly, but are much more compact. The TT-FC layer is also faster than its uncompressed analog and allows to use hundreds of thousands of hidden units. The experiments
folder contains the code to reproduce the experiments from the papers.
[1] Tensorizing Neural Networks
Alexander Novikov, Dmitry Podoprikhin, Anton Osokin, Dmitry Vetrov; In Advances in Neural Information Processing Systems 28 (NIPS-2015) [arXiv].
[2] Ultimate tensorization: compressing convolutional and FC layers alike
Timur Garipov, Dmitry Podoprikhin, Alexander Novikov, Dmitry Vetrov; Learning with Tensors: Why Now and How?, NIPS-2016 workshop (NIPS-2015) [arXiv].
Please cite our work if you write a scientific paper using this code.
In BiBTeX format:
@incollection{novikov15tensornet,
author = {Novikov, Alexander and Podoprikhin, Dmitry and Osokin, Anton and Vetrov, Dmitry},
title = {Tensorizing Neural Networks},
booktitle = {Advances in Neural Information Processing Systems 28 (NIPS)},
year = {2015},
}
@article{garipov16ttconv,
author = {Garipov, Timur and Podoprikhin, Dmitry and Novikov, Alexander and Vetrov, Dmitry},
title = {Ultimate tensorization: compressing convolutional and {FC} layers alike},
journal = {arXiv preprint arXiv:1611.03214},
year = {2016}
}
- TensorFlow (tested with v. 1.1.0)
- NumPy
We also published a MATLAB and Theano+Lasagne implementation in a separate repository.
Its just a synonym for a multidimensional array. For example a matrix is a 2-dimensional tensor.
Good point. Actually, the Tensor Train format coincides the matrix low-rank format when applied to matrices. For this reason, there is a special matrix Tensor Train format, which basically does two things: reshapes the matrix into a tensor (say 10-dimensional) and permutes its dimensions in a special way; uses tensor decomposition on the resulting tensor. This way proved to be more efficient than the matrix low-rank format for the matrix of the fully-connected layer.
Look at the original paper: Ivan Oseledets, Tensor-Train decomposition, 2011 [pdf]. You can also check out my (Alexander Novikov's) slides, from slide 3 to 14.
By the way, train means like actual train, with wheels. The name comes from the pictures like the one below that illustrate the Tensor Train format and naturally look like a train (at least they say so).
Unfortunately not (at least not yet).
I want to implement this in Caffe (or other library without autodiff). Any tips on doing the backward pass?
Great! Write us when you're done or if you have questions along the way.
The MATLAB version of the code has the backward pass implementation for TT-FC layer. But note that the forward pass in MATLAB and TensorFlow versions is implemented differently.
We haven't, but this paper uses CP-decomposition to compress the kernel of the convolutional layer: Lebedev V., Ganin Y. et al., Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition [arXiv] [code]. They got nice compression results, but was not able to train CP-conv layers from scratch, only to train a network with regular convolutional layers, represent them in the CP-format, and when finetune the rest of the network. Even finetuning an CP-conv layer often diverges.