Keras + Tensorpack

Use Keras to define a model and train it with efficient tensorpack trainers.


Keras alone has various overhead. In particular, it is not efficient with large models. The article Towards Efficient Multi-GPU Training in Keras with TensorFlow has mentioned some of it.

Even on a single GPU, tensorpack can run 1.2~2x faster than the equivalent Keras code. The gap becomes larger when you scale to multiple GPUs. Tensorpack and horovod are the only two tools I know that can scale the training of a large Keras model.

Simple Examples: a simple MNIST model written mostly in tensorpack style, but use Keras model as symbolic functions. the same MNIST model written in Keras style.

ImageNet Example: reproduce exactly the same setting of tensorpack ResNet example on ImageNet. It has:

  • ResNet-50 model modified from keras.applications. (We put stride on 3x3 conv in each bottleneck, which is different from certain other implementations).
  • Multi-GPU data-parallel training and validation which scales
    • Finished 100 epochs in 19 hours on 8 V100s, with >90% GPU utilization.
    • Still slightly slower than native tensorpack examples.
  • Good accuracy (same as tensorpack ResNet example)


Keras does not respect variable scopes or variable collections, which contradicts with tensorpack trainers. Therefore Keras support is experimental.

These simple examples can run within tensorpack smoothly, but note that a future version of Keras may break them (unlikely, though).