Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Benchmark CNN with other TF high-level APIs

Tensorpack is 1.2x~5x faster than the equivalent code written in some other TF high-level APIs.

Benchmark setting:

  • Hardware: AWS p3.16xlarge (8 Tesla V100s)
  • Software: Python 3.6, TF 1.6.0, cuda 9, cudnn 7.0.5, Keras 2.1.5, tflearn 0.3.2, tensorpack 0.8.3.
  • Measurement: speed is measured by images per second (larger is better). First epoch is warmup and is not considered in timing. Second or later epochs have statistically insignificant difference.
  • Data:
    • True data for Cifar10.
    • For ImageNet, assumed to be a constant numpy array already available on CPU. This is a reasonable setting because data always has to come from somewhere to CPU anyway.
  • Source code is here. They are all <100 lines that you can easily run and verify by yourself.

On a Single GPU:

Task tensorpack Keras tflearn
Keras Official Cifar10 Example 7507 3448 3967
VGG16 on fake ImageNet 226 188 114
AlexNet on fake ImageNet 2633 1280 N/A
ResNet50 on fake ImageNet 318 230 N/A

Data Parallel on 8 GPUs:

Each script has one line to change the number of GPUs.

1 GPU 2 GPUs 8 GPUs
tensorpack+ResNet50 318 582 2177
Keras+ResNet50 230 291 376
tensorpack+VGG16 226 438 1471
Keras+VGG16 188 320 501


  1. With a better (but different in batch sizes, etc) setting in ../ResNet-MultiGPU/, tensorpack can further reach 2600 im/s for ResNet50 on a p3.16xlarge instance.

  2. It's possible for Keras to be faster (by using better input pipeline, building data-parallel graph by yourself, etc), but it's NOT how most users are using Keras or how any of the Keras examples are written.

    Using Keras together with Tensorpack is one way to make Keras faster. See the Keras+Tensorpack example.

You can’t perform that action at this time.