reedwm and tensorflower-gardener Give better names to ops.
The name parameter to more ops is specified, and more name_scopes are used. This makes the graph in Tensorboard look nicer.

PiperOrigin-RevId: 205141994
Latest commit c481288 Jul 18, 2018
Permalink
..
Failed to load latest commit information.
models Fix using official_resnet_model May 7, 2018
platforms Open source the tests. Feb 13, 2018
test_data Open source the tests. Feb 13, 2018
README.md Update README to mention branches for stable TF versions. Jul 12, 2018
all_reduce_benchmark.py Fix bug where all reduce benchmark did not use correct variable shapes. Apr 27, 2018
all_reduce_benchmark_test.py Add all-reduce benchmark. Apr 24, 2018
allreduce.py Extend TF CNN Benchmarks to use new native collective reduce mode. May 31, 2018
allreduce_test.py Extend TF CNN Benchmarks to use new native collective reduce mode. May 31, 2018
batch_allreduce.py Extend TF CNN Benchmarks to use new native collective reduce mode. May 31, 2018
benchmark_cnn.py Give better names to ops. Jul 18, 2018
benchmark_cnn_distributed_test.py Changing worker and ps job names to conform to the standard /job:{wor… May 24, 2018
benchmark_cnn_distributed_test_runner.py Open source the tests. Feb 13, 2018
benchmark_cnn_test.py Give better names to ops. Jul 18, 2018
benchmark_storage.py Adding a script that stores tf_cnn_benchmarks output in datastore Jun 3, 2017
cbuild_benchmark_storage.py Internal change. Dec 19, 2017
cnn_util.py Internal change. Feb 9, 2018
cnn_util_test.py Open source the tests. Feb 13, 2018
constants.py Specialize hierarchical copy algorithm for ring topology May 5, 2018
convnet_builder.py keepdims as keep_dims is deprecated. Mar 19, 2018
data_utils.py Fixing breakage caused by changes to FunctionBufferingResource Jun 27, 2018
datasets.py Assumed users have downloaded tensorflow models and it's in their pyt… Mar 16, 2018
flags.py Create a wrapper for resnet with distribution strategy that facilitie… Jun 19, 2018
preprocessing.py Allow fp16 to be used with the official preprocessor. May 3, 2018
run_tests.py Add all-reduce benchmark. Apr 24, 2018
test_util.py Change forward_only mode, let it freeze the graph before running. Jun 19, 2018
tf_cnn_benchmarks.py Raise error on flags in format "--distortions False" Feb 1, 2018
variable_mgr.py Give better names to ops. Jul 18, 2018
variable_mgr_util.py Give better names to ops. Jul 18, 2018
variable_mgr_util_test.py Open source the tests. Feb 13, 2018

README.md

tf_cnn_benchmarks: High performance benchmarks

tf_cnn_benchmarks contains implementations of several popular convolutional models, and is designed to be as fast as possible. tf_cnn_benchmarks supports both running on a single machine or running in distributed mode across multiple hosts. See the High-Performance models guide for more information.

These models utilize many of the strategies in the TensorFlow Performance Guide. Benchmark results can be found here.

These models are designed for performance. For models that have clean and easy-to-read implementations, see the TensorFlow Official Models.

Getting Started

To run ResNet50 with synthetic data without distortions with a single GPU, run

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet50 --variable_update=parameter_server

Note that the master branch of tf_cnn_benchmarks requires the latest nightly version of TensorFlow. You can install the nightly version by running pip install tf-nightly-gpu in a clean environment, or by installing TensorFlow from source. We sometimes will create a branch of tf_cnn_benchmarks, in the form of cnn_tf_vX.Y_compatible, that is compatible with TensorFlow version X.Y For example, branch cnn_tf_v1.9_compatible works with TensorFlow 1.9.

Some important flags are

  • model: Model to use, e.g. resnet50, inception3, vgg16, and alexnet.
  • num_gpus: Number of GPUs to use.
  • data_dir: Path to data to process. If not set, synthetic data is used. To use Imagenet data use these instructions as a starting point.
  • batch_size: Batch size for each GPU.
  • variable_update: The method for managing variables: parameter_server ,replicated, distributed_replicated, independent
  • local_parameter_device: Device to use as parameter server: cpu or gpu.

To see the full list of flags, run python tf_cnn_benchmarks.py --help.

To run ResNet50 with real data with 8 GPUs, run:

python tf_cnn_benchmarks.py --data_format=NCHW --batch_size=256 \
--model=resnet50 --optimizer=momentum --variable_update=replicated \
--nodistortions --gradient_repacking=8 --num_gpus=8 \
--num_epochs=90 --weight_decay=1e-4 --data_dir=${DATA_DIR} --use_fp16 \
--train_dir=${CKPT_DIR}

This will train a ResNet-50 model on ImageNet with 2048 batch size on 8 GPUs. The model should train to around 76% accuracy.

Running the tests

To run the tests, run

pip install portpicker
python run_tests.py && python run_tests.py --run_distributed_tests

Note the tests require portpicker.

The command above runs a subset of tests that is both fast and fairly comprehensive. Alternatively, all the tests can be run, but this will take a long time:

python run_tests.py --full_tests && python run_tests.py --full_tests --run_distributed_tests

We will run all tests on every PR before merging them, so it is not necessary to pass --full_tests when running tests yourself.

To run an individual test, such as method testParameterServer of test class TfCnnBenchmarksTest of module benchmark_cnn_test, run

python -m unittest -v benchmark_cnn_test.TfCnnBenchmarksTest.testParameterServer