Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No performance improved on batch 128 ? #30

Closed
zhaoerchao opened this issue Jun 12, 2017 · 5 comments
Closed

No performance improved on batch 128 ? #30

zhaoerchao opened this issue Jun 12, 2017 · 5 comments

Comments

@zhaoerchao
Copy link

I run the script followed this:

python tf_cnn_benchmarks.py --local_parameter_device=gpu --num_gpus=4 --batch_size=128 --model=resnet50 --variable_update=replicated --nodistortions --nccl True --trace_file ~/timeline.json

But there is no improvement at all. The speed is equal to the one on batch 64:

Step Img/sec loss
1 images/sec: 725.2 +/- 0.0 (jitter = 0.0) 7.463
10 images/sec: 736.7 +/- 1.4 (jitter = 2.9) 7.180
20 images/sec: 731.7 +/- 2.4 (jitter = 6.7) 7.048
30 images/sec: 723.7 +/- 2.6 (jitter = 19.3) 6.971
40 images/sec: 719.0 +/- 2.4 (jitter = 15.9) 6.929
50 images/sec: 716.0 +/- 2.1 (jitter = 12.2) 6.898

What's the reason? And why is the speed slower and slower when the step is bigger ?

@ekelsen
Copy link

ekelsen commented Jun 13, 2017

The underlying convolution routines won't get any faster when the batch_size goes from 64 -> 128, so it isn't surprising that the overall training doesn't either.

@ilovechai
Copy link

@zhaoerchao @ekelsen These benchmark programs are giving for example say 570images/sec where as when you run the same model normally it gives half of that of the benchmark programs gave, why so?

@zhaoerchao
Copy link
Author

@cryptox31 Do you run the program on the same GPU with the same version TF?

@ilovechai
Copy link

@zhaoerchao I am currently using 4 Tesla P100 GPUs and running Tensorflow 1.01 inception v3 model, and I am not getting optimum results.

@tfboyd
Copy link
Member

tfboyd commented Jul 25, 2017

Increasing the batch-size will not always increase performance. I am far from an expert. From my testing, I find that each model and hardware combination will have a point where even if there is more memory, increasing batch size is does not help. Increasing batch size normally helps when the step time is very fast and increasing the batch-size slows down the step time enough to hide the transfer times and other calculations that are impossible to "hide" with a very fast step time. I know that is not a very technical explanation. One good example of this is notice that "everyone" runs alexnet at a batch size of 512 or more now but use to run much small batches. I have not been working with ML very long but if you test alexnet with 32, 128, 256, and then 512 on most ML platforms you will see a significant speedup as the batch-size increases. If I remember correctly, even more so on multi-GPU.

Finally, the goal is normally to converge at the best possible top_1. I know people are training with large total batches for ResNet but I have not seen anyone training with 128 per GPU. Of course there is so much happening it likely has happened and I did not see it.

Closing as this is kind of expected. If you are having unexpected results with batch-size 64 or 32 please let me know and I will see if I can figure it out.

@tfboyd tfboyd closed this as completed Jul 25, 2017
freedomtan pushed a commit to freedomtan/benchmarks that referenced this issue Apr 18, 2018
Merge internal changes into public repository (change 168924045)
shengfuintel pushed a commit to Intel-tensorflow/benchmarks that referenced this issue May 23, 2018
Adding modules/functions common to Q2 POR development
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants