New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression when training MaskRCNN with CUDA 11 #1458
Comments
Thanks for reporting. I have not had a chance to use cuda 11, but some things that can be used to root cause it:
maybe related: https://github.com/tensorpack/benchmarks/tree/master/ResNet-MultiGPU can be used to check whether there is a regression in a plain ResNet. |
Quick update: I saw ~15% slowdown even in 1 GPU setting. Haven't had the chance to do any detailed profiling. Were you be able to reproduce the slowdown? |
Maybe I'll be able to test cuda 11 on a 1080 soon, but it will take a long time before I'm able to access a cuda11-capable V100 machine. I guess you were using cuda 10 with cudnn7 but cuda 11 with cudnn8? It might help to try cudnn 8 with cuda 10 as well since nvidia has provided such combination. Presumably, cuda and cudnn are the only two variables between the two settings. |
Yes, I also tested https://github.com/tensorpack/benchmarks/tree/master/ResNet-MultiGPU and CUDA 11 + CUDNN 8 shows slightly better performance than CUDA 10 + CUDNN 7. This might suggest that the issue is related to some ops in Mask R-CNN. |
The two models (plain ResNet and Mask R-CNN) are very different in the way they use cudnn. For Mask R-CNN we use The easiest way to verify this might be to enable tuning with diff --git i/examples/FasterRCNN/data.py w/examples/FasterRCNN/data.py
index 35d8bd4f..eefe193f 100644
--- i/examples/FasterRCNN/data.py
+++ w/examples/FasterRCNN/data.py
@@ -73,7 +73,8 @@ class TrainingDataPreprocessor:
def __init__(self, cfg):
self.cfg = cfg
self.aug = imgaug.AugmentorList([
- CustomResize(cfg.PREPROC.TRAIN_SHORT_EDGE_SIZE, cfg.PREPROC.MAX_SIZE),
+ #CustomResize(cfg.PREPROC.TRAIN_SHORT_EDGE_SIZE, cfg.PREPROC.MAX_SIZE),
+ imgaug.Resize((800, 800)),
imgaug.Flip(horiz=True)
])
I often do this when need to benchmark the full power of GPUs. Then, benchmark the two environment, both with cudnn tuning enabled and see if they give similar speed. This should rule out differences in algorithm heuristics between cudnn versions, if any. |
I got access to a machine with new enough nvidia driver for cuda 11, however, apparently TF 1.15 cannot be built with cuda 11 / cudnn 8: the support was added later at tensorflow/tensorflow@28feb4d , tensorflow/tensorflow@255f590, etc. How did you use TF 1.15 with cuda11/cudnn8? Is there a version maintained elsewhere? |
I can reproduce the regression with TF2.3 on 1 GTX1080Ti. The regression comes from cudnn8: cudnn8+cuda10.2 or cudnn8+cuda11 are equally slow, while cudnn7 + cuda10.2 is faster. The regression only appears when cudnn autotune is disabled. If I apply the above patch, use So it seems cudnn8 change some algorithm selection heuristics that affects some convolution shapes used in this R-CNN. |
Thanks for the update! Confirmed that Re: TF 1.15 with cuda11, I used the |
cuDNNv8 deprecated the old algorithm selection APIs (tensorflow/tensorflow@255f590#diff-3ddecd9a9809669183ca2750a865f73a) and the new API seems to have regression. |
It seems that training MaskRCNN with CUDA 11 is underperforming. However, I did not see this issue when training other models without tensorpack (e.g. official ResNet). I am actually not sure whether this is an tensorpack issue or not (and if so, what would be the root cause).
1. What you did:
(1) If you're using examples, what's the command you run:
(2) If you're using examples, have you made any changes to the examples? Paste
git status; git diff
here:N/A
2. What you observed:
The training throughput is lower than CUDA 10. For 8 * 8 V100 servers, the throughput with CUDA 11 was ~180 samples/sec, while the throughput with CUDA 10 was ~270 samples/sec.
3. What you expected, if not obvious.
Higher throughput expected.
4. Your environment:
Paste the output of this command:
python -c 'import tensorpack.tfutils as u; print(u.collect_env_info())'
If this command failed, tell us your version of Python/TF/tensorpack.
The text was updated successfully, but these errors were encountered: