New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TENSORFLOW 1.14 STYLEAGN 2 PERFORMANCE ISSUE ON RTX 3090 MULTIPLE GPU #44200
Comments
Can you please share the standalone code to reproduce the issue in our environment.It helps us in localizing the issue faster. Thanks! |
@ravikyram check the code here:- https://github.com/NVlabs/stylegan2
|
Are you the building TensorFlow from source against those cuda versions (10.2 and 11.1)? On a side note TF 1.14 is out of the support window, you may want to try latest TF versions such as 2.3 which offers much better performance. Thanks! |
@ymodak as mentioned on Stylegan 2 GitHub (https://github.com/NVlabs/stylegan2) it is compatible with TF 1.14 and 1.15 only and 1.15 is not working for CUDA 11.1. As I am using Nvidia RTX 3090 which has Ampere GPU with CUDA 11.1 so the one possible issue could be as mentioned in https://www.tensorflow.org/install/gpu, can you please confirm whether this is the main problem and if, then can you suggest any solution for it? |
You should switch to TF 2.x. We no longer fix code on TF 1.x. |
@Thunder003 That's correct. Your configuration is not using gpu computing power due incompatible cuda versions. We do not provide TF binaries that support cuda 11.1 at the moment. On a side note - Current |
Thanks, @ymodak for your answer. I'm trying to build from source but getting an issue:- Could not find any cudnn.h matching version '8' in any subdirectory: I have checked that CUDNN has properly installed on /usr/include/cudnn.h path. Following this , I have copy-pasted a cudnn.h file to /usr/local/cuda/ and libcudnn* CUDNN installation file to /usr/local/cuda. Can you please tell me a solution for it. I am building TF 1.14 with CUDA 11.1 & CUDNN 8.0.4 |
Note that code might also need to change to support newer versions of CUDA. |
Oh, I have the same problem. I have one RTX 3090 with CUDA version of 11.1 on Windows 10. Conda was used to install the cudatoolkit which provides CUDA 10.1 and cudnn 7.6, and I have tensorflow 1.14 installed. Tensorflow recognized my RTX 3090 well, but it spent a long time to begin or finish (I had to go to sleep ...) the training process. I wonder if I build the tensorflow1.14 (yes, I need this old version) from source combines with CUDA 11 and cudnn 8, I can use the RTX 3090 well. Thanks. |
@mihaimaruseac, yes, it might need to be changed. To get assured I'm trying to build from source, but got stuck in another problem( As mentioned in the quote). Can you take a look at that? Or if you think it's off-topic then I can raise another issue for this. @psycho2012 have you got any good images with TF 1.14 on RTX 3090? I'm just getting black images with RTX 3090 installed with CUDA 11.1, TF 1.14 ( This is probably a compatibility issue of TF with CUDA version). If you are getting good images with stylegan then can you tell me which version of CUDA, CUDNN, and TF are you using with RTX 3090?
|
@Thunder003 I didn't get any good results. |
@psycho2012 can you tell me the version of CUDA, CUDNN you used with TF 1.14 in RTX 3090? |
@Thunder003 CUDA 11.1 and cuDNN 8.04, but I failed to run it on GPU for TF 1.14. I have tried to build it from source with TF 1.14, but failed. According to the experiment, I think the current version of TF 1.14 can not support CUDA11.1. |
@psycho2012 thanks for your answer. It seems like TF 1.14 is incompatible with CUDA 11.1. One thing that is scratching my head is the type of issue I'm getting when building from source( TF 1.14, CUDA 11.1, CUDNN 8.0.4), pls check the image. I have checked the path and files, they are proper. Are you getting stuck at the same step? I have added more paths for Libcudnn files also, but the error persists. If you are not stuck at this step can you tell me the path you have provided, just for reference (I know it may differ) |
@Thunder003 I just tried on Windows but also failed on the step of the configuration. Maybe TF 1.14 is not supported by CUDA 11.1. |
@psycho2012 thanks for your answer. I have checked from another source also that TF 1.14 is incompatible with CUDA 11.1. and probably the error popped-up due to that only. |
Can I know how do you solve the problem?because i have the same trouble. |
@C-SJK for start-up time you can increase the cache size. But still, you may not get a good result. I'm getting a black screen only in the resulting image. I assumed that there is an incompatibility issue between TF 1.14 & CUDA 11.1. |
NGC, I know it. |
thanks,i get it! |
Did you manage to build tf 1.14 and maybe run stylegan2 on top of it?
or
depending on that tweak:
|
as stylegan2 doesn't have an issues tracker - I overloaded the issue into this git commit related - NVlabs/stylegan2@23f8bed there is a script we can run to upgrade stylegan2 to use tensorflow 2 - automagically. however - we need to address the follow failed conversions tf.contrib. tf.contrib.memory_stats.MaxBytesInUse probably would help if someone from tensorflow could also guide support in getting this over the line. the nvidia labs don't wont to support tensorflow 2. but it seems like push comes to shove here. Unless we're holding out for a stylegan3 to drop - fyi @tkarras |
I am running Stylegan 2 model on 4x RTX 3090 and it is taking a long time to start up the training than as in 1x RTX 3090. Although, as training starts, it gets finished up earlier in 4x than in 1x. I am using CUDA 11.1 and TensorFlow 1.14 in both the GPUs.
Secondly, When I am using 1x RTX 2080ti, with CUDA 10.2 and TensorFlow 1.14, it is taking less amount to start the training as compared to 1x RTX 3090 with 11.1 CUDA and Tensorflow 1.14. Tentatively, it is taking 5 min in 1x RTX 2080ti, 30-35 minutes in 1x RTX 3090, and 1.5 hrs in 4x RTX 3090 to start the training for one of the dataset.
I'll be grateful if anyone can help me to resolve this issue.
I am using Ubuntu 16.04, Core™ i9-10980XE CPU, and 32 GB ram both in 2080ti and 3090 machines.
The text was updated successfully, but these errors were encountered: