TENSORFLOW 1.14 STYLEAGN 2 PERFORMANCE ISSUE ON RTX 3090 MULTIPLE GPU #44200

Thunder003 · 2020-10-21T09:44:31Z

I am running Stylegan 2 model on 4x RTX 3090 and it is taking a long time to start up the training than as in 1x RTX 3090. Although, as training starts, it gets finished up earlier in 4x than in 1x. I am using CUDA 11.1 and TensorFlow 1.14 in both the GPUs.

Secondly, When I am using 1x RTX 2080ti, with CUDA 10.2 and TensorFlow 1.14, it is taking less amount to start the training as compared to 1x RTX 3090 with 11.1 CUDA and Tensorflow 1.14. Tentatively, it is taking 5 min in 1x RTX 2080ti, 30-35 minutes in 1x RTX 3090, and 1.5 hrs in 4x RTX 3090 to start the training for one of the dataset.

I'll be grateful if anyone can help me to resolve this issue.

I am using Ubuntu 16.04, Core™ i9-10980XE CPU, and 32 GB ram both in 2080ti and 3090 machines.

ravikyram · 2020-10-21T09:49:40Z

@Thunder003

Can you please share the standalone code to reproduce the issue in our environment.It helps us in localizing the issue faster. Thanks!

Thunder003 · 2020-10-21T11:42:58Z

@ravikyram check the code here:- https://github.com/NVlabs/stylegan2

@Thunder003

Can you please share the standalone code to reproduce the issue in our environment.It helps us in localizing the issue faster. Thanks!

ymodak · 2020-10-23T20:05:53Z

Are you the building TensorFlow from source against those cuda versions (10.2 and 11.1)?
If you are you using pre built pip packages then I suspect your gpu is not being utilized for computing. Reason being that currently we ship TF (2.3) packages that support cuda 10.1.
See tested build configurations to know more.

On a side note TF 1.14 is out of the support window, you may want to try latest TF versions such as 2.3 which offers much better performance. Thanks!

Thunder003 · 2020-10-24T06:50:34Z

@ymodak as mentioned on Stylegan 2 GitHub (https://github.com/NVlabs/stylegan2) it is compatible with TF 1.14 and 1.15 only and 1.15 is not working for CUDA 11.1. As I am using Nvidia RTX 3090 which has Ampere GPU with CUDA 11.1 so the one possible issue could be as mentioned in https://www.tensorflow.org/install/gpu, can you please confirm whether this is the main problem and if, then can you suggest any solution for it?

mihaimaruseac · 2020-10-26T17:28:47Z

You should switch to TF 2.x. We no longer fix code on TF 1.x.

ymodak · 2020-10-26T21:34:37Z

@Thunder003 That's correct. Your configuration is not using gpu computing power due incompatible cuda versions. We do not provide TF binaries that support cuda 11.1 at the moment.
For this you may try building TF from source yourself.

On a side note - Current tf-nightly version supports cuda 11.0
If you are okay with unstable version (tf-nightly) you can give it a try or wait for upcoming stable TF 2.4 release.

Thunder003 · 2020-10-29T14:10:53Z

Thanks, @ymodak for your answer. I'm trying to build from source but getting an issue:-

Could not find any cudnn.h matching version '8' in any subdirectory:
''
'include'
'include/cuda'
'include/*-linux-gnu'
'extras/CUPTI/include'
'include/cuda/CUPTI'
of:
'/lib'
'/lib/x86_64-linux-gnu'
'/usr'
'/usr/include/'
'/usr/include/cudnn.h'
'/usr/local/cuda'
'/usr/local/cuda-11.1'
'/usr/local/cuda-11.1/targets/x86_64-linux/lib'
Asking for detailed CUDA configuration...

I have checked that CUDNN has properly installed on /usr/include/cudnn.h path. Following this , I have copy-pasted a cudnn.h file to /usr/local/cuda/ and libcudnn* CUDNN installation file to /usr/local/cuda. Can you please tell me a solution for it.

I am building TF 1.14 with CUDA 11.1 & CUDNN 8.0.4

mihaimaruseac · 2020-11-02T18:11:02Z

Note that code might also need to change to support newer versions of CUDA.

psycho2012 · 2020-11-03T01:31:39Z

Oh, I have the same problem.

I have one RTX 3090 with CUDA version of 11.1 on Windows 10. Conda was used to install the cudatoolkit which provides CUDA 10.1 and cudnn 7.6, and I have tensorflow 1.14 installed. Tensorflow recognized my RTX 3090 well, but it spent a long time to begin or finish (I had to go to sleep ...) the training process.

I wonder if I build the tensorflow1.14 (yes, I need this old version) from source combines with CUDA 11 and cudnn 8, I can use the RTX 3090 well. Thanks.

Thunder003 · 2020-11-03T04:45:16Z

@mihaimaruseac, yes, it might need to be changed. To get assured I'm trying to build from source, but got stuck in another problem( As mentioned in the quote). Can you take a look at that? Or if you think it's off-topic then I can raise another issue for this.

@psycho2012 have you got any good images with TF 1.14 on RTX 3090? I'm just getting black images with RTX 3090 installed with CUDA 11.1, TF 1.14 ( This is probably a compatibility issue of TF with CUDA version). If you are getting good images with stylegan then can you tell me which version of CUDA, CUDNN, and TF are you using with RTX 3090?

Thanks, @ymodak for your answer. I'm trying to build from source but getting an issue:-

Could not find any cudnn.h matching version '8' in any subdirectory:
''
'include'
'include/cuda'
'include/*-linux-gnu'
'extras/CUPTI/include'
'include/cuda/CUPTI'
of:
'/lib'
'/lib/x86_64-linux-gnu'
'/usr'
'/usr/include/'
'/usr/include/cudnn.h'
'/usr/local/cuda'
'/usr/local/cuda-11.1'
'/usr/local/cuda-11.1/targets/x86_64-linux/lib'
Asking for detailed CUDA configuration...

I have checked that CUDNN has properly installed on /usr/include/cudnn.h path. Following this , I have copy-pasted a cudnn.h file to /usr/local/cuda/ and libcudnn* CUDNN installation file to /usr/local/cuda. Can you please tell me a solution for it.

I am building TF 1.14 with CUDA 11.1 & CUDNN 8.0.4

psycho2012 · 2020-11-04T03:38:04Z

@Thunder003 I didn't get any good results.

Thunder003 · 2020-11-04T03:57:10Z

@psycho2012 can you tell me the version of CUDA, CUDNN you used with TF 1.14 in RTX 3090?

psycho2012 · 2020-11-05T01:05:53Z

@Thunder003 CUDA 11.1 and cuDNN 8.04, but I failed to run it on GPU for TF 1.14. I have tried to build it from source with TF 1.14, but failed. According to the experiment, I think the current version of TF 1.14 can not support CUDA11.1.

Thunder003 · 2020-11-05T01:56:50Z

@psycho2012 thanks for your answer. It seems like TF 1.14 is incompatible with CUDA 11.1. One thing that is scratching my head is the type of issue I'm getting when building from source( TF 1.14, CUDA 11.1, CUDNN 8.0.4), pls check the image. I have checked the path and files, they are proper. Are you getting stuck at the same step?

I have added more paths for Libcudnn files also, but the error persists. If you are not stuck at this step can you tell me the path you have provided, just for reference (I know it may differ)

psycho2012 · 2020-11-05T03:01:27Z

@Thunder003 I just tried on Windows but also failed on the step of the configuration. Maybe TF 1.14 is not supported by CUDA 11.1.

Thunder003 · 2020-11-05T03:29:29Z

@psycho2012 thanks for your answer. I have checked from another source also that TF 1.14 is incompatible with CUDA 11.1. and probably the error popped-up due to that only.
I'm closing the issue now.

C-SJK · 2020-11-12T09:19:38Z

@psycho2012 thanks for your answer. I have checked from another source also that TF 1.14 is incompatible with CUDA 11.1. and probably the error popped-up due to that only.
I'm closing the issue now.

Can I know how do you solve the problem?because i have the same trouble.

Thunder003 · 2020-11-12T09:53:03Z

@C-SJK for start-up time you can increase the cache size. But still, you may not get a good result. I'm getting a black screen only in the resulting image. I assumed that there is an incompatibility issue between TF 1.14 & CUDA 11.1.
I saw NGC is using TF 1.15.4 & CUDA 11.1. But with me, TF 1.15 is not able to detect the GPU

C-SJK · 2020-11-12T10:53:52Z

@C-SJK for start-up time you can increase the cache size. But still, you may not get a good result. I'm getting a black screen only in the resulting image. I assumed that there is an incompatibility issue between TF 1.14 & CUDA 11.1.
I saw NGC is using TF 1.15.4 & CUDA 11.1. But with me, TF 1.15 is not able to detect the GPU

NGC, I know it.
Do you know how to use it?

C-SJK · 2020-11-12T12:28:32Z

@psycho2012 thanks for your answer. I have checked from another source also that TF 1.14 is incompatible with CUDA 11.1. and probably the error popped-up due to that only.
I'm closing the issue now.

thanks，i get it!

JulianPinzaru · 2020-11-17T08:31:48Z

Oh, I have the same problem.

I have one RTX 3090 with CUDA version of 11.1 on Windows 10. Conda was used to install the cudatoolkit which provides CUDA 10.1 and cudnn 7.6, and I have tensorflow 1.14 installed. Tensorflow recognized my RTX 3090 well, but it spent a long time to begin or finish (I had to go to sleep ...) the training process.

I wonder if I build the tensorflow1.14 (yes, I need this old version) from source combines with CUDA 11 and cudnn 8, I can use the RTX 3090 well. Thanks.

Did you manage to build tf 1.14 and maybe run stylegan2 on top of it?
I am strugling to make it work. Ubuntu 18.04 rtx 3090.
sm_86 error is what I run into. Tried to change some nvcc option from 0 to 1 as suggested and ran into segmentation fault.

Setting up TensorFlow plugin "upfirdn_2d.cu": iulian device: 0, name: GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6
Compiling... Loading... Done.
Segmentation fault (core dumped)

or

venv/lib/python3.7/site-packages/tensorflow_core/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/username/.cache/dnnlib/tflib-cudacache/fused_bias_act_b854f54134b47f099f4349d891d819ed.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

depending on that tweak:

         compile_opts += f' --compiler-options \'{" ".join(tf.sysconfig.get_compile_flags())}\''

       # compile_opts += f' --compiler-options \'-fPIC -D_GLIBCXX_USE_CXX11_ABI=1\''

johndpope · 2020-12-14T20:40:57Z

as stylegan2 doesn't have an issues tracker - I overloaded the issue into this git commit

related - NVlabs/stylegan2@23f8bed

there is a script we can run to upgrade stylegan2 to use tensorflow 2 - automagically.
tf_upgrade_v2
--intree stylegan2/
--outtree stylegan2-tf2/
--reportfile report.txt

however - we need to address the follow failed conversions
I beseech Nvidia to create an "UNSUPPORTED" tensorflow2 branch which we can all fix.

tf.contrib. tf.contrib.memory_stats.MaxBytesInUse
Using member tf.contrib in deprecated module tf.contrib. tf.contrib cannot be converted automatically.
tf.contrib. tf.contrib.opt.ScipyOptimizerInterface cannot be converted automatically. tf.contrib
tf.contrib. tf.contrib.opt.GGTOptimizer cannot be converted automatically

probably would help if someone from tensorflow could also guide support in getting this over the line. the nvidia labs don't wont to support tensorflow 2. but it seems like push comes to shove here. Unless we're holding out for a stylegan3 to drop - fyi @tkarras

Thunder003 added the type:performance Performance Issue label Oct 21, 2020

google-ml-butler bot assigned ravikyram Oct 21, 2020

ravikyram added comp:gpu GPU related issues TF 1.14 for issues seen with TF 1.14 labels Oct 21, 2020

ravikyram assigned ymodak and unassigned ravikyram Oct 21, 2020

Thunder003 changed the title ~~TENSORFLOW 1.4 STYLEAGN 2 PERFORMANCE ISSUE ON RTX 3090ti MULTIPLE GPU~~ TENSORFLOW 1.14 STYLEAGN 2 PERFORMANCE ISSUE ON RTX 3090ti MULTIPLE GPU Oct 21, 2020

Thunder003 changed the title ~~TENSORFLOW 1.14 STYLEAGN 2 PERFORMANCE ISSUE ON RTX 3090ti MULTIPLE GPU~~ TENSORFLOW 1.14 STYLEAGN 2 PERFORMANCE ISSUE ON RTX 3090 MULTIPLE GPU Oct 22, 2020

ymodak added the stat:awaiting response Status - Awaiting response from author label Oct 23, 2020

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Oct 26, 2020

ymodak added the stat:awaiting response Status - Awaiting response from author label Oct 26, 2020

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Oct 31, 2020

Thunder003 closed this as completed Nov 5, 2020

Saduf2019 mentioned this issue Nov 8, 2020

Tensorflow support for Nvidia geforce rtx 3090 - CUDA-11.1 #44682

Closed

UsharaniPagadala mentioned this issue Jun 10, 2021

U-Net is not converging in RTX3090 #50021

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TENSORFLOW 1.14 STYLEAGN 2 PERFORMANCE ISSUE ON RTX 3090 MULTIPLE GPU #44200

TENSORFLOW 1.14 STYLEAGN 2 PERFORMANCE ISSUE ON RTX 3090 MULTIPLE GPU #44200

Thunder003 commented Oct 21, 2020 •

edited

ravikyram commented Oct 21, 2020

Thunder003 commented Oct 21, 2020

ymodak commented Oct 23, 2020 •

edited

Thunder003 commented Oct 24, 2020

mihaimaruseac commented Oct 26, 2020

ymodak commented Oct 26, 2020

Thunder003 commented Oct 29, 2020

mihaimaruseac commented Nov 2, 2020

psycho2012 commented Nov 3, 2020 •

edited

Thunder003 commented Nov 3, 2020

psycho2012 commented Nov 4, 2020

Thunder003 commented Nov 4, 2020

psycho2012 commented Nov 5, 2020

Thunder003 commented Nov 5, 2020 •

edited

psycho2012 commented Nov 5, 2020

Thunder003 commented Nov 5, 2020

C-SJK commented Nov 12, 2020

Thunder003 commented Nov 12, 2020 •

edited

C-SJK commented Nov 12, 2020

C-SJK commented Nov 12, 2020

JulianPinzaru commented Nov 17, 2020 •

edited

johndpope commented Dec 14, 2020

TENSORFLOW 1.14 STYLEAGN 2 PERFORMANCE ISSUE ON RTX 3090 MULTIPLE GPU #44200

TENSORFLOW 1.14 STYLEAGN 2 PERFORMANCE ISSUE ON RTX 3090 MULTIPLE GPU #44200

Comments

Thunder003 commented Oct 21, 2020 • edited

ravikyram commented Oct 21, 2020

Thunder003 commented Oct 21, 2020

ymodak commented Oct 23, 2020 • edited

Thunder003 commented Oct 24, 2020

mihaimaruseac commented Oct 26, 2020

ymodak commented Oct 26, 2020

Thunder003 commented Oct 29, 2020

mihaimaruseac commented Nov 2, 2020

psycho2012 commented Nov 3, 2020 • edited

Thunder003 commented Nov 3, 2020

psycho2012 commented Nov 4, 2020

Thunder003 commented Nov 4, 2020

psycho2012 commented Nov 5, 2020

Thunder003 commented Nov 5, 2020 • edited

psycho2012 commented Nov 5, 2020

Thunder003 commented Nov 5, 2020

C-SJK commented Nov 12, 2020

Thunder003 commented Nov 12, 2020 • edited

C-SJK commented Nov 12, 2020

C-SJK commented Nov 12, 2020

JulianPinzaru commented Nov 17, 2020 • edited

johndpope commented Dec 14, 2020

Thunder003 commented Oct 21, 2020 •

edited

ymodak commented Oct 23, 2020 •

edited

psycho2012 commented Nov 3, 2020 •

edited

Thunder003 commented Nov 5, 2020 •

edited

Thunder003 commented Nov 12, 2020 •

edited

JulianPinzaru commented Nov 17, 2020 •

edited