Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to build TF 1.15 on Cuda 11.1 #43629

Closed
iperov opened this issue Sep 28, 2020 · 22 comments
Closed

Fail to build TF 1.15 on Cuda 11.1 #43629

iperov opened this issue Sep 28, 2020 · 22 comments
Assignees
Labels
stat:awaiting response Status - Awaiting response from author subtype:windows Windows Build/Installation Issues TF 1.15 for issues seen on TF 1.15 type:build/install Build and install issues

Comments

@iperov
Copy link

iperov commented Sep 28, 2020

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows
  • TensorFlow installed from (source or binary): source
  • TensorFlow version: 1.15
  • Python version: 3.6
  • Installed using virtualenv? pip? conda?: pip
  • Bazel version (if compiling from source): 0.25.3
  • GCC/Compiler version (if compiling from source): MSVC 2017
  • CUDA/cuDNN version: 11.1 / 7.6.0
  • GPU model and memory: RTX 2080 TI

Describe the problem

unable to build TF 1.15 on Cuda 11.1

Any other info / logs

Execution platform: @bazel_tools//platforms:host_platform
tensorflow/core/kernels/cuda_sparse.cc(212): error C2065: 'cusparseSgtsv': undec
lared identifier
tensorflow/core/kernels/cuda_sparse.cc(212): error C2065: 'cusparseDgtsv': undec
lared identifier
tensorflow/core/kernels/cuda_sparse.cc(212): error C2065: 'cusparseCgtsv': undec
lared identifier
tensorflow/core/kernels/cuda_sparse.cc(212): error C2065: 'cusparseZgtsv': undec
lared identifier
tensorflow/core/kernels/cuda_sparse.cc(224): error C2065: 'cusparseSgtsv_nopivot
': undeclared identifier
tensorflow/core/kernels/cuda_sparse.cc(224): error C2065: 'cusparseDgtsv_nopivot
': undeclared identifier
tensorflow/core/kernels/cuda_sparse.cc(224): error C2065: 'cusparseCgtsv_nopivot
': undeclared identifier
tensorflow/core/kernels/cuda_sparse.cc(224): error C2065: 'cusparseZgtsv_nopivot
': undeclared identifier
tensorflow/core/kernels/cuda_sparse.cc(250): error C2065: 'cusparseSgtsvStridedB
atch': undeclared identifier
tensorflow/core/kernels/cuda_sparse.cc(250): error C2065: 'cusparseDgtsvStridedB
atch': undeclared identifier
tensorflow/core/kernels/cuda_sparse.cc(250): error C2065: 'cusparseCgtsvStridedB
atch': undeclared identifier
tensorflow/core/kernels/cuda_sparse.cc(250): error C2065: 'cusparseZgtsvStridedB
atch': undeclared identifier
@iperov iperov added the type:build/install Build and install issues label Sep 28, 2020
@bhack
Copy link
Contributor

bhack commented Sep 28, 2020

You need master for Cuda 11

@iperov
Copy link
Author

iperov commented Sep 28, 2020

@bhack that is tf 1.15-master

@bhack
Copy link
Contributor

bhack commented Sep 28, 2020

I meant the master branch.

@iperov
Copy link
Author

iperov commented Sep 28, 2020

My big app is created using TF 1

I cannot upgrade it to TF 2 API, because it requires a lot of modifications and testings from scratch, which is time and money consuming task.

Seems like Tensorflow(TM) cannot provide backward compatibility for new CUDA versions.
So even 2 years old app will not support new cards.
It is serious impact to business and companies which are using TF.
Where is the guarantee that it will not happen again?

I am already very sorry that I did not choose pytorch at first.
Burn in hell google, die tensorflow.

@bhack
Copy link
Contributor

bhack commented Sep 28, 2020

@iperov You can express your opinion and also your frustration but please respect the perimeter of our code of conduct https://github.com/tensorflow/tensorflow/blob/master/CODE_OF_CONDUCT.md /cc @theadactyl

I think you can probably explore to use TF 1.15 in a Docker container.

@iperov
Copy link
Author

iperov commented Sep 29, 2020

why there docker, if source code is just cannot handle cuda 11.1?

RTX 3080 does not work with CUDA < 11.1

@Saduf2019
Copy link
Contributor

Saduf2019 commented Sep 29, 2020

@iperov
Please refer to these configurations.

Please note as per process we do not have support for tf 1.x you have to upgrade to 2.x. [You may try with cuda toolkit 9 and see if you face any issues as 1.x would not support cuda 9+]

@Saduf2019 Saduf2019 added subtype:windows Windows Build/Installation Issues stat:awaiting response Status - Awaiting response from author TF 1.15 for issues seen on TF 1.15 labels Sep 29, 2020
@iperov
Copy link
Author

iperov commented Sep 29, 2020

@Saduf2019

The main issue is not about support 1.x.
Issue is 1.x does not support new cards starts from RTX 3000 series.

Is this problem with CUDA ( NVIDIA breaks backward compatibility ? ) or the source code of TF 1.x has bugs ?

This is serious reason not work with TF and/or CUDA anymore.

Please discuss this topic with your devteam.

@Saduf2019
Copy link
Contributor

@iperov
As tf 1.x is not actively maintained anymore, there is no work on 1.x, hence please upgrade to 2.x ans there is support for it only now and any issues and up-gradation is performed on 2.x only.

@iperov
Copy link
Author

iperov commented Sep 29, 2020

@Saduf2019
I don't need actively maintain 1.x
Just fix support RTX 3000 for 1.15

@Saduf2019
Copy link
Contributor

Saduf2019 commented Sep 29, 2020

@iperov
Please refer to these configurations.

Please note as per process we do not have support for tf 1.x you have to upgrade to 2.x. [You may try with cuda toolkit 9 and see if you face any issues as 1.x would not support cuda 9+]

as explained above there is no support for 1.x now, no fixes/changes will be made to 1.x.
please upgrade to 2.x and let us know if you face any issues. kindly move this issue to closed status if there are no issues with 2.x

@bhack
Copy link
Contributor

bhack commented Sep 29, 2020

Please follow our migration guide https://www.tensorflow.org/guide/migrate

@mihaimaruseac
Copy link
Collaborator

Hi @iperov

In order to obtain the maximum performance, code is tied to the current version of CUDA. Hence, each branch can only build with exactly one version of CUDA. It will be extremely costly to upgrade an old branch to a new CUDA version and very risky, so we don't do that at all.

TF is not alone here. Every software moves forward and every user will have to upgrade at one time or another.

@iperov
Copy link
Author

iperov commented Sep 30, 2020

@mihaimaruseac
ok. Is there any way to use any version of tensorflow with RTX 3k on Windows ?

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Oct 2, 2020
@mihaimaruseac
Copy link
Collaborator

If you want CUDA 11 you can use nightly. If you want to keep 1.x APIs then you can only use CUDA 10.0

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Oct 5, 2020
@iperov iperov closed this as completed Oct 5, 2020
@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@muskie82
Copy link

muskie82 commented Dec 1, 2020

To use tensorflow 1.1x on CUDA11.x, I think you should use nvidia-tensorflow.
The installation is quite simple.
(https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/)

@VladislavAD
Copy link

I think you should use nvidia-tensorflow.
(https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/)

Any possible ways to compile working libtensorflow.dll with it? I installed cuda 10.0 to system with 3070 and tensorflow.dll 1.15 that i'm currently using doesn't give me any errors, but initialization and inference are extremely slow.

@anshkumar
Copy link

Install NVIDIA drivers(455.23). After installing it check the status of GPU using nvidia-smi. Then install tf-1.15 as follows:

sudo apt update
sudo apt install -y python3-dev python3-pip git
pip3 install --upgrade pip setuptools requests

pip install -U virtualenv
virtualenv --system-site-packages -p python3 /venv
source /venv/bin/activate

pip install nvidia-pyindex
pip install nvidia-tensorflow[horovod]

This should install tf-1.15 with cuda 11.1 support.
Test it as follows:
python -c 'import tensorflow as tf; print(tf.__version__)'

python -c "import tensorflow as tf; print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))"

@iperov
Copy link
Author

iperov commented Dec 21, 2020

@anshkumar are you bot or spammer?

@anshkumar
Copy link

@iperov why do you think so ?
This worked for me. See (https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/)

@cocoyen1995
Copy link

I think you should use nvidia-tensorflow.
(https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/)

Any possible ways to compile working libtensorflow.dll with it? I installed cuda 10.0 to system with 3070 and tensorflow.dll 1.15 that i'm currently using doesn't give me any errors, but initialization and inference are extremely slow.

Hi @VladislavAD ,

I have a similar question here.
I tried to run a self-built tensorflow.dll(1.13.1) with cuda 10.0 and cudnn7.4.2 on RTX3080, it can run but the result is totally different from the one I ran on RTX2080Ti. Initialization took about 20 mins in my case, but the inference time is similar with the time I ran on RTX2080Ti. I'm wondering is the inference result being as expected in your case?

Another question here, how did you deal with using TF1's tensorflow.dll on RTX30 series' GPU card?
(Or did you switch to TF2 to solve this issue?)

Thanks in advance for any advice, I'm looking forward to your reply!

@tensorflow tensorflow locked as resolved and limited conversation to collaborators Mar 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stat:awaiting response Status - Awaiting response from author subtype:windows Windows Build/Installation Issues TF 1.15 for issues seen on TF 1.15 type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests

9 participants