Fail to build TF 1.15 on Cuda 11.1 #43629

iperov · 2020-09-28T19:18:55Z

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows
TensorFlow installed from (source or binary): source
TensorFlow version: 1.15
Python version: 3.6
Installed using virtualenv? pip? conda?: pip
Bazel version (if compiling from source): 0.25.3
GCC/Compiler version (if compiling from source): MSVC 2017
CUDA/cuDNN version: 11.1 / 7.6.0
GPU model and memory: RTX 2080 TI

Describe the problem

unable to build TF 1.15 on Cuda 11.1

Any other info / logs

Execution platform: @bazel_tools//platforms:host_platform
tensorflow/core/kernels/cuda_sparse.cc(212): error C2065: 'cusparseSgtsv': undec
lared identifier
tensorflow/core/kernels/cuda_sparse.cc(212): error C2065: 'cusparseDgtsv': undec
lared identifier
tensorflow/core/kernels/cuda_sparse.cc(212): error C2065: 'cusparseCgtsv': undec
lared identifier
tensorflow/core/kernels/cuda_sparse.cc(212): error C2065: 'cusparseZgtsv': undec
lared identifier
tensorflow/core/kernels/cuda_sparse.cc(224): error C2065: 'cusparseSgtsv_nopivot
': undeclared identifier
tensorflow/core/kernels/cuda_sparse.cc(224): error C2065: 'cusparseDgtsv_nopivot
': undeclared identifier
tensorflow/core/kernels/cuda_sparse.cc(224): error C2065: 'cusparseCgtsv_nopivot
': undeclared identifier
tensorflow/core/kernels/cuda_sparse.cc(224): error C2065: 'cusparseZgtsv_nopivot
': undeclared identifier
tensorflow/core/kernels/cuda_sparse.cc(250): error C2065: 'cusparseSgtsvStridedB
atch': undeclared identifier
tensorflow/core/kernels/cuda_sparse.cc(250): error C2065: 'cusparseDgtsvStridedB
atch': undeclared identifier
tensorflow/core/kernels/cuda_sparse.cc(250): error C2065: 'cusparseCgtsvStridedB
atch': undeclared identifier
tensorflow/core/kernels/cuda_sparse.cc(250): error C2065: 'cusparseZgtsvStridedB
atch': undeclared identifier

The text was updated successfully, but these errors were encountered:

bhack · 2020-09-28T21:01:43Z

You need master for Cuda 11

iperov · 2020-09-28T21:15:37Z

@bhack that is tf 1.15-master

bhack · 2020-09-28T21:17:02Z

I meant the master branch.

iperov · 2020-09-28T21:24:24Z

My big app is created using TF 1

I cannot upgrade it to TF 2 API, because it requires a lot of modifications and testings from scratch, which is time and money consuming task.

Seems like Tensorflow(TM) cannot provide backward compatibility for new CUDA versions.
So even 2 years old app will not support new cards.
It is serious impact to business and companies which are using TF.
Where is the guarantee that it will not happen again?

I am already very sorry that I did not choose pytorch at first.
Burn in hell google, die tensorflow.

bhack · 2020-09-28T21:37:08Z

@iperov You can express your opinion and also your frustration but please respect the perimeter of our code of conduct https://github.com/tensorflow/tensorflow/blob/master/CODE_OF_CONDUCT.md /cc @theadactyl

I think you can probably explore to use TF 1.15 in a Docker container.

iperov · 2020-09-29T05:47:19Z

why there docker, if source code is just cannot handle cuda 11.1?

RTX 3080 does not work with CUDA < 11.1

Saduf2019 · 2020-09-29T06:26:26Z

@iperov
Please refer to these configurations.

Please note as per process we do not have support for tf 1.x you have to upgrade to 2.x. [You may try with cuda toolkit 9 and see if you face any issues as 1.x would not support cuda 9+]

iperov · 2020-09-29T06:44:11Z

@Saduf2019

The main issue is not about support 1.x.
Issue is 1.x does not support new cards starts from RTX 3000 series.

Is this problem with CUDA ( NVIDIA breaks backward compatibility ? ) or the source code of TF 1.x has bugs ?

This is serious reason not work with TF and/or CUDA anymore.

Please discuss this topic with your devteam.

Saduf2019 · 2020-09-29T10:48:55Z

@iperov
As tf 1.x is not actively maintained anymore, there is no work on 1.x, hence please upgrade to 2.x ans there is support for it only now and any issues and up-gradation is performed on 2.x only.

iperov · 2020-09-29T10:52:50Z

@Saduf2019
I don't need actively maintain 1.x
Just fix support RTX 3000 for 1.15

Saduf2019 · 2020-09-29T13:38:55Z

@iperov
Please refer to these configurations.

Please note as per process we do not have support for tf 1.x you have to upgrade to 2.x. [You may try with cuda toolkit 9 and see if you face any issues as 1.x would not support cuda 9+]

as explained above there is no support for 1.x now, no fixes/changes will be made to 1.x.
please upgrade to 2.x and let us know if you face any issues. kindly move this issue to closed status if there are no issues with 2.x

bhack · 2020-09-29T13:43:29Z

Please follow our migration guide https://www.tensorflow.org/guide/migrate

mihaimaruseac · 2020-09-30T18:56:49Z

Hi @iperov

In order to obtain the maximum performance, code is tied to the current version of CUDA. Hence, each branch can only build with exactly one version of CUDA. It will be extremely costly to upgrade an old branch to a new CUDA version and very risky, so we don't do that at all.

TF is not alone here. Every software moves forward and every user will have to upgrade at one time or another.

iperov · 2020-09-30T19:50:48Z

@mihaimaruseac
ok. Is there any way to use any version of tensorflow with RTX 3k on Windows ?

mihaimaruseac · 2020-10-02T17:11:39Z

If you want CUDA 11 you can use nightly. If you want to keep 1.x APIs then you can only use CUDA 10.0

google-ml-butler · 2020-10-05T20:27:11Z

Are you satisfied with the resolution of your issue?
Yes
No

muskie82 · 2020-12-01T06:02:24Z

To use tensorflow 1.1x on CUDA11.x, I think you should use nvidia-tensorflow.
The installation is quite simple.
(https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/)

VladislavAD · 2020-12-08T11:48:35Z

I think you should use nvidia-tensorflow.
(https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/)

Any possible ways to compile working libtensorflow.dll with it? I installed cuda 10.0 to system with 3070 and tensorflow.dll 1.15 that i'm currently using doesn't give me any errors, but initialization and inference are extremely slow.

anshkumar · 2020-12-21T15:25:10Z

Install NVIDIA drivers(455.23). After installing it check the status of GPU using nvidia-smi. Then install tf-1.15 as follows:

sudo apt update
sudo apt install -y python3-dev python3-pip git
pip3 install --upgrade pip setuptools requests

pip install -U virtualenv
virtualenv --system-site-packages -p python3 /venv
source /venv/bin/activate

pip install nvidia-pyindex
pip install nvidia-tensorflow[horovod]

This should install tf-1.15 with cuda 11.1 support.
Test it as follows:
python -c 'import tensorflow as tf; print(tf.__version__)'

python -c "import tensorflow as tf; print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))"

iperov · 2020-12-21T16:03:30Z

@anshkumar are you bot or spammer?

anshkumar · 2020-12-21T16:05:46Z

@iperov why do you think so ?
This worked for me. See (https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/)

cocoyen1995 · 2021-03-18T03:51:02Z

I think you should use nvidia-tensorflow.
(https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/)

Any possible ways to compile working libtensorflow.dll with it? I installed cuda 10.0 to system with 3070 and tensorflow.dll 1.15 that i'm currently using doesn't give me any errors, but initialization and inference are extremely slow.

Hi @VladislavAD ,

I have a similar question here.
I tried to run a self-built tensorflow.dll(1.13.1) with cuda 10.0 and cudnn7.4.2 on RTX3080, it can run but the result is totally different from the one I ran on RTX2080Ti. Initialization took about 20 mins in my case, but the inference time is similar with the time I ran on RTX2080Ti. I'm wondering is the inference result being as expected in your case?

Another question here, how did you deal with using TF1's tensorflow.dll on RTX30 series' GPU card?
(Or did you switch to TF2 to solve this issue?)

Thanks in advance for any advice, I'm looking forward to your reply!

iperov added the type:build/install Build and install issues label Sep 28, 2020

google-ml-butler bot assigned Saduf2019 Sep 28, 2020

Saduf2019 added subtype:windows Windows Build/Installation Issues stat:awaiting response Status - Awaiting response from author TF 1.15 for issues seen on TF 1.15 labels Sep 29, 2020

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Oct 2, 2020

Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Oct 5, 2020

iperov closed this as completed Oct 5, 2020

Saduf2019 mentioned this issue Oct 6, 2020

RTX3080 install RuntimeError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid #43701

Closed

Saduf2019 mentioned this issue Nov 11, 2020

Issue working with Nvidia rtx-3090 #44753

Closed

Saduf2019 mentioned this issue Dec 8, 2020

Cannot ./configure Tensorflow on Ubuntu 18.04 LTS with CUDA 11.1 #45463

Closed

guicho271828 mentioned this issue Mar 3, 2021

Should I run setup again? guicho271828/latplan#10

Closed

tensorflow locked as resolved and limited conversation to collaborators Mar 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to build TF 1.15 on Cuda 11.1 #43629

Fail to build TF 1.15 on Cuda 11.1 #43629

iperov commented Sep 28, 2020

bhack commented Sep 28, 2020

iperov commented Sep 28, 2020

bhack commented Sep 28, 2020

iperov commented Sep 28, 2020

bhack commented Sep 28, 2020

iperov commented Sep 29, 2020

Saduf2019 commented Sep 29, 2020 •

edited

iperov commented Sep 29, 2020

Saduf2019 commented Sep 29, 2020

iperov commented Sep 29, 2020

Saduf2019 commented Sep 29, 2020 •

edited

bhack commented Sep 29, 2020

mihaimaruseac commented Sep 30, 2020

iperov commented Sep 30, 2020

mihaimaruseac commented Oct 2, 2020

google-ml-butler bot commented Oct 5, 2020

muskie82 commented Dec 1, 2020 •

edited

VladislavAD commented Dec 8, 2020

anshkumar commented Dec 21, 2020

iperov commented Dec 21, 2020

anshkumar commented Dec 21, 2020

cocoyen1995 commented Mar 18, 2021

Fail to build TF 1.15 on Cuda 11.1 #43629

Fail to build TF 1.15 on Cuda 11.1 #43629

Comments

iperov commented Sep 28, 2020

bhack commented Sep 28, 2020

iperov commented Sep 28, 2020

bhack commented Sep 28, 2020

iperov commented Sep 28, 2020

bhack commented Sep 28, 2020

iperov commented Sep 29, 2020

Saduf2019 commented Sep 29, 2020 • edited

iperov commented Sep 29, 2020

Saduf2019 commented Sep 29, 2020

iperov commented Sep 29, 2020

Saduf2019 commented Sep 29, 2020 • edited

bhack commented Sep 29, 2020

mihaimaruseac commented Sep 30, 2020

iperov commented Sep 30, 2020

mihaimaruseac commented Oct 2, 2020

google-ml-butler bot commented Oct 5, 2020

muskie82 commented Dec 1, 2020 • edited

VladislavAD commented Dec 8, 2020

anshkumar commented Dec 21, 2020

iperov commented Dec 21, 2020

anshkumar commented Dec 21, 2020

cocoyen1995 commented Mar 18, 2021

Saduf2019 commented Sep 29, 2020 •

edited

Saduf2019 commented Sep 29, 2020 •

edited

muskie82 commented Dec 1, 2020 •

edited