Configure script automatically selects CUDA/cuDNN path instead of waiting for user input #60760

ramizouari · 2023-06-02T04:15:37Z

Click to expand!

Issue Type

Bug

Have you reproduced the bug with TF nightly?

No

Source

source

Tensorflow Version

TF 2.10

Custom Code

No

OS Platform and Distribution

Fedora 37

Mobile device

No response

Python version

3.10

Bazel version

5.3.0

GCC/Compiler version

12.3.1

CUDA/cuDNN version

11.8,12.1/8.0

GPU model and memory

GTX 1660 Ti, 6 GB

Current Behaviour?

I am having multiple CUDA versions, and I am trying to build Tensorflow from source with CUDA support.

Now the problem lays when I try to configure the build system using ./configure. It will asks for relevant information for the build system. This includes:

Python path
Python packages path
Whether to support mROC
Whether to support CUDA
Whether to support TensorRT

Now, when I select CUDA support. the script seems to automatically selects my CUDA/cuDNN versions, and does not give me the possibility to select it manually, which is contradictory to what the documentation suggests at https://www.tensorflow.org/install/source#gpu_support: "If your system has multiple versions of CUDA or cuDNN installed, explicitly set the version instead of relying on the default"

Now, I was able to trace the issue exactly to the configure.py file.
In fact, I strongly suspects that there is a logical error on the section that parses the user input (Line 1244 on branch r2.11):

  environ_save = dict(environ_cp)
  for _ in range(_DEFAULT_PROMPT_ASK_ATTEMPTS):
    if validate_cuda_config(environ_cp):
      cuda_env_names = [
          'TF_CUDA_VERSION',
          'TF_CUBLAS_VERSION',
          'TF_CUDNN_VERSION',
          'TF_TENSORRT_VERSION',
          'TF_NCCL_VERSION',
          'TF_CUDA_PATHS',
          # Items below are for backwards compatibility when not using
          # TF_CUDA_PATHS.
          'CUDA_TOOLKIT_PATH',
          'CUDNN_INSTALL_PATH',
          'NCCL_INSTALL_PATH',
          'NCCL_HDR_PATH',
          'TENSORRT_INSTALL_PATH'
      ]
      # Note: set_action_env_var above already writes to bazelrc.
      for name in cuda_env_names:
        if name in environ_cp:
          write_action_env_to_bazelrc(name, environ_cp[name])
      break

    # Restore settings changed below if CUDA config could not be validated.
    environ_cp = dict(environ_save)

    set_tf_cuda_version(environ_cp)
    set_tf_cudnn_version(environ_cp)
    if is_windows():
      set_tf_tensorrt_version(environ_cp)
    if is_linux():
      set_tf_tensorrt_version(environ_cp)
      set_tf_nccl_version(environ_cp)

    set_tf_cuda_paths(environ_cp)

Now, from my understanding, the script will validate the given environment, and then if that fails will ask for user input.
With that, on the first iteration of the loop, the validation will not contain the required environment variables.

I was able to solve the issue by swapping the order as follow:

    environ_save = dict(environ_cp)
    for _ in range(_DEFAULT_PROMPT_ASK_ATTEMPTS):
      # Restore settings changed below if CUDA config could not be validated.
      environ_cp = dict(environ_save)

      set_tf_cuda_version(environ_cp)
      set_tf_cudnn_version(environ_cp)
      if is_windows():
        set_tf_tensorrt_version(environ_cp)
      if is_linux():
        set_tf_tensorrt_version(environ_cp)
        set_tf_nccl_version(environ_cp)

      set_tf_cuda_paths(environ_cp)
      if validate_cuda_config(environ_cp):
        cuda_env_names = [
            'TF_CUDA_VERSION',
            'TF_CUBLAS_VERSION',
            'TF_CUDNN_VERSION',
            'TF_TENSORRT_VERSION',
            'TF_NCCL_VERSION',
            'TF_CUDA_PATHS',
            # Items below are for backwards compatibility when not using
            # TF_CUDA_PATHS.
            'CUDA_TOOLKIT_PATH',
            'CUDNN_INSTALL_PATH',
            'NCCL_INSTALL_PATH',
            'NCCL_HDR_PATH',
            'TENSORRT_INSTALL_PATH'
        ]
        # Note: set_action_env_var above already writes to bazelrc.
        for name in cuda_env_names:
          if name in environ_cp:
            write_action_env_to_bazelrc(name, environ_cp[name])
        break

Standalone code to reproduce the issue

Assumption: Multiple CUDA versions on /usr/local

Command:
./configure

Input Example:
1. [Default Setting]
2. [Default Setting]
3. N
4. y
5. N

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

SuryanarayanaY · 2023-06-02T09:52:11Z

Hi @ramizouari ,

Tensorflow preconfigures paths of the CUDA and cuDNN toolkits which are installed as per Official instructions in documentation using Conda.If the script able to detect the path automatically then it won't ask the user to mention the paths.If the path not able to detectable by script then it will prompt the users to mention the path.Please refer the below example for same.

(tf) suryanarayanay@surya-ubuntu22-cuda-test:~/tensorflow$ ./configure
bash: /home/suryanarayanay/miniconda3/envs/tf/lib/libtinfo.so.6: no version information available (required by bash)
You have bazel 5.3.0 installed.
Please specify the location of python. [Default is /home/suryanarayanay/miniconda3/envs/tf/bin/python3]: 


Found possible Python library paths:
  /home/suryanarayanay/miniconda3/envs/tf/lib/python3.9/site-packages
Please input the desired Python library path to use.  Default is [/home/suryanarayanay/miniconda3/envs/tf/lib/python3.9/site-packages]

Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Do you wish to build TensorFlow with TensorRT support? [y/N]: n
No TensorRT support will be enabled for TensorFlow.

Could not find any cuda.h matching version '' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
        'local/cuda/extras/CUPTI/include'
of:
        '/lib'
        '/lib/x86_64-linux-gnu'
        '/usr'
        '/usr/lib/x86_64-linux-gnu/libfakeroot'

Asking for detailed CUDA configuration...

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 11]: 11.8


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 2]: 8.6


Please specify the locally installed NCCL version you want to use. [Leave empty to use http://github.com/nvidia/nccl]: 


Please specify the comma-separated list of base paths to look for CUDA libraries and headers. [Leave empty to use the default]:

So if the script is able to identify the path then tensorflow only facilitating the users right. However if you want to keep the cuda and cudnn libraries at a particular directory or want to use particular version of cuda/cudnn you can done this by removing cuda/cuDNN from standard download path and then the script will ask to enter the cuda path as seen in above example.

I would like to know how you installed the cuda/cuDNN and how the path has been set. Also please confirm whether the auto detection is causing any particular problem for your case. Please elaborate.

Thanks!

ramizouari · 2023-06-02T19:59:46Z

Hi @SuryanarayanaY ,
First of all, thank you for your help.

I installed both cuDNN and CUDA via Nvidia's RPM package. And so it is updated via the package manager.
The installation is on the standard path /usr/local/cuda.

Now to be more precise, for any update with version xx.y of CUDA. the package manager will:

install the update on /usr/local/cuda-xx.y folder
set /usr/local/cuda-x and /usr/local/cuda as a symbolic to /usr/local/cuda-xx.y

With this, I effectively have many CUDA versions installed on the path /usr/local/cuda-xx.y, with the latest version acting as the default one.

The path is set on login. In fact, my ~/.bashrc file contain these two lines:

export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"

However if you want to keep the cuda and cudnn libraries at a particular directory or want to use particular version of cuda/cudnn you can done this by removing cuda/cuDNN from standard download path and then the script will ask to enter the cuda path as seen in above example.

I am going to slightly disagree on this.
This should be the logical behaviour when there is exactly one installation (modulo some symbolic links).
But in my case, I have many different installations, and it will be better if the script asks for what version I expect.

Also, the documentation itself hints that the script should do such behaviour upon detecting many CUDA versions, which is not what is happening.

SuryanarayanaY · 2023-06-07T08:28:26Z

Hi @ramizouari ,

The script for ./configure can be found here.

If you are interested then go through the source code and analyse the behaviour and may let us know if you have any pointers for this behaviour.

Thanks!

SuryanarayanaY · 2023-06-13T04:35:12Z

@nitins17 - Please share your pointers on this issue.

CC - @learning-to-play

google-ml-butler bot added the type:bug Bug label Jun 2, 2023

google-ml-butler bot assigned tilakrayal Jun 2, 2023

SuryanarayanaY assigned SuryanarayanaY and unassigned tilakrayal Jun 2, 2023

SuryanarayanaY added type:build/install Build and install issues TF 2.10 stat:awaiting response Status - Awaiting response from author and removed stat:awaiting response Status - Awaiting response from author labels Jun 2, 2023

SuryanarayanaY added the stat:awaiting response Status - Awaiting response from author label Jun 2, 2023

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Jun 2, 2023

SuryanarayanaY added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configure script automatically selects CUDA/cuDNN path instead of waiting for user input #60760

Configure script automatically selects CUDA/cuDNN path instead of waiting for user input #60760

ramizouari commented Jun 2, 2023 •

edited by google-ml-butler bot

Issue Type

Have you reproduced the bug with TF nightly?

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output

SuryanarayanaY commented Jun 2, 2023

ramizouari commented Jun 2, 2023

SuryanarayanaY commented Jun 7, 2023

SuryanarayanaY commented Jun 13, 2023

Configure script automatically selects CUDA/cuDNN path instead of waiting for user input #60760

Configure script automatically selects CUDA/cuDNN path instead of waiting for user input #60760

Comments

ramizouari commented Jun 2, 2023 • edited by google-ml-butler bot

Issue Type

Have you reproduced the bug with TF nightly?

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output

SuryanarayanaY commented Jun 2, 2023

ramizouari commented Jun 2, 2023

SuryanarayanaY commented Jun 7, 2023

SuryanarayanaY commented Jun 13, 2023

ramizouari commented Jun 2, 2023 •

edited by google-ml-butler bot