Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure script automatically selects CUDA/cuDNN path instead of waiting for user input #60760

Open
ramizouari opened this issue Jun 2, 2023 · 4 comments
Assignees
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.10 type:bug Bug type:build/install Build and install issues

Comments

@ramizouari
Copy link

ramizouari commented Jun 2, 2023

Click to expand!

Issue Type

Bug

Have you reproduced the bug with TF nightly?

No

Source

source

Tensorflow Version

TF 2.10

Custom Code

No

OS Platform and Distribution

Fedora 37

Mobile device

No response

Python version

3.10

Bazel version

5.3.0

GCC/Compiler version

12.3.1

CUDA/cuDNN version

11.8,12.1/8.0

GPU model and memory

GTX 1660 Ti, 6 GB

Current Behaviour?

I am having multiple CUDA versions, and I am trying to build Tensorflow from source with CUDA support.

Now the problem lays when I try to configure the build system using ./configure. It will asks for relevant information for the build system. This includes:

  1. Python path
  2. Python packages path
  3. Whether to support mROC
  4. Whether to support CUDA
  5. Whether to support TensorRT

Now, when I select CUDA support. the script seems to automatically selects my CUDA/cuDNN versions, and does not give me the possibility to select it manually, which is contradictory to what the documentation suggests at https://www.tensorflow.org/install/source#gpu_support: "If your system has multiple versions of CUDA or cuDNN installed, explicitly set the version instead of relying on the default"

Now, I was able to trace the issue exactly to the configure.py file.
In fact, I strongly suspects that there is a logical error on the section that parses the user input (Line 1244 on branch r2.11):

  environ_save = dict(environ_cp)
  for _ in range(_DEFAULT_PROMPT_ASK_ATTEMPTS):
    if validate_cuda_config(environ_cp):
      cuda_env_names = [
          'TF_CUDA_VERSION',
          'TF_CUBLAS_VERSION',
          'TF_CUDNN_VERSION',
          'TF_TENSORRT_VERSION',
          'TF_NCCL_VERSION',
          'TF_CUDA_PATHS',
          # Items below are for backwards compatibility when not using
          # TF_CUDA_PATHS.
          'CUDA_TOOLKIT_PATH',
          'CUDNN_INSTALL_PATH',
          'NCCL_INSTALL_PATH',
          'NCCL_HDR_PATH',
          'TENSORRT_INSTALL_PATH'
      ]
      # Note: set_action_env_var above already writes to bazelrc.
      for name in cuda_env_names:
        if name in environ_cp:
          write_action_env_to_bazelrc(name, environ_cp[name])
      break

    # Restore settings changed below if CUDA config could not be validated.
    environ_cp = dict(environ_save)

    set_tf_cuda_version(environ_cp)
    set_tf_cudnn_version(environ_cp)
    if is_windows():
      set_tf_tensorrt_version(environ_cp)
    if is_linux():
      set_tf_tensorrt_version(environ_cp)
      set_tf_nccl_version(environ_cp)

    set_tf_cuda_paths(environ_cp)

Now, from my understanding, the script will validate the given environment, and then if that fails will ask for user input.
With that, on the first iteration of the loop, the validation will not contain the required environment variables.

I was able to solve the issue by swapping the order as follow:

    environ_save = dict(environ_cp)
    for _ in range(_DEFAULT_PROMPT_ASK_ATTEMPTS):
      # Restore settings changed below if CUDA config could not be validated.
      environ_cp = dict(environ_save)

      set_tf_cuda_version(environ_cp)
      set_tf_cudnn_version(environ_cp)
      if is_windows():
        set_tf_tensorrt_version(environ_cp)
      if is_linux():
        set_tf_tensorrt_version(environ_cp)
        set_tf_nccl_version(environ_cp)

      set_tf_cuda_paths(environ_cp)
      if validate_cuda_config(environ_cp):
        cuda_env_names = [
            'TF_CUDA_VERSION',
            'TF_CUBLAS_VERSION',
            'TF_CUDNN_VERSION',
            'TF_TENSORRT_VERSION',
            'TF_NCCL_VERSION',
            'TF_CUDA_PATHS',
            # Items below are for backwards compatibility when not using
            # TF_CUDA_PATHS.
            'CUDA_TOOLKIT_PATH',
            'CUDNN_INSTALL_PATH',
            'NCCL_INSTALL_PATH',
            'NCCL_HDR_PATH',
            'TENSORRT_INSTALL_PATH'
        ]
        # Note: set_action_env_var above already writes to bazelrc.
        for name in cuda_env_names:
          if name in environ_cp:
            write_action_env_to_bazelrc(name, environ_cp[name])
        break

Standalone code to reproduce the issue

Assumption: Multiple CUDA versions on /usr/local

Command:
./configure

Input Example:
1. [Default Setting]
2. [Default Setting]
3. N
4. y
5. N

Relevant log output

No response

@google-ml-butler google-ml-butler bot added the type:bug Bug label Jun 2, 2023
@SuryanarayanaY SuryanarayanaY added type:build/install Build and install issues TF 2.10 stat:awaiting response Status - Awaiting response from author and removed stat:awaiting response Status - Awaiting response from author labels Jun 2, 2023
@SuryanarayanaY
Copy link
Collaborator

Hi @ramizouari ,

Tensorflow preconfigures paths of the CUDA and cuDNN toolkits which are installed as per Official instructions in documentation using Conda.If the script able to detect the path automatically then it won't ask the user to mention the paths.If the path not able to detectable by script then it will prompt the users to mention the path.Please refer the below example for same.

(tf) suryanarayanay@surya-ubuntu22-cuda-test:~/tensorflow$ ./configure
bash: /home/suryanarayanay/miniconda3/envs/tf/lib/libtinfo.so.6: no version information available (required by bash)
You have bazel 5.3.0 installed.
Please specify the location of python. [Default is /home/suryanarayanay/miniconda3/envs/tf/bin/python3]: 


Found possible Python library paths:
  /home/suryanarayanay/miniconda3/envs/tf/lib/python3.9/site-packages
Please input the desired Python library path to use.  Default is [/home/suryanarayanay/miniconda3/envs/tf/lib/python3.9/site-packages]

Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Do you wish to build TensorFlow with TensorRT support? [y/N]: n
No TensorRT support will be enabled for TensorFlow.

Could not find any cuda.h matching version '' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
        'local/cuda/extras/CUPTI/include'
of:
        '/lib'
        '/lib/x86_64-linux-gnu'
        '/usr'
        '/usr/lib/x86_64-linux-gnu/libfakeroot'

Asking for detailed CUDA configuration...

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 11]: 11.8


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 2]: 8.6


Please specify the locally installed NCCL version you want to use. [Leave empty to use http://github.com/nvidia/nccl]: 


Please specify the comma-separated list of base paths to look for CUDA libraries and headers. [Leave empty to use the default]:

So if the script is able to identify the path then tensorflow only facilitating the users right. However if you want to keep the cuda and cudnn libraries at a particular directory or want to use particular version of cuda/cudnn you can done this by removing cuda/cuDNN from standard download path and then the script will ask to enter the cuda path as seen in above example.

I would like to know how you installed the cuda/cuDNN and how the path has been set. Also please confirm whether the auto detection is causing any particular problem for your case. Please elaborate.

Thanks!

@SuryanarayanaY SuryanarayanaY added the stat:awaiting response Status - Awaiting response from author label Jun 2, 2023
@ramizouari
Copy link
Author

Hi @SuryanarayanaY ,
First of all, thank you for your help.

I installed both cuDNN and CUDA via Nvidia's RPM package. And so it is updated via the package manager.
The installation is on the standard path /usr/local/cuda.

Now to be more precise, for any update with version xx.y of CUDA. the package manager will:

  1. install the update on /usr/local/cuda-xx.y folder
  2. set /usr/local/cuda-x and /usr/local/cuda as a symbolic to /usr/local/cuda-xx.y

With this, I effectively have many CUDA versions installed on the path /usr/local/cuda-xx.y, with the latest version acting as the default one.

The path is set on login. In fact, my ~/.bashrc file contain these two lines:

export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"

However if you want to keep the cuda and cudnn libraries at a particular directory or want to use particular version of cuda/cudnn you can done this by removing cuda/cuDNN from standard download path and then the script will ask to enter the cuda path as seen in above example.

I am going to slightly disagree on this.
This should be the logical behaviour when there is exactly one installation (modulo some symbolic links).
But in my case, I have many different installations, and it will be better if the script asks for what version I expect.

Also, the documentation itself hints that the script should do such behaviour upon detecting many CUDA versions, which is not what is happening.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Jun 2, 2023
@SuryanarayanaY
Copy link
Collaborator

Hi @ramizouari ,

The script for ./configure can be found here.

If you are interested then go through the source code and analyse the behaviour and may let us know if you have any pointers for this behaviour.

Thanks!

@SuryanarayanaY
Copy link
Collaborator

@nitins17 - Please share your pointers on this issue.

CC - @learning-to-play

@SuryanarayanaY SuryanarayanaY added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.10 type:bug Bug type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests

3 participants