Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Error building extension 'codebook_cuda': [1/3] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output cuda_kernel.cuda.o.d #52

Closed
sanjeev-bhandari opened this issue Mar 14, 2024 · 2 comments

Comments

@sanjeev-bhandari
Copy link

Environment: Google Colab
CUDA Info:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   57C    P8              12W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |

Code Runned from: Basic Basic AQLM generation DEMO

error line:

%%capture
output = quantized_model.generate(tokenizer("", return_tensors="pt")["input_ids"].cuda(), max_new_tokens=10)
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py](https://localhost:8080/#) in _run_ninja_build(build_directory, verbose, error_prefix)
   2095         stdout_fileno = 1
-> 2096         subprocess.run(
   2097             command,

26 frames
CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py](https://localhost:8080/#) in _run_ninja_build(build_directory, verbose, error_prefix)
   2110         if hasattr(error, 'output') and error.output:  # type: ignore[union-attr]
   2111             message += f": {error.output.decode(*SUBPROCESS_DECODE_ARGS)}"  # type: ignore[union-attr]
-> 2112         raise RuntimeError(message) from e
   2113 
   2114 

RuntimeError: Error building extension 'codebook_cuda': [1/3] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=codebook_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++17 -c /usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu -o cuda_kernel.cuda.o 
FAILED: cuda_kernel.cuda.o 
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=codebook_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++17 -c /usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu -o cuda_kernel.cuda.o 
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu(59): error: no suitable user-defined conversion from "nv_bfloat162" to "__half2" exists
              res2 = __hfma2(a[j], b[j], res2);
                             ^

/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu(59): error: no suitable user-defined conversion from "nv_bfloat162" to "__half2" exists
              res2 = __hfma2(a[j], b[j], res2);
                                   ^

/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu(59): error: no suitable user-defined conversion from "nv_bfloat162" to "__half2" exists
              res2 = __hfma2(a[j], b[j], res2);
                                         ^

/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu(147): error: no suitable user-defined conversion from "__nv_bfloat162" to "__half2" exists
              res2 = __hfma2(__hadd2(a0[j], a1[j]), b[j], res2);
                             ^

/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu(147): error: no suitable user-defined conversion from "nv_bfloat162" to "__half2" exists
              res2 = __hfma2(__hadd2(a0[j], a1[j]), b[j], res2);
                                                    ^

/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu(147): error: no suitable user-defined conversion from "nv_bfloat162" to "__half2" exists
              res2 = __hfma2(__hadd2(a0[j], a1[j]), b[j], res2);
                                                          ^

6 errors detected in the compilation of "/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu".
[2/3] c++ -MMD -MF cuda_kernel.o.d -DTORCH_EXTENSION_NAME=codebook_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cpp -o cuda_kernel.o 
ninja: build stopped: subcommand failed.
@BlackSamorez
Copy link
Collaborator

Hi @sanjeev-bhandari!
aqlm==1.1.1 is having problems compiling on older GPUs (CC<8). Please downgrade to 1.1.0, for now. We're planning on fixing this error and releasing a new version soon.

@BlackSamorez
Copy link
Collaborator

aqlm==1.1.2 has been released fixing this issue. The 1.1.1 version has been yanked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants