RuntimeError: Error building extension 'codebook_cuda': [1/3] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output cuda_kernel.cuda.o.d #52

sanjeev-bhandari · 2024-03-14T04:28:51Z

Environment: Google Colab
CUDA Info:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   57C    P8              12W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |

Code Runned from: Basic Basic AQLM generation DEMO

error line:

%%capture
output = quantized_model.generate(tokenizer("", return_tensors="pt")["input_ids"].cuda(), max_new_tokens=10)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py](https://localhost:8080/#) in _run_ninja_build(build_directory, verbose, error_prefix)
   2095         stdout_fileno = 1
-> 2096         subprocess.run(
   2097             command,

26 frames
CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py](https://localhost:8080/#) in _run_ninja_build(build_directory, verbose, error_prefix)
   2110         if hasattr(error, 'output') and error.output:  # type: ignore[union-attr]
   2111             message += f": {error.output.decode(*SUBPROCESS_DECODE_ARGS)}"  # type: ignore[union-attr]
-> 2112         raise RuntimeError(message) from e
   2113 
   2114 

RuntimeError: Error building extension 'codebook_cuda': [1/3] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=codebook_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++17 -c /usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu -o cuda_kernel.cuda.o 
FAILED: cuda_kernel.cuda.o 
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=codebook_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++17 -c /usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu -o cuda_kernel.cuda.o 
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu(59): error: no suitable user-defined conversion from "nv_bfloat162" to "__half2" exists
              res2 = __hfma2(a[j], b[j], res2);
                             ^

/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu(59): error: no suitable user-defined conversion from "nv_bfloat162" to "__half2" exists
              res2 = __hfma2(a[j], b[j], res2);
                                   ^

/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu(59): error: no suitable user-defined conversion from "nv_bfloat162" to "__half2" exists
              res2 = __hfma2(a[j], b[j], res2);
                                         ^

/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu(147): error: no suitable user-defined conversion from "__nv_bfloat162" to "__half2" exists
              res2 = __hfma2(__hadd2(a0[j], a1[j]), b[j], res2);
                             ^

/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu(147): error: no suitable user-defined conversion from "nv_bfloat162" to "__half2" exists
              res2 = __hfma2(__hadd2(a0[j], a1[j]), b[j], res2);
                                                    ^

/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu(147): error: no suitable user-defined conversion from "nv_bfloat162" to "__half2" exists
              res2 = __hfma2(__hadd2(a0[j], a1[j]), b[j], res2);
                                                          ^

6 errors detected in the compilation of "/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cu".
[2/3] c++ -MMD -MF cuda_kernel.o.d -DTORCH_EXTENSION_NAME=codebook_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.cpp -o cuda_kernel.o 
ninja: build stopped: subcommand failed.

The text was updated successfully, but these errors were encountered:

BlackSamorez · 2024-03-14T09:55:34Z

Hi @sanjeev-bhandari!
aqlm==1.1.1 is having problems compiling on older GPUs (CC<8). Please downgrade to 1.1.0, for now. We're planning on fixing this error and releasing a new version soon.

BlackSamorez · 2024-03-14T14:48:01Z

aqlm==1.1.2 has been released fixing this issue. The 1.1.1 version has been yanked.

BlackSamorez closed this as completed Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Error building extension 'codebook_cuda': [1/3] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output cuda_kernel.cuda.o.d #52

RuntimeError: Error building extension 'codebook_cuda': [1/3] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output cuda_kernel.cuda.o.d #52

sanjeev-bhandari commented Mar 14, 2024

BlackSamorez commented Mar 14, 2024

BlackSamorez commented Mar 14, 2024

RuntimeError: Error building extension 'codebook_cuda': [1/3] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output cuda_kernel.cuda.o.d #52

RuntimeError: Error building extension 'codebook_cuda': [1/3] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output cuda_kernel.cuda.o.d #52

Comments

sanjeev-bhandari commented Mar 14, 2024

BlackSamorez commented Mar 14, 2024

BlackSamorez commented Mar 14, 2024