We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi @ptillet @Jokeren @ThomasRaoux @jlebar
sys.platform: linux Python: 3.9.16 (main, Aug 15 2023, 19:38:56) [GCC 8.3.1 20190311 (Red Hat 8.3.1-3)] CUDA available: True MUSA available: False GPU 0,1: NVIDIA A100-SXM4-80GB CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.8, V11.8.89 GCC: gcc (GCC) 10.2.1 20210130 (Red Hat 10.2.1-11) PyTorch: 2.2.2+cu118 PyTorch compiling details: PyTorch built with: - GCC 9.3 - C++ Version: 201703 - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications - Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01) - OpenMP 201511 (a.k.a. OpenMP 4.5) - LAPACK is enabled (usually provided by MKL) - NNPACK is enabled - CPU capability usage: AVX2 - CUDA Runtime 11.8 - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90 - CuDNN 8.9.2 (built against CUDA 12.1) - Built with CuDNN 8.7 - Magma 2.6.1 - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, transformers: 4.40.0 pydantic: 2.6.0 triton: 2.2.0
import torch import triton import triton.language as tl @triton.jit def _add_kernel(A, B, C, size, BLOCK: tl.constexpr): prog_id = tl.program_id(0) offs = prog_id * BLOCK + tl.arange(0, BLOCK) a = tl.load(A + offs, mask=offs < size) b = tl.load(B + offs, mask=offs < size) tl.store(C + offs, a + b, mask=offs < size) def custom_add(a, b): c = torch.empty_like(a) size = c.size(0) BLOCK = 16 grid = [triton.cdiv(size, BLOCK)] _add_kernel[grid](a, b, c, size, BLOCK=BLOCK) return c def check_env_triton(): try: a = torch.tensor([1, 2], device='cuda') b = a.new_tensor([3, 4], device='cuda') c = custom_add(a, b) except Exception as e: print(e) check_env_triton()
Triton Error [CUDA]: device kernel image is invalid
If you need more detailed information, please feel free to contact me at any time. Thanks.
The text was updated successfully, but these errors were encountered:
ref InternLM/lmdeploy#1621 (comment)
Sorry, something went wrong.
No branches or pull requests
Hi @ptillet @Jokeren @ThomasRaoux @jlebar
env
reproduce
error
If you need more detailed information, please feel free to contact me at any time. Thanks.
The text was updated successfully, but these errors were encountered: