You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4
python3 --version
Python 3.9.5
import spconv
spconv.version
'2.2.6'
To reproduce
import spconv.pytorch as spconv
import torch
num_channels = 32
N = 4
features = torch.rand([N, num_channels])
# Coordinates with shape [N, ndim + 1], batch index must be put in indices[:, 0]
indices_np = np.array([[0, 1, 1, 2], [0, 2, 1, 1], [1, 7, 8, 2], [1, 7, 9, 2]]).astype(np.int32)
indices = torch.from_numpy(indices_np)
spatial_shape = [513, 517, 29]
batch_size = 2
device_idx = 0
x = spconv.SparseConvTensor(features.cuda(device_idx), indices.cuda(device_idx), spatial_shape, batch_size)
conv = spconv.SubMConv3d(num_channels, 16, 3, padding=1, bias=False, indice_key="subm1").cuda(device_idx)
print(conv(x))
command CUDA_LAUNCH_BLOCKING=1 python3 /repro.py
If I run with device_idx=0 output is as expected
<spconv.pytorch.core.SparseConvTensor object at 0x7f9f76e15ca0>
If I run with device_idx = 1 output is:
[Exception|implicit_gemm_pair]indices=torch.Size([4, 4]),bs=2,ss=[513, 517, 29],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False spconv try to save debug data to /debug.txt, but failed with exception CUDA error: an illegal memory access was encountered. please check your SPCONV_DEBUG_SAVE_PATH Traceback (most recent call last): File "/repro.py", line 17, in <module> print(conv(x)) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.9/dist-packages/spconv/pytorch/conv.py", line 515, in forward raise e File "/usr/local/lib/python3.9/dist-packages/spconv/pytorch/conv.py", line 492, in forward res = ops.get_indice_pairs_implicit_gemm( File "/usr/local/lib/python3.9/dist-packages/spconv/pytorch/ops.py", line 546, in get_indice_pairs_implicit_gemm SpconvOps.sort_1d_by_key_allocator(pair_mask_tv[j], RuntimeError: radix_sort: failed on 1st step: cudaErrorIllegalAddress: an illegal memory access was encountered
No debug data was saved to the file SPCONV_DEBUG_SAVE_PATH
To me it looks like it is trying to do the rules on GPU0 even though all the tensors on are GPU1.
Note it works fine if we run with with torch.cuda.device(1):
The text was updated successfully, but these errors were encountered:
dddab
changed the title
Sparse Conv only works on GPU0
Sparse Conv only works on GPU0 without context manager.
Dec 17, 2022
NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4
python3 --version
Python 3.9.5
To reproduce
command
CUDA_LAUNCH_BLOCKING=1 python3 /repro.py
If I run with
device_idx=0
output is as expected<spconv.pytorch.core.SparseConvTensor object at 0x7f9f76e15ca0>
If I run with
device_idx = 1
output is:[Exception|implicit_gemm_pair]indices=torch.Size([4, 4]),bs=2,ss=[513, 517, 29],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False spconv try to save debug data to /debug.txt, but failed with exception CUDA error: an illegal memory access was encountered. please check your SPCONV_DEBUG_SAVE_PATH Traceback (most recent call last): File "/repro.py", line 17, in <module> print(conv(x)) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.9/dist-packages/spconv/pytorch/conv.py", line 515, in forward raise e File "/usr/local/lib/python3.9/dist-packages/spconv/pytorch/conv.py", line 492, in forward res = ops.get_indice_pairs_implicit_gemm( File "/usr/local/lib/python3.9/dist-packages/spconv/pytorch/ops.py", line 546, in get_indice_pairs_implicit_gemm SpconvOps.sort_1d_by_key_allocator(pair_mask_tv[j], RuntimeError: radix_sort: failed on 1st step: cudaErrorIllegalAddress: an illegal memory access was encountered
No debug data was saved to the file SPCONV_DEBUG_SAVE_PATH
To me it looks like it is trying to do the rules on GPU0 even though all the tensors on are GPU1.
Note it works fine if we run with
with torch.cuda.device(1):
The text was updated successfully, but these errors were encountered: