Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse Conv only works on GPU0 without context manager. #544

Open
dddab opened this issue Dec 17, 2022 · 0 comments
Open

Sparse Conv only works on GPU0 without context manager. #544

dddab opened this issue Dec 17, 2022 · 0 comments

Comments

@dddab
Copy link

dddab commented Dec 17, 2022

NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4

python3 --version
Python 3.9.5

import spconv
spconv.version
'2.2.6'

To reproduce

import spconv.pytorch as spconv
import torch

num_channels = 32
N = 4

features = torch.rand([N, num_channels])
# Coordinates with shape [N, ndim + 1], batch index must be put in indices[:, 0]
indices_np = np.array([[0, 1, 1, 2], [0, 2, 1, 1], [1, 7, 8, 2], [1, 7, 9, 2]]).astype(np.int32)
indices = torch.from_numpy(indices_np)
spatial_shape = [513, 517, 29]
batch_size = 2
device_idx = 0
x = spconv.SparseConvTensor(features.cuda(device_idx), indices.cuda(device_idx), spatial_shape, batch_size)
conv = spconv.SubMConv3d(num_channels, 16, 3, padding=1, bias=False, indice_key="subm1").cuda(device_idx)
print(conv(x))

command CUDA_LAUNCH_BLOCKING=1 python3 /repro.py

If I run with device_idx=0 output is as expected

<spconv.pytorch.core.SparseConvTensor object at 0x7f9f76e15ca0>

If I run with device_idx = 1 output is:

[Exception|implicit_gemm_pair]indices=torch.Size([4, 4]),bs=2,ss=[513, 517, 29],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False spconv try to save debug data to /debug.txt, but failed with exception CUDA error: an illegal memory access was encountered. please check your SPCONV_DEBUG_SAVE_PATH Traceback (most recent call last): File "/repro.py", line 17, in <module> print(conv(x)) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.9/dist-packages/spconv/pytorch/conv.py", line 515, in forward raise e File "/usr/local/lib/python3.9/dist-packages/spconv/pytorch/conv.py", line 492, in forward res = ops.get_indice_pairs_implicit_gemm( File "/usr/local/lib/python3.9/dist-packages/spconv/pytorch/ops.py", line 546, in get_indice_pairs_implicit_gemm SpconvOps.sort_1d_by_key_allocator(pair_mask_tv[j], RuntimeError: radix_sort: failed on 1st step: cudaErrorIllegalAddress: an illegal memory access was encountered

No debug data was saved to the file SPCONV_DEBUG_SAVE_PATH

To me it looks like it is trying to do the rules on GPU0 even though all the tensors on are GPU1.

Note it works fine if we run with with torch.cuda.device(1):

@dddab dddab changed the title Sparse Conv only works on GPU0 Sparse Conv only works on GPU0 without context manager. Dec 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant