Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible context issue using pyvkfft in a multithreaded/multigpu environment #26

Closed
kkotyk opened this issue Apr 28, 2023 · 13 comments · Fixed by #27
Closed

Possible context issue using pyvkfft in a multithreaded/multigpu environment #26

kkotyk opened this issue Apr 28, 2023 · 13 comments · Fixed by #27

Comments

@kkotyk
Copy link

kkotyk commented Apr 28, 2023

Hey Vince, I'm trying to write an app that delegates work to threads to perform FFTs on different gpus. Each thread manages a separate gpu and is in the basic form of:

def thread_0():
    cupy.cuda.Device(0).use()
    while True:
        get_data....
        gpu_data = cp.array(data)
        fft = fftn(gpu_data)

def thread_1():
    cupy.cuda.Device(1).use()
    while True:
        get_data....
        gpu_data = cp.array(data)
        fft = fftn(gpu_data)

def main():
    spawn_threads...
    while True:
        send_data_0_thread0(...)
        send_data_1_thread1(...)

However, pyvkfft is throwing an exception:

Traceback (most recent call last):
File "/opt/python/lib/python3.9/threading.py", line 973, in _bootstrap_inner
self.run()
File "/opt/python/lib/python3.9/threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "/opt/leolabs/radar/20230425T163443/leo-radar/radar/sparta/processing/incoherent_worker.py", line 193, in process_thread
process_dict['results'] = process_incoherent.process_cpi(process_dict['mode'], process_dict['samples'])
File "/opt/leolabs/radar/20230425T163443/leo-radar/radar/sparta/processing/process_incoherent.py", line 434, in process_cpi
self._process_range_subset(cpi_data_gpu, tx_pulses, subset_min_range_idx, pulse_group_rising_edge, range_doppler[rstart:rend])
File "/opt/leolabs/radar/20230425T163443/leo-radar/radar/sparta/processing/process_incoherent.py", line 231, in _process_range_subset
dm_pulse = dsp.fft_demod_decimate(input_data, self._padded_tx_pulse, rising_edge[pg_idx], min_range_idx,
File "/opt/leolabs/radar/20230425T163443/leo-radar/radar/sparta/processing/incoherent_dsp_lib.py", line 199, in fft_demod_decimate
padded_ranges = FFTBackend.fft(padded_ranges, inplace=True, axes=-1)
File "/opt/leolabs/radar/20230425T163443/leo-radar/radar/sparta/utils/backends.py", line 55, in fft
return pyvkfft_lib.fftn(input_data, dest=input_data, ndim=1, axes=axes)
File "/opt/leolabs/radar/20230425T163443/leo-radar/venv3/lib/python3.9/site-packages/pyvkfft/fft.py", line 205, in fftn
app.fft(src, dest)
File "/opt/leolabs/radar/20230425T163443/leo-radar/venv3/lib/python3.9/site-packages/pyvkfft/cuda.py", line 208, in fft
check_vkfft_result(res, src.shape, src.dtype, self.ndim, self.inplace, self.norm, self.r2c,
File "/opt/leolabs/radar/20230425T163443/leo-radar/venv3/lib/python3.9/site-packages/pyvkfft/base.py", line 425, in check_vkfft_result
raise RuntimeError("VkFFT error %d: %s %s" % (res, r.name, s))
RuntimeError: VkFFT error 4039: VKFFT_ERROR_FAILED_TO_LAUNCH_KERNEL C2C (10022,525) complex64 1D inplace norm=1 [cuda]
cuLaunchKernel error: 400, 1 10022 1 - 38 1 1

From what I can this an access issue where the code is trying to access data on the wrong GPU. Is this an issue in how pyvkfft is handling context in a multigpu environment, or am I not setting something up correctly for pyvkfft? From my debugging, it looks like all my other Cupy code is respecting the device/stream context. Please let me know if any other information I can provide.

@vincefn
Copy link
Owner

vincefn commented Apr 28, 2023

Dear @kkotyk , could you supply a complete self-contained script which allows to reproduce the issue ?

From what I see you are using the simple fftn interface, which caches the fft plans. I would not be surprised if the caching mechanism did not manage to switch threads. If you instantiate directly the cuda.VkFFTApp in each thread I suspect that it would work.

Alternatively you can try adding the cuda_stream optional parameter to the fftn function - IIRC the streams should be different for the two devices, so the fft plan caching should produce separate plans.

(I normally only use multiprocessing -and creating the contexts inside the process- to avoid this. Though what you report does look like a bug if other cupy functions manage correctly)

@kkotyk
Copy link
Author

kkotyk commented Apr 29, 2023

Here is a minimal example that replicates the issue on 2 different test machines of mine:

import pyvkfft.fft as fft
import cupy as cp
import numpy as np
import threading
import queue
import time

q = queue.Queue()

def thread_fn(device_num):
    cp.cuda.Device(device_num).use()

    while True:
        data = q.get()
        gpu_data = cp.array(data)
        fft_gpu = fft.fftn(gpu_data, dest=gpu_data)
        print(f"processed data on {device_num}")


def main():

    num_devices = cp.cuda.runtime.getDeviceCount()

    threads = []
    for i in range(num_devices):
        thread = threading.Thread(target=thread_fn, args=(i,))
        thread.daemon = True
        thread.start()

    while True:
        for i in range(num_devices):
            data = np.ones(2**13)
            q.put(data)

        time.sleep(.5)

if __name__ == '__main__':
    main()

I suspect your intuition about App caching is likely the problem.

@vincefn
Copy link
Owner

vincefn commented May 1, 2023

I can confirm that taking into account the device when caching solves the issue.

Now I just need to finalise the unit tests - it's a bit messy to manipulate GPU contexts through different backends so I'll probably need to encapsulate all in separate process...

vincefn added a commit that referenced this issue May 1, 2023
…the VkFFTApp using the pyvkfft.fft interface (#26).

Actually use the cuda_stream parameter in the pyvkfft.fft interface.
@kkotyk
Copy link
Author

kkotyk commented May 1, 2023

Thanks for looking at this so quickly. I'm not sure if it helps, but as a user I wouldn't mind if you introduced a prepare_threaded_environment(...) or similar method that could help with some of that messiness so that you don't have to auto detect or make assumptions.

@vincefn
Copy link
Owner

vincefn commented May 2, 2023

I'm not sure if it helps, but as a user I wouldn't mind if you introduced a prepare_threaded_environment(...) or similar method that could help with some of that messiness so that you don't have to auto detect or make assumptions.

In the case of cupy, this should be easily taken care of using the Device context manager

Beyond that I don't think I can provide any more than examples. Multi-GPU computing can easily be quite complicated.

@kkotyk
Copy link
Author

kkotyk commented May 2, 2023

Multi-GPU computing can easily be quite complicated.

Truth

@kkotyk
Copy link
Author

kkotyk commented May 10, 2023

Hey Vince, I wanted to checkout and test your fixes in this branch but I'm getting the following issue when I try to install with pip install .

Failed to build pyvkfft

Installing collected packages: pyvkfft

  Running setup.py install for pyvkfft ... error

  error: subprocess-exited-with-error

  

  × Running setup.py install for pyvkfft did not run successfully.

  │ exit code: 1

  ╰─> [69 lines of output]

      VKFFT_GIT_TAG in os.environ ? no

      ['pyvkfft-test = pyvkfft.scripts.pyvkfft_test:main', 'pyvkfft-test-suite = pyvkfft.scripts.pyvkfft_test_suite:main', 'pyvkfft-benchmark = pyvkfft.scripts.pyvkfft_benchmark:main']

      running install

      /laptop/dspenv/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.

        warnings.warn(

      running build

      running build_py

      creating build/lib.linux-x86_64-cpython-39

      creating build/lib.linux-x86_64-cpython-39/pyvkfft

      copying pyvkfft/config.py -> build/lib.linux-x86_64-cpython-39/pyvkfft

      copying pyvkfft/version.py -> build/lib.linux-x86_64-cpython-39/pyvkfft

      copying pyvkfft/benchmark.py -> build/lib.linux-x86_64-cpython-39/pyvkfft

      copying pyvkfft/__init__.py -> build/lib.linux-x86_64-cpython-39/pyvkfft

      copying pyvkfft/cuda.py -> build/lib.linux-x86_64-cpython-39/pyvkfft

      copying pyvkfft/opencl.py -> build/lib.linux-x86_64-cpython-39/pyvkfft

      copying pyvkfft/fft.py -> build/lib.linux-x86_64-cpython-39/pyvkfft

      copying pyvkfft/accuracy.py -> build/lib.linux-x86_64-cpython-39/pyvkfft

      copying pyvkfft/base.py -> build/lib.linux-x86_64-cpython-39/pyvkfft

      creating build/lib.linux-x86_64-cpython-39/pyvkfft/test

      copying pyvkfft/test/__init__.py -> build/lib.linux-x86_64-cpython-39/pyvkfft/test

      copying pyvkfft/test/test_fft.py -> build/lib.linux-x86_64-cpython-39/pyvkfft/test

      creating build/lib.linux-x86_64-cpython-39/pyvkfft/scripts

      copying pyvkfft/scripts/pyvkfft_test_suite.py -> build/lib.linux-x86_64-cpython-39/pyvkfft/scripts

      copying pyvkfft/scripts/__init__.py -> build/lib.linux-x86_64-cpython-39/pyvkfft/scripts

      copying pyvkfft/scripts/pyvkfft_test.py -> build/lib.linux-x86_64-cpython-39/pyvkfft/scripts

      copying pyvkfft/scripts/pyvkfft_benchmark.py -> build/lib.linux-x86_64-cpython-39/pyvkfft/scripts

      running egg_info

      writing pyvkfft.egg-info/PKG-INFO

      writing dependency_links to pyvkfft.egg-info/dependency_links.txt

      writing entry points to pyvkfft.egg-info/entry_points.txt

      writing requirements to pyvkfft.egg-info/requires.txt

      writing top-level names to pyvkfft.egg-info/top_level.txt

      reading manifest file 'pyvkfft.egg-info/SOURCES.txt'

      reading manifest template 'MANIFEST.in'

      warning: no files found matching 'LICENSE_VkFFT'

      warning: no files found matching 'README_VkFFT.md'

      adding license file 'LICENSE'

      writing manifest file 'pyvkfft.egg-info/SOURCES.txt'

      running build_ext

      building 'pyvkfft._vkfft_cuda' extension

      creating build/temp.linux-x86_64-cpython-39

      creating build/temp.linux-x86_64-cpython-39/src

      /usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -Isrc/vkFFT -Isrc -c src/vkfft_cuda.cu -o build/temp.linux-x86_64-cpython-39/src/vkfft_cuda.o -O3 --ptxas-options=-v -std=c++11 --compiler-options=-fPIC

      src/vkFFT.h(3105): warning #550-D: variable "maxSequenceSharedMemoryPow2" was set but never used

      

      src/vkFFT.h(13969): warning #68-D: integer conversion resulted in a change of sign

      

      src/vkFFT.h(15317): warning #68-D: integer conversion resulted in a change of sign

      

      src/vkfft_cuda.cu(97): error: class "VkFFTConfiguration" has no member "omitDimension"

      

      src/vkfft_cuda.cu(98): error: class "VkFFTConfiguration" has no member "omitDimension"

      

      src/vkfft_cuda.cu(99): error: class "VkFFTConfiguration" has no member "omitDimension"

      

      src/vkfft_cuda.cu(103): error: class "VkFFTConfiguration" has no member "performDCT"

      

      src/vkfft_cuda.cu(115): error: class "VkFFTConfiguration" has no member "keepShaderCode"

      

      src/vkfft_cuda.cu(127): error: class "VkFFTConfiguration" has no member "performBandwidthBoost"

      

      src/vkfft_cuda.cu(138): error: class "VkFFTConfiguration" has no member "groupedBatch"

      

      src/vkfft_cuda.cu(139): error: class "VkFFTConfiguration" has no member "groupedBatch"

      

      src/vkfft_cuda.cu(140): error: class "VkFFTConfiguration" has no member "groupedBatch"

      

      9 errors detected in the compilation of "src/vkfft_cuda.cu".

      error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1

      [end of output]

  

  note: This error originates from a subprocess, and is likely not a problem with pip.

error: legacy-install-failure



× Encountered error while trying to install package.

╰─> pyvkfft

I am able to install a fresh release version with pip install pyvkfft. Am I missing something with my environment?

@vincefn
Copy link
Owner

vincefn commented May 11, 2023

The current git development has some changes to prepare for a reorganisation of the VkFFT headers (see #25) so I'm assuming this is the issue.

What version of VkFFT headers are you using ? What is in pyvkfft/src/ ?
Right now I've switched to the develop branch of VkFFT and in pyvkfft/src there is a symbolic link from pyvkfft/src/vkFFT to vkfft/vkFFT. But it should also work if you still have the old vkfft single-file header.

I think I may use a git submodule to make this simpler.

vincefn added a commit that referenced this issue May 11, 2023
…27)

* Take into account the current cuda device when automatically caching the VkFFTApp using the pyvkfft.fft interface (#26).
Actually use the cuda_stream parameter in the pyvkfft.fft interface.

* Add multi-gpu, multi-threaded tests. Use pycuda.driver.Context.get_current() as key for caching.
Prevent F-ordered inplace R2C tests with cupy.
@vincefn
Copy link
Owner

vincefn commented May 11, 2023

Hi @kkotyk, I have just merged a change so that VkFFT will be automatically used as a git submodule, which should be easier to install (I suggest re-checking out pyvkfft, otherwise youmay have to manually init the VkFFT submodule).

@kkotyk
Copy link
Author

kkotyk commented May 11, 2023

That submodule fix works great. I was easily able to install after that. I ran into another issue trying to run the minimal example I linked before:

Exception in thread Thread-1:

Traceback (most recent call last):

  File "/usr/lib64/python3.9/threading.py", line 980, in _bootstrap_inner

    self.run()

  File "/usr/lib64/python3.9/threading.py", line 917, in run

    self._target(*self._args, **self._kwargs)

  File "/laptop/git/leo-radar/radar/sparta/processing/extras/test_multi_gpu.py", line 16, in thread_fn

Exception in thread Thread-2:

Traceback (most recent call last):

  File "/usr/lib64/python3.9/threading.py", line 980, in _bootstrap_inner

    self.run()

  File "/usr/lib64/python3.9/threading.py", line 917, in run

    self._target(*self._args, **self._kwargs)

  File "/laptop/git/leo-radar/radar/sparta/processing/extras/test_multi_gpu.py", line 16, in thread_fn

    fft_gpu = fft.fftn(gpu_data, gpu_data)

  File "/laptop/dspenv/lib64/python3.9/site-packages/pyvkfft/fft.py", line 214, in fftn

    fft_gpu = fft.fftn(gpu_data, gpu_data)

  File "/laptop/dspenv/lib64/python3.9/site-packages/pyvkfft/fft.py", line 214, in fftn

    app = _get_fft_app(backend, src.shape, src.dtype, inplace, ndim, axes, norm, cuda_stream, cl_queue, devctx,

TypeError: unhashable type: 'Stream'

    app = _get_fft_app(backend, src.shape, src.dtype, inplace, ndim, axes, norm, cuda_stream, cl_queue, devctx,

TypeError: unhashable type: 'Stream'

My guess is this is typing issue thrown by the lru_cache lib you are using here.

@vincefn
Copy link
Owner

vincefn commented May 12, 2023

The minimal example you gave runs fine as far as I can see - I just tested on linux with cupy_cuda11x-12.0.0 and python 3.9.

What system are you using ?

@vincefn
Copy link
Owner

vincefn commented May 12, 2023

@kkotyk I have changed the way the cuda stream is used as arguments, so the lru_cache should hopefully work for you. I'm still curious as to why it failed for you and not for me.

@kkotyk
Copy link
Author

kkotyk commented May 12, 2023

Hey Vince, your new changes work for me! I'm not sure what the issue was but this works now!

edit:
I'm using cupy-cuda116==10.5.0 and python 3.9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants