Interoperability with CUDA

ArrayFire manages its own memory, runs within its own CUDA stream, and creates custom IDs for devices. As such, most of the interoperability functions focus on reducing potential synchronization conflicts between ArrayFire and CUDA.

Basics

Function	Purpose
Array::new_from_device_ptr	Construct an ArrayFire Array from device memory
Array::device_ptr	Obtain a pointer to the device memory (implies `lock()`)
Array::lock	Removes ArrayFire's control of a device memory pointer
Array::unlock	Restores ArrayFire's control over a device memory pointer
get_device	Gets the current ArrayFire device ID
set_device	Switches ArrayFire to the specified device
get_device_native_id	Fetches CUDA deviceID for a given ArrayFire device ID
set_device_native_id	Switches active device to the specified CUDA device ID
get_stream	Get the current CUDA stream used by ArrayFire

Using custom CUDA kernels in existing ArrayFire application

By default, ArrayFire manages its own memory and operates in its own CUDA stream. Thus there is a slight amount of bookkeeping that needs to be done in order to integrate your custom CUDA kernel.

Ideally, we recommend using ArrayFire's CUDA stream to launch your custom kernels. However, this is currently not possible due to limitation on RustaCUDA not being to able to wrap an existing cudaStream_t/CUstream_t objects. The current work around is to create a stream of your own and launch the kernel on it.

Notice that since ArrayFire and your kernels are not sharing the same CUDA stream, there is a need to perform explicit synchronization before launching kernel on your stream that depends on the computation carried out by ArrayFire earlier. This extra step is unnecessary once the above stated limiation of RustaCUDA's stream is eliminated.

This process is best illustrated with a fully worked example:

{{#include ../../cuda-interop/examples/custom_kernel.rs}}

Adding ArrayFire to existing CUDA Application

Finish any pending CUDA operations (e.g. cudaDeviceSynchronize() or similar stream functions)
Create ArrayFire arrays from existing CUDA pointers
Perform operations on ArrayFire arrays
Instruct ArrayFire to finish operations using eval and sync
Obtain pointers to important memory
Continue your CUDA application.
Free non-managed memory

To create the Array fom device pointer, you should use one of the following approaches:

Using DeviceBuffer from RustaCUDA, or a Wrapper Object for CUDA device memory

let mut buffer = memory::DeviceBuffer::from_slice(&v).unwrap();

let array_dptr = Array::new_from_device_ptr(
    buffer.as_device_ptr().as_raw_mut(), dim4!(10, 10));

array_dptr.lock(); // Needed to avoid free as arrayfire takes ownership

Using raw pointer returned from cuda_malloc interface exposed by RustaCUDA

let mut dptr: *mut f32 = std::ptr::null_mut();
unsafe {
    dptr = memory::cuda_malloc::<f32>(10*10).unwrap().as_raw_mut();
}

let array_dptr = Array::new_from_device_ptr(dptr, dim4!(10, 10));
// After ArrayFire takes over ownership of the pointer, you can use other
// arrayfire functions as usual.

ArrayFire's memory manager automatically assumes responsibility for any memory provided to it. Thus ArrayFire could free or reuse the memory at any later time. If this behavior is not desired, you may call Array::unlock and manage the memory yourself. However, if you do so, please be cautious not to free memory when ArrayFire might be using it!

The seven steps above are best illustrated using a fully-worked example:

{{#include ../../cuda-interop/examples/cuda_af_app.rs}}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!