{{#include interop_excerpts.md:1:4}}
ArrayFire manages its own memory, runs within its own CUDA stream, and creates custom IDs for devices. As such, most of the interoperability functions focus on reducing potential synchronization conflicts between ArrayFire and CUDA.
{{#include interop_excerpts.md:6:7}}
Function | Purpose |
---|---|
Array::new_from_device_ptr | Construct an ArrayFire Array from device memory |
Array::device_ptr | Obtain a pointer to the device memory (implies lock() ) |
Array::lock | Removes ArrayFire's control of a device memory pointer |
Array::unlock | Restores ArrayFire's control over a device memory pointer |
get_device | Gets the current ArrayFire device ID |
set_device | Switches ArrayFire to the specified device |
get_device_native_id | Fetches CUDA deviceID for a given ArrayFire device ID |
set_device_native_id | Switches active device to the specified CUDA device ID |
get_stream | Get the current CUDA stream used by ArrayFire |
By default, ArrayFire manages its own memory and operates in its own CUDA stream. Thus there is a slight amount of bookkeeping that needs to be done in order to integrate your custom CUDA kernel.
Ideally, we recommend using ArrayFire's CUDA stream to launch your custom kernels. However, this is currently not possible due to limitation on RustaCUDA not being to able to wrap an existing cudaStream_t/CUstream_t objects. The current work around is to create a stream of your own and launch the kernel on it.
Notice that since ArrayFire and your kernels are not sharing the same CUDA stream, there is a need to perform explicit synchronization before launching kernel on your stream that depends on the computation carried out by ArrayFire earlier. This extra step is unnecessary once the above stated limiation of RustaCUDA's stream is eliminated.
This process is best illustrated with a fully worked example:
{{#include ../../cuda-interop/examples/custom_kernel.rs}}
{{#include interop_excerpts.md:9:15}}
- Finish any pending CUDA operations (e.g. cudaDeviceSynchronize() or similar stream functions)
- Create ArrayFire arrays from existing CUDA pointers
- Perform operations on ArrayFire arrays
- Instruct ArrayFire to finish operations using eval and sync
- Obtain pointers to important memory
- Continue your CUDA application.
- Free non-managed memory
To create the Array fom device pointer, you should use one of the following approaches:
Using DeviceBuffer from RustaCUDA, or a Wrapper Object for CUDA device memory
let mut buffer = memory::DeviceBuffer::from_slice(&v).unwrap();
let array_dptr = Array::new_from_device_ptr(
buffer.as_device_ptr().as_raw_mut(), dim4!(10, 10));
array_dptr.lock(); // Needed to avoid free as arrayfire takes ownership
Using raw pointer returned from cuda_malloc interface exposed by RustaCUDA
let mut dptr: *mut f32 = std::ptr::null_mut();
unsafe {
dptr = memory::cuda_malloc::<f32>(10*10).unwrap().as_raw_mut();
}
let array_dptr = Array::new_from_device_ptr(dptr, dim4!(10, 10));
// After ArrayFire takes over ownership of the pointer, you can use other
// arrayfire functions as usual.
ArrayFire's memory manager automatically assumes responsibility for any memory provided to it. Thus ArrayFire could free or reuse the memory at any later time. If this behavior is not desired, you may call Array::unlock and manage the memory yourself. However, if you do so, please be cautious not to free memory when ArrayFire might be using it!
The seven steps above are best illustrated using a fully-worked example:
{{#include ../../cuda-interop/examples/cuda_af_app.rs}}