The alpaka library is a header-only C++20 abstraction library for accelerator development.
This is a prototype implementation to evaluate different concepts for the host side API, Kernel language, ... The code is NOT production ready! Currently, I do not follow coding standards and provide updates via pull requests. It is possible that the development branch history is updated via force pushed.
alpaka is licensed under MPL-2.0.
The recipies shown here assume you have installed spack packages for specific compiler versions and that alpaka is relative to the build folder available.
alpaka_DEP_*
controls whether a parallelization framework is used and introduces a dependency on third-party libraries.alpaka_EXEC_*
activates or deactivates which execution schemas will be used for examples.- Execution schemas can be set to OFF in CMake, but you can still use them within your application code.
- Similarly, an execution schema can be set to ON, but it may not be usable in the application if the API where the executor can be used is deactivated.
spack load gcc@14.1.0
spack load cmake@3.29.1
# -Dalpaka_DEP_OMP=ON is implicitly set, if the compiler not support OpenMP only serial code will be generated
cmake ../alpaka -Dalpaka_TESTING=ON -Dalpaka_BENCHMARKS=ON -Dalpaka_EXAMPLES=ON -DBUILD_TESTING=ON
make -j
ctest --output-on-failure
spack load cmake@3.29.1
spack load cuda@12.4.0
# use -DCMAKE_CUDA_ARCHITECTURES=80 to set the GPU architecture
cmake ../alpaka -Dalpaka_TESTING=ON -Dalpaka_BENCHMARKS=ON -Dalpaka_EXAMPLES=ON -Dalpaka_DEP_OMP=OFF -Dalpaka_DEP_CUDA=ON -Dalpaka_EXEC_CpuSerial=OFF
make -j
ctest --output-on-failure
spack load cmake@3.29.1
spack load hip@6.3.4
export CXX=clang++
# use -DCMAKE_HIP_ARCHITECTURES=gfx906 to set the GPU architecture
# for older CMake version sometimes the architecture must be set with -DAMDGPU_TARGETS=gfx906
cmake ../alpaka -Dalpaka_TESTING=ON -Dalpaka_BENCHMARKS=ON -Dalpaka_EXAMPLES=ON -Dalpaka_DEP_OMP=OFF -Dalpaka_DEP_HIP=ON -Dalpaka_EXEC_CpuSerial=OFF
make -j
ctest --output-on-failure
If you like to run benchmarks you should set at least the following CMake variables.
-DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-ftree-vectorize -march=native"
You should best deselect the CPU executor CpuOmpBlocksAndThreads
with -Dalpaka_EXEC_CpuOmpBlocksAndThreads=OFF
.
This executor is using nested parallelism and is very slow.
You can benchmark bableStream for different number of elements e.g. with a simple loop
for((i=1;i<10;++i)) ; do ./benchmark/babelstream/babelstream --array-size=$((33554432 * $i)) --number-runs=100; done
All methods and classes in the alpaka
namespace can be called from the controller thread (named host
) and from the compute device.
alpaka::onHost
can only be called fromhost
.alpaka::onAcc
can only be called from within a kernel running on the compute device.
Methods starting with onHost::make
(e.g., onHost::makeDevice()
) create handles to instances where the copy is only a shallow copy and not a deep copy.
Methods starting with get
(e.g., onHost::getDeviceProperties(...)
) provide access to properties of an instance.
There are two types of interfaces: a free function interface and an OOP interface for many host
objects (e.g., Platform
, Device
, and Queue
).
If you use the free function interface, auto platform = onHost::makePlatform(api::cpu)
will return an instance that follows the concepts::Platform
concept, but it can only be used in free functions.
If you use the OOP interface, where you can access members like platform.getDevice(...)
, you transform the instance into a fixed-typed object with onHost::Platform platform = onHost::makePlatform(api::cpu)
.
Most free functions that can be called from host
can be found under onHost.hpp.
Functions callable from within a compute kernel can be found under onAcc.hpp.
A central class for M-dimensional extents, offsets, and indices is Vec.
There are two types of index vectors: Vec
, which supports constexpr
usage, but when moved around, it stores the information in a runtime instance, and CVec
, which is a compile-time index vector that stores the indices in the template signature.
Passing an instance of CVec
into a function or kernel will retain the full compile-time knowledge.
Performing calculations like addition, subtraction, etc., with a CVec
will result in losing the full compile-time knowledge, and the results will be of type Vec
.
alpaka
is designed so that explicit usage of types is reduced to a minimum.
Most objects should be created with factories (e.g., onHost::makePlatform(api::cpu)
) and using tags (empty C++ structs), such as api::cpu
, instead of the tag type.
alpaka
provides APIs that can be used to generate platforms and query devices.
The following APIs are available:
api::cpu
api::cuda
api::hip
APIs except api::cpu
often introduce third-party library dependencies (e.g., CUDA or ROCm).
You can de/activate these in CMake via alpaka_DEP_*
.
Executors describe how compute threads will be executed and mapped to the hierarchy of grids, blocks, and threads.
They can be controlled in CMake via alpaka_EXEC_*
.
Disabling an executor in CMake only changes which executors will be used for examples, tests, and benchmarks.
For example, if you disable alpaka_EXEC_CpuSerial
in CMake, you can still enqueue kernels that use the serial executor.
queue.enqueue(exec::cpuSerial, Vec{3}, Vec{1}, kernel, 42);
An executor is not usable with all device queues. You can check this with onHost::isExecutorSupportedBy(exec::cpuSerial, device)
.
A good starting point for learning how to use alpaka is the tutorial example.