Reduce code redundancy between CPU and CUDA implementation #36

acavelan · 2020-06-04T08:58:48Z

Currently kernels are implemented twice, meaning that if we modify, e.g., momentumAndEnergyIAD.hpp, then we also need to modify cuda/cudaMomentumAndEnergyIAD.cu.

However the code does the same thing for every particle.

For every computeXXX function in sph-exa, we should have a:

namespace kernel{
    inline void kernel::computeXXX(int pi, int *clist, ...)
}

function that takes the particle index as a parameter and only does the computation for that one particle. This function should only accept simple variables and raw pointers (by copy), and no references.

Basically, this function should usable both by OpenMP, OpenACC, and CUDA.

The workflow is something like this:

computeDensity(taskList)
-> calls computeDensity(task)
-> calls inline computeDensity(particleArray)
-> calls inline kernel::computeDensity(int pi, int *clist, ...)

computeDensity(task) will handle data movement for OpenMP / OpenACC offloading / CUDA
computeDensity(particleArray) will handle omp / acc directives / CUDA kernel launch

kernel::computeDensity(int pi, int *clist) is identical for all models. Data movement and CUDA kernel launch are handled separately in computeDensity(task) and computeDensity(particleArray).

The easiest way to do this is probably by starting from the existing CUDA code, which is the most constrained.

The challenge is to compile the CUDA parts independently with nvcc. I am thinking of using a simple #include to import the kernel::computeXXX function. Code structure should look like:

include/sph/
    density.hpp: contains computeDensity(taskList) as well as CPU implementations of computeDensity(task) and computeDensity(particleArray)
    cuda/
        density.cu: contains CUDA implementations of computeDensity(task) and computeDensity(particleArray)
    kernel/
        density.hpp: contains kernel::computeDensity(int pi, int *clist, ...)

kernel/density.hpp is included both in sph/density.hpp and sph/cuda/density.cu.

Of course, we want the same pattern for all computeXXX functions, not just density.

The text was updated successfully, but these errors were encountered:

This was referenced Jun 4, 2020

Gz newtonraphson #37

Merged

GPU Offloading: Overlap communications and computations #38

Closed

acavelan added enhancement New feature or request good first issue Good for newcomers and removed good first issue Good for newcomers labels Jun 4, 2020

cypox mentioned this issue Jan 25, 2021

Redundancy #66

Merged

sekelle closed this as completed Feb 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce code redundancy between CPU and CUDA implementation #36

Reduce code redundancy between CPU and CUDA implementation #36

acavelan commented Jun 4, 2020 •

edited

Loading

Reduce code redundancy between CPU and CUDA implementation #36

Reduce code redundancy between CPU and CUDA implementation #36

Comments

acavelan commented Jun 4, 2020 • edited Loading

acavelan commented Jun 4, 2020 •

edited

Loading