Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce code redundancy between CPU and CUDA implementation #36

Closed
acavelan opened this issue Jun 4, 2020 · 0 comments
Closed

Reduce code redundancy between CPU and CUDA implementation #36

acavelan opened this issue Jun 4, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@acavelan
Copy link
Collaborator

acavelan commented Jun 4, 2020

Currently kernels are implemented twice, meaning that if we modify, e.g., momentumAndEnergyIAD.hpp, then we also need to modify cuda/cudaMomentumAndEnergyIAD.cu.

However the code does the same thing for every particle.

For every computeXXX function in sph-exa, we should have a:

namespace kernel{
    inline void kernel::computeXXX(int pi, int *clist, ...)
}

function that takes the particle index as a parameter and only does the computation for that one particle. This function should only accept simple variables and raw pointers (by copy), and no references.

Basically, this function should usable both by OpenMP, OpenACC, and CUDA.

The workflow is something like this:

computeDensity(taskList)
-> calls computeDensity(task)
-> calls inline computeDensity(particleArray)
-> calls inline kernel::computeDensity(int pi, int *clist, ...)

computeDensity(task) will handle data movement for OpenMP / OpenACC offloading / CUDA
computeDensity(particleArray) will handle omp / acc directives / CUDA kernel launch

kernel::computeDensity(int pi, int *clist) is identical for all models. Data movement and CUDA kernel launch are handled separately in computeDensity(task) and computeDensity(particleArray).

The easiest way to do this is probably by starting from the existing CUDA code, which is the most constrained.

The challenge is to compile the CUDA parts independently with nvcc. I am thinking of using a simple #include to import the kernel::computeXXX function. Code structure should look like:

include/sph/
    density.hpp: contains computeDensity(taskList) as well as CPU implementations of computeDensity(task) and computeDensity(particleArray)
    cuda/
        density.cu: contains CUDA implementations of computeDensity(task) and computeDensity(particleArray)
    kernel/
        density.hpp: contains kernel::computeDensity(int pi, int *clist, ...)

kernel/density.hpp is included both in sph/density.hpp and sph/cuda/density.cu.

Of course, we want the same pattern for all computeXXX functions, not just density.

@acavelan acavelan added enhancement New feature or request good first issue Good for newcomers and removed good first issue Good for newcomers labels Jun 4, 2020
@cypox cypox mentioned this issue Jan 25, 2021
@sekelle sekelle closed this as completed Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants