Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the memory usage #37

Merged
merged 4 commits into from
Feb 16, 2024
Merged

Reduce the memory usage #37

merged 4 commits into from
Feb 16, 2024

Conversation

semi-h
Copy link
Member

@semi-h semi-h commented Feb 14, 2024

I can't believe I missed this!

In transeq at the very end we do
du_x = du_x + du_y + du_z

However due to the restictions on running GPU kernels with specific thread and block dimensions we carry out this operation in 2 separate calls as
du_x = du_x + du_y
du_x = du_x + du_z

And this gives an oppurtunity to move the y2x sum up just below transeq_y call. Then we release du_y, dv_y, dw_y right after adding these into _x counterparts.

This basic fix allowed reducing the memory usage from 18 scalar fields down to 15, without affecting the performance at all for the CUDA backend. (15GiB for a $512^3$ simulation). The total figure excludes Poisson solvers memory requirement which is not yet in the codebase.

Any further reductions in memory usage after this point would result in an increase in the runtime of the simulation. For example we might be able to reduce it down to 12, which shouldn't be that hard, but it would require some extra reordering operations and its better not to work on that at this stage I think.

Assuming that FFT based Poisson solver will requre ~4 scalar field equivalent memory, we should be able to fit a $1024^3$ simulation on a typical 4xA100 node!

Now we have separate sum_yintox, sum_zintox, and vecadd subroutines in backends, all similar at some level. All these can be combined into a single subroutine like we did with reorder subroutines, and that's what I'll do next. I'll create an issue to discuss this further, and not include this next step in the current PR. I'm happy to merge this one as soon as someone approves.

@semi-h semi-h merged commit 26ccadd into xcompact3d:main Feb 16, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants