Reduce the memory usage #37

semi-h · 2024-02-14T15:32:42Z

I can't believe I missed this!

In transeq at the very end we do
du_x = du_x + du_y + du_z

However due to the restictions on running GPU kernels with specific thread and block dimensions we carry out this operation in 2 separate calls as
du_x = du_x + du_y
du_x = du_x + du_z

And this gives an oppurtunity to move the y2x sum up just below transeq_y call. Then we release du_y, dv_y, dw_y right after adding these into _x counterparts.

This basic fix allowed reducing the memory usage from 18 scalar fields down to 15, without affecting the performance at all for the CUDA backend. (15GiB for a $512^3$ simulation). The total figure excludes Poisson solvers memory requirement which is not yet in the codebase.

Any further reductions in memory usage after this point would result in an increase in the runtime of the simulation. For example we might be able to reduce it down to 12, which shouldn't be that hard, but it would require some extra reordering operations and its better not to work on that at this stage I think.

Assuming that FFT based Poisson solver will requre ~4 scalar field equivalent memory, we should be able to fit a $1024^3$ simulation on a typical 4xA100 node!

Now we have separate sum_yintox, sum_zintox, and vecadd subroutines in backends, all similar at some level. All these can be combined into a single subroutine like we did with reorder subroutines, and that's what I'll do next. I'll create an issue to discuss this further, and not include this next step in the current PR. I'm happy to merge this one as soon as someone approves.

src/backend.f90

semi-h added 4 commits February 14, 2024 13:39

Implement sum_yintox and sum_zintox subroutines.

6ce9795

Reduce memory usage of the transeq operation.

c1d23dc

Remove redundant sum_yzintox subroutine.

25195df

Reduce memory usage in solver caused by du, dv, dw fields.

a51b274

semi-h mentioned this pull request Feb 14, 2024

Add a new interface in backends for summing two fields #38

Open

semi-h requested review from JamieJQuinn, Nanoseb and pbartholomew08 February 15, 2024 09:38

JamieJQuinn reviewed Feb 15, 2024

View reviewed changes

src/backend.f90 Show resolved Hide resolved

JamieJQuinn approved these changes Feb 16, 2024

View reviewed changes

semi-h merged commit 26ccadd into xcompact3d:main Feb 16, 2024
2 checks passed

This was referenced Feb 19, 2024

Reuse result field in transeq_component as a temporary field #40

Open

Investigate removing reordered variables to reduce memory usage #41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the memory usage #37

Reduce the memory usage #37

semi-h commented Feb 14, 2024

Reduce the memory usage #37

Reduce the memory usage #37

Conversation

semi-h commented Feb 14, 2024