Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for scalar product operator #67

Merged
merged 6 commits into from
Mar 5, 2024
Merged

Conversation

semi-h
Copy link
Member

@semi-h semi-h commented Mar 1, 2024

We need a scalar product operator for calculating the enstrophy in the domain. The CUDA kernel can be implemented in a more performant way, there are lots of examples out there, but the current implementation is not bad. We only need to run this once in maybe 100/1000 iterations for post-processing so not worried too much.

@pbartholomew08
Copy link
Member

Could we just use the dot_product intrinsic? NVIDIA claim to accelerate all these

@semi-h
Copy link
Member Author

semi-h commented Mar 1, 2024

We'll probably have padding in our data structure when the grid size not a multiple of SZ and then if we use a library to calculate the dot product it'll be hard to eliminate the padded entries.

Copy link
Collaborator

@JamieJQuinn JamieJQuinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid, just needs some documentation.

@@ -304,6 +305,16 @@ subroutine vecadd_omp(self, a, x, b, y)

end subroutine vecadd_omp

real(dp) function scalar_product_omp(self, x, y) result(s)
implicit none

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
! TODO incomplete implementation


s = sum_d

call MPI_Allreduce(MPI_IN_PLACE, s, 1, MPI_DOUBLE_PRECISION, MPI_SUM, &
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're thinking about mixed/reduced precision, should the precision passed to MPI derive from the actual data type? Not sure what the typical way to do this is in MPI codes. @pbartholomew08, @Nanoseb?

@@ -329,31 +340,27 @@ subroutine copy_into_buffers(u_send_s, u_send_e, u, n, n_blocks)

end subroutine copy_into_buffers

subroutine set_fields_omp(self, u, v, w, u_in, v_in, w_in)
subroutine set_field_omp(self, f, arr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear argument names. Perhaps field and data?

@@ -199,6 +199,27 @@ attributes(global) subroutine axpby(n, alpha, x, beta, y)

end subroutine axpby

attributes(global) subroutine scalar_product(s, x, y, n)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caller needs to understand this is pencil sum to set correct n_threads/blocks. This should be documented here.

@semi-h semi-h merged commit 85246f3 into xcompact3d:main Mar 5, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants