-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for scalar product operator #67
Conversation
Could we just use the |
We'll probably have padding in our data structure when the grid size not a multiple of SZ and then if we use a library to calculate the dot product it'll be hard to eliminate the padded entries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks solid, just needs some documentation.
@@ -304,6 +305,16 @@ subroutine vecadd_omp(self, a, x, b, y) | |||
|
|||
end subroutine vecadd_omp | |||
|
|||
real(dp) function scalar_product_omp(self, x, y) result(s) | |||
implicit none | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
! TODO incomplete implementation |
|
||
s = sum_d | ||
|
||
call MPI_Allreduce(MPI_IN_PLACE, s, 1, MPI_DOUBLE_PRECISION, MPI_SUM, & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're thinking about mixed/reduced precision, should the precision passed to MPI derive from the actual data type? Not sure what the typical way to do this is in MPI codes. @pbartholomew08, @Nanoseb?
@@ -329,31 +340,27 @@ subroutine copy_into_buffers(u_send_s, u_send_e, u, n, n_blocks) | |||
|
|||
end subroutine copy_into_buffers | |||
|
|||
subroutine set_fields_omp(self, u, v, w, u_in, v_in, w_in) | |||
subroutine set_field_omp(self, f, arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear argument names. Perhaps field
and data
?
@@ -199,6 +199,27 @@ attributes(global) subroutine axpby(n, alpha, x, beta, y) | |||
|
|||
end subroutine axpby | |||
|
|||
attributes(global) subroutine scalar_product(s, x, y, n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caller needs to understand this is pencil sum to set correct n_threads/blocks. This should be documented here.
We need a scalar product operator for calculating the enstrophy in the domain. The CUDA kernel can be implemented in a more performant way, there are lots of examples out there, but the current implementation is not bad. We only need to run this once in maybe 100/1000 iterations for post-processing so not worried too much.