-
Notifications
You must be signed in to change notification settings - Fork 50
DiRAC/UCL OpenMP additional examples #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…sh script missing
…ons / HIP-OpenMP Fortran daxpy
thanks @qiUip we'll start looking into this right away |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @qiUip the amount of work in this PR is really impressive. We appreciate you finding the time to do this, and are grateful for your contributions.
Most (if not all) comments I made are minor so I am pretty sure we can get this merged in quickly. Thanks!
subroutine zeros(a, n) | ||
implicit none | ||
integer, intent(in) :: n | ||
real(4), intent(out) :: a(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the intent of a
should be made inout
|
||
ROCM_GPU ?= $(strip $(shell rocminfo |grep -m 1 -E gfx[^0]{1} | sed -e 's/ *Name: *//')) | ||
|
||
OPENMP_FLAGS = -fopenmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OPENMP_FLAGS should be defined as: -fopenmp --offload-arch=${ROCM_GPU} otherwise a linking error shows.
in that case then OPENMP_OFFLOAD_FLAGS can be removed.
we could also remove ${FREE_FORM_FLAG}
|
||
Building and running the example | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's mention here that we need export HSA_XNACK=1
@@ -159,3 +159,30 @@ loop rather than worrying about where our array data is located. | |||
You can experiment with these examples on both a MI300A APU and a discrete GPU such as MI300X or MI200 series GPU. You | |||
should see a performance difference since the MI300A only has to map the pointer and not move the whole array. | |||
|
|||
We have one less example to look at. Many scientific codes have multi-dimensional data that need to be operated on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean one more example
? :)
{ | ||
|
||
public: | ||
double init_value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is init_value
used anywhere?
{ | ||
|
||
public: | ||
double init_value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used anywhere?
int N=10000; | ||
double *x = new double[N]; | ||
|
||
#pragma omp target teams loop map(from:x[0:N]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the purpose of this example to show that this approach can work also without unified shared memory provided that a map is supplied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for all the device_routines examples we added the option of either using or not using unified shared memory.
@@ -0,0 +1,21 @@ | |||
# Porting excercise reduction of multiple scalars in one kernel | |||
|
|||
README.md from `HPCTrainingExamples/Pragma_Examples/OpenMP/Fortran/9_reduction_array` from the Training Examples repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel free to change the name of the top dir to just 9_reduction_array
EXEC = reduction_array | ||
default: ${EXEC} | ||
all: ${EXEC} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the dir has a typo in the name: soluiton
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if we introduced it in the first place, but let's fix it :D
add_executable(mem2 mem2.F90) | ||
add_executable(mem3 mem3.F90) | ||
add_executable(mem4 mem4.F90) | ||
add_executable(mem5 mem5.F90) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please remind me if there was a reason other than time constraints for not including mem6 and mem9?
thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No other reason than they were not in the DiRAC materials. We can add them at a later date.
Thanks @gcapodagAMD I am going on leave for a few weeks but I will address all these (minor) comments when I return. |
sounds lovely, thanks @qiUip , catch you when you'll get back |
This PR includes all contributions (working or not) of additional and/or modified examples created for DiRAC AMD GPU training by UCL.
Most of the contributions here are Fortran versions of existing C/CXX examples. Several directories / examples are missing accompanying README.md (or not sufficiently updated them), or might not be complete with all examples in the corresponding C/CXX directory as they were not part of the DiRAC training materials.
Some iterations with the mainterns might be required to ensure the additions are complete and aligned with all the existing content.