Skip to content

DiRAC/UCL OpenMP additional examples #107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

qiUip
Copy link

@qiUip qiUip commented Jul 7, 2025

This PR includes all contributions (working or not) of additional and/or modified examples created for DiRAC AMD GPU training by UCL.

Most of the contributions here are Fortran versions of existing C/CXX examples. Several directories / examples are missing accompanying README.md (or not sufficiently updated them), or might not be complete with all examples in the corresponding C/CXX directory as they were not part of the DiRAC training materials.

Some iterations with the mainterns might be required to ensure the additions are complete and aligned with all the existing content.

@gcapodagAMD
Copy link
Collaborator

thanks @qiUip we'll start looking into this right away

@gcapodagAMD gcapodagAMD self-assigned this Jul 9, 2025
Copy link
Collaborator

@gcapodagAMD gcapodagAMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @qiUip the amount of work in this PR is really impressive. We appreciate you finding the time to do this, and are grateful for your contributions.
Most (if not all) comments I made are minor so I am pretty sure we can get this merged in quickly. Thanks!

subroutine zeros(a, n)
implicit none
integer, intent(in) :: n
real(4), intent(out) :: a(n)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the intent of a should be made inout


ROCM_GPU ?= $(strip $(shell rocminfo |grep -m 1 -E gfx[^0]{1} | sed -e 's/ *Name: *//'))

OPENMP_FLAGS = -fopenmp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPENMP_FLAGS should be defined as: -fopenmp --offload-arch=${ROCM_GPU} otherwise a linking error shows.
in that case then OPENMP_OFFLOAD_FLAGS can be removed.
we could also remove ${FREE_FORM_FLAG}


Building and running the example

```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's mention here that we need export HSA_XNACK=1

@@ -159,3 +159,30 @@ loop rather than worrying about where our array data is located.
You can experiment with these examples on both a MI300A APU and a discrete GPU such as MI300X or MI200 series GPU. You
should see a performance difference since the MI300A only has to map the pointer and not move the whole array.

We have one less example to look at. Many scientific codes have multi-dimensional data that need to be operated on.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean one more example? :)

{

public:
double init_value;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is init_value used anywhere?

{

public:
double init_value;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used anywhere?

int N=10000;
double *x = new double[N];

#pragma omp target teams loop map(from:x[0:N])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the purpose of this example to show that this approach can work also without unified shared memory provided that a map is supplied?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for all the device_routines examples we added the option of either using or not using unified shared memory.

@@ -0,0 +1,21 @@
# Porting excercise reduction of multiple scalars in one kernel

README.md from `HPCTrainingExamples/Pragma_Examples/OpenMP/Fortran/9_reduction_array` from the Training Examples repository.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to change the name of the top dir to just 9_reduction_array

EXEC = reduction_array
default: ${EXEC}
all: ${EXEC}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dir has a typo in the name: soluiton

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we introduced it in the first place, but let's fix it :D

add_executable(mem2 mem2.F90)
add_executable(mem3 mem3.F90)
add_executable(mem4 mem4.F90)
add_executable(mem5 mem5.F90)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please remind me if there was a reason other than time constraints for not including mem6 and mem9?
thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No other reason than they were not in the DiRAC materials. We can add them at a later date.

@qiUip
Copy link
Author

qiUip commented Jul 10, 2025

Thanks @gcapodagAMD

I am going on leave for a few weeks but I will address all these (minor) comments when I return.

@gcapodagAMD
Copy link
Collaborator

sounds lovely, thanks @qiUip , catch you when you'll get back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants