# Using OpenCL on Setonix

## Introduction 

Setonix is a world class supercomputer, delivering over 27 Petaflops of floating point performance using AMD EPYC CPUs and Instinct MI250x GPUs. As of November 2022 Setonix sits in place 15 on the [TOP 500](https://top500.org/system/180123/) list of the world's most powerful supercomputers and number 4 of the [Green 500](https://www.top500.org/lists/green500/2022/11/) at 57 GigaFLOPS/Watt.

## Official documentation

The [Pawsey Documentation Portal](https://support.pawsey.org.au/documentation/) should be your first point of call when looking for documentation. That source **must take priority** if there is any discrepancy between the official documentation and this material. On  this [page](https://support.pawsey.org.au/documentation/display/US/Setonix+GPU+Partition+Quick+Start) is some specific documentation for using GPU's on Setonix. 

## Access to Setonix

Firstly, you need a username and password to access Setonix. Your **username** and **password** will be given to you prior to the beginning of this workshop. If you are using your regular Pawsey account then you can reset your password [here](https://support.pawsey.org.au/password-reset/).

Access to Setonix is via Secure SHell (SSH). On Linux, Mac OS, and Windows 10 and higher an SSH client is available from the command line or terminal application. Otherwise you need to use a client program like [Putty](https://www.putty.org/) or [MobaXterm](https://mobaxterm.mobatek.net/download-home-edition.html).

### Access with SSH on the command line

On the command line use **ssh** to access Setonix.

```bash
ssh -Y <username>@setonix.pawsey.org.au
```

#### Passwordless login with SSH

In order to avoid specifying a username and password on each login you can generate a keypair on your computer, like this:

```bash
ssh-keygen -t rsa
```

Then copy the public key (the file that ends in \*.pub) to your account on Setonix and append it to the authorized_keys file in `${HOME}/.ssh`. On your machine run this command:

```bash
scp -r <filename>.pub <username>@setonix.pawsey.org.au
```

Then login to Setonix and run this command

```bash
mkdir -p ${HOME}/.ssh
cat <filename>.pub >> ${HOME}/.ssh/authorized_keys
chmod -R 0400 ${HOME}/.ssh
```

Then you can run 

```bash
ssh <username>@setonix.pawsey.org.au
```

without a password.

### Access from Windows with the MobaXterm client

If you have a OS that is older than Windows 10, and need a client in a hurry, then just download **MobaXterm Home (Portable Edition)** from [this location](https://mobaxterm.mobatek.net/download-home-edition.html). Extract the Zip file and run the application. You might need to accept a firewall notification. 

Now go to **Settings -> SSH** and uncheck **"Enable graphical SSH-browser"** in the SSH-browser settings pane. Also enable **"SSH keepalive"** to keep SSH connections active.

<figure style="margin-bottom 3em; margin-top: 2em; margin-left:auto; margin-right:auto; width:100%">
    <img style="vertical-align:middle" src="../images/MobaXTerm_Settings.svg"> <figcaption style= "text-align:lower; margin:1em; float:bottom; vertical-align:bottom;">Figure: MobaXTerm settings.</figcaption>
</figure>

Close the MobaXTerm settings and start a local terminal.

## Hardware environment on Setonix

On Setonix there are two main kinds of compute nodes:

* CPU nodes with 2 sockets and 128 cores, 256 threads.
* GPU nodes with 1 CPU socket with 64 cores, 128 threads, and 4 MI250X GPU sockets. Each MI250X GPU socket has two GPU compute devices.

### CPU nodes

CPU nodes are based on the AMD<span>&trade;</span> EPYC<span>&trade;</span> 7763 processor in a dual-socket configuration. Each processor has a multi-chip design with 8 chiplets (Core CompleX's). Shown below is a near infrared image of an EPYC processor, showing 8 chiplets and an IO die. 

<figure style="margin: 1em; margin-left:auto; margin-right:auto; width:50%;">
    <img src="images/EPYC_7702_delidded.jpg">
    <figcaption style= "text-align:lower; margin:1em; float:bottom; vertical-align:bottom;">Near infrared photograph of a de-lidded AMD EPYC CPU with chiplets and IO die. Image credit: <a href="https://commons.wikimedia.org/wiki/File:AMD_Epyc_7702_delidded.jpg")>Wikipedia.</a> </figcaption>
</figure>

Each chiplet has 8 cores, and these cores share access to a 32 MB L3 cache. Every core has its own L1 and L2 cache, provides 2 hardware threads, and has access to SIMD units that can perform floating point math on vectors up to 256 bits (8x32-bit floats) wide in a single clock cycle. There are 16 hardware threads available per chiplet. Since every processor has 8 chiplets, there are a total of 64 cores 128 threads per processor; and 128 cores 256 threads per node. Here is some cache and performance information for the AMD Epyc 7763 CPU.

| Node | CPU | Base clock freq(GHz) | Peak clock freq (GHz) | Cores | Hardware threads | L1 Cache (KB) | L2 Cache (KB) | L3 cache (MB) | FP SIMD width (bits) | Peak TFLOPs (FP32) |
|:----:|:----:|-----:| -----: | -----: | :----: | :----: | :----: | :----: | :----: | :---: |
| CPU |AMD EPYC 7763 | 2.45 | 3.50 | 64 | 128 | 64x32 | 64x512 | 8x32 | 256 | ~1.79 |

Below is an image of a CPU compute blade on Setonix, in this shot there are 8 CPU heatsinks for a total of four nodes per blade.  

<figure style="margin: 1em; margin-left:auto; margin-right:auto; width:100%;">
    <img src="images/cpu_blade.jpg">
    <figcaption style= "text-align:lower; margin:1em; float:bottom; vertical-align:bottom;">A CPU blade on Setonix, showing four compute nodes per blade. Each compute node has two CPU sockets.</figcaption>
</figure>

### GPU nodes

GPU nodes on Setonix have **one** AMD 7A53 'Trento' CPU processor and **four** MI250X GPU processors. The CPU is a specially-optimized version of the EPYC processor used in the CPU nodes, but otherwise has the same design and architecture. The Instinct<span>&trade;</span> MI250X processor is also a Multi-Chip Module (MCM) design, with two graphics dies (otherwise known as Graphics Complex Dies) that provide two GPU compute devices per processor, as shown below.

<figure style="margin: 1em; margin-left:auto; margin-right:auto; width:100%;">
    <img src="../images/MI250x.png">
    <figcaption style= "text-align:lower; margin:1em; float:bottom; vertical-align:bottom;">AMD Instinct<span>&trade;</span> MI250X compute architecture, showing two GPU devices per processor. Image credit: <a href="https://hc34.hotchips.org/")>AMD Instinct<span>&trade;</span> MI200 Series Accelerator and Node Architectures | Hot Chips 34</a></figcaption>
</figure>

Each of the two Graphics Compute Dies (GCD's) in a MI250X appears to OpenCL as a **individual compute device** with its own 64 GB of global memory and 8MB of L2 cache. Since there are four MI250X's, **there are a total of 8 GPU compute devices visible to OpenCL per GPU node**. The compute devices have 110 **compute units**, and each compute unit executes instructions over a bank of 4x16 floating point SIMD units that share a 16KB L1 cache, as seen below:

<figure style="margin: 1em; margin-left:auto; margin-right:auto; width:100%;">
    <img src="images/Setonix-GPU-Compute-Unit.png">
    <figcaption style= "text-align:lower; margin:1em; float:bottom; vertical-align:bottom;">Close-up of an AMD Instinct MI250X compute unit.</figcaption>
</figure>

The interesting thing to note with these compute units is that both 64-bit and 32-bit floating instructions are executed natively **at the same rate**. Therefore only the increased bandwidth requirements for moving 64-bit numbers around is a performance consideration. Below is a table of performance numbers for each of the four dual-gpu MI250X processors in a gpu node.

| Card | Boost clock (GHz)| Compute Units | FP32 Processing Elements | FP64 Processing Elements (equivalent compute capacity) | L1 Cache (KB) | L2 Cache (MB) | device memory (GB) | Peak Tflops (FP32)| Peak Tflops (FP64)|
|:----:|:-----| :----- | :----- | :---- | :---- | :---- | :---- | :---- | :---- |
| AMD Radeon Instinct MI250x |1.7 | 2x110 | 2x7040 | 2x7040 | 2x110x16 | 2x8 | 2x64 | 47.9 | 47.9 |

Below is an installation image of a GPU compute blade with two nodes. Each node has 1 CPU socket and four GPU sockets.

<figure style="margin: 1em; margin-left:auto; margin-right:auto; width:100%;">
    <img src="images/gpu_blade.jpg">
    <figcaption style= "text-align:lower; margin:1em; float:bottom; vertical-align:bottom;">A GPU blade on Setonix, showing two GPU nodes, each node has one CPU socket and four GPU sockets.</figcaption>
</figure>

## Job queues

On Setonix the following queues are available for general use. A special account is needed to access the `gpu` queue. This will usually be your project name followed by the suffix **-gpu**.

|Queue| Max time limit| Processing elements (CPU) | Socket| Cores| processing elements per CPU core | Available memory (GB) | Number of OpenCL devices | Memory per OpenCL device (GB) |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| work | 24 hours | 256 | 2 | 64 | 2 | ~230 | 1 | ~230 |
| long | 96 hours | 256 | 2 | 64 | 2 | ~230 | 1 | ~230 |
| debug | 1 hour | 256 | 2 | 64 | 2 | ~230 | 1 | ~230 |
| highmem | 96 hours | 256 | 2 | 64 | 2 | ~980 | 1 | ~980 |
| copy | 24 hours | 32 | 1 | 64 | 2 | ~118 | 1 | ~118 |
| gpu | 24 hours | 128 | 1 | 64 | 2 | ~230 | 4x2 | 64 |
| gpu-highmem | 24 hours | 128 | 1 | 64 | 2 | ~460 | 4x2 | 64 |
| gpu-dev | 4 hours | 128 | 1 | 64 | 2 | ~230 | 4x2 | 64 |

## Interactive jobs

When compiling software on Setonix it is good practice to compile on a compute node. The following commands help you get an interactive job on either a CPU node or a GPU node. These are listed for information purposes. For the workshop we will use the `salloc` command given in the welcome letter.

### Interactive jobs on CPU nodes

The **work** queue is the queue to use for applications that run exclusively on a CPU node. You can use the following command to get an interactive job that has one MPI process with access to 8 OpenMP threads.

```bash
salloc --account=${PAWSEY_PROJECT} --ntasks=1 --mem=8GB --cpus-per-task=8 --time=4:00:00 --partition=work
```

### Interactive jobs on GPU nodes

Allocations for the **gpu** queue on Setonix need a separate account with the **-gpu** suffix. The following command reserves 1 MPI process with access to 8 OpenMP threads and one GPU (one GCD) for interactive use.             

```bash
salloc --account=${PAWSEY_PROJECT}-gpu --ntasks=1 --mem=8GB --cpus-per-task=8 --time=4:00:00 --gpus-per-task=1 --partition=gpu
```

## Building software for Setonix

OpenCL is different from CUDA and HIP in that compilation of kernels is done **within the library of an OpenCL implementation** rather than by a vendor-specific compiler. This means that you are free to use whatever programming environment is most suitable for your needs.

### Software modules

#### Programming environment

There are three main programming environments available on Setonix. Each provides C/C++ and Fortran compilers that build software with knowledge of of the MPI libraries available on Setonix. The **PrgEnv-GNU** programming environment loads the GNU compilers for best software compatibility, the module **PrgEnv-aocc** loads the AMD **aocc** optimising compiler to try and get the best performance from the AMD CPU's on Setonix, and the **PrgEnv-cray** environment loads the well-supported compilers from Cray. Use these commands to find which module to load.

| Programming environment | command to use |
| :--- | :--- |
| AMD | ```module avail PrgEnv-aocc``` |
| Cray | ```module avail PrgEnv-cray``` |
| GNU | ```module avail PrgEnv-gnu``` |

Then the following compiler wrappers are available for use to compile source files:

| Command | Explanation |
| :--- | :--- |
| cc | C compiler |
| CC | C++ compiler |
| ftn | FORTRAN compiler |

In order to use a GPU-aware MPI library from Cray you also need to load the **craype-accel-amd-gfx90a** module, which is available in all three programming environments.  Load the module with this command.

```bash
module load craype-accel-amd-gfx90a
```

then set this environment variable to enable GPU support with MPI.

```bash
export MPICH_GPU_SUPPORT_ENABLED=1
```

#### ROCm

The ROCm library from AMD provides both an OpenCL implementation as well as AMD tools like profilers. You can load the ROCm library with this command:

```bash
module load rocm/5.4.3
```

#### Custom OpenCL environment

OpenCL support comes with the ROCm module, however the OpenCL header and ICD loader that somes with ROCm is quite old, not using the latest OpenCL API. There is an OpenCL environment that has been put together specifically for this course. It uses the [Portable OpenCL library](http://portablecl.org/) library to utilise CPU's as compute devices, loads the ROCm module, and provides access to OpenCL tools and the latest headers and ICD loader from Khronos. Use these commands to load this environment:

```bash
module use /software/projects/courses01/setonix/opencl/modulefiles
module load PrgEnv-opencl
```

#### Omnitrace support

[Omnitrace](https://github.com/AMDResearch/omnitrace) is a tool for using rocprof to collect **traces**, or information on **when** an application component starts using compute resources, and **for how long** it uses those resources. Currently you will need these modules loaded to access the experimental Omnitrace tools.

```bash
module load rocm/5.0.2
module use /software/projects/courses01/setonix/omnitrace/share/modulefiles
module load omnitrace/1.10.0
```

#### Omniperf support

[Omniperf](https://github.com/AMDResearch/omniperf) is a tool to make low level information collected by **rocprof** accessible. It can perform feats like creating [roofline models](https://en.wikipedia.org/wiki/Roofline_model) of how well your kernels are performing, in relation to the theoretical capability of the compute hardware. The following commands will help you access the experimental Omniperf tools.

```bash
module load cray-python
module load rocm/5.0.2
module use /software/projects/courses01/setonix/omniperf/1.0.8PR2/modulefiles
module load omniperf/1.0.8-PR2
```

### Compiling software with OpenCL and MPI support

You can compile MPI software with OpenCL using the compiler wrapper **CC** from one of the three available programming environments. In order provide the best chance of reducing compiler issues it is **best practice to compile from the compute node** that you are going to use. Here are some suggested compiler flags.

| Function | flags |
| :--- | :--- |
| Production (compile and link) | ```-g -O2``` |
| Debug (compile and link) | ```-O0 -g``` |
| OpenMP (compile and link)| ```-fopenmp``` |

## Exercise: compile and run your first MPI-enabled OpenCL application

In the file [hello_devices_mpi.cpp](hello_devices_mpi.cpp) is a MPI-enabled OpenCL application that reports on devices and fills a vector. Your task is to compile this file into an executable called **hello_devices_mpi.exe**.

### Compilation steps

#### Task 1. Login and setup

* Log into **setonix.pawsey.org.au**.
```bash
ssh <username>@setonix.pawsey.org.au
```
* Change directory to your space on /scratch.
```bash
cd $MYSCRATCH
```
* Get the course material from Github if don't already have it.
```bash
wget https://github.com/pelagos-consulting/OpenCL_Course/archive/refs/heads/main.zip
unzip -DD main.zip
cd OpenCL_Course-main/course_material/L2_Using_OpenCL_On_Setonix
```
* Get an interactive GPU job on Setonix. The correct command to use will be in the welcome letter, and looks something like this: 
```bash
salloc --account ${PAWSEY_PROJECT}-gpu --ntasks 1 --mem 8GB --cpus-per-task 8 --time 1:00:00 --gpus-per-task 1 --partition gpu
```

* Load the ROCm module

```bash
module load rocm/5.0.2
```

    
#### Task 2. Compile the program with the OpenCL headers and ICD loader from ROCm

* Compile the file [hello_devices_mpi.cpp](hello_devices_mpi.cpp) with the `CC` compiler wrapper. There is an OpenCL header directory in `/opt/rocm/opencl/include` and an ICD loader in `/opt/rocm/opencl/lib`. Use those to compile the application:

```bash
CC -g -fopenmp -O2 -I../include -I/opt/rocm/opencl/include -L/opt/rocm/opencl/lib hello_devices_mpi.cpp -o hello_devices_mpi.exe -lOpenCL
./hello_devices_mpi.exe
```

Notice that we had a compiler warning with `CL_TARGET_OPENCL_VERSION`. This is because we are targeting OpenCL 3.0, but the ROCm header libraries aren't aware of the API change. 

#### Task 3. Compile the program with the OpenCL headers and ICD loader from Khronos

There is an OpenCL header directory from Khronos in `/software/projects/courses01/setonix/opencl/OpenCL-Headers/install/include` and an updated ICD loader in `/software/projects/courses01/setonix/opencl/OpenCL-ICD-Loader/install/lib64`. We can also use those to compile.

```bash
CC -g -fopenmp -O2 -I../include -I/software/projects/courses01/setonix/opencl/OpenCL-Headers/install/include -L/software/projects/courses01/setonix/opencl/OpenCL-ICD-Loader/install/lib64 hello_devices_mpi.cpp -o hello_devices_mpi.exe -lOpenCL
./hello_devices_mpi.exe
```

#### Task 4. Use the PrgEnv-opencl module

The **PrgEnv-opencl** module from `/software/projects/courses01/setonix/opencl/modulefiles` makes available the OpenCL headers and ICD loader from Khronos. It adds the Khronos header path to the **CPATH** environment variable and the ICD loader path to the **LD_LIBRARY_PATH** and **LIBRARY_PATH** environment variables. Now we can just compile without explicitly specifying the header and ICD loader directories.

```bash
module use /software/projects/courses01/setonix/opencl/modulefiles
module load PrgEnv-opencl
CC -g -fopenmp -O2 -I../include hello_devices_mpi.cpp -o hello_devices_mpi.exe -lOpenCL
./hello_devices_mpi.exe
```

 The module also adds the [PoCL](http://portablecl.org/) OpenCL implementation so we can also use the CPU as a compute device. This extra implementation was enabled by installing PoCL to `/software/projects/courses01/setonix/opencl/pocl/3.1` and setting the environment variable **OCL_ICD_VENDORS** to `/software/projects/courses01/setonix/opencl/OpenCL_vendors` where a file called `pocl.icd` points to the pocl vendor library in `/software/projects/courses01/setonix/opencl/pocl/3.1/lib64/libpocl.so.2.10.0`. 
 
On line 55 of `hello_devices_mpi.cpp` change the type of device to select from: 
 
```C++
    // Set the target device
    cl_device_type target_device=CL_DEVICE_TYPE_GPU;
```

Change this to

```C++
    // Set the target device
    cl_device_type target_device=CL_DEVICE_TYPE_ALL;
```

Now recompile and run

```bash
CC -g -fopenmp -O2 -I../include hello_devices_mpi.cpp -o hello_devices_mpi.exe -lOpenCL
./hello_devices_mpi.exe
``` 

Now you should see the CPU as a compute device!

#### Bonus task

Try changing the number of GPU's in your request for resources for the interactive job. How many compute devices appear in the output from the above command?

#### Makefile solution

If you get stuck, the example [Makefile](Makefile) contains the above compilation steps. Assuming you loaded the right modules defined above, the make command is run as follows:

```bash
make clean; make
```

The script **run_compile.sh** contains the necessary commands to load the appropriate modules and run the **make** command.

```bash
chmod 700 run_compile.sh
./run_compile.sh
```

## Batch jobs with OpenCL on GPU nodes

Pawsey has extensive documentation available for running jobs, at this [site](https://support.pawsey.org.au/documentation/display/US/Running+Jobs+in+Setonix). Here is some information that is specific to making best use of the GPU nodes on Setonix.

### GPU node configuration

On the GPU nodes of Setonix there is 1 CPU and 8 compute devices. Each of the 8 chiplets in the CPU is intended to have optimal access to one of the 8 available GPU compute devices. Shown below is a hardware diagram of a compute node, where each chiplet is connected optimally to one compute device.

<figure style="margin: 1em; margin-left:auto; margin-right:auto; width:100%;">
    <img src="images/Setonix-GPU-Node.png">
    <figcaption style= "text-align:lower; margin:1em; float:bottom; vertical-align:bottom;">Overall view of a Setonix GPU node, showing the placement of hardware threads and the closest available compute device.</figcaption>
</figure>

From the above diagram we see that best use of the GPU's occur when a chiplet accesses a GPU that is closest to it. Work is still being done on making sure that MPI processes map optimally to available compute devices, however these interim suggestions will help space out the MPI tasks so each task resides on its own chiplet.

* Use **--ntasks-per-node=8** to allocate up to 8 MPI tasks per node, one task per chiplet/compute device pair.
* Use **--gpus-per-task=1** to allocate 1 compute device per MPI task.
* Use **--cpus-per-task=8** and **--threads-per-core=1** to allocate all available threads in a chiplet to a single MPI process.
* Use the **--gpu-bind=closest** option to bind each compute device to the closest MPI task.
* Use the **--exclusive** option to have exclusive use of all the resources on a node. This will make your job harder to get through the queues, so use this only if you **absolutely** need all the resources on a gpu node.

### Example job script

The suggested job script below will allocate an MPI task for every compute device on a node of Setonix. Then it will allocate 8 OpenMP threads to each MPI task. We can use the helper program [hello_jobstep.cpp](hello_jobstep.cpp) adapted from a [program](https://code.ornl.gov/olcf/hello_jobstep) by Thomas Papatheodore from ORNL. Every software thread executed by the program reports the MPI rank, OpenMP thread, the CPU hardware thread, as well as the GPU and BUS ID's of the GPU hardware.

```bash
#!/bin/bash -l

#SBATCH --account=<account>-gpu    # your account
#SBATCH --partition=gpu            # Using the gpu partition
#SBATCH --ntasks=8                 # Total number of tasks
#SBATCH --ntasks-per-node=8        # Set this for 1 mpi task per compute device
#SBATCH --cpus-per-task=8          # How many OpenMP threads per MPI task
#SBATCH --threads-per-core=1       # How many OpenMP threads per core (1 or 2)
#SBATCH --gpus-per-task=1          # How many OpenCL compute devices to allocate to a  task
#SBATCH --gpu-bind=closest         # Bind each MPI taks to the nearest GPU
#SBATCH --mem=4000M                # Indicate the amount of memory per node when asking for shared resources
#SBATCH --time=00:05:00            # Estimated time in HH:MM:SS

module use /software/projects/courses01/setonix/opencl/modulefiles
module load PrgEnv-opencl
module load craype-accel-amd-gfx90a

# Recompile the software
make clean; make

export MPICH_GPU_SUPPORT_ENABLED=1 # Enable GPU support with MPI

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK   #To define the number of OpenMP threads available per MPI task, in this case it will be 8
export OMP_PLACES=cores     #To bind to cores 
export OMP_PROC_BIND=close  #To bind (fix) threads (allocating them as close as possible). This option works together with the "places" indicated above, then: allocates threads in closest cores.
 
# Temporal workaround for avoiding Slingshot issues on shared nodes:
export FI_CXI_DEFAULT_VNI=$(od -vAn -N4 -tu < /dev/urandom)

# Run a job with task placement and $BIND_OPTIONS
#srun -N $SLURM_JOB_NUM_NODES -n $SLURM_NTASKS -c $OMP_NUM_THREADS $BIND_OPTIONS  ./hello_jobstep.exe
srun -N $SLURM_JOB_NUM_NODES -n $SLURM_NTASKS -c $OMP_NUM_THREADS ./hello_jobstep.exe | sort
```

In the file [jobscript.sh](jobscript.sh) is a batch script for the information above. Edit the `<account>` field to include the account to charge to. The value to use will be in the environment variable `$PAWSEY_PROJECT`. 

```bash
echo $PAWSEY_PROJECT
```

Then submit the script to the batch queue with this command

```bash
sbatch jobscript.sh
```

Use this command to check on the progress of your job

```bash
squeue --me
```

Then if you need to you and you know the job id you can cancel a job with this command

```bash
scancel <jobID>
```

Once the job is done, have a look at the `*.out` file and examine how the threads and GPU's are placed.

## Summary

In this section we cover using OpenCL on the Pawsey Supercomputer Setonix. This includes logins with SSH;  hardware and software environments; and accessing the job queues through interactive and batch jobs. We conclude the chapter with the OpenCL software compilation process on Setonix, and then how to get the best performance in batch jobs by scheduling MPI tasks close to the available compute devices.

<address>
Written by Dr. Toby Potter of <a href="https://www.pelagos-consulting.com">Pelagos Consulting and Education</a>, for the <a href="https://pawsey.org.au">Pawsey Supercomputing Research Centre</a>, and with contributions from the Pawsey team. All trademarks mentioned in this teaching series belong to their respective owners.
</address>