
#  OpenMP Visualizing Scheduling


Enter your name and student ID.

 * Name:
 * Student ID:



# 1. Scheduling
* There are several strategies for partitioning (load-balancing) a loop among cores (OpenMP threads, to be more precise)
* This notebook visualizes how it is affected by how you execute a loop in general


# 2. Compilers
* We use [LLVM ver. 18.1.8](https://llvm.org/) (`clang` and `clang++`) in this exercise, as [NVIDIA HPC SDK](https://docs.nvidia.com/hpc-sdk/index.html) (`nvc` and `nvc++`) does not support some of the OpenMP features we use below (taskloop)

## 2-1. Set up LLVM
Execute this before you use LLVM

In [None]:
export PATH=/home/share/llvm/bin:$PATH
export LD_LIBRARY_PATH=/home/share/llvm/lib:/home/share/llvm/lib/x86_64-unknown-linux-gnu:$LD_LIBRARY_PATH

Check if it works (check if full paths of nvc/nvc++ are shown)

In [None]:
which clang
which clang++

# 3. The OpenMP program that records scheduling
* omp_sched_rec.c in this directory is an OpenMP program that executes a doubely-nested loop in several ways
* inspect it by opening it in Jupyterlab or any other program you like

## 3-1. Compile

In [None]:
clang -Wall -O3 -fopenmp -std=gnu99 omp_sched_rec.c -o omp_sched_rec

## 3-2. Run
* How it executes the double loop can be controlled by a few environment variables
  * LB --- select the execution stragety among #pragma omp for, #pragma omp task and #pragma omp taskloop
  * OMP_SCHEDULE --- specify the scheduling method of #pragma omp for

* Some examples given below


* use 4 cores and #pragma omp for, default scheduling strategy (presumably static)

In [None]:
OMP_NUM_THREADS=4 ./omp_sched_rec 

* the effect will be same as above

In [None]:
OMP_NUM_THREADS=4 OMP_SCHEDULE=static ./omp_sched_rec 

* use 4 cores and #pragma omp for, by the dynamic scheduling policy

In [None]:
OMP_NUM_THREADS=4 OMP_SCHEDULE=dynamic ./omp_sched_rec 

* use 4 cores and #pragma omp for, by dynamic scheduling policy with the grainsize = 100 (i.e., 100 iteratins are fetched at a time)

In [None]:
OMP_NUM_THREADS=4 OMP_SCHEDULE=dynamic,100 ./omp_sched_rec 

* use 4 cores and #pragma omp for, by the guided self scheduling

In [None]:
OMP_NUM_THREADS=4 OMP_SCHEDULE=guided ./omp_sched_rec 

* use 4 cores and #pragma omp task using recursive 2D decomposition

In [None]:
OMP_NUM_THREADS=4 LB=task ./omp_sched_rec

* use 4 cores and #pragma omp task using recursive 2D decomposition, with grainsize=100 (i.e., stop generating tasks for <100 iterations)

In [None]:
OMP_NUM_THREADS=4 LB=task,100 ./omp_sched_rec 

* use 4 cores and #pragma omp task using taskloop, with grainsize=100

In [None]:
OMP_NUM_THREADS=4 LB=taskloop,100 ./omp_sched_rec 

# 4. Visualization
* Running `omp_sched_rec` leaves a record file with the name `"log.txt"`
* Below is a Python program that visualizes `"log.txt"`


In [None]:
#!/usr/bin/python3

import colorsys
import re
import sys
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython import display 

def read_log(log_txt):
    fp = open(log_txt)
    p = re.compile("(?P<r>\d+) (?P<i>\d+) (?P<j>\d+)"
                   " (?P<begin>\d+) (?P<end>\d+)"
                   " (?P<load>\d+) (?P<thread>\d+)")
    R = {}                      # i,j -> begin,end,load,thread
    for line in fp:
        m = p.match(line)
        assert m, line
        [ r,i,j,begin,end,load,thread ] = [ int(f) for f in m.groups() ]
        R[r,i,j] = (begin,end,load,thread)
    fp.close()
    return R

def make_norm_rgb_table(threads):
    tbl = {}
    n = len(threads)
    for i,w in enumerate(threads):
        h = float(i)/float(n)
        r,g,b = colorsys.hsv_to_rgb(h, 1.0, 1.0)
        tbl[w] = r,g,b
    return tbl

def sched_vis(log_txt):
    """
    R : (i,j) -> begin,end,load,thread
    """
    R = read_log(log_txt)
    # get the set of unique thread ids
    threads = set([ t for _,_,_,t in R.values() ])
    max_load = max([ l for _,_,l,_ in R.values() ])
    # assign color to each thread
    rgb_table = make_norm_rgb_table(threads)
    # get number of rows and columns
    repeat = max([ r for r,i,j in R.keys() ]) + 1
    M = max([ i for r,i,j in R.keys() ]) + 1
    N = max([ j for r,i,j in R.keys() ]) + 1
    img = np.zeros((M,N,3))
    fig = plt.figure(figsize=(M / 72, N / 72))
    im = plt.imshow(img)
    interval_ms = 30
    speed_factor = 0.2
    draw_interval_clock = interval_ms * 1.0e6 * speed_factor
    def sort_key(rec):
        (r,i,j),(begin,end,load,thread) = rec
        return end
    
    def load_data():
        events = sorted(R.items(), key=sort_key)
        max_r = -1
        for c,((r,i,j),(begin,end,load,thread)) in enumerate(events):
            if c == 0 or end - last_update > draw_interval_clock:
                last_update = end
                im.set_data(img)
                yield im,
            rgb = rgb_table[thread]
            #alpha = 1.0 # if load/float(max_load) > 0.005 else 0.4
            assert (r >= max_r)
            if r > max_r:
                img[:,:,:] = np.zeros((M,N,3))
                max_r = r
            img[i,j] = rgb
        im.set_data(img)
        yield im,

    iterator = load_data()
    def update(*args):
        try:
            return next(iterator)
        except StopIteration:
            return im,
    ani = animation.FuncAnimation(fig, update, interval=interval_ms, blit=True)
    html = display.HTML(ani.to_jshtml())
    display.display(html)
    plt.close()
    # plt.show()

# usage:
# sched_vis("log.txt")

* Execute one of the above and execute the cell below
* It takes some time (> 30 seconds) and shows a playback tool you can play with after finished

In [None]:
# be patient (> 30 seconds) until it finishes
sched_vis("log.txt")

* Change the parameters as you like and visualize it

In [None]:
OMP_NUM_THREADS= LB= OMP_SCHEDULE= ./omp_sched_rec 