<h1>Exercise #4 : Parallelization</h1>
<p>We are going to explain the parallelization levels present in WEST.</p>

<h2>4.1 Download the material</h2>
<p>In this excercixe we will focus on the <tt><b>wstat.x</b></tt> input.</p>

In [None]:
# pseudopotentials
!wget -N http://www.quantum-simulation.org/potentials/sg15_oncv/upf/Si_ONCV_PBE-1.2.upf
!wget -N http://www.quantum-simulation.org/potentials/sg15_oncv/upf/H_ONCV_PBE-1.2.upf

# input files 
!wget -N http://www.west-code.org/doc/training/silane/pw.in
!wget -N http://www.west-code.org/doc/training/silane/wstat.in

<p>We need to read the output of a DFT calculation, therefore as first step we run the DFT calculation invoking the executable <tt><b>pw.x</b></tt> on 8 cores.</p>

In [None]:
!mpirun -n 8 pw.x -i pw.in > pw.out

<h2>1.1 Parallelization schemes in WEST</h2>
<p>WEST uses up to four layers of parallelism on CPU-based computers:
<ul>
<li>Plane-waves (FFT)
<li>Bands
<li>Spin channels
<li>Eigenpotentials
</ul>
The following command is using <b>N</b> CPU cores, <b>NI</b> images, <b>NK</b> pools, <b>NB</b> band groups, and <b>N</b>/(<b>NI</b>*<b>NK</b>*<b>NB</b>) cores per FFT: <br>
<code>mpirun -n <b>N</b> wstat.x -nimage <b>NI</b> -npool <b>NK</b> -nbgrp <b>NB</b> -i wstat.in > wstat.out <br></code>
This is how we achieved good scaling on CPU-based supercomputers such as the BG/Q Mira at Argonne National Laboratory, where WEST makes efficient use of $512$ cores per FFT and $1024$ images, for a total of N$=512\times 1024 = 524288$ cores. 
Details about the implementation are described in <a href="https://pubs.acs.org/doi/full/10.1021/ct500958p#showFigures">J. Chem. Theory Comput. 11, 2680 (2015)</a> : 
<img src="https://pubs.acs.org/cms/10.1021/ct500958p/asset/images/medium/ct-2014-00958p_0005.gif" width="50%">
On computers equipped with GPU accelerators, WEST is capable of harnessing the data parallelism provided by GPUs. Again, we achieved good scaling on GPU-accelerated supercomputers such as Summit at Oak Ridge National Laboratory, where WEST makes efficient use of over $25,000$ NVIDIA V100 GPUs.
Details about the implementation are described in <a href="https://pubs.acs.org/doi/10.1021/acs.jctc.2c00241">J. Chem. Theory Comput. 18, 4690-4707 (2022)</a>.
</p>

In [None]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 2 wstat.x -nimage 1 -i wstat.in > wstat.out

In [None]:
cp silane.wstat.save/wstat.json wstat_cores2_image1.json

In [None]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 4 wstat.x -nimage 1 -i wstat.in > wstat.out

In [None]:
cp silane.wstat.save/wstat.json wstat_cores4_image1.json

In [None]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 4 wstat.x -nimage 2 -i wstat.in > wstat.out

In [None]:
cp silane.wstat.save/wstat.json wstat_cores4_image2.json

In [None]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 8 wstat.x -nimage 1 -i wstat.in > wstat.out

In [None]:
cp silane.wstat.save/wstat.json wstat_cores8_image1.json

In [None]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 8 wstat.x -nimage 2 -i wstat.in > wstat.out

In [None]:
cp silane.wstat.save/wstat.json wstat_cores8_image2.json

In [None]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 8 wstat.x -nimage 4 -i wstat.in > wstat.out

In [None]:
cp silane.wstat.save/wstat.json wstat_cores8_image4.json

In [None]:
!ls -lrt wstat_*json

<p>Load the files.</p>

In [None]:
import json

data = {}

for name in ['cores2_image1', 'cores4_image1', 'cores4_image2', 'cores8_image1', 'cores8_image2', 'cores8_image4'] : 
    # read data wstat_XX.json
    with open('wstat_'+name+'.json') as file:
        data[name] = json.load(file)

print(json.dumps(data, indent=2))

<p>We plot the energy levels of DFT and GW.</p>

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# timings
y = {}
c = {}

# 2 cores
for name in ['cores2_image1'] : 
    y[name] = [ data[name]['timing']['WSTAT']['wall:sec'] ]
    c[name] = 'black'

# 4 cores
for name in ['cores4_image1', 'cores4_image2'] :
    y[name] = [ data[name]['timing']['WSTAT']['wall:sec'] ]
    c[name] = 'blue'

# 8 cores
for name in ['cores8_image1','cores8_image2','cores8_image4'] :
    y[name] = [ data[name]['timing']['WSTAT']['wall:sec'] ]
    c[name] = 'green'

print(y)

# plot
x = list( range( 1, len(y)+1 ) )
labels = y.keys()

fig, ax = plt.subplots(1, 1)
counter = 0
for i in labels :
    for a in y[i] :
        ax.hlines(a, x[counter]-0.25, x[counter]+0.25, color=c[i])
    counter += 1

plt.xticks(x, labels, rotation='vertical')
plt.ylabel('Time (s)')

plt.title('Parallelization')

plt.show()