![img](https://raw.githubusercontent.com/villegar/xxii-simmac/master/images/logoEN.png)
# Parallel Computing using Rmpi
### by Roberto Villegas-Diaz & Anne Fennell, PhD
### South Dakota State University

<img style="float: right, margin-top" src="https://raw.githubusercontent.com/villegar/xxii-simmac/master/images/SDLogo3c.jpg" alt="SDSU" width="200"/>

## Outline
- Overview
- Introduction to Parallel Computing
- Hands-On
- Remarks
- References
- Acknowledgements

## Overview

![meme](https://i.redd.it/9tu18n684z331.jpg)

Source: https://www.reddit.com/r/ProgrammerHumor/comments/bzv8q4/parallelism_be_like/

### Serial Computing

![serial-computing](https://raw.githubusercontent.com/villegar/xxii-simmac/master/images/serialProblem.gif)


- A problem is broken into a discrete series of instructions
- Instructions are executed sequentially one after another
- Executed on a single processor
- Only one instruction may execute at any moment in time

### Parallel Computing
![parallel-computing](https://raw.githubusercontent.com/villegar/xxii-simmac/master/images/parallelProblem.gif)

- A problem is broken into discrete parts that can be solved concurrently
- Each part is further broken down to a series of instructions
- Instructions from each part execute simultaneously on different processors
- An overall control/coordination mechanism is employed

### Available Resources

- CeNAT (Centro Nacional de Alta Tecnología), CR: http://www.cenat.ac.cr/en/
- PRACE (Partnership for Advanced Computing in Europe), Europe: http://www.prace-ri.eu
- RIKEN (Kokuritsu Kenkyū Kaihatsu Hōjin Rikagaku Kenkyūsho), Japan: https://www.riken.jp/en/ 
- XSEDE (Extreme Science and Engineering Discovery Environment), USA: https://www.xsede.org

### Top 500 Supercomputers*
![top500](https://raw.githubusercontent.com/villegar/xxii-simmac/master/images/top500-top3.jpg)

Source: https://www.top500.org

\* November 2019

- Rmax - Maximal LINPACK performance achieved
- Rpeak - Theoretical peak performance
- LINPACK is a collection of Fortran subroutines that analyze and solve linear equations and linear least-squares problems. The package solves linear systems whose matrices are general, banded, symmetric indefinite, symmetric positive definite, triangular, and tridiagonal square. In addition, the package computes the QR and singular value decompositions of rectangular matrices and applies them to least-squares problems. LINPACK uses column-oriented algorithms to increase efficiency by preserving locality of reference. 

Sources: https://www.top500.org/project/top500_description/ and https://www.netlib.org/linpack/

### HPC system 
![hpc-system](https://raw.githubusercontent.com/villegar/xxii-simmac/master/images/nodesNetwork.gif)

There is a wide range of configurations used on HPC systems, above represents a simplified version for illustration purposes.

### Why Use Parallel Computing?


#### The Real World is Massively Parallel:
- In the natural world, many complex, interrelated events are happening at the same time, yet within a temporal sequence.
- Compared to serial computing, parallel computing is much better suited for modeling, simulating and understanding complex, real world phenomena.

![real-world-applications-1](https://raw.githubusercontent.com/villegar/xxii-simmac/master/images/realWorldCollage1.jpg)

![real-world-applications-2](https://raw.githubusercontent.com/villegar/xxii-simmac/master/images/realWorldCollage2.jpg)

![real-world-applications-3](https://raw.githubusercontent.com/villegar/xxii-simmac/master/images/realWorldCollage3.jpg)

### Why Use Parallel Computing? (cont.)

- Save time and/or money

In theory, throwing more resources at a task will shorten its time to completion, with potential cost savings.
Parallel computers can be built from cheap, commodity components.

- Solve larger and more complex problems

Many problems are so large and/or complex that it is impractical or impossible to solve them on a single computer, especially given limited computer memory.
Example: "Grand Challenge Problems" (en.wikipedia.org/wiki/Grand_Challenge) requiring PetaFLOPS and PetaBytes of computing resources.
Example: Web search engines/databases processing millions of transactions every second

- Provide concurrency

A single compute resource can only do one thing at a time. Multiple compute resources can do many things simultaneously.
Example: Collaborative Networks provide a global venue where people from around the world can meet and conduct work "virtually".

- Take advantage of non-local resources

Using compute resources on a wide area network, or even the Internet when local compute resources are scarce or insufficient. Two examples below, each of which has over 1.7 million contributors globally (May 2018):
Example: SETI@home (setiathome.berkeley.edu)
Example: Folding@home (folding.stanford.edu)

- Make better use of underlying parallel hardware

Modern computers, even laptops, are parallel in architecture with multiple processors/cores.
Parallel software is specifically intended for parallel hardware with multiple cores, threads, etc.
In most cases, serial programs run on modern computers "waste" potential computing power.

### The Future: Exascale Computing


Exaflop = $10^{18}$ calculations per second

## Introduction to Parallel Computing

### Game rules

#### Speedup

$$S_p = \frac{T_s}{T_p}$$


<img src="https://raw.githubusercontent.com/villegar/xxii-simmac/master/images/speedup-curve.png" width="500">

where:
 - $p$ is the number of processors
 - $T_s$ is the execution time of the sequential algorithm
 - $T_p$ is the execution time of the parallel algorithm 
 - $S_p = p$ (linear speedup, ideal)


### Game rules (cont.)

#### Parallel efficiency 

$$E_p = \frac{S_p}{p} = \frac{T_s}{p T_p}$$

In [1]:
library(Rmpi)

In [13]:
system("lscpu", intern = TRUE)

## Testing installation

### `foreach` library

In [56]:
library(foreach)
foreach (i = 1:3) %do% {
    sqrt(i)
}

### `foreach` + `doParallel`

In [59]:
library(foreach)
library(doParallel)
numCores <- detectCores() - 1
registerDoParallel(numCores)  # use multicore, set to the number of our cores
foreach (i=1:3) %dopar% {
  sqrt(i)
}

## `rmpi` library

In [4]:
# Load the R MPI package if it is not already loaded.
if (!is.loaded("mpi_initialize")) {
    library("Rmpi")
}
ns <- mpi.universe.size() - 1
ns

## Getting real

$$\displaystyle \left(\sqrt{-\text{s#@t}}\right)^2$$

## Using `foreach` and `doParallel`

In [48]:
N <- 400000 # Sample size
tic("serial")
results_serial <- c()
for (i in 1:N) {
    results <- c(results,sqrt(i))
}
toc()

serial: 240.949 sec elapsed


In [49]:
library(foreach)
tic("foreach only")
results_foreach <- c()
results <- foreach (i=1:N, .combine="cbind") %do% {
  sqrt(i)
}
toc()

foreach only: 100.895 sec elapsed


In [50]:
library(foreach)
library(doParallel)

In [54]:
numCores <- detectCores() - 1
tic("foreach + doParallel")
cl <- makeCluster(numCores)
registerDoParallel(cl)
#registerDoParallel(numCores)  # use multicore, set to the number of our cores
results <- foreach (i=1:N, .combine="cbind") %dopar% {
  sqrt(i)
}
toc()

foreach + doParallel: 193.241 sec elapsed


In [28]:
library(tictoc)

In [None]:
mpi.finalize() # Terminate spawned slaves
mpi.exit() # mpi.finalize() + detach Rmpi 
mpi.quit() # mpi.exit() + close R

In [None]:
library(Rmpi)
mpi.spawn.Rslaves(nslaves=5)
# Syntax
# mpi.remote.exec(cmd, …, simplify = TRUE, comm =1, ret =TRUE)
mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))

mpi.remote.exec(sum(1:mpi.comm.rank()))
mpi.finalize()

In [None]:
mpi.spawn.Rslaves(nslaves=4)
ptm<-proc.time() 
mpi.iparReplicate(400, mean(rnorm(1000000)))
print(proc.time() - ptm)
mpi.finalize()

## Acknowlegments

- South Dakota State University
- Costa Rica National Center for High Technology (CeNAT)

## References

- https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf
- https://nceas.github.io/oss-lessons/parallel-computing-in-r/parallel-computing-in-r.html
- https://blog.jupyter.org/a-slideshow-template-for-voil%C3%A0-apps-435f67d10b4f
- https://bioinfomagician.wordpress.com/2013/11/25/mpi-tutorial-for-r-rmpi/

[1] Barney, B., "Introduction to Parallel Computing", Lawrence Livermore National Laboratory
https://computing.llnl.gov/tutorials/parallel_comp/

[2] 