## Homework 09:  Parallel Programming 02

## Due Date: Apr 19, 2023, 11:59pm

#### Firstname Lastname: Ching-Tsung Tsai

#### E-mail: ct2840@nyu.edu

#### Enter your solutions and submit this notebook

---

**Problem 1 (40p)**

In this problem the goal is to calculate the mean and standard deviation of a large list of numbers by using Reduction as one of Collective Operations, see Lecture 11. 


Consider $N = 256000$ random variables uniform on $[0, 1]$, call these $x_0, x_1, \dots, x_{N - 1}$.  


Write an MPI program with $N=16$ processes that outputs the average and standard deviation of $x_0, x_1, \dots, x_{N - 1}$.


To simplify the problem, let one process, say `Process 0`, independently draws $N$ samples uniformly on $[0, 1]$.

How do you explain the results?


**Instructions:** 
Your program should use MPI4PY and collective operations. 
Save your program as 2020_spring_sol09_pr01.py and run it from the terminal as:

```
mpirun -n 16 python 2020_spring_sol09_pr01.py
```


In [68]:
%%writefile 2020_spring_sol09_pr01.py
from mpi4py import MPI
import numpy as np

comm = MPI.COMM_WORLD    # global communicator
rank = comm.Get_rank()   # current rank
sz = comm.Get_size()     # -n
n = 256000
# MPI:
if rank == 0:
    
    data = np.random.uniform(0,1, size=[sz,int(n/sz)]).astype('f')  
else:
    data = None

recvbuf = np.empty(int(n/sz), dtype="f")  # buffer to receive
comm.Scatter(data,recvbuf, root = 0)    # scatter data evenly to each worker
local_sum = sum(recvbuf)
global_sum = comm.allreduce(local_sum, MPI.SUM)   # gather sum from all process
global_mean = global_sum/n
local_sq_diff = sum((num-global_mean)**2 for num in recvbuf) 
global_sq_diff = comm.reduce(local_sq_diff, MPI.SUM, 0)
if rank == 0:
    global_sd = (global_sq_diff/n)**0.5
    print(f"Mean by MPI: {global_mean:.5f}, SD by MPI: {global_sd:.5f}")
    print(f"Mean by numpy built-in: {np.mean(data):.5f}, SD numpy built-in: {np.std(data):.5f}")


Overwriting 2020_spring_sol09_pr01.py


In [69]:
!mpirun -n 16 python3 2020_spring_sol09_pr01.py

Mean by MPI: 0.49886, SD by MPI: 0.28846
Mean by numpy built-in: 0.49886, SD numpy built-in: 0.28846



---
**Problem 2 (60p)**

In this problem the goal is to demonstrate how one can use a Domain Decomposition and  Collective Operations. 

Consider the exponential distribution $X \sim \textrm{Exp}(1)$ with the unit mean. Find numerical approximations of moments of the exponential random variable. 

That is, for a random variable $X$ with the distribution $f(x) = e^{-x}$ for $x \geq 0$, compute the first $15$ moments, where the $k$-th moment is defined as:
$$I_k = \int_{0}^{\infty} x^k f(x) dx.$$


Your program should use MPI parallel collective instructions, where the integration is performed in parallel over $N=16$ processes, over a finite range $[0, M)$, where $M=1000$, with $N = 16$ partitions and $1000$ increments per partition, see Lecture 10 and 11.

Provide evaluations of $J_1, J_2, \dots, J_{15}$, where $$J_k = \int_{0}^{M} x^k f(x) dx.$$


**Instructions:** 

Save your program as 2020_sol09_pr02.py; and run it from the terminal as:

```
mpirun -n 16 python 2020_spring_sol09_pr02.py
```


**Bonus Question (10 points):** 

What is the value of $I_k$, as a function of $k$? How can it be derived?

In [89]:
def I(M, k, n=10000):   
    """
    M: upperbound of the integration
    k: the kth momentum
    n: number of splits for trapezoidal rule, default=10000
    """
    bounds = np.linspace(0,M,n)  # split M to n pieces
    interval = M/n               # the interval(x-axis length) of each partial sum
    sums = 0                     # buffer to calculate integration
    for i in range(n):
        x = bounds[i]
        fx = np.exp(-x)     # f(x)
        y = x**k * fx
        sums += interval*y
    return sums


array([0.])

In [106]:
%%writefile 2020_spring_sol09_pr02.py
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD    # global communicator
rank = comm.Get_rank()   # current rank
N = comm.Get_size()     # -n 16
def I(M, k, n=10000):   
    """
    TODO: calculate the integration using trapezoidal rule
    M: upperbound of the integration
    k: the kth momentum
    n: number of splits for trapezoidal rule, default=10000
    """
    bounds = np.linspace(0,M,n)  # split M to n pieces
    interval = M/n               # the interval(x-axis length) of each partial sum
    sums = 0                     # buffer to calculate integration
    for i in range(n):
        x = bounds[i]
        fx = np.exp(-x)     # f(x)
        y = x**k * fx
        sums += interval*y
    return sums
if rank==0:
    M=np.array([1000])
else:
    M=np.zeros(1, dtype=int)

comm.Bcast(M, root=0)
M += 1000*rank   # add 1000 per rank
Jk = I(M[0], rank, n=10000)
Jks = comm.gather(Jk, root=0)
if rank==0:
    for i in range(N):
        print(f"The {i}-th moment approximation(J{i}): {Jks[i]}")
        
        
    
    


Overwriting 2020_spring_sol09_pr02.py


In [107]:
!mpirun -n 16 python 2020_spring_sol09_pr02.py

The 0-th moment approximation(J0): 1.0507332777775127
The 1-th moment approximation(J1): 0.9965729913945666
The 2-th moment approximation(J2): 1.999732959859084
The 3-th moment approximation(J3): 5.999605379059183
The 4-th moment approximation(J4): 23.997659381104473
The 5-th moment approximation(J5): 119.98784793292702
The 6-th moment approximation(J6): 719.9277930026687
The 7-th moment approximation(J7): 5039.496348113682
The 8-th moment approximation(J8): 40315.96979385826
The 9-th moment approximation(J9): 362843.7120461333
The 10-th moment approximation(J10): 3628437.0924776006
The 11-th moment approximation(J11): 39912808.2229183
The 12-th moment approximation(J12): 478953700.27700853
The 13-th moment approximation(J13): 6226398104.507026
The 14-th moment approximation(J14): 87169573390.50105
The 15-th moment approximation(J15): 1307543600235.6833
