# EE4375-2022: Fifth Lab Session: One-Dimensional Galerkin Finite Element Method using Distributed Memory Parallel Computing

Solves the Poisson equation $- \frac{d^2 \, u(x)}{dx^2} = f(x)$ on the unit bar domain $x \in \Omega=(0,1)$ supplied with various boundary conditions and various source terms. The Galerkin finite element method is employed. Here we target a parallel computing (using distributed memory) imnplementation. 

This problem can be solved using [GridapDistributed](https://gridap.github.io/GridapDistributed.jl/dev/) as students in the bachelor minor Computational Science and Engineering have convincingly shown. It remains valuable to dig for the details. 

General info on [parallel computing in Julia](https://juliaparallel.org/resources/) and [MPI.jl](https://github.com/JuliaParallel/MPI.jl). 

## Import Packages

In [1]:
using LinearAlgebra
using Plots
using LaTeXStrings
using SparseArrays
using BenchmarkTools 

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mPrecompiling BenchmarkTools [6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf]


## Section 1: Preprocessing 
- <b> first approach </b>: monolytic in memory approach: assuming global mesh, matrix and rhs vector to be available on all processors: assembly process similar to all processors; 
- <b> second approach </b>: distribute memory approach: distribute memory approach: distributed assembly and solve; include figure here; 

## Section 2: Parallel Assembly of Matrix and Right-Hand Side Vector 
- <b> first approach </b>: using distributed for loop as in [distributed computing toolbox](https://docs.julialang.org/en/v1/stdlib/Distributed/); similar to Fresh-Approach example; 
- using pmap; see above; difference between for-loop and map might be matter of taste;
- using lazy_map (gridap like);
- <b> second approach </b>: using SharedVector and SharedMatrix using [shared arrays](https://docs.julialang.org/en/v1/stdlib/SharedArrays/); or [Distributed Arrays](https://juliaparallel.org/DistributedArrays.jl/stable/); each processor fills its part of the matrix; decomposition according to elemements (as opposed to nodes); 
- using PartionedMatrices; 

<div>
<img src="./figures/dag-fem-1d.jpg" width=400 /> 
<center> Figure 1: Directed acyclic graph representation of decomposed mesh of 4 elements on the interval. A finite element discretization is assumed. </center>   
</div>

 

## Section 3: Parallel Linear System Solve 
- <b> first approach </b>: use backslash (nothing to be done); 
- using sequential preconditioned [conjugate gradient method](https://en.wikipedia.org/wiki/Conjugate_gradient_method) from [IterativeSolvers.jl]([https://github.com/JuliaLinearAlgebra/IterativeSolvers.jl)
- <b> second approach </b>: use preconditioned conjugate gradient method; using parallel BLAS1 and BLAS2 functions; using [sparse-matrix multiplication](https://github.com/JuliaInv/ParSpMatVec.jl);
- use PCG with proper [overlap of computation and communication](https://netlib.org/linalg/html_templates/node107.html#SECTION00941100000000000000);  

### Parallel BLAS-1 Operations 

### Parallel BLAS-2 Operations 
See e.g. [Parallel Linear Algebra by Deprez](https://fdesprez.github.io/teaching/par-comput/lectures/slides/L8-AlLinPar-2p.pdf). 

## Section 4: Postprocessing 
Visualize the computed solution. 