# Parallel Computing

Modern scientific computing tasks often involve massive datasets and computationally expensive algorithms.
Problems like large-scale simulations, statistical sampling (e.g., Monte Carlo methods), and real-time data processing demand a level of performance that cannot be achieved with sequential execution alone.

Parallel computing is the practice of dividing a problem into smaller subproblems that can be solved simultaneously.
With the rise of multicore processors, distributed systems, and GPUs, parallel computing is now essential for high-performance computing (HPC).

This lecture introduces key ideas, theoretical limits, and practical tools for parallel computation.

## Theoretical Foundations

Before we explore specific tools and implementations, it's important to understand the theoretical limits of parallel computing.
These foundational concepts help us answer questions like:
* What is the maximum possible speedup if we parallelize a task?
* Where should we invest our effort to gain performance?
* Why do some problems benefit more from parallelization than others?

In addition, scaling analyses provide practical ways to assess real-world performance of parallel codes.

### Amdahl's Law

Let $f$ be the fraction of a program that must be executed sequentially.
The maximum speed-up $S$ obtainable with $P$ processors is:
\begin{align}
  S(P) = \frac{1}{f + (1-f)/P}.
\end{align}
As $P \to \infty$, $S \to 1/f$.

**Implication:** Even small sequential portions limit total speedup.

Amdahl's law illustrates that optimizing the serial part of a program can be more impactful than parallelizing the rest.
For example, if 5% of the computation is inherently sequential, no matter how many processors are used, we cannot speed up the program more than 20x.
This is espeically important when one design algorithms for leadership HPC (e.g., DOE Frontier).
It corresponds to "strong scaling tests".

### Gustafson's Law

Recognizes that in order to fully utilize computing resources, problem size often needs to scale with the number of processors:
\begin{align}
  S(P) = P - f(P - 1)
\end{align}
Assumes the workload increases with $P$, thus avoiding Amdahl's pessimism.

**Implication:** In practice, we often scale up problems as we add more resources.
This law gives a more optimistic and realistic view in scientific computing, where we often increase the resolution or domain size with more computing power.
It corresponds to "weak scaling tests".

### Flynn's Taxonomy

A classification of computer architectures:

* SISD: Single Instruction Single Data (standard CPU)
* SIMD: Single Instruction Multiple Data (vector processors, GPUs)
* MISD: Rare, mostly theoretical
* MIMD: Multiple Instruction Multiple Data (clusters, multicore CPUs)

Flynn's taxonomy helps us map programming models to the underlying hardware.
For instance, OpenMP typically targets MIMD systems with shared memory, while SIMD models underpin GPUs and vectorized CPU instructions