# Noisy and Repellant

Your task with this assignment is to parallelize a serial application using OpenMP, and to use profiling tools to assess different approaches to that parallelization.

This assignment is intended for the CPU-only nodes, where we can get a lot of CPU thread concurrency.

In [3]:
module use $CSE6230_DIR/modulefiles
module load cse6230

|                                                                         |
|       A note about python/3.6:                                          |
|       PACE is lacking the staff to install all of the python 3          |
|       modules, but we do maintain an anaconda distribution for          |
|       both python 2 and python 3. As conda significantly reduces        |
|       the overhead with package management, we would much prefer        |
|       to maintain python 3 through anaconda.                            |
|                                                                         |
|       All pace installed modules are visible via the module avail       |
|       command.                                                          |
|                                                                         |


Last time, we modelled non-interacting particles as an example of a streaming kernel.  This time, we are going to advance things a bit and have particles that interact with each other and with the "media" in which they move.

Both gravitational and electrical forces are often modeled as deriving from a simple *potential*: the potential between two particles $p_1$ and $p_2$ is a function of their distance:

$$ \varphi(p_1, p_2) = k \frac{e_1 e_2}{\|r_1 - r_2\|}, $$

Where $r_1$ and $r_2$ are their positions, $e_1$ and $e_2$ are their charges, and $k$ is a scaling factor.  In this assignment we will assume $e_i = 1$ for all particles.  The force acting on $p_1$ due to the potential is the negative of its gradient with respect to $r_2$,

$$ F(p_1, p_2) = -\nabla_{r_1} \varphi(p_1, p_2). $$

The whole equal-and-opposite thing in physics implies that a force with the same magnitude and opposite direction acts on $p_2$: $F(p_2, p_1) = - F(p_1, p_2)$.  In this toy problem, all particle masses are equal to 1, so the acceleration due to a force is equivalent to the force.  The *total acceleration* experienced by a particle is the sum of the forces from all other particles:

$$ \partial v_i / \partial t = \sum_{j \neq i} F(p_i, p_j).$$

Now we see what makes this different from the streaming kernel we studied last week: to update one particle involves contributions from all other particles, $O(N_p^2)$ interactions!

Particles that are affected only by potentials is a good model for particles moving in a vaccuum, but sometimes we want to model particles moving in a medium, where the collide frequently and randomly with other particles.  This type of motion is called [Brownian motion](https://en.wikipedia.org/wiki/Brownian_motion). We'll skip a bunch of statistical physics and jump to the conclusion: whereas numerical time-stepping of a classical force
often takes the form of an update like

$$ x_{i,t+1} = x_{i,t} + f \Delta{t}, $$

Brownian motion looks like,

$$ x_{i,t+1} = x_{i,t} + \sqrt{2 d \Delta{t}}z, $$

where $z$ is a realization of a random variable.

If we have charged particles moving in a medium, then both potential and noisy contributions affect the motion of the particle.  Our program for this assignment includes both!  That makes this program more complicated to model and to optimize: we cannot reduce the performance down to the behavior of one kernel, but must try to evaluate when each kernel is the bottleneck.

Before you start with the actual assignment, let me show you what all of this looks like.
We run the program with `make runcloud`, like last time, and many of the variables that define the behavior of the target are the same: `NP` is the number of particles, `DT` is the step size, and `NT` is the number of steps. `K` is the potential coefficient $k$: negative values cause particles to attract and positive values cause particles to repel.  `D` is the diffusion coefficient of Brownian motion.

We can make this example look most like the last assignment by turning off Brownian motion and choosing negative $k$:

In [19]:
make runcloud NP=32 DT=1.e-5 NT=400 D=0. K=-1.

make clean
make[1]: Entering directory `/nv/coc-ice/tisaac3/srv/rep/cse6230/assignments/interacting-particles'
rm -f *.o cloud cloud2
make[1]: Leaving directory `/nv/coc-ice/tisaac3/srv/rep/cse6230/assignments/interacting-particles'
make cloud
make[1]: Entering directory `/nv/coc-ice/tisaac3/srv/rep/cse6230/assignments/interacting-particles'
icc -std=c99 -g -Wall -fPIC -O3 -xHost -qopt-report=5 -I../../utils -I../../utils/tictoc   -c -o verlet.o verlet.c
icc: remark #10397: optimization reports are generated in *.optrpt files in the output location
icc -std=c99 -g -Wall -fPIC -O3 -xHost -qopt-report=5 -I../../utils -I../../utils/tictoc   -c -o verlet_stream_and_noise.o verlet_stream_and_noise.c
icc: remark #10397: optimization reports are generated in *.optrpt files in the output location
icc -std=c99 -g -Wall -fPIC -O3 -xHost -qopt-report=5 -I../../utils -I../../utils/tictoc   -c -o verlet_accelerate.o verlet_accelerate.c
icc: remark #10397: optimization reports are generated in *.opt

You'll notice the output describes the "Hamiltonian" of the system.  This is like the total energy, and is theoretically conserved.  When I ran the above, it was conserved to 5 decimal places.  But these $n$-body systems are chaotic, and in particular they are unstable with particles that attract each other.  If we run the same program for just twice as long:

In [20]:
make runcloud NP=32 DT=1.e-5 NT=800 D=0. K=-1.

make clean
make[1]: Entering directory `/nv/coc-ice/tisaac3/srv/rep/cse6230/assignments/interacting-particles'
rm -f *.o cloud cloud2
make[1]: Leaving directory `/nv/coc-ice/tisaac3/srv/rep/cse6230/assignments/interacting-particles'
make cloud
make[1]: Entering directory `/nv/coc-ice/tisaac3/srv/rep/cse6230/assignments/interacting-particles'
icc -std=c99 -g -Wall -fPIC -O3 -xHost -qopt-report=5 -I../../utils -I../../utils/tictoc   -c -o verlet.o verlet.c
icc: remark #10397: optimization reports are generated in *.optrpt files in the output location
icc -std=c99 -g -Wall -fPIC -O3 -xHost -qopt-report=5 -I../../utils -I../../utils/tictoc   -c -o verlet_stream_and_noise.o verlet_stream_and_noise.c
icc: remark #10397: optimization reports are generated in *.optrpt files in the output location
icc -std=c99 -g -Wall -fPIC -O3 -xHost -qopt-report=5 -I../../utils -I../../utils/tictoc   -c -o verlet_accelerate.o verlet_accelerate.c
icc: remark #10397: optimization reports are generated in *.opt

For this assignment, I made a tool for us to visualize or simulations.  We can generate videos of our simulations, and we can embed them in this notebook.  We do this with the
`make vizcloud` target, which takes the same arguments as `make runcloud`, but also `CHUNK` (the number of time steps between frames of the video) and `VIZNAME`, the basename of the output video.  Like so:

In [23]:
make vizcloud NP=32 DT=1.e-5 NT=800 D=0. K=-1. CHUNK=10 VIZNAME="attract"

make clean
make[1]: Entering directory `/nv/coc-ice/tisaac3/srv/rep/cse6230/assignments/interacting-particles'
rm -f *.o cloud cloud2
make[1]: Leaving directory `/nv/coc-ice/tisaac3/srv/rep/cse6230/assignments/interacting-particles'
make cloud
make[1]: Entering directory `/nv/coc-ice/tisaac3/srv/rep/cse6230/assignments/interacting-particles'
icc -std=c99 -g -Wall -fPIC -O3 -xHost -qopt-report=5 -I../../utils -I../../utils/tictoc   -c -o verlet.o verlet.c
icc: remark #10397: optimization reports are generated in *.optrpt files in the output location
icc -std=c99 -g -Wall -fPIC -O3 -xHost -qopt-report=5 -I../../utils -I../../utils/tictoc   -c -o verlet_stream_and_noise.o verlet_stream_and_noise.c
icc: remark #10397: optimization reports are generated in *.optrpt files in the output location
icc -std=c99 -g -Wall -fPIC -O3 -xHost -qopt-report=5 -I../../utils -I../../utils/tictoc   -c -o verlet_accelerate.o verlet_accelerate.c
icc: remark #10397: optimization reports are generated in *.opt

: 2