# Monte Carlo integration

<a href="http://www.physics.adelaide.edu.au/cssm/lattice/" target="_blank"><img src="https://raw.githubusercontent.com/wlough/CU-Phys2600-Fall2025/main/lectures/img/baryon-potential-lattice.gif" /></a>

## PHYS 2600: Scientific Computing

## Lecture 21

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

Previously, we introduced the __Monte Carlo method__: using random inputs to a deterministic process (usually a complicated one!), and then obtaining useful information from the distribution of outputs:

$$
\langle f(x) \rangle = \sum_i f(x_i) p(x_i) \\
\approx \frac{1}{N} \sum_i f(x_i) \ \ \ \ {\rm if}\ p(x_i)\ {\rm constant}
$$
(last equation if $p(x_i)$ is constant.)  Here $f$ represents the deterministic process, and we approximate the result with a sample of size $N$.  The _central limit theorem_ tells us that this converges as $N$ increases, and the expected uncertainty (standard error $\sigma_{\rm SEM} \propto 1/\sqrt{N}$.) We saw this in tutorial 20:

In [None]:
def random_means(n, T):  # T trials of sampling n random numbers on (0,1) and averaging
    mean_array = np.zeros(T)
    for i in range(T):
        mean_array[i] = np.mean(np.random.rand(n))
    return mean_array


plt.hist(random_means(500, 1000))
plt.hist(random_means(2000, 1000))
plt.hist(random_means(8000, 1000))


Random processes like radioactive decay are a natural application of Monte Carlo.  Other common uses of Monte Carlo are on things which are just too complex to predict, so they can be treated as random (modeling of financial markets, for instance.)

But there are a few clever ways to exploit Monte Carlo even for _problems that had no randomness to begin with._  The most important example is __Monte Carlo integration__.

## Monte Carlo integration

Every integral has a _geometric_ interpretation - the first example we all encounter is that the integral

$$
\int_a^b dx\ f(x)
$$

<img src="https://raw.githubusercontent.com/wlough/CU-Phys2600-Fall2025/main/lectures/img/mcint-sketch-1.png" width=400px style="float:right;"/>

is equal to the area under the curve $f(x)$ between $a \leq x \leq b$. 

Geometry is easy to exploit in terms of probability!  Suppose we print out a picture of $f(x)$ on a piece of paper with width $(b-a)$ and height $h$.  Then if we __throw darts__ at the picture which land randomly, the probability of a dart landing _below the curve_ is

$$
p(\rm{below}) = \frac{\textrm{area under } f(x)}{\textrm{area of plot}} = \frac{1}{h(b-a)} \int_a^b dx\ f(x).
$$

This gives us a way to calculate the integral using a random process!



Of course, doing the actual experiment would be time-consuming - much easier to do this on a computer with random numbers!  We draw random points $(x_i, y_i)$ in the region $a \leq x \leq b$ and $y_0 \leq y \leq y_0 + h$, and count how many satisfy $y_i < f(x_i)$.  Then we apply our formula:  

$$
\int_a^b dx\ f(x) = A \times p({\rm below}) = A \frac{N({\rm below})}{N(\rm{total})},
$$
where $A = h(b-a)$ is the area of our sampling region.

<img src="https://raw.githubusercontent.com/wlough/CU-Phys2600-Fall2025/main/lectures/img/mcint-sketch-2.png" width=600px />

Obviously, the sampling region has to fully contain $f(x)$ - i.e. we'll get the wrong answer if we cut the function off.

The "randomly sampling an area" version of this (aka "__hit-or-miss Monte Carlo__") is intuitive, but it's really a special case of a more general trick.  

If we start with an integral of the form
$$
I = \int d\mathbf{x}\ f(\mathbf{x}),
$$
we can multiply and divide by some _probability density function $p(\mathbf{x})$_, which must be non-zero for any $\mathbf{x}$ we're integrating over:

$$
I = \int d\mathbf{x}\ p(\mathbf{x}) \frac{f(\mathbf{x})}{p(\mathbf{x})} = \left\langle \frac{f(\mathbf{x})}{p(\mathbf{x})} \right\rangle_p
$$

In the last equality, we're noticing that $I$ is just the __expectation value__ $\langle f(\mathbf{x}) / p(\mathbf{x}) \rangle$ with respect to $p(\mathbf{x})$.  

The last step is to draw random numbers!  If we use the RNG to draw $N$ points $\mathbf{x}_i$ randomly from $p(\mathbf{x})$, then the expectation value is just an average:
$$
I \approx \frac{1}{N} \sum_i \frac{f(\mathbf{x}_i)}{p(\mathbf{x}_i)}.
$$

We have a lot of freedom to choose $p(\mathbf{x})$, but the most common choice is the __uniform__ distribution: if our integration volume is $V$, then $p(\mathbf{x}) = 1/V$, and we have

$$
I = V \langle f(\mathbf{x}) \rangle_p \approx \frac{V}{N} \sum_i f(\mathbf{x}_i)
$$

drawing $N$ samples to approximate by Monte Carlo.  

Thanks to the central limit theorem, we even know what the error is on our integral estimate!  It can be proven that

$$
I \approx V \langle f(\mathbf{x}) \rangle \pm V \sqrt{\frac{\langle f(\mathbf{x})^2 \rangle - \langle f(\mathbf{x}) \rangle^2}{N}} 
$$

so the error on $I$ goes as $1/\sqrt{N}$.

Here's a sketch showing how the more direct Monte Carlo integration algorithm works: we choose points $\{x_i\}$ randomly, evaluate $f(x_i)$, and then average.

<img src="https://raw.githubusercontent.com/wlough/CU-Phys2600-Fall2025/main/lectures/img/mcint-sketch-mean-value.png" width=600px />

We can think of this as an application of the __mean-value theorem__ from calculus: in 1-d, there exists some $c$ such that

$$
\frac{1}{b-a} \int_a^b dx\ f(x) = f(c).
$$

In the direct case, we use random numbers to estimate the "average value" $f(c)$.  This is clearly more efficient than hit-or-miss, because we're picking points in one less dimension - just $x$ instead of $(x,y)$.

## The curse of dimensionality

Although our sketches have all been one-dimensional, Monte Carlo integration is easy to use in arbitrary numbers of dimensions.  In fact, if the dimension $d$ is large enough, __random sampling almost always beats grid-based integration!__

The reason Monte Carlo wins is because of how the error scales.  Suppose we are doing an integral in $d=10$, and we need to make the error smaller by a factor of 2.  For grid-based integration, this means we have to reduce the grid spacing, $dx \rightarrow dx/2$. 

But in ten dimensions, that means the number of points in our grid becomes:

$$
N_{\rm grid}' = \frac{dx}{dx'} N_{\rm grid} = 2^{10} N_{\rm grid} = 1024 N_{\rm grid}.
$$

So we have to use __1,000 times more points__ to reduce the error by 2.  But for Monte Carlo integration, the error $\sigma \sim 1/\sqrt{N}$, so we just need __4 times more samples__ to reduce the error by 2 - much better!

This analysis is very crude: for example, we might do better with a _non-uniform_ grid which just uses many points in areas where the integrand changes sharply.  But in practice, Monte Carlo usually starts to beat grid integration around $d = 8$ or so.

Monte Carlo integration does not save us completely from an effect known as the __curse of dimensionality__: numerical integrals are much harder in high dimensions.

For Monte Carlo, the problem which appears as $d$ becomes very large is that the integration volume can become very small compared to the sampling volume.  (In 3 dimensions, the sphere fills 53\% of a cube touching its surface ($V_s/V_c = \pi/6 \approx 0.53$).  In 10 dimensions, the sphere fills 0.2\% of the cube!)  This leads to __poor sampling__:

<img src="https://raw.githubusercontent.com/wlough/CU-Phys2600-Fall2025/main/lectures/img/mcint-sketch-3.png" width=400px />

The same problem occurs with sharply-peaked functions in lower dimensions.  Drawing uniformly from a region where $f(x)$ is mostly zero results in very bad estimates for $I$.

In cases like this, we need to pick a better $p(x)$.  If we have a good idea of the shape of the function, we can do this by hand - otherwise, the technique of __importance sampling__ can algorithmically adjust our random draws to better match a particular function.


There are two common importance-sampling algorithms (and Python modules) you should be aware of:

1. __Adaptive sampling__ does the integral repeatedly, looking for where the contributions are large and moving the random points around accordingly.  The `vegas` module ([official documentation](https://vegas.readthedocs.io/en/latest/)) is the gold standard of adaptive sampling.
2. __Markov-chain Monte Carlo__ uses a fancy random-number generation algorithm called a _Markov chain_ to automatically attempt to draw random samples where the function to be integrated is sharply peaked.  The `emcee` module ([official documentation](https://emcee.readthedocs.io/)) provides a robust MCMC algorithm designed for difficult functions.

We won't cover `vegas` or `emcee` in this class; they are power tools that are complicated to use and understand.  For simple problems, it's better to just write your own Monte Carlo integrator, as we'll do on the tutorial.  If you encounter an integral you can't do that way, _then_ it's time to learn one of those modules.

## Tutorial 21

Let's do some integrals!