# Normal approximations

<hr />

**a)** Imagine I have a univariate continuous distribution with PDF $f(y)$ that has a maximum at $y^*$. Assume that the first and second derivatives of $f(y)$ are defined and continuous near $y^*$. Show by expanding the log PDF of this distribution in a Taylor series about $y^*$ that the distribution is locally Normal near the maximum.

In performing the Taylor series, how is the scale parameter $\sigma$ of the Normal approximation of the distribution related to the log PDF of the distribution it is approximating?

**b)** Another way you can approximate a distribution as Normal is to use its mean and variance as the parameters as the approximate Normal. We will call this technique "equating moments." Can you do this if the distribution you are approximating has heavy tails, say like a Cauchy distribution? Why or why not?

**c)** Make plots of the PDF and CDF of the following distributions with their Normal approximations as derived from the Taylor series and by equating moments. Do you have any comments about the approximations?

<!-- - Student-t with *µ = 0*, *σ* = 1, and *ν* = 4
- Cauchy with with *µ = 0*, *σ* = 1 -->
- Beta with *α* = *β* = 10
- Gamma with *α* = 5 and *β* = 2

**d)** Discrete distributions are also often approximated as Normal. In fact, early studies of the Normal distributions arose from it being used to approximate a Binomial distribution. Use the method of equating moments to make a plot of the PMF of the Binomial distribution and the PDF of the Normal approximation of the Binomial distribution for:

- Binomial with *N* = 100 and *θ* = 0.1.
- Binomial with *N* = 10 and *θ* = 0.1.

Comment on what you see.

<!--I do not ask for CDF for Binomial because plotting CDFs for discrete distributions means you have to make staircases, which is kind of tricky. -->

## Solution

<hr>

In [1]:
import numpy as np
import scipy.stats as st

import bokeh.io
import bokeh.plotting
import bokeh.layouts

bokeh.io.output_notebook()

**a)** This is described in the [Distribution Explorer](https://distribution-explorer.github.io/continuous/normal.html#related-distributions). Importantly, the variance of the approximate Normal is given by

\begin{align}
\sigma^2 = \left(-\frac{\mathrm{d}^2\,\ln f(y;\alpha, \beta)}{\mathrm{d}y^2}\right)^{-1}.
\end{align}

**b)** You cannot approximate a heavy tailed distribution this way because if the tails are heavy, second, or even first, moments may not exist. This is the case with the Cauchy distribution.

**c)** The mean and variance of the Beta and Gamma distributions are, referring to the Distribution Explorer:

- **Beta.** mean: $\alpha/(\alpha + \beta)$, variance: $\alpha\beta/(\alpha + \beta)^2(1 + \alpha + \beta)$
- **Gamma.** mean: $\alpha/\beta$, variance: $\alpha/\beta^2$

To do the Taylor expansion, we need the mode of each and additionally the value of the second derivative of the log PDF at the mode. We will start with the Beta distribution.

\begin{align}
\frac{\mathrm{d}\,\ln f(y;\alpha, \beta)}{\mathrm{d}y} = \frac{\mathrm{d}}{\mathrm{d}y}\,\left[(\alpha-1)\ln y + (\beta - 1)\ln (1-y)\right] = \frac{\alpha-1}{y} - \frac{\beta - 1}{1 - y}.
\end{align}

Setting this derivative equal to zero and solving for $y$ to get the $y$ for which the Beta PDF is maximal gives

\begin{align}
y^* = \frac{\alpha - 1}{\alpha + \beta - 2}.
\end{align}

Since we must have $0 \le y^* \le 1$, we assume that $\alpha, \beta > 1$, which is the case for this problem. Computing the second derivative yields

\begin{align}
\frac{\mathrm{d}^2\,\ln f(y;\alpha, \beta)}{\mathrm{d}y^2} = -\frac{\alpha - 1}{y^2} - \frac{\beta-1}{(1-y)^2}.
\end{align}

Negating and inverting this evaluated at $y^*$ gives the variance. So, we have arrived at the mean and variance of our approximate Normal distribution,

\begin{align}
&\mu = \frac{\alpha - 1}{\alpha + \beta - 2},\\[1em]
&\sigma^2 = \left(\frac{\alpha - 1}{\mu^2} + \frac{\beta-1}{(1-\mu)^2}\right)^{-1} = \frac{(\alpha-1)(\beta-1)}{(\alpha + \beta - 2)^3}.
\end{align}

Let's make a plot!

In [2]:
alpha = 10
beta = 10

mu_taylor = (alpha - 1) / (alpha + beta - 2)
sigma_taylor = 1 / np.sqrt(
    (alpha - 1) / mu_taylor ** 2 + (beta - 1) / (1 - mu_taylor) ** 2
)

mu_moment = alpha / (alpha + beta)
sigma_moment = np.sqrt(alpha * beta / (alpha + beta) ** 2 / (1 + alpha + beta))

y = np.linspace(0, 1, 200)
f = st.beta.pdf(y, alpha, beta)
taylor_f = st.norm.pdf(y, mu_taylor, sigma_taylor)
moment_f = st.norm.pdf(y, mu_moment, sigma_moment)
F = st.beta.cdf(y, alpha, beta)
taylor_F = st.norm.cdf(y, mu_taylor, sigma_taylor)
moment_F = st.norm.cdf(y, mu_moment, sigma_moment)

def plot_curves(y, f, taylor_f, moment_f, F, taylor_F, moment_F, title):
    p_cdf = bokeh.plotting.figure(
        frame_height=200,
        frame_width=300,
        toolbar_location="above",
        x_axis_label="y",
        y_axis_label="F(y)",
        title=title,
        x_range=[y.min(), y.max()],
    )

    p_pdf = bokeh.plotting.figure(
        frame_height=200,
        frame_width=300,
        toolbar_location="above",
        x_axis_label="y",
        y_axis_label="f(y)",
    )
    
    p_pdf.x_range = p_cdf.x_range

    p_pdf.line(y, f, line_width=2, color="#1f77b3")
    p_pdf.line(y, taylor_f, line_width=2, color="#ff7e0e")
    p_pdf.line(y, moment_f, line_width=2, color="#2ba02b")
    
    p_cdf.line(y, F, line_width=2, color="#1f77b3", legend_label='exact')
    p_cdf.line(y, taylor_F, line_width=2, color="#ff7e0e", legend_label='Taylor')
    p_cdf.line(y, moment_F, line_width=2, color="#2ba02b", legend_label='moments')
    p_cdf.legend.location = "bottom_right"

    return bokeh.layouts.gridplot([p_pdf, p_cdf], ncols=2)

bokeh.io.show(plot_curves(y, f, taylor_f, moment_f, F, taylor_F, moment_F, "Beta"))

The Normal approximation, in both the Taylor series case and that of equating moments, is close to the Beta distribution. In general, Normal approximations work well when a peak in a distribution is symmetric (and in the case of $\alpha$ and $\beta$ both large, the Beta distribution is approximately normal anyway).

We can do the same procedure for the Gamma distribution to get the Taylor series approximation. The first and second derivatives of the log PDF are

\begin{align}
&\frac{\mathrm{d}\,\ln f(y;\alpha, \beta)}{\mathrm{d}y} = \frac{\mathrm{d}}{\mathrm{d}y}\,\left[(\alpha-1)\ln y - \beta y\right] = \frac{\alpha-1}{y} - \beta, \\[1em]
&\frac{\mathrm{d}^2\,\ln f(y;\alpha, \beta)}{\mathrm{d}y^2} = -\frac{\alpha - 1}{y^2}.
\end{align}

Setting the first derivative equal to zero gives $y^* = (\alpha-1)/\beta$. The variance of the approximate Normal is then $\sigma^2 = \beta^2 / (\alpha-1)$. We can again make plots.

In [3]:
alpha = 5
beta = 2

mu_taylor = (alpha - 1) / beta
sigma_taylor = beta / np.sqrt(alpha - 1)

mu_moment = alpha / beta
sigma_moment = np.sqrt(alpha / beta ** 2)

y = np.linspace(0, 10, 200)
f = st.gamma.pdf(y, alpha, loc=0, scale=1 / beta)
taylor_f = st.norm.pdf(y, mu_taylor, sigma_taylor)
moment_f = st.norm.pdf(y, mu_moment, sigma_moment)
F = st.gamma.cdf(y, alpha, loc=0, scale=1 / beta)
taylor_F = st.norm.cdf(y, mu_taylor, sigma_taylor)
moment_F = st.norm.cdf(y, mu_moment, sigma_moment)

bokeh.io.show(plot_curves(y, f, taylor_f, moment_f, F, taylor_F, moment_F, "Gamma"))

In this case, the Normal approximation breaks down because of the asymmetry of the peak in the Gamma distribution. The parameters are also such that the peak in the Gamma distribution is close to zero so that the Normal approximation crashes into $y = 0$.

In the the CDFs, it looks as though the equating moments procedure gave a better approximation. This can be misleading, though, since the left tails of the CDF hit zero too soon for both of the Normal approximations. To assess the approximations, we can look at the probability mass around the peak, say $1.5 \le y \le 2.5$. The exact value can be calculated for the Gamma distribution.

In [4]:
prob_mass = st.gamma.cdf(2.5, alpha, loc=0, scale=1 / beta) - st.gamma.cdf(
    1.5, alpha, loc=0, scale=1 / beta
)

prob_mass

np.float64(0.3747699594585599)

Let's see how far off the Taylor-approximated Normal is.

In [5]:
prob_mass - (
    st.norm.cdf(2.5, mu_taylor, sigma_taylor)
    - st.norm.cdf(1.5, mu_taylor, sigma_taylor)
)

np.float64(-0.008154963089466338)

This is off by about 0.008, or about 0.008 / 0.375 ≈ 2%. Now let's check the equating moments approximation.

In [6]:
prob_mass - (
    st.norm.cdf(2.5, mu_moment, sigma_moment)
    - st.norm.cdf(1.5, mu_moment, sigma_moment)
)

np.float64(0.06031664421990868)

This is much more substantial, off by an order of magnitude more than the Taylor approximation. Note, though, that in making the comparison, we had to be careful that we were using the CDF to look at total mass within a range; the entire CDF is shifted substantially left in the Taylor approximation. So, neither approximation is particularly good, each being bad in their own way.

**d)** The mean and variance of the Binomial distribution are respectively $N\theta$ and $N\theta(1-\theta)$, which we can use to parametrize an approximate Normal distribution. Let's start by making a plot with $N = 100$.

In [7]:
N = 100
theta = 0.1

y = np.arange(0, N)
y_norm = np.linspace(0, N, 200)

binom = st.binom.pmf(y, N, theta)
norm = st.norm.pdf(y_norm, N * theta, np.sqrt(N * theta * (1 - theta)))

p = bokeh.plotting.figure(
    frame_width=300,
    frame_height=200,
    x_axis_label='n',
    y_axis_label='P(n; N, θ)',
)
p.line(y_norm, norm, color='orange', line_width=2)
p.scatter(y, binom)

bokeh.io.show(p)

In inspecting the plot, we see that we get qualitatively good matching around the peak, but by zooming (or plotting on a log scale if we wish), we notice that the Normal distribution tail falls off faster than that of the Binomial.

Let's now move to smaller $N$, $N = 10$.

In [8]:
N = 10
theta = 0.1

y = np.arange(0, N)
y_norm = np.linspace(0, N, 200)

binom = st.binom.pmf(y, N, theta)
norm = st.norm.pdf(y_norm, N * theta, np.sqrt(N * theta * (1 - theta)))

p = bokeh.plotting.figure(
    frame_width=300,
    frame_height=200,
    x_axis_label='n',
    y_axis_label='P(n; N, θ)',
)
p.line(y_norm, norm, color='orange', line_width=2)
p.scatter(y, binom)

bokeh.io.show(p)

Here, we do not match as well, and the tail again falls off too fast. In fact, the Central Limit Theorem says that the Binomial distribution will approach a Normal distribution as $N\to\infty$ (not derived here), and this is qualitatively what we see in this exercise. The approximation gets worse as $N$ gets smaller.

A plot of the CDFs (not show here) demonstrates that the Normal distribution is shifted rightward from the Binomial it is approximating.

In summary, Normal approximations can be useful approximations, but can be very poor approximations if used in the wrong context.

## Computing environment

In [9]:
%load_ext watermark
%watermark -v -p numpy,scipy,bokeh,jupyterlab

Python implementation: CPython
Python version       : 3.13.5
IPython version      : 9.4.0

numpy     : 2.2.6
scipy     : 1.16.0
bokeh     : 3.7.3
jupyterlab: 4.4.5

