# Sampling out of a bivariate Normal distribution {#exr-sample-bivariate-normal}

<hr>

Draw 4000 samples out of a bivariate Normal distribution with mean $\boldsymbol{\mu} = (10, 20)$ and a covariance matrix of

\begin{align}
\mathsf{\Sigma} = \begin{pmatrix}
4 & -2 \\
-2 & 6
\end{pmatrix}
\end{align}

using each of the following three methods.

**a)** Using Numpy.

**b)** Using Stan's build-in random number generator (that is, in the `generated quantities` block).

**c)** Using Stan's MCMC sampler.

Make plots of the samples to show they are consistent.

## Solution

<hr>

In [1]:
# Colab setup ------------------
import os, sys, subprocess
if "google.colab" in sys.modules:
    cmd = "pip install --upgrade bebi103 iqplot arviz cmdstanpy watermark"
    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
    import cmdstanpy; cmdstanpy.install_cmdstan()
    from bebi103.stan import install_cmdstan_colab
    install_cmdstan_colab()
# ------------------------------

import numpy as np

import cmdstanpy
import arviz as az

import iqplot
import bebi103

import bokeh.io
bokeh.io.output_notebook()

**a)** To sample using Numpy, we use `rng.multivariate_normal()`.

In [2]:
rng = np.random.default_rng()
mu = np.array([10, 20])
Sigma = np.array([[4, -2], [-2, 6]])

samples_np = rng.multivariate_normal(mu, Sigma, size=4000)

  samples_np = rng.multivariate_normal(mu, Sigma, size=4000)
  samples_np = rng.multivariate_normal(mu, Sigma, size=4000)
  samples_np = rng.multivariate_normal(mu, Sigma, size=4000)


**b)** To sample using Stan's random number generator (not MCMC), we use the Stan code below.

```stan
generated quantities {
  vector[2] y;

  {
    vector[2] mu = [10, 20]';
    matrix[2, 2] Sigma = [[4, -2], [-2, 6]]; 

    y = multi_normal_rng(mu, Sigma);
  }

}
```

Note that variables declared in the curly braces are not stored by the sampler.

Now, let's compile and sample!

In [3]:
with bebi103.stan.disable_logging():
    sm = cmdstanpy.CmdStanModel(stan_file='bivariate_normal.stan')
    samples = sm.sample(iter_sampling=4000, chains=1, fixed_param=True, show_progress=False)
    samples_stan = az.from_cmdstanpy(samples)

**c)** Now let's do it with MCMC. The Stan code is:

```stan
parameters {
  vector[2] y;
}

model {
  vector[2] mu = [10, 20]';
  matrix[2, 2] Sigma = [[4, -2], [-2, 6]];

  y ~ multi_normal(mu, Sigma);
}
```

Let's compile and sample!

In [4]:
with bebi103.stan.disable_logging():
    sm = cmdstanpy.CmdStanModel(stan_file='bivariate_normal_mcmc.stan')
    samples = sm.sample(show_progress=False)
    samples_stan_mcmc = az.from_cmdstanpy(samples)

To compare all of the samples, let's start by making a scatter plot.

In [5]:
p = bokeh.plotting.figure(
    frame_width=300, frame_height=300, x_axis_label="y[0]", y_axis_label="y[1]"
)

p.scatter(samples_np[:, 0], samples_np[:, 1], size=2, alpha=0.2, legend_label="Numpy")
p.scatter(
    samples_stan.posterior.y.squeeze().sel(y_dim_0=0),
    samples_stan.posterior.y.squeeze().sel(y_dim_0=1),
    size=2,
    alpha=0.2,
    color='orange',
    legend_label="Stan",
)
p.scatter(
    samples_stan_mcmc.posterior.y.stack(dim=dict(sample=('chain', 'draw'))).sel(y_dim_0=0),
    samples_stan_mcmc.posterior.y.stack(dim=dict(sample=('chain', 'draw'))).sel(y_dim_0=1),
    size=2,
    alpha=0.2,
    color='tomato',
    legend_label="Stan MCMC",
)

p.legend.click_policy = 'hide'

bokeh.io.show(p)

The samples all seem to overlap, a good sign!

We can also check the ECDFs of the marginalized distributions. First, for `y[0]`.

In [6]:
p = iqplot.ecdf(samples_np[:, 0])
p = iqplot.ecdf(
    samples_stan.posterior.y.squeeze().sel(y_dim_0=0),
    q='y[0]',
    p=p,
    line_kwargs=dict(color="orange"),
)
p = iqplot.ecdf(
    samples_stan_mcmc.posterior.y.stack(dim=dict(sample=('chain', 'draw'))).sel(y_dim_0=0),
    p=p,
    line_kwargs=dict(color="tomato"),
)

bokeh.io.show(p)

Nearly identical! Let's check the other marginal distribution.

In [7]:
p = iqplot.ecdf(samples_np[:, 1])
p = iqplot.ecdf(
    samples_stan.posterior.y.squeeze().sel(y_dim_0=1),
    q='y[1]',
    p=p,
    line_kwargs=dict(color="orange"),
)
p = iqplot.ecdf(
    samples_stan_mcmc.posterior.y.stack(dim=dict(sample=('chain', 'draw'))).sel(y_dim_0=1),
    p=p,
    line_kwargs=dict(color="tomato"),
)

bokeh.io.show(p)

Everything looks great!

In [8]:
bebi103.stan.clean_cmdstan()

## Computing environment

In [9]:
%load_ext watermark
%watermark -v -p numpy,cmdstanpy,arviz,bebi103,iqplot,jupyterlab
print("cmdstan   :", bebi103.stan.cmdstan_version())

Python implementation: CPython
Python version       : 3.12.11
IPython version      : 9.1.0

numpy     : 2.1.3
cmdstanpy : 1.2.5
arviz     : 0.21.0
bebi103   : 0.1.27
iqplot    : 0.3.7
jupyterlab: 4.3.7

cmdstan   : 2.36.0
