# Lecture 18: Bootstrapping and Covariances

### Nov. 3, 2025

# Agenda

- Introduction (this notebook, *00_Introduction*)
- Linear algebra and matrix reminder
- Bootstrapping (*01_bootstrapping*)

## The Big(ger) Guns

If you have a linear problem you are trying to solve (or if you can linearize it), you can do *much* better than this iterative mumbo-jumbo.  You can solve it in one shot!  Let's go back to our $y_i = mx_i +b + n_i$ example, but extend it to two dimensions: $z_i = ax_i+by_i+c + n_i$

You know what $x_i$ and $y_i$ are (they are the coordinates of your measurement), and you measured $z_i$.  It turns out you can frame the measurements you made as a matrix multiplication:

\begin{equation}
\begin{pmatrix} 
x_0 & y_0 & 1 \\ 
x_1 & y_1 & 1 \\ 
\vdots & \vdots & \vdots \\ 
x_i & y_i & 1 
\end{pmatrix}
\begin{pmatrix} 
a \\ 
b \\ 
c 
\end{pmatrix}
=
\begin{pmatrix} 
z_0 \\ 
z_1 \\ 
\vdots \\ 
z_i 
\end{pmatrix}
\end{equation}

Let's define the first matrix to be $\mathbf{A}$, the second vector (our parameters to solve for) as $\mathbf{\theta}$, and our measurements $\mathbf{z}$.  Then the above equation reads:
\begin{equation}
\mathbf{A}\cdot\mathbf{\theta} = \mathbf{z}
\end{equation}
Because $\mathbf{A}$ is not a square matrix, it is not generally invertible, but $\mathbf{A}^\dagger \mathbf{A}$ is.  It will be, in this case, a 3x3 matrix.  This means we can re-write the above as:
\begin{equation}
\mathbf{A}^\dagger\mathbf{A}\cdot\mathbf{\theta} = \mathbf{A}^\dagger\mathbf{z}
\end{equation}
And then, constructing the matrix inverse $(\mathbf{A}^\dagger\mathbf{A})^{-1}$, and applying to both sides, we have:
\begin{equation}
\mathbf{\theta} = (\mathbf{A}^\dagger\mathbf{A})^{-1}\mathbf{A}^\dagger\mathbf{z}
\end{equation}

The final flourish is, if not all measurements have the same noise, to do inverse-variance weighting.  If we assume our noise for each measurement is independent, we can write down a noise matrix $\mathbf{N}$ that is diagonal and has $\sigma_i^2$ in each row corresponding
to the i$^{\rm th}$ measurement.  Then $\mathbf{N}^{-1}$ is the inverse variance weighting.
Adding that in at the beginning, we can run through the same math to get the final answer:
\begin{equation}
\mathbf{\theta} = (\mathbf{A}^\dagger\mathbf{N}^{-1}\mathbf{A})^{-1}\mathbf{A}^\dagger\mathbf{N}^{-1}\mathbf{z}
\end{equation}

## Linear Algebra Reminder

$\log p (\{y_n\} \,| \, \theta) = - \frac{1}{2} \sum_{n=1}^{N} [ \frac{[y_n-f_n]^2}{\sigma_n^2} + \log(2\pi \sigma_n^2) ] $

can be extended to more dimensions as:

$\log p (\{y_n\} \,| \, \theta) = - \frac{1}{2} r^{\rm T} C^{-1} r - \frac{1}{2}\log{\rm det} C - \frac{N}{2}\log(2\pi) $

if

$ r= \begin{pmatrix}
y_1-f_1 \\
\vdots \\
y_N-f_N
\end{pmatrix} $

and 

$ C= \begin{pmatrix}
\sigma_1^2 & & 0 \\
& \ddots & \\
0 & & \sigma_N^2
\end{pmatrix} $
