# Stochastic Lanczos Quadrature

Let $\vec{A}\in\R^{n\times n}$ be a symmetric matrix with eigendecomposition $\vec{A} = \sum_{i=1}^{n} \lambda_i \vec{u}_i \vec{u}_i^\T$, where $\lambda_i$ are the eigenvalues and $\vec{u}_i$ are the orthonormal eigenvectors of $\vec{A}$.
Matrix functions
\begin{equation*}
f(\vec{A}) := \sum_{i=1}^{n} f(\lambda_i) \vec{u}_i \vec{u}_i^\T
\end{equation*}
arise in a variety of applications.
In many settings, we are interested in approximating the *spectral sum*
\begin{equation*}
\tr(f(\vec{A})) = \sum_{i=1}^{n} f(\lambda_i).
\end{equation*}
A natural approach is to combine the implicit trace estimation algorithms discussed earlier in this chapter with black-box methods for approximating $\vec{x}\mapsto \vec{x}^\T f(\vec{A})\vec{x}$ or $\vec{x}\mapsto f(\vec{A})\vec{x}$.


## The Lanczos Method for Matrix Functions

The Lanczos method can be used to approximate the maps $\vec{x}\mapsto \vec{x}^\T f(\vec{A})\vec{x}$ and $\vec{x}\mapsto f(\vec{A})\vec{x}$.

:::{prf:definition} Krylov Subspace
Given a matrix $\vec{A}\in\R^{n\times n}$ and a vector $\vec{x}\in\R^n$, the *Krylov subspace* of order $k$ is defined as
\begin{equation*}
\mathcal{K}_k(\vec{A}, \vec{x}) := \text{span}\{\vec{x}, \vec{A}\vec{x}, \ldots, \vec{A}^{k-1}\vec{x}\}.
\end{equation*}
:::

When applied to a symmetric matrix $\vec{A}$ for $k$ iterations, the Lanczos method produces an orthonormal basis $\vec{Q}\in\R^{n\times k}$ for the Krylov subspace $\mathcal{K}_k(\vec{A}, \vec{x})$ and a symmetric tridiagonal matrix $\vec{T}\in\R^{k\times k}$ such that $\vec{T} = \vec{Q}^\T\vec{A}\vec{Q}$.
The output of the Lanczos method can then be used to approximate $f(\vec{A})\vec{x}$ and $\vec{x}^\T f(\vec{A})\vec{x}$.
This requires $k-1$ matrix-vector products with $\vec{A}$.

:::{prf:definition} Lanczos Method
The Lanczos approximations to $f(\vec{A})\vec{x}$ and $\vec{x}^\T f(\vec{A})\vec{x}$ are respectively given by
\begin{equation*}\begin{aligned}
\Call{Lan-FA}_k(f;\vec{A},\vec{x})&:= \|\vec{x}\|\vec{Q} f(\vec{T})\vec{e}_1 \\
\Call{Lan-QF}_k(f;\vec{A},\vec{x})&:=  \|\vec{x}\|^2\vec{e}_1^\T f(\vec{T})\vec{e}_1.
\end{aligned}\end{equation*}
:::

It is well-known that the accuracy of these approximations is related to how well $f(x)$ can be approximation by polynomials on the interval $[\lambda_n,\lambda_1]$.

:::{prf:theorem} 
:label: thm:lanczos_FA
\begin{equation*}
\begin{aligned}
\| f(\vec{A})\vec{x} - \Call{Lan-FA}_k(f;\vec{A},\vec{x}) \| &\leq 2 \|\vec{x}\| \min_{\deg(p)<k} \left(\max_{x\in[\lambda_n,\lambda_1]} | f(x) - p(x) |\right) \\
| \vec{x}^\T f(\vec{A})\vec{x} - \Call{Lan-QF}_k(f;\vec{A},\vec{x}) | &\leq 2 \|\vec{x}\|^2 \min_{\deg(p)<2k-1} \left(\max_{x\in[\lambda_n,\lambda_1]} | f(x) - p(x) |\right).
\end{aligned}
\end{equation*}
:::

In particular, note that when $f(x)$ is a low-degree polynomial, the approximations are exact.




## Stochastic Lanczos Quadrature

Stochastic Lanczos Quadrature (SLQ) is the simple combination of the [Girard--Hutchinson estimator](./girard-hutchinson.ipynb) with the Lanczos method.

````{prf:definition} Stochastic Lanczos Quadrature
The *Stochastic Lanczos Quadrature* (SLQ) estimator for the spectral sum $\tr(f(\vec{A}))$ is given by
\begin{equation*}
\Call{SLQ}_{k,m}(f;\vec{A}) := \frac{1}{m} \sum_{i=1}^{m} \Call{Lan-QF}_k(f;\vec{A},\vec{x}_i),
\end{equation*}
where $\vec{x}_i$ are independent standard Gaussian vectors.
````

A simple application of the triangle inequality gives a bound on expected squared error of the SLQ estimator.

:::{prf:theorem} 

For any $k,m\geq 1$, the SLQ estimator uses $(k-1)m$ matrix-vector products to $\vec{A}$ and satisfies
\begin{equation*}
\EE\left[ | \tr(f(\vec{A})) - \Call{SLQ}_{k,m}(f;\vec{A}) |^2 \right]
\leq \frac{4\| f(\vec{A}) \|_\F^2}{m} + 6n^2  \min_{\deg(p)<2k-1} \left( \max_{x\in[\lambda_n,\lambda_1]} | f(x) - p(x) | \right).
\end{equation*}
:::



:::{admonition} Proof
:class: dropdown 

By the triangle inequality and since $(x+y)^2\leq 2(x^2+y^2)$, we have
\begin{equation*}\begin{aligned}
\hspace{1em}&\hspace{-1em}| \tr(f(\vec{A})) - \Call{SLQ}_{k,m}(f;\vec{A}) |^2
\\&\leq 2\left| \tr(f(\vec{A})) - \frac{1}{m}\sum_{i=1}^{m} \vec{x}_i^\T f(\vec{A})\vec{x}_i \right|^2 + 2\left| \frac{1}{m}\sum_{i=1}^{m} \vec{x}_i^\T f(\vec{A})\vec{x}_i - \Call{Lan-QF}_k(f;\vec{A},\vec{x}_i) \right|^2.
\end{aligned}\end{equation*}

Note that 
\begin{equation*}
\EE\left[ \left| \tr(f(\vec{A})) - \frac{1}{m}\sum_{i=1}^{m} \vec{x}_i^\T f(\vec{A})\vec{x}_i \right|^2 \right]
= 
\VV\left[ \widehat{\tr}_m(f(\vec{A})) \right]
= \frac{2 \| f(\vec{A}) \|_\F^2}{m},
\end{equation*}
where $\widehat{\tr}_m(\cdot)$ is the [Girard--Hutchinson estimator](./girard-hutchinson.ipynb#def:girard_hutchinson_estimator).

Next, by the triangle inequality and {prf:ref}`thm:lanczos_FA`, 
\begin{equation*}\begin{aligned}
\hspace{2em}&\hspace{-2em}
\left| \frac{1}{m}\sum_{i=1}^{m} \vec{x}_i^\T f(\vec{A})\vec{x}_i - \Call{Lan-QF}_k(f;\vec{A},\vec{x}_i) \right|
\\&\leq \frac{1}{m}\sum_{i=1}^{m} \left| \vec{x}_i^\T f(\vec{A})\vec{x}_i - \Call{Lan-QF}_k(f;\vec{A},\vec{x}_i) \right|
\\&\leq \frac{1}{m} \sum_{i=1}^{m} \|\vec{x}_i\|^2 \min_{\deg(p)<2k-1} \max_{x\in[\lambda_n,\lambda_1]} | f(x) - p(x) |.
\end{aligned}\end{equation*}
Hence, 
\begin{equation*}\begin{aligned}
\hspace{2em}&\hspace{-2em}
\EE\left[ \left| \frac{1}{m}\sum_{i=1}^{m} \vec{x}_i^\T f(\vec{A})\vec{x}_i - \Call{Lan-QF}_k(f;\vec{A},\vec{x}_i) \right|^2 \right]
\\&\leq \frac{1}{m^2}\EE\left[  \left(\sum_{i=1}^{m} \|\vec{x}_i\|^2 \right)^2 \right] \min_{\deg(p)<2k-1} \max_{x\in[\lambda_n,\lambda_1]} | f(x) - p(x) |
\end{aligned}\end{equation*}
Now, note that $\sum_{i=1}^{m} \|\vec{x}_i\|^2$ is a Chi-squared random variable with $mn$ degrees of freedom. 
By looking on [Wikipedia](https://en.wikipedia.org/wiki/Chi-squared_distribution), we see that
\begin{equation*}
\EE\left[  \left(\sum_{i=1}^{m} \|\vec{x}_i\|^2 \right)^2 \right] = 2mn + (mn)^2 \leq 3(mn)^2.
\end{equation*}
Combining all of the above gives the result.

:::

Note that for "nice" functions $f(x)$, the error of the best polynomial approximation decreases exponentially with $k$ {cite:p}`trefethen_19`.



## Randomized Matrix Free Quadrature





````{prf:definition}

The *spectral density* of $\vec{A}$ is the probability measure 
```{math}
\varphi(x) := \frac{1}{n} \sum_{i=1}^{n} \delta(x - \lambda_i),
```
where $\delta(x)$ is the Dirac delta at zero$.

````

We are interested approximating the spectral density of $\vec{A}$, as well a 

Observe that 
```{math}
\tr(f(\vec{A})) = n\int_{-\infty}^{\infty} f(x) \varphi(x) \d{x},
```
Likewise,
```{math}
\Phi(\alpha) := n\int_{-\infty}^{\alpha} \varphi(x) \d{x} = \tr(f_\alpha(\vec{A})),\quad f_\alpha(x) = \begin{cases}
1 & \text{if } x \leq \alpha,\\
0 & \text{otherwise.}
\end{cases}
```
Hence approximating the spectral density and approximating spectral sums are equivalent.



