---
title: Randomly Pivoted Cholesky
description: Randomized pivoting strategies and sampling approaches for Cholesky decomposition
keywords: [Cholesky decomposition, randomized pivoting, sampling-based factorization, symmetric positive definite, pivot selection, numerical stability]
numbering:
  equation:
    enumerator: 7.%s
    continue: true
  proof:theorem:
    enumerator: 7.%s
    continue: true
  proof:algorithm:
    enumerator: 7.%s
    continue: true
  proof:definition:
    enumerator: 7.%s
    continue: true
  proof:proposition:
    enumerator: 7.%s
    continue: true
---

Suppose $\vec{A}$ is positive definite and recall the Nyström approximation
```{math}
\vec{A} \langle \vec{\Omega}\rangle := \vec{A}\vec{\Omega} ( \vec{\Omega}^\T\vec{A}\vec{\Omega})^+ \vec{\Omega}^\T\vec{A}.
```
When the columns of $\vec{\Omega}$ a subset of the columns of the identity, then $\vec{A}\vec{\Omega}$ corresponds to subsampling the columns of $\vec{A}$.
In particular, let $S \subseteq \{1, \ldots, n\}$ be a tuple containing the $k$ *pivots* (columns of $\vec{A}$) we will observe; i.e. so that $\vec{\Omega} = \vec{I}[:,S]$.
Then 
```{math}
:label: def-nystrom-pivot
\vec{A} \langle \vec{\Omega}\rangle = \vec{A}[:,S] \vec{A}[S,S]^+ \vec{A}[S,:] = : \vec{A} \langle S\rangle.
```
In particular, since $\vec{A}$ is symmetric, we can compute $\vec{A} \langle \vec{\Omega}\rangle$ having only observed $\vec{A}(:,S)$, which contains just $kn$ entries of $\vec{A}$.

The key question is how best to choose the columns of $\vec{A}$. 
Ideally, we would like to choose $k$ columns so that the Nyström approximation is competitive with the best rank-$k$ approximation to $\vec{A}$. 


## Partial Cholesky factorization with pivoting


It turns out that, to compute {prf:ref}`def-nystrom-pivot` we can use a Cholesky factorization algorithm with pivoting, and stop after $k$ steps.

:::{prf:algorithm} partial Cholesky factorization with pivoting
:label: alg-partial-cholesky
**Input:** $\vec{A}\in\R^{n\times n}$, pivots $S = (s_1, \ldots, s_k)$

1. Initialize $\vec{F}_0 = []\in\R^{n\times 0}$.
1. For $i=1,\ldots, k$
    - $\tilde{\vec{g}}_i = \vec{A}[:,s_i] - \vec{F}_{i-1}\vec{F}_{i-1}[s_i,:]^\T$
    - $ \vec{g}_i = \tilde{\vec{g}}_i / \sqrt{\tilde{\vec{g}}_i[s_i]}$
    - $\vec{F}_i = [\vec{F}_{i-1} ~ \vec{g}_i  ] \in \R^{n\times i}$

**Output:** $\vec{F}_k\vec{F}_k^\T$
:::


Note that the "textbook" Cholesky factorization algorithms maintain $\vec{A} - \vec{F}_i \vec{F}_i^\T$ directly.
On the other hand, anticipating that we will terminate for some $k<n$,  {prf:ref}`alg-partial-cholesky` only computes the necessary parts of $\vec{A} - \vec{F}_i\vec{F}_i^\T$ as they are needed.


:::{prf:theorem}
The output $\vec{F}_k\vec{F}_k^\T$ of {prf:ref}`alg-partial-cholesky` with pivots $S$ is equal to $\vec{A} \langle S\rangle$.
:::


:::{prf:proof}
:class: dropdown
:enumerated: false

See Ethan's [blog](./https://www.ethanepperly.com/index.php/2022/10/24/nystrom-cholesky-and-schur/).
:::



## Adaptive pivoting


Nothing in {prf:ref}`alg-partial-cholesky` requires that the pivot $s_i$ be chosen prior to step $i$!
In particular, we can choose the $i$-th pivot adaptively, based on the approximation $\vec{A}\langle S_{i-1}\rangle = \vec{F}_{i-1}\vec{F}_{i-1}^\T$, where $S_i := (s_1, \ldots, s_{i-1})$.
While computing the error $\vec{A} - \vec{A}\langle S_{i-1}\rangle $ would let us try to find the column that would reduce the error the most, we want to avoid looking at all of the entires of $\vec{A}$.


Amazingly, we can find good pivots without observing all of $\vec{A}$.
Towards this end, note that:
1. The error $\vec{A} - \vec{A}\langle \vec{\Omega} \rangle$ of a Nyström approximation is positive definite.
1. For any positive definite $\vec{E}$, $\|\vec{E}\|\leq \|\vec{E}\|_\F \leq \tr(\vec{E})$.

By computing the $n$ diagonal entries of $\vec{A}$, we can keep track of $\operatorname{diag}(\vec{A} - \vec{A}\langle S_{i-1}\rangle )$ and use this to choose the pivot. 
One approach is to greedily choose the pivot as the largest entries of $\operatorname{diag}(\vec{A} - \vec{A}\langle S_{i-1}\rangle )$.
However, this approach is has the tendency to focus on outlier entires. 
Instead, we can sample proprtional to the values $\operatorname{diag}(\vec{A} - \vec{A}\langle S_{i-1}\rangle )$.
This results in the *Randomly Pivoted Cholesky* algorithm introduced in {cite:p}`chen_epperly_tropp_webber_25`


:::{prf:algorithm} Randomly pivoted Cholesky
:label: alg-rp-cholesky
**Input:** $\vec{A}\in\R^{n\times n}$

1. Initialize $\vec{F}_0 = []\in\R^{n\times 0}$, $\vec{d}_0 = \operatorname{diag}(\vec{A})$
1. For $i=1,\ldots, k$
    - Sample $s_i$ so that $\PP[s_i = j] \propto \vec{d}_{i-1}[j]$
    - $\tilde{\vec{g}}_i = \vec{A}[:,s_i] - \vec{F}_{i-1}\vec{F}_{i-1}[s_i,:]^\T$
    - $ \vec{g}_i = \tilde{\vec{g}}_i / \sqrt{\tilde{\vec{g}}_i[s_i]}$
    - $\vec{F}_i = [\vec{F}_{i-1} ~ \vec{g}_i  ] \in \R^{n\times i}$
    - $\vec{d}_i = \vec{d}_{i-1} - \operatorname{diag}(\vec{g}_i\vec{g}_{i}^\T)$

**Output:** $\vec{F}_k\vec{F}_k^\T$
:::