---
title: Block Krylov Iteration
description: Advanced techniques for improved accuracy and robustness in randomized low-rank approximation
keywords: [subspace iteration, block Krylov, power iteration, Krylov subspace, iterative methods, spectral gap, convergence acceleration]
numbering:
  equation:
    enumerator: 5.%s
    continue: true
  proof:theorem:
    enumerator: 5.%s
    continue: true
  proof:algorithm:
    enumerator: 5.%s
    continue: true
  proof:definition:
    enumerator: 5.%s
    continue: true
  proof:proposition:
    enumerator: 5.%s
    continue: true
---



## Motivation 

[Subspace Iteration](./subspace-iteration.ipynb) can dramatically improve on the performance of the [Randomized SVD](./randomized-svd.ipynb) when $\vec{A}$ has a heavy tail. 
The key observation is that the singular value tail of $(\vec{A}\vec{A}^\T)^q\vec{A}$ is much lighter than that of $\vec{A}$.
This is because the polynomial $x^{2q+1}$ pushes tail singular values of $\vec{A}$ to zero.

From a computational perspective, there isn't that much special about $x^{2q+1}$. In fact, we can replace this monomial with any polynomial of the same degree containing only odd degree terms, without significantly changing the cost of the algorithm.
Thus, we might consider an approximation of the form:
\begin{equation}
\widehat{\vec{A}} = \vec{Q}\vec{Q}^\T \vec{A},
\quad 
\vec{Q} = \Call{orth}(p(\vec{A}\vec{A}^\T)\vec{A}\vec{\Omega})
,\quad \deg(p) \leq q.
\end{equation}
It's conceivable that, if we choose a good polynomial, this method could be more accurate than the subspace iteration.[^cheb]
The difficulty is that we don't know how to choose a good polynomial in advance, since we don't know the singular values of $\vec{A}$.

[^cheb]: For instance, Chebyshev polynomials grow substantially faster than monomials.


## Block Krylov Iteration

Observe that 
\begin{equation}
\{ p(\vec{A}\vec{A}^\T)\vec{A}\vec{\Omega} : \deg(p) \leq q \}
= \mathcal{K}_t(\vec{A}\vec{A}^\T,\vec{A}\vec{\Omega}),
\end{equation}
where $\mathcal{K}_t(\vec{A}\vec{A}^\T,\vec{A}\vec{\Omega})$ is the block Krylov subspace
\begin{equation}
\mathcal{K}_t(\vec{A}\vec{A}^\T,\vec{A}\vec{\Omega}) := \operatorname{span}\left\{ \vec{A}\vec{\Omega}, (\vec{A}\vec{A}^\T)\vec{A}\vec{\Omega}, \ldots, (\vec{A}\vec{A}^\T)^{t-1}\vec{A}\vec{\Omega} \right\}.
\end{equation}

Computing a basis for $\mathcal{K}_t(\vec{A}\vec{A}^\T,\vec{A}\vec{\Omega})$ requires the same number of matrix-vector products with $\vec{A}$ and $\vec{A}^\T$ as subspace iteration.
This suggests the *Randomized Block Krylov Iteration* (RBKI) approximation:
\begin{equation}
\widehat{\vec{A}} = \vec{Q}\vec{Q}^\T \vec{A},
\quad 
\vec{Q} = \Call{orth}(\mathcal{K}_t(\vec{A}\vec{A}^\T,\vec{A}\vec{\Omega})).
\end{equation}
Intuitively, by projecting onto a larger space, we can get a better approxiamtion.

## Convergence Guarantees



RBKI with $\sqrt{q}$ iterations can achieve similar guarantees to subspace iteration with $q$ iterations.
This was first proved in {cite:p}`musco_musco_15`; see also {cite:p}`tropp_webber_23` for a non-asymptotic analysis.


### Block-size versus depth

{cite:p}`chen_epperly_meyer_musco_rao_25`

{cite:p}`meyer_musco_musco_24`


:::{prf:theorem} 
Let $\widehat{\vec{A}}$ be the rank-$k$ approximation to $\vec{A}$ produced by RBKI after $t$ iterations. 
Then for some
\begin{equation*}
t = \tilde{O}\left( \frac{k/b}{\sqrt{\varepsilon}}  \right).
\end{equation*}
it holds that
\begin{equation*}
\|\vec{A} - \widehat{\vec{A}}\| \leq (1+\varepsilon) \|\vec{A} - \llbracket \vec{A} \rrbracket_k\|.
\end{equation*}
::: 

Non-asymptotic bounds for RSVD, RSI, and RBKI can be found in {cite:p}`tropp_webber_23`