---
title: Sparse Sketching Matrices
description: Introduction to sparse sketching matrices including CountSketch and SparseStack methods for efficient subspace embeddings with reduced computational cost.
keywords: [sparse sketching, CountSketch, SparseStack, subspace embedding, randomized linear algebra, sparse matrices, embedding dimension, Rademacher distribution]
numbering:
  equation:
    enumerator: 2.%s
    continue: true
  proof:theorem:
    enumerator: 2.%s
    continue: true
  proof:algorithm:
    enumerator: 2.%s
    continue: true
  proof:definition:
    enumerator: 2.%s
    continue: true
  proof:proposition:
    enumerator: 2.%s
    continue: true
---

Sparse sketching matrices aim to reduce the generate and apply times by making the sketching matrix sparse.


## CountSketch

The classical example of a sparse sketching matrix is the CountSketch matrix {cite:p}`clarkson_woodruff_13`.

:::{prf:definition}
We say a matrix $\vec{S}\in\R^{k\times n}$ is a *CountSketch matrix* if it has the distribution
\begin{equation*}
\vec{S} = \begin{bmatrix}
| & | && |\\
\rho_1\vec{e}_{s_1} & \rho_2\vec{e}_{s_2} & \cdots & \rho_n\vec{e}_{s_n} \\
| & | && |
\end{bmatrix}
,\qquad
\begin{aligned}
&\rho_i\sim \Call{Rademacher}\text{ iid}\\
&s_i \sim \Call{Unif}(\{1,\ldots,k\})\text{ iid}.
\end{aligned}
\end{equation*}
:::

A CountSketch matrix can be applied to $\vec{A}$ in $O(\nnz(\vec{A}))$ operations.
However, it does not provide a good subspace embedding.

:::{prf:theorem}

Fix any subspace $V\subset\R^n$ of dimension $d$.
A CountSketch matrix $\vec{S}$ is a subspace embedding for $V$ with distortion $\varepsilon$ with constant probability for some
\begin{equation*}
k = O\left( \frac{d^2}{\varepsilon^2} \right).
\end{equation*}
:::


## SparseStack Sketch

The poor scaling of embedding dimension in CountSketch can be remedied by increasing the per-column sparsity.
There are a number of ways to do this, but one of the most promising is to simply stack a bunch of independent CountSketch matrices on top of one another.


:::{prf:definition}
:label: def:sparse-stack-sketch
We say a matrix $\vec{S}\in\R^{k\times n}$ is a *SparseStack matrix** if it has the distribution
\begin{equation*}
\vec{S} = \begin{bmatrix}
\rho_{1,1}\vec{e}_{s_{1,1}} & \rho_{1,2}\vec{e}_{s_{1,2}} & \cdots & \rho_{1,n}\vec{e}_{s_{1,n}} \\
\rho_{2,1}\vec{e}_{s_{2,1}} & \rho_{2,2}\vec{e}_{s_{2,2}} & \cdots & \rho_{2,n}\vec{e}_{s_{2,n}} \\
\vdots & \vdots & \ddots & \vdots \\
\rho_{\zeta,1}\vec{e}_{s_{\zeta,1}} & \rho_{\zeta,2}\vec{e}_{s_{\zeta,2}} & \cdots & \rho_{\zeta,n}\vec{e}_{s_{\zeta,n}}
\end{bmatrix}
,\qquad
\begin{aligned}
&\rho_{i,j}\sim \Call{Rademacher}\text{ iid}\\
&s_{i,j} \sim \Call{Unif}(\{1,\ldots,k/\zeta\})\text{ iid}.
\end{aligned}
\end{equation*}
:::

This is equivalent to generating $\zeta$ CountSketch matrices of hight $k/\zeta$, and stacking them on top of one another.
A SparseStack matrix can be applied to $\vec{A}$ in $O(\zeta\nnz(\vec{A}))$ operations.

SparseStack matrices are known to give a subspace embedding with the optimal embedding dimension, even when $\zeta$ is extremely small.
The following is due to {cite:p}`chenakkod_derezinski_dong_25`.

:::{prf:theorem}

Fix any subspace $V\subset\R^n$ of dimension $d$.
A SparseStack matrix $\vec{S}$ is a subspace embedding for $V$ with distortion $\varepsilon \geq d^{-O(1)}$ with constant probability for some
\begin{equation*}
k = \tilde{O}\left(\frac{d}{\varepsilon^2}\right)
,\quad
\zeta = \tilde{O}\left(\frac{\log(d)}{\varepsilon}\right).
\end{equation*}
In this theorem statement, $\tilde{O}(\cdot)$ hides sub-polylogarithmic factors.[^otilde]
:::

[^otilde]: That is, $k = O(d \log(d)^{o(1)})$ and $\zeta = O(\log(d)^{1+o(1)})$.


### Implementation

It's relatively easy to efficiently generate SparseStack matrices in a high-level langauge like Python.


In [None]:
def sparse_stack_sketch(n,k,zeta,rng):

    k_rem = k%zeta
    k_loc = k//zeta

    C = np.zeros((n,zeta),dtype=int)
    C[:,:k_rem] = np.random.randint(0,k_loc+1,size=(n,k_rem))
    C[:,k_rem:] = np.random.randint(0,k_loc,size=(n,zeta-k_rem))
    offsets = np.cumsum([0]+[k_loc+1]*k_rem + [k_loc]* (zeta-k_rem-1))
    C += offsets

    indices = C.flatten()
    values = np.sqrt(1/zeta)*(2*np.random.randint(2,size=n*zeta)-1)
    indptr = np.arange(0,n+1)*zeta
    S = sp.sparse.csc_matrix ((values,indices,indptr),shape=(k,n))

    return S


### Comparison with sparse sign sketches

A perhaps more common sparse sketching distribution, typically called *SparseSign* matrix, does not separate the rows of the sketch into blocks.
Instead, each column has exactly $\zeta$ random signs in uniformly random positions.
We make the following observations:
- The current best theoretical bounds for the embedding dimension and sparsity of SparseStack sketches is better than the corresponding bounds for SparseSign sketches.
- It is somewhat more tedious SparseSign sketches than to generate than SparseStack sketches, but it can be done efficiently with a bit of care {cite:p}`chen_niroula_ray_subrahmanya_pistoia_kumar_25`.
- The embedding dimension of  SparseStack matrices can easily be adjusted by stacking on more CountSketch matrices. 
This may be useful in practice, where an algorithm might need to adjust the embedding dimension on the fly.


Thus, we recommend SparseStack sketches as the default sparse sketch.

