# Sparse Coding

## Orthonormal Basis (ONB)
Pros: 
* Fast to compute inverse
* Energy preservation: $\|U^T x\|^2 = \|x\|^2$

Given $x$ and orthonormal matrix $U$ compute $$z= U^T x$$

Approximate $x$ as $\hat{x} = U \hat{z}$, where
* $U U^T = U^T U = I$
* $\hat{z}_i = z_i$ if $|z_i$ > \epsilon$ else 0
    * (Thresholding)
    
Reconstruction error: $\|x - \hat{x}\|^2 = \sum_{d \notin \sigma} \langle x, u_d \rangle ^2$

Bases:
* Fourier basis: Optimal periodic functions, global approximation
    * Ex stripes/checker patterns in images (high pass filter)
* Wavelet: Optimal for local approximations, for non-periodic functions
* PCA basis: Optimal given $\Sigma$


![](bad-localization.png)

* Here a fourier basis is a horrible idea

## Haar Wavelets
Used to form __orthonormal basis__

Wavelets:
* Scaling: $\phi(x) = [1,1,1,1]$
* Mother: $W(x) = [1,1,-1,-1]$
* Dilated: $W(2x) = [1,-1,0,0]$
* Translated: $W(2x-1) = [0,0,1,-1]$

![](wavelets.png)

### Overcomplete Basis
$U \in R ^{D\times L}$ for L number of atoms, $D = dim(data)$.
Overcomplete means $L$ > $D$, i.e. more atoms than dimensions in data.

Decoding: Linear program
* Solve $z^* \in argmin_z \|z\|_0$ 
* s.t. $x = Uz$
* NP hard, instead approxmiate with 1-norm (which is convex)
* Can also approximate with Matching Pursuit

### Matching Pursuit
Approximation $z$ of signal $x$ onto basis $U$ using $K$ entries.

Objective: $$z^* \in argmin_z \|x - Uz\|_2$$
such that $$\|z\|_0 \leq K$$

Algorithm
* Init $z \leftarrow 0$
* While $\|z\|_0 < K$ do
    * Select atom $u_i$ with smallest angle $i^* = argmax_i | \langle u_i, x \rangle |$
    * Update approx: $z_{i^*} \leftarrow z_{i^*} + \langle u_{i^*}, x \rangle u_{i^*}$
    * Update residual: $x \leftarrow x + \langle u_{i^*}, x \rangle u_{i^*}$

### Coherence
Linear independence measure for dicts $$m(U) = max_{i,j} | u_i ^T u_j|$$

Let $B$ be an ONB.
* It holds that $m(B) = 0$.
* If atom $u$ is added to $B$ then $m([B,u]) > \frac{1}{\sqrt{D}}$

### Exact Recovery
This happens when $K > \frac{1}{2} (1 + \frac{1}{m(U)})$

### Compressive Sensing
Predicting is compressing. 

Compressing the data while gathering it saves space.

Let $x \in R^D$ be $K$-sparse in ONB $U$.

Let $y \in R^M$ with $y_i = \langle w_i, x \rangle$
* There exists $M$ linear combinations of the signal
* $y = Wx = WUz = \Theta z$ where $\Theta \in R^{M \times D}$

Reconstruct $x$ from $y$ by finding 
* $z^* \in argmin_z \|z\|_0$
* s.t. $y = \Theta z$
* Either with MP or by relaxing to 1-norm

Given $z$, $x$ can now be reconstructed as $\tilde{x} = U z$