<h1 align = "center">Randomized Singular Value Decomposition</h1>

<h6 align = "center">Author: Xinyu Chen</h6>

In the fields of both machine learning and signal processing, matrix decomposition is a foundational tool for some critical applications like data compression, dimensionality reduction, and sparsity learning. In many cases, for purposes of approximating a data matrix by a low-rank structure, the Singular Value Decomposition (SVD) is the best choice. However, the accurate and efficient SVD of large-scale datasets is computationally challenging. To resolve the SVD in this situation, there are many methods by applying randomized linear algebra. One of the most important method for fast SVD is randomized SVD. This post will introduce the preliminary and essential idea of the randomized SVD. To help readers gain a better understanding of randomized SVD, we also provide the corresponding Python implementation in this post.

> For reproducing this notebook, please clone or download the **tensor-learning** repository ([https://github.com/xinychen/tensor-learning](https://github.com/xinychen/tensor-learning)) on your computer first.

### Power Iterations



In [33]:
import numpy as np
np.seterr(divide='ignore', invalid='ignore')

def power_iteration(mat, Phi, power_iter = 3):
    B = mat @ Phi
    for q in range(power_iter):
        B = mat @ (mat.T @ B)
    Q, _ = np.linalg.qr(B)
    return Q

In [None]:
def rsvd(mat, rank, power):
    dim1, dim2 = mat.shape
    Phi = np.random.randn(dim2, rank)
    A = mat @ Phi
    if power > 0:
        for k in range(power):
            A = mat @ (mat.T @ A)
    Q, R = np.linalg.qr(A)
    U_tilde, S, V = fast_svd(Q.T @ mat)
    return Q @ U_tilde, S, V

### Testing `fast_svd` against `numpy.linalg.svd`

In [2]:
import time

mat = np.random.rand(10000, 18000)
start = time.time()
U, S, V = fast_svd(mat)
end = time.time()
print(np.diag(S))
print(end - start)

[6708.2749416    67.51813228   67.46474809 ...    9.96231502    9.92972879
    9.86411576]
617.9077150821686


In [3]:
start = time.time()
U, S, V = np.linalg.svd(mat, full_matrices = 0)
end = time.time()
print(S)
print(end - start)

[6708.2749416    67.51813228   67.46474809 ...    9.96231502    9.92972879
    9.86411576]
736.6943187713623


### Testing `rsvd` against `numpy.linalg.svd`

In [34]:
import time

mat = np.random.rand(10000, 9000)
start = time.time()
U, S, V = fast_svd(mat)
end = time.time()
print(np.diag(S[:10]))
print(end - start)

[4743.47345882   56.30120971   56.14880174   56.13332907   56.03742922
   56.00312683   55.98918016   55.8925917    55.86384718   55.80551466]
387.3423180580139


In [43]:
import time

start = time.time()
U, S, V = rsvd(mat, 100, 2)
end = time.time()
print(np.diag(S[:10]))
print(end - start)

[4743.47345882   52.08820533   52.03989705   51.98184477   51.84658782
   51.78052861   51.68910186   51.59906955   51.51632298   51.43246594]
1.8465938568115234
