<h1 align = "center">Randomized Singular Value Decomposition</h1>

<h6 align = "center">Author: Xinyu Chen</h6>

The accurate and efficient decomposition of large data matrices is one of the cornerstones of modern computational mathematics and data science.

For reproducing this notebook, please clone or download the **tensor-learning** repository ([https://github.com/xinychen/tensor-learning](https://github.com/xinychen/tensor-learning)) on your computer first.

If we want to find the SVD of $\boldsymbol{X}\in\mathbb{R}^{m\times n}$ as
\begin{equation}
\boldsymbol{X}=\boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^{\top}
\end{equation}
where $m\ll n$, we can first compute the following SVD:
\begin{equation}
\boldsymbol{X}\boldsymbol{X}^\top=\boldsymbol{U}\tilde{\boldsymbol{\Sigma}}\tilde{\boldsymbol{V}}^{T}
\end{equation}


In [1]:
import numpy as np
np.seterr(divide='ignore', invalid='ignore')

def fast_svd(mat):
    dim1, dim2 = mat.shape
    if dim1 <= dim2:
        U, s_tilde, V_tilde = np.linalg.svd(mat @ mat.T, full_matrices = 0)
        S = np.sqrt(np.diag(s_tilde) @ V_tilde @ U)
        V = mat.T @ U @ np.linalg.inv(S)
        return U, S, V
    else:
        U0, S, V0 = fast_svd(mat.T)
        U = V0.T
        V = U0.T
        return U, S, V

def rsvd(mat, rank, q):
    dim1, dim2 = mat.shape
    if dim1 > dim2:
        Phi = np.random.randn(dim2, rank)
        A = mat @ Phi
        for k in range(q):
            A = mat @ (mat.T @ A)
        Q, R = np.linalg.qr(A)
        U_tilde, S, V = fast_svd(Q.T @ A)
        return Q @ U_tilde, S, V

In [2]:
import time

mat = np.random.rand(3, 10000000)
start = time.time()
U, S, V = fast_svd(mat)
end = time.time()
print(U)
print(np.diag(S))
print(end - start)

[[-0.57733574  0.73637114 -0.35276194]
 [-0.57749445 -0.67368348 -0.46114069]
 [-0.57722059 -0.06251493  0.8141918 ]]
[2886.34380676  913.29543057  912.88546298]
0.3979451656341553


In [3]:
start = time.time()
U, S, V = fast_svd(mat.T)
print(V.T)
print(np.diag(S))
end = time.time()
print(end - start)

[[-0.57733574  0.73637114 -0.35276194]
 [-0.57749445 -0.67368348 -0.46114069]
 [-0.57722059 -0.06251493  0.8141918 ]]
[2886.34380676  913.29543057  912.88546298]
0.4226951599121094


In [4]:
start = time.time()
U, S, V = np.linalg.svd(mat, full_matrices = 0)
end = time.time()
print(U)
print(S)
print(end - start)

[[-0.57733574  0.73637114 -0.35276194]
 [-0.57749445 -0.67368348 -0.46114069]
 [-0.57722059 -0.06251493  0.8141918 ]]
[2886.34380676  913.29543057  912.88546298]
0.9292600154876709
