## Low-Tubal-Rank Smoothing Tensor Completion Imputer (LSTC-Tubal)

This notebook shows how to implement a LSTC-Tubal imputer on some real-world large-scale data sets. To overcome the problem of missing values within multivariate time series data, this method takes into account both low-rank structure and time series regression. Meanwhile, to make the model scalable, we also integrate linear transform into the LATC model. For an in-depth discussion of LATC-Tubal-imputer, please see [1].

<div class="alert alert-block alert-info">
<font color="black">
<b>[1]</b> Xinyu Chen, Yixian Chen, Lijun Sun (2020). <b>Scalable low-rank tensor learning for spatiotemporal traffic data imputation</b>. arXiv: 2008.03194. <a href="https://arxiv.org/abs/2008.03194" title="PDF"><b>[PDF]</b></a> <a href="https://doi.org/10.5281/zenodo.3939792" title="data"><b>[data]</b></a> 
</font>
</div>


### Define LATC-imputer kernel

We start by introducing some necessary functions that relies on `Numpy`.

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>ten2mat</code>:</b> <font color="black">Unfold tensor as matrix by specifying mode.</font></li>
<li><b><code>mat2ten</code>:</b> <font color="black">Fold matrix as tensor by specifying dimension (i.e, tensor size) and mode.</font></li>
<li><b><code>svt</code>:</b> <font color="black">Implement the process of Singular Value Thresholding (SVT).</font></li>
</ul>
</div>

In [1]:
import numpy as np

def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')

def mat2ten(mat, dim, mode):
    index = list()
    index.append(mode)
    for i in range(dim.shape[0]):
        if i != mode:
            index.append(i)
    return np.moveaxis(np.reshape(mat, list(dim[index]), order = 'F'), 0, mode)

In [2]:
def unitary_transform(tensor, Phi):
    return np.einsum('kt, ijk -> ijt', Phi, tensor)

def inv_unitary_transform(tensor, Phi):
    return np.einsum('kt, ijt -> ijk', Phi, tensor)

In [3]:
def tsvt_unitary(tensor, Phi, tau):
    dim = tensor.shape
    X = np.zeros(dim)
    tensor = unitary_transform(tensor, Phi)
    for t in range(dim[2]):
        u, s, v = np.linalg.svd(tensor[:, :, t], full_matrices = False)
        r = len(np.where(s > tau)[0])
        if r >= 1:
            s = s[: r]
            s[: r] = s[: r] - tau
            X[:, :, t] = u[:, : r] @ np.diag(s) @ v[: r, :]
    return inv_unitary_transform(X, Phi)

from scipy.fftpack import dctn, idctn

def tsvt_dct(tensor, tau):
    dim = tensor.shape
    X = np.zeros(dim)
    tensor = dctn(tensor, axes = (2,), norm = 'ortho')
    for t in range(dim[2]):
        u, s, v = np.linalg.svd(tensor[:, :, t], full_matrices = False)
        r = len(np.where(s > tau)[0])
        if r >= 1:
            s = s[: r]
            s[: r] = s[: r] - tau
            X[:, :, t] = u[:, : r] @ np.diag(s) @ v[: r, :]
    return idctn(X, axes = (2,), norm = 'ortho')

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>compute_mape</code>:</b> <font color="black">Compute the value of Mean Absolute Percentage Error (MAPE).</font></li>
<li><b><code>compute_rmse</code>:</b> <font color="black">Compute the value of Root Mean Square Error (RMSE).</font></li>
</ul>
</div>

> Note that $$\mathrm{MAPE}=\frac{1}{n} \sum_{i=1}^{n} \frac{\left|y_{i}-\hat{y}_{i}\right|}{y_{i}} \times 100, \quad\mathrm{RMSE}=\sqrt{\frac{1}{n} \sum_{i=1}^{n}\left(y_{i}-\hat{y}_{i}\right)^{2}},$$ where $n$ is the total number of estimated values, and $y_i$ and $\hat{y}_i$ are the actual value and its estimation, respectively.

In [4]:
def compute_mape(var, var_hat):
    return np.sum(np.abs(var - var_hat) / var) / var.shape[0]

def compute_rmse(var, var_hat):
    return  np.sqrt(np.sum((var - var_hat) ** 2) / var.shape[0])

The main idea behind LATC-imputer is to approximate partially observed data with both low-rank structure and time series dynamics. The following `imputer` kernel includes some necessary inputs:

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>dense_tensor</code>:</b> <font color="black">This is an input which has the ground truth for validation. If this input is not available, you could use <code>dense_tensor = sparse_tensor.copy()</code> instead.</font></li>
<li><b><code>sparse_tensor</code>:</b> <font color="black">This is a partially observed tensor which has many missing entries.</font></li>
<li><b><code>time_lags</code>:</b> <font color="black">Time lags, e.g., <code>time_lags = np.array([1, 2, 3])</code>. </font></li>
<li><b><code>alpha</code>:</b> <font color="black">Weights for tensors' nuclear norm, e.g., <code>alpha = np.ones(3) / 3</code>. </font></li>
<li><b><code>rho</code>:</b> <font color="black">Learning rate for ADMM, e.g., <code>rho = 0.0005</code>. </font></li>
<li><b><code>lambda0</code>:</b> <font color="black">Weight for time series regressor, e.g., <code>lambda0 = 5 * rho</code>. If <code>lambda0 = 0</code>, then this imputer is actually a standard low-rank tensor completion (i.e., High-accuracy Low-Rank Tensor Completion, or HaLRTC).</font></li>
<li><b><code>epsilon</code>:</b> <font color="black">Stop criteria, e.g., <code>epsilon = 0.001</code>. </font></li>
<li><b><code>maxiter</code>:</b> <font color="black">Maximum iteration to stop algorithm, e.g., <code>maxiter = 50</code>. </font></li>
</ul>
</div>


In [5]:
def imputer(dense_tensor, sparse_tensor, rho0, lambda0, epsilon, maxiter, 
            sparse_Psi = True, transform = "unitary"):
    """Low-Tubal-Rank Smoothing Tensor Completion, LSTC-Tubal-imputer."""
    
    dim = np.array(sparse_tensor.shape)
    dt = np.int(np.prod(dim) / dim[0])
    sparse_mat = ten2mat(sparse_tensor, 0)
    pos_missing = np.where(sparse_mat == 0)
    pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
    var = dense_tensor[pos_test]
    
    T = np.zeros(dim)                         # \boldsymbol{\mathcal{T}}
    Z = sparse_mat.copy()                     # \boldsymbol{Z}
    Z[pos_missing] = np.mean(sparse_mat[sparse_mat != 0])
    it = 0
    last_mat = sparse_mat.copy()
    snorm = np.linalg.norm(sparse_mat, 'fro')
    del dense_tensor, sparse_tensor, sparse_mat
    rho = rho0
    Phis = []
    if transform == "unitary":
        temp1 = ten2mat(mat2ten(Z, dim, 0), 2)
        _, Phi = np.linalg.eig(temp1 @ temp1.T)
        Phis.append(Phi)
        del temp1
    if lambda0 > 0:
        if sparse_Psi == True:
            from scipy import sparse
            from scipy.sparse.linalg import inv as inv
            Psi1 = sparse.coo_matrix((np.ones(dt - 1), (np.arange(0, dt - 1), np.arange(0, dt - 1))), 
                                     shape = (dt - 1, dt)).tocsr()
            Psi2 = sparse.coo_matrix((np.ones(dt - 1), (np.arange(0, dt - 1), np.arange(0, dt - 1) + 1)), 
                                     shape = (dt - 1, dt)).tocsr()
            temp0 = Psi2 - Psi1
            temp0 = temp0.T @ temp0
            Imat = sparse.coo_matrix((np.ones(dt), (np.arange(0, dt), np.arange(0, dt))), shape = (dt, dt)).tocsr()
            const = rho * inv(temp0 + rho * Imat / lambda0).todense() / lambda0
        elif sparse_Psi == False:
            Psi1 = np.append(np.eye(dt - 1), np.zeros((dt - 1, 1)), axis = 1)
            Psi2 = np.append(np.zeros((dt - 1, 1)), np.eye(dt - 1), axis = 1)
            temp0 = Psi2 - Psi1
            temp0 = temp0.T @ temp0
            const = rho * np.linalg.inv(temp0 + rho * np.eye(dt) / lambda0) / lambda0
        del Psi1, Psi2, temp0
    while True:
        rho = min(rho * 1.05, 1e5)
        if transform == "unitary":
            X = tsvt_unitary(mat2ten(Z, dim, 0) - T / rho, Phi, 1 / rho)
        elif transform == "dct":
            X = tsvt_dct(mat2ten(Z, dim, 0) - T / rho, 1 / rho)
        mat_hat = ten2mat(X, 0)
        temp = ten2mat(X + T / rho, 0)
        if lambda0 > 0:
            Z[pos_missing] = (temp @ const)[pos_missing]
        elif lambda0 == 0:
            Z[pos_missing] = temp[pos_missing]
        T = T + rho * (X - mat2ten(Z, dim, 0))
        tol = np.linalg.norm((mat_hat - last_mat), 'fro') / snorm
        last_mat = mat_hat.copy()
        it += 1
        if it % 10 == 0:
            if transform == "unitary":
                temp1 = ten2mat(mat2ten(Z, dim, 0) - T / rho, 2)
                _, Phi = np.linalg.eig(temp1 @ temp1.T)
                Phis.append(Phi)
                del temp1
        if it % 5 == 0:
            print('Iter: {}'.format(it))
            print('Tolerance: {:.6}'.format(tol))
            print('MAPE: {:.6}'.format(compute_mape(var, X[pos_test])))
            print('RMSE: {:.6}'.format(compute_rmse(var, X[pos_test])))
            print()
        if (tol < epsilon) or (it >= maxiter):
            break

    print('Total iteration: {}'.format(it))
    print('Tolerance: {:.6}'.format(tol))
    print('Imputation MAPE: {:.6}'.format(compute_mape(var, X[pos_test])))
    print('Imputation RMSE: {:.6}'.format(compute_rmse(var, X[pos_test])))
    print()
    
    return X, Phis

If you want to set parameters reasonably, please use this cross validation on your data set.

### Guangzhou urban traffic speed data set

In [6]:
import numpy as np
import time
import scipy.io

for r in [0.3, 0.7, 0.9]:
    print('Missing rate = {}'.format(r))
    missing_rate = r

    ## Random Missing (RM)
    dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
    dim1, dim2, dim3 = dense_tensor.shape
    np.random.seed(1000)
    sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)

    rho = 2e-3
    epsilon = 1e-4
    maxiter = 100

    ## Test LSTC-Tubal model
    c = 0.5
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()

Missing rate = 0.3
- coefficient = 0.5
- lambda = 0.001

Iter: 5
Tolerance: 0.0339055
MAPE: 0.0816114
RMSE: 3.55365

Iter: 10
Tolerance: 0.015534
MAPE: 0.0869997
RMSE: 3.68022

Iter: 15
Tolerance: 0.0101271
MAPE: 0.0914731
RMSE: 3.83991

Iter: 20
Tolerance: 0.00406181
MAPE: 0.0890576
RMSE: 3.73821

Iter: 25
Tolerance: 0.00694859
MAPE: 0.0879192
RMSE: 3.69289

Iter: 30
Tolerance: 0.00233711
MAPE: 0.0842906
RMSE: 3.54463

Iter: 35
Tolerance: 0.00505137
MAPE: 0.0820066
RMSE: 3.44847

Iter: 40
Tolerance: 0.00152744
MAPE: 0.0785244
RMSE: 3.31015

Iter: 45
Tolerance: 0.00345151
MAPE: 0.0760273
RMSE: 3.2118

Iter: 50
Tolerance: 0.00101696
MAPE: 0.0729792
RMSE: 3.09422

Iter: 55
Tolerance: 0.00203064
MAPE: 0.070447
RMSE: 2.99814

Iter: 60
Tolerance: 0.000719256
MAPE: 0.0679943
RMSE: 2.90794

Iter: 65
Tolerance: 0.00105983
MAPE: 0.0659066
RMSE: 2.83334

Iter: 70
Tolerance: 0.000559079
MAPE: 0.0641273
RMSE: 2.77186

Iter: 75
Tolerance: 0.000625885
MAPE: 0.0626239
RMSE: 2.72214

Iter: 80
Toleranc

In [7]:
import numpy as np
import time
import scipy.io

for r in [0.3, 0.7]:
    print('Missing rate = {}'.format(r))
    missing_rate = r

    ## Non-random Missing (NM)
    dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
    dim1, dim2, dim3 = dense_tensor.shape
    np.random.seed(1000)
    sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim3) + 0.5 - missing_rate)[:, None, :]

    rho = 2e-3
    epsilon = 1e-4
    maxiter = 100

    ## Test LSTC-Tubal model
    c = 0.5
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()

Missing rate = 0.3
- coefficient = 0.5
- lambda = 0.001

Iter: 5
Tolerance: 0.0349911
MAPE: 0.106468
RMSE: 4.40039

Iter: 10
Tolerance: 0.014471
MAPE: 0.110656
RMSE: 4.49791

Iter: 15
Tolerance: 0.0102471
MAPE: 0.10936
RMSE: 4.48112

Iter: 20
Tolerance: 0.00433245
MAPE: 0.108289
RMSE: 4.43659

Iter: 25
Tolerance: 0.00652971
MAPE: 0.108549
RMSE: 4.44286

Iter: 30
Tolerance: 0.00223584
MAPE: 0.107327
RMSE: 4.39491

Iter: 35
Tolerance: 0.00476492
MAPE: 0.107262
RMSE: 4.38347

Iter: 40
Tolerance: 0.00142669
MAPE: 0.106407
RMSE: 4.35337

Iter: 45
Tolerance: 0.00329114
MAPE: 0.105918
RMSE: 4.32984

Iter: 50
Tolerance: 0.000854379
MAPE: 0.105647
RMSE: 4.31931

Iter: 55
Tolerance: 0.00215351
MAPE: 0.105163
RMSE: 4.29847

Iter: 60
Tolerance: 0.000570412
MAPE: 0.105272
RMSE: 4.30039

Iter: 65
Tolerance: 0.00132423
MAPE: 0.105052
RMSE: 4.28845

Iter: 70
Tolerance: 0.000451006
MAPE: 0.105394
RMSE: 4.29611

Iter: 75
Tolerance: 0.000785851
MAPE: 0.105472
RMSE: 4.29499

Iter: 80
Tolerance: 0.00040894

In [8]:
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Block-out Missing (BM)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape

dim_time = dim2 * dim3
block_window = 6
vec = np.random.rand(int(dim_time / block_window))
temp = np.array([vec] * block_window)
vec = temp.reshape([dim2 * dim3], order = 'F')

sparse_tensor = mat2ten(ten2mat(dense_tensor, 0) * np.round(vec + 0.5 - missing_rate)[None, :], np.array([dim1, dim2, dim3]), 0)

rho = 2e-3
epsilon = 1e-4
maxiter = 100

## Test LSTC-Tubal model
c = 0.5
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

- coefficient = 0.5
- lambda = 0.001

Iter: 5
Tolerance: 0.03461
MAPE: 0.112687
RMSE: 4.60526

Iter: 10
Tolerance: 0.0149629
MAPE: 0.11541
RMSE: 4.6532

Iter: 15
Tolerance: 0.0104335
MAPE: 0.108051
RMSE: 4.46065

Iter: 20
Tolerance: 0.00434682
MAPE: 0.10619
RMSE: 4.37842

Iter: 25
Tolerance: 0.00663737
MAPE: 0.105334
RMSE: 4.33563

Iter: 30
Tolerance: 0.00225535
MAPE: 0.103183
RMSE: 4.25256

Iter: 35
Tolerance: 0.00491697
MAPE: 0.102221
RMSE: 4.21268

Iter: 40
Tolerance: 0.00144698
MAPE: 0.100398
RMSE: 4.14394

Iter: 45
Tolerance: 0.00352217
MAPE: 0.098765
RMSE: 4.07512

Iter: 50
Tolerance: 0.000924037
MAPE: 0.0973901
RMSE: 4.02533

Iter: 55
Tolerance: 0.00235544
MAPE: 0.0954983
RMSE: 3.95077

Iter: 60
Tolerance: 0.000673535
MAPE: 0.0944666
RMSE: 3.91553

Iter: 65
Tolerance: 0.00156583
MAPE: 0.0927913
RMSE: 3.85488

Iter: 70
Tolerance: 0.000585279
MAPE: 0.0920092
RMSE: 3.82931

Iter: 75
Tolerance: 0.00106065
MAPE: 0.0906916
RMSE: 3.7862

Iter: 80
Tolerance: 0.000547515
MAPE: 0.0900578


### Hangzhou metro passenger flow data set

In [9]:
import numpy as np
import time
import scipy.io

for r in [0.3, 0.7, 0.9]:
    print('Missing rate = {}'.format(r))
    missing_rate = r

    ## Random Missing (RM)
    dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
    dim1, dim2, dim3 = dense_tensor.shape
    np.random.seed(1000)
    sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)

    rho = 5e-5
    epsilon = 1e-4
    maxiter = 100

    ## Test LSTC-Tubal model
    c = 0.5
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()

Missing rate = 0.3
- coefficient = 0.5
- lambda = 2.5e-05

Iter: 5
Tolerance: 0.0841764
MAPE: 0.335029
RMSE: 43.9859

Iter: 10
Tolerance: 0.0299786
MAPE: 0.246127
RMSE: 36.9057

Iter: 15
Tolerance: 0.030533
MAPE: 0.229951
RMSE: 36.143

Iter: 20
Tolerance: 0.0162763
MAPE: 0.21991
RMSE: 34.2334

Iter: 25
Tolerance: 0.0348231
MAPE: 0.244862
RMSE: 40.2828

Iter: 30
Tolerance: 0.0183668
MAPE: 0.22858
RMSE: 36.3817

Iter: 35
Tolerance: 0.040103
MAPE: 0.268237
RMSE: 38.8651

Iter: 40
Tolerance: 0.0133048
MAPE: 0.253102
RMSE: 35.5673

Iter: 45
Tolerance: 0.0269873
MAPE: 0.269733
RMSE: 36.8757

Iter: 50
Tolerance: 0.00820885
MAPE: 0.257953
RMSE: 34.1665

Iter: 55
Tolerance: 0.0170719
MAPE: 0.263062
RMSE: 32.5649

Iter: 60
Tolerance: 0.00517466
MAPE: 0.253162
RMSE: 31.2686

Iter: 65
Tolerance: 0.0107674
MAPE: 0.25465
RMSE: 30.8183

Iter: 70
Tolerance: 0.00352722
MAPE: 0.249927
RMSE: 30.1643

Iter: 75
Tolerance: 0.00697863
MAPE: 0.248177
RMSE: 29.6807

Iter: 80
Tolerance: 0.0023473
MAPE: 0.244852

In [10]:
import numpy as np
import time
import scipy.io

for r in [0.3, 0.7]:
    print('Missing rate = {}'.format(r))
    missing_rate = r

    ## Non-random Missing (NM)
    dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
    dim1, dim2, dim3 = dense_tensor.shape
    np.random.seed(1000)
    sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim3) + 0.5 - missing_rate)[:, None, :]

    rho = 5e-5
    epsilon = 1e-4
    maxiter = 100

    ## Test LSTC-Tubal model
    c = 0.5
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()

Missing rate = 0.3
- coefficient = 0.5
- lambda = 2.5e-05

Iter: 5
Tolerance: 0.112376
MAPE: 0.345695
RMSE: 87.8088

Iter: 10
Tolerance: 0.034613
MAPE: 0.266214
RMSE: 98.0403

Iter: 15
Tolerance: 0.0406396
MAPE: 0.242923
RMSE: 71.7122

Iter: 20
Tolerance: 0.0213477
MAPE: 0.231426
RMSE: 74.6764

Iter: 25
Tolerance: 0.0546175
MAPE: 0.272454
RMSE: 75.4825

Iter: 30
Tolerance: 0.0184937
MAPE: 0.252607
RMSE: 74.0558

Iter: 35
Tolerance: 0.0406889
MAPE: 0.276211
RMSE: 73.001

Iter: 40
Tolerance: 0.0138578
MAPE: 0.266165
RMSE: 73.0144

Iter: 45
Tolerance: 0.0250296
MAPE: 0.263535
RMSE: 70.5482

Iter: 50
Tolerance: 0.00811676
MAPE: 0.252991
RMSE: 70.2787

Iter: 55
Tolerance: 0.0165441
MAPE: 0.254578
RMSE: 68.4803

Iter: 60
Tolerance: 0.00526396
MAPE: 0.251133
RMSE: 68.208

Iter: 65
Tolerance: 0.0106659
MAPE: 0.252077
RMSE: 67.0521

Iter: 70
Tolerance: 0.0033541
MAPE: 0.25059
RMSE: 66.8197

Iter: 75
Tolerance: 0.00758279
MAPE: 0.253047
RMSE: 66.9

Iter: 80
Tolerance: 0.00244803
MAPE: 0.255914
R

In [11]:
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Block-out Missing (BM)
dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape

dim_time = dim2 * dim3
block_window = 6
vec = np.random.rand(int(dim_time / block_window))
temp = np.array([vec] * block_window)
vec = temp.reshape([dim2 * dim3], order = 'F')

sparse_tensor = mat2ten(ten2mat(dense_tensor, 0) * np.round(vec + 0.5 - missing_rate)[None, :], np.array([dim1, dim2, dim3]), 0)

rho = 5e-5
epsilon = 1e-4
maxiter = 100

## Test LSTC-Tubal model
c = 0.5
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

- coefficient = 0.5
- lambda = 2.5e-05

Iter: 5
Tolerance: 0.0963544
MAPE: 0.417931
RMSE: 58.4411

Iter: 10
Tolerance: 0.0468751
MAPE: 0.309683
RMSE: 64.0052

Iter: 15
Tolerance: 0.042451
MAPE: 0.293201
RMSE: 48.3605

Iter: 20
Tolerance: 0.0199752
MAPE: 0.28997
RMSE: 49.1178

Iter: 25
Tolerance: 0.0500219
MAPE: 0.306351
RMSE: 54.4646

Iter: 30
Tolerance: 0.0177108
MAPE: 0.296414
RMSE: 52.7315

Iter: 35
Tolerance: 0.0344754
MAPE: 0.308968
RMSE: 44.9934

Iter: 40
Tolerance: 0.0127351
MAPE: 0.307222
RMSE: 45.1822

Iter: 45
Tolerance: 0.0211597
MAPE: 0.287813
RMSE: 42.3138

Iter: 50
Tolerance: 0.00782266
MAPE: 0.287281
RMSE: 41.8265

Iter: 55
Tolerance: 0.0147149
MAPE: 0.292228
RMSE: 37.94

Iter: 60
Tolerance: 0.00493089
MAPE: 0.292814
RMSE: 37.769

Iter: 65
Tolerance: 0.00948246
MAPE: 0.281092
RMSE: 36.4727

Iter: 70
Tolerance: 0.00327784
MAPE: 0.284402
RMSE: 36.2675

Iter: 75
Tolerance: 0.00685624
MAPE: 0.28982
RMSE: 35.7604

Iter: 80
Tolerance: 0.002348
MAPE: 0.294248
RMSE: 35.7701

Ite

### Seattle freeway traffic speed data set

In [12]:
import numpy as np
import pandas as pd
import time
import scipy.io

for r in [0.3, 0.7, 0.9]:
    print('Missing rate = {}'.format(r))
    missing_rate = r

    ## Random missing (RM)
    dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0).values
    dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288]).transpose(0, 2, 1)
    dim1, dim2, dim3 = dense_tensor.shape
    np.random.seed(1000)
    sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)

    rho = 2e-3
    epsilon = 1e-4
    maxiter = 100

    ## Test LSTC-Tubal model
    c = 0.5
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()

Missing rate = 0.3
- coefficient = 0.5
- lambda = 0.001

Iter: 5
Tolerance: 0.0237854
MAPE: 0.064921
RMSE: 3.95787

Iter: 10
Tolerance: 0.00775803
MAPE: 0.0702511
RMSE: 4.12394

Iter: 15
Tolerance: 0.00576277
MAPE: 0.0711923
RMSE: 4.15995

Iter: 20
Tolerance: 0.00211853
MAPE: 0.068638
RMSE: 4.02897

Iter: 25
Tolerance: 0.00357092
MAPE: 0.0672279
RMSE: 3.9568

Iter: 30
Tolerance: 0.00118089
MAPE: 0.0645297
RMSE: 3.82267

Iter: 35
Tolerance: 0.0020781
MAPE: 0.0626989
RMSE: 3.73012

Iter: 40
Tolerance: 0.000766374
MAPE: 0.060399
RMSE: 3.62255

Iter: 45
Tolerance: 0.00133621
MAPE: 0.0585678
RMSE: 3.54038

Iter: 50
Tolerance: 0.00060013
MAPE: 0.0567339
RMSE: 3.46385

Iter: 55
Tolerance: 0.000749786
MAPE: 0.0551952
RMSE: 3.40537

Iter: 60
Tolerance: 0.000488568
MAPE: 0.0538794
RMSE: 3.3604

Iter: 65
Tolerance: 0.00052614
MAPE: 0.0527726
RMSE: 3.32751

Iter: 70
Tolerance: 0.000407211
MAPE: 0.0519287
RMSE: 3.30894

Iter: 75
Tolerance: 0.000402456
MAPE: 0.0512949
RMSE: 3.3003

Iter: 80
Toleranc

In [13]:
import numpy as np
import pandas as pd
import time
import scipy.io

for r in [0.3, 0.7]:
    print('Missing rate = {}'.format(r))
    missing_rate = r

    ## Non-random Missing (NM)
    dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0).values
    dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288]).transpose(0, 2, 1)
    dim1, dim2, dim3 = dense_tensor.shape
    np.random.seed(1000)
    sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim3) + 0.5 - missing_rate)[:, None, :]

    rho = 5e-4
    epsilon = 1e-4
    maxiter = 100

    ## Test LSTC-Tubal model
    c = 0.5
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()

Missing rate = 0.3
- coefficient = 0.5
- lambda = 0.00025

Iter: 5
Tolerance: 0.0299196
MAPE: 0.101585
RMSE: 5.77838

Iter: 10
Tolerance: 0.0152888
MAPE: 0.106548
RMSE: 5.74134

Iter: 15
Tolerance: 0.0133867
MAPE: 0.105349
RMSE: 5.70047

Iter: 20
Tolerance: 0.00889953
MAPE: 0.105052
RMSE: 5.6654

Iter: 25
Tolerance: 0.00835735
MAPE: 0.106866
RMSE: 5.77564

Iter: 30
Tolerance: 0.00426289
MAPE: 0.106063
RMSE: 5.72156

Iter: 35
Tolerance: 0.00610597
MAPE: 0.105521
RMSE: 5.71873

Iter: 40
Tolerance: 0.00246262
MAPE: 0.104314
RMSE: 5.65591

Iter: 45
Tolerance: 0.00475178
MAPE: 0.104243
RMSE: 5.63846

Iter: 50
Tolerance: 0.00162015
MAPE: 0.103305
RMSE: 5.58463

Iter: 55
Tolerance: 0.00352357
MAPE: 0.103089
RMSE: 5.57913

Iter: 60
Tolerance: 0.00105374
MAPE: 0.102445
RMSE: 5.55061

Iter: 65
Tolerance: 0.00251924
MAPE: 0.102257
RMSE: 5.5256

Iter: 70
Tolerance: 0.000712566
MAPE: 0.102123
RMSE: 5.5137

Iter: 75
Tolerance: 0.00161909
MAPE: 0.102035
RMSE: 5.50596

Iter: 80
Tolerance: 0.00048265
M

In [14]:
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Block-out Missing (BM)
dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0).values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288]).transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
block_window = 12
vec = np.random.rand(int(dim2 * dim3 / block_window))
temp = np.array([vec] * block_window)
vec = temp.reshape([dim2 * dim3], order = 'F')
sparse_tensor = mat2ten(dense_mat * np.round(vec + 0.5 - missing_rate)[None, :], np.array([dim1, dim2, dim3]), 0)

rho = 2e-3
epsilon = 1e-4
maxiter = 100

## Test LSTC-Tubal model
c = 0.5
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

- coefficient = 0.5
- lambda = 0.001

Iter: 5
Tolerance: 0.0221242
MAPE: 0.173966
RMSE: 7.93363

Iter: 10
Tolerance: 0.00855929
MAPE: 0.154986
RMSE: 7.23505

Iter: 15
Tolerance: 0.00685016
MAPE: 0.130482
RMSE: 6.44735

Iter: 20
Tolerance: 0.00244091
MAPE: 0.122929
RMSE: 6.24301

Iter: 25
Tolerance: 0.00426487
MAPE: 0.115615
RMSE: 6.00621

Iter: 30
Tolerance: 0.00131616
MAPE: 0.11285
RMSE: 5.92239

Iter: 35
Tolerance: 0.00299315
MAPE: 0.110456
RMSE: 5.81346

Iter: 40
Tolerance: 0.000883322
MAPE: 0.109021
RMSE: 5.76534

Iter: 45
Tolerance: 0.00218362
MAPE: 0.106377
RMSE: 5.65105

Iter: 50
Tolerance: 0.000672387
MAPE: 0.105239
RMSE: 5.61055

Iter: 55
Tolerance: 0.00132483
MAPE: 0.103608
RMSE: 5.54313

Iter: 60
Tolerance: 0.000543
MAPE: 0.102507
RMSE: 5.50559

Iter: 65
Tolerance: 0.001015
MAPE: 0.101034
RMSE: 5.44116

Iter: 70
Tolerance: 0.000500484
MAPE: 0.100044
RMSE: 5.40455

Iter: 75
Tolerance: 0.000687955
MAPE: 0.0988675
RMSE: 5.36066

Iter: 80
Tolerance: 0.000429524
MAPE: 0.0979674
R

### Portland highway traffic volume data set

In [15]:
import numpy as np
import pandas as pd
import time
import scipy.io

for r in [0.3, 0.7, 0.9]:
    print('Missing rate = {}'.format(r))
    missing_rate = r

    # Random Missing (RM)
    dense_mat = np.load('../datasets/Portland-data-set/volume.npy')
    dim1, dim2 = dense_mat.shape
    dim = np.array([dim1, 96, 31])
    dense_tensor = mat2ten(dense_mat, dim, 0)
    np.random.seed(1000)
    sparse_tensor = mat2ten(dense_mat * np.round(np.random.rand(dim1, dim2) + 0.5 - missing_rate), dim, 0)

    rho = 1e-4
    epsilon = 1e-4
    maxiter = 100

    ## Test LSTC-Tubal model
    c = 0.5
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()

Missing rate = 0.3
- coefficient = 0.5
- lambda = 5e-05

Iter: 5
Tolerance: 0.0539294
MAPE: 0.322747
RMSE: 23.5301

Iter: 10
Tolerance: 0.0220826
MAPE: 0.223969
RMSE: 20.7758

Iter: 15
Tolerance: 0.0205212
MAPE: 0.223859
RMSE: 21.5432

Iter: 20
Tolerance: 0.0096372
MAPE: 0.224298
RMSE: 21.5161

Iter: 25
Tolerance: 0.0110835
MAPE: 0.231334
RMSE: 21.5076

Iter: 30
Tolerance: 0.00404289
MAPE: 0.22457
RMSE: 20.9471

Iter: 35
Tolerance: 0.00664838
MAPE: 0.226517
RMSE: 20.6806

Iter: 40
Tolerance: 0.00172961
MAPE: 0.216342
RMSE: 20.2077

Iter: 45
Tolerance: 0.00391649
MAPE: 0.215055
RMSE: 19.8938

Iter: 50
Tolerance: 0.00114424
MAPE: 0.20865
RMSE: 19.5905

Iter: 55
Tolerance: 0.00210309
MAPE: 0.207061
RMSE: 19.4231

Iter: 60
Tolerance: 0.00102263
MAPE: 0.203456
RMSE: 19.2948

Iter: 65
Tolerance: 0.00133306
MAPE: 0.201453
RMSE: 19.2351

Iter: 70
Tolerance: 0.000955342
MAPE: 0.199334
RMSE: 19.2641

Iter: 75
Tolerance: 0.00101769
MAPE: 0.197947
RMSE: 19.3576

Iter: 80
Tolerance: 0.000878462
MAP

In [16]:
import numpy as np
import pandas as pd
import time
import scipy.io

for r in [0.3, 0.7]:
    print('Missing rate = {}'.format(r))
    missing_rate = r

    # Non-random Missing (NM)
    dense_mat = np.load('../datasets/Portland-data-set/volume.npy')
    dim1, dim2 = dense_mat.shape
    dim = np.array([dim1, 96, 31])
    dense_tensor = mat2ten(dense_mat, dim, 0)
    np.random.seed(1000)
    sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim[2]) + 0.5 - missing_rate)[:, None, :]

    rho = 1e-4
    epsilon = 1e-4
    maxiter = 100

    ## Test LSTC-Tubal model
    c = 0.5
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()

Missing rate = 0.3
- coefficient = 0.5
- lambda = 5e-05

Iter: 5
Tolerance: 0.0573916
MAPE: 0.378467
RMSE: 26.8127

Iter: 10
Tolerance: 0.0209886
MAPE: 0.257043
RMSE: 25.1931

Iter: 15
Tolerance: 0.0202561
MAPE: 0.247885
RMSE: 24.8527

Iter: 20
Tolerance: 0.0106517
MAPE: 0.245348
RMSE: 24.6681

Iter: 25
Tolerance: 0.0102658
MAPE: 0.244661
RMSE: 24.3385

Iter: 30
Tolerance: 0.00397567
MAPE: 0.240069
RMSE: 24.257

Iter: 35
Tolerance: 0.00639444
MAPE: 0.243122
RMSE: 24.1769

Iter: 40
Tolerance: 0.00153542
MAPE: 0.236702
RMSE: 24.2208

Iter: 45
Tolerance: 0.00365476
MAPE: 0.237195
RMSE: 24.2308

Iter: 50
Tolerance: 0.000896753
MAPE: 0.236319
RMSE: 24.3629

Iter: 55
Tolerance: 0.00199182
MAPE: 0.238037
RMSE: 24.5167

Iter: 60
Tolerance: 0.000787365
MAPE: 0.239495
RMSE: 24.7621

Iter: 65
Tolerance: 0.0013694
MAPE: 0.242717
RMSE: 25.0044

Iter: 70
Tolerance: 0.000797217
MAPE: 0.246336
RMSE: 25.3165

Iter: 75
Tolerance: 0.00106972
MAPE: 0.249847
RMSE: 25.5771

Iter: 80
Tolerance: 0.000833111
M

In [17]:
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Block-out Missing (BM)
dense_mat = np.load('../datasets/Portland-data-set/volume.npy')
dim1, dim2 = dense_mat.shape
dim = np.array([dim1, 96, 31])
dense_tensor = mat2ten(dense_mat, dim, 0)
block_window = 4
vec = np.random.rand(int(dim2 / block_window))
temp = np.array([vec] * block_window)
vec = temp.reshape([dim2], order = 'F')
sparse_tensor = mat2ten(dense_mat * np.round(vec + 0.5 - missing_rate)[None, :], dim, 0)

rho = 1e-4
epsilon = 1e-4
maxiter = 100

## Test LSTC-Tubal model
c = 0.5
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

- coefficient = 0.5
- lambda = 5e-05

Iter: 5
Tolerance: 0.0591194
MAPE: 0.387187
RMSE: 35.7207

Iter: 10
Tolerance: 0.0218681
MAPE: 0.301665
RMSE: 32.0719

Iter: 15
Tolerance: 0.0209589
MAPE: 0.271332
RMSE: 27.0552

Iter: 20
Tolerance: 0.0102781
MAPE: 0.270356
RMSE: 26.8936

Iter: 25
Tolerance: 0.0112466
MAPE: 0.279285
RMSE: 27.0194

Iter: 30
Tolerance: 0.00408242
MAPE: 0.273967
RMSE: 26.521

Iter: 35
Tolerance: 0.0072072
MAPE: 0.269371
RMSE: 26.0728

Iter: 40
Tolerance: 0.00196622
MAPE: 0.261443
RMSE: 25.6547

Iter: 45
Tolerance: 0.00474108
MAPE: 0.264
RMSE: 25.1968

Iter: 50
Tolerance: 0.00132238
MAPE: 0.257685
RMSE: 24.8741

Iter: 55
Tolerance: 0.00294526
MAPE: 0.255799
RMSE: 24.595

Iter: 60
Tolerance: 0.00106377
MAPE: 0.251882
RMSE: 24.4312

Iter: 65
Tolerance: 0.00188703
MAPE: 0.250216
RMSE: 24.2039

Iter: 70
Tolerance: 0.000998828
MAPE: 0.248775
RMSE: 24.1377

Iter: 75
Tolerance: 0.00139954
MAPE: 0.247881
RMSE: 24.0604

Iter: 80
Tolerance: 0.000954969
MAPE: 0.248062
RMSE: 24.08

### License

<div class="alert alert-block alert-danger">
<b>This work is released under the MIT license.</b>
</div>