## Low-Rank Autoregressive Tensor Completion (LATC)

This notebook shows how to implement a LATC (with truncated nuclear norm) imputer on some real-world traffic data sets. To overcome the problem of missing values within multivariate time series data, this method takes into account both low-rank structure and time series regression. For an in-depth discussion of LATC, please see [1].

<div class="alert alert-block alert-info">
<font color="black">
<b>[1]</b> Xinyu Chen, Mengying Lei, Nicolas Saunier, Lijun Sun (2021). <b>A Low-Rank Autorgressive Tensor Completion for Spatiotemporal Traffic Data Imputation</b>. arXiv:xxxx.xxxxx. <a href="https://arxiv.org/abs/xxxx.xxxxx" title="PDF"><b>[PDF]</b></a> 
</font>
</div>


### Define LATC-imputer kernel

We start by introducing some necessary functions that relies on `Numpy`.

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>ten2mat</code>:</b> <font color="black">Unfold tensor as matrix by specifying mode.</font></li>
<li><b><code>mat2ten</code>:</b> <font color="black">Fold matrix as tensor by specifying dimension (i.e, tensor size) and mode.</font></li>
<li><b><code>svt_tnn</code>:</b> <font color="black">Implement the process of Singular Value Thresholding (SVT).</font></li>
</ul>
</div>

In [1]:
import numpy as np

def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')

def mat2ten(mat, dim, mode):
    index = list()
    index.append(mode)
    for i in range(dim.shape[0]):
        if i != mode:
            index.append(i)
    return np.moveaxis(np.reshape(mat, list(dim[index]), order = 'F'), 0, mode)

In [2]:
def svt_tnn(mat, tau, theta):
    [m, n] = mat.shape
    if 2 * m < n:
        u, s, v = np.linalg.svd(mat @ mat.T, full_matrices = 0)
        s = np.sqrt(s)
        idx = np.sum(s > tau)
        mid = np.zeros(idx)
        mid[: theta] = 1
        mid[theta : idx] = (s[theta : idx] - tau) / s[theta : idx]
        return (u[:, : idx] @ np.diag(mid)) @ (u[:, : idx].T @ mat)
    elif m > 2 * n:
        return svt_tnn(mat.T, tau, theta).T
    u, s, v = np.linalg.svd(mat, full_matrices = 0)
    idx = np.sum(s > tau)
    vec = s[: idx].copy()
    vec[theta : idx] = s[theta : idx] - tau
    return u[:, : idx] @ np.diag(vec) @ v[: idx, :]

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>compute_mape</code>:</b> <font color="black">Compute the value of Mean Absolute Percentage Error (MAPE).</font></li>
<li><b><code>compute_rmse</code>:</b> <font color="black">Compute the value of Root Mean Square Error (RMSE).</font></li>
</ul>
</div>

> Note that $$\mathrm{MAPE}=\frac{1}{n} \sum_{i=1}^{n} \frac{\left|y_{i}-\hat{y}_{i}\right|}{y_{i}} \times 100, \quad\mathrm{RMSE}=\sqrt{\frac{1}{n} \sum_{i=1}^{n}\left(y_{i}-\hat{y}_{i}\right)^{2}},$$ where $n$ is the total number of estimated values, and $y_i$ and $\hat{y}_i$ are the actual value and its estimation, respectively.

In [3]:
def compute_mape(var, var_hat):
    return np.sum(np.abs(var - var_hat) / var) / var.shape[0]

def compute_rmse(var, var_hat):
    return  np.sqrt(np.sum((var - var_hat) ** 2) / var.shape[0])

How to create $\boldsymbol{\Psi}_{0},\boldsymbol{\Psi}_{1},\ldots,\boldsymbol{\Psi}_{d}$?

In [4]:
from scipy import sparse
from scipy.sparse.linalg import spsolve as spsolve

def generate_Psi(dim_time, time_lags):
    Psis = []
    max_lag = np.max(time_lags)
    for i in range(len(time_lags) + 1):
        row = np.arange(0, dim_time - max_lag)
        if i == 0:
            col = np.arange(0, dim_time - max_lag) + max_lag
        else:
            col = np.arange(0, dim_time - max_lag) + max_lag - time_lags[i - 1]
        data = np.ones(dim_time - max_lag)
        Psi = sparse.coo_matrix((data, (row, col)), shape = (dim_time - max_lag, dim_time))
        Psis.append(Psi)
    return Psis

In [5]:
import numpy as np

# Example
dim_time = 5
time_lags = np.array([1, 3])
Psis = generate_Psi(dim_time, time_lags)
print('Psi_0:')
print(Psis[0].toarray())
print()
print('Psi_1:')
print(Psis[1].toarray())
print()
print('Psi_2:')
print(Psis[2].toarray())
print()

Psi_0:
[[0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]

Psi_1:
[[0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]]

Psi_2:
[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]]



The main idea behind LATC-imputer is to approximate partially observed data with both low-rank structure and time series dynamics. The following `imputer` kernel includes some necessary inputs:

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>dense_tensor</code>:</b> <font color="black">This is an input which has the ground truth for validation. If this input is not available, you could use <code>dense_tensor = sparse_tensor.copy()</code> instead.</font></li>
<li><b><code>sparse_tensor</code>:</b> <font color="black">This is a partially observed tensor which has many missing entries.</font></li>
<li><b><code>time_lags</code>:</b> <font color="black">Time lags, e.g., <code>time_lags = np.array([1, 2, 3])</code>. </font></li>
<li><b><code>alpha</code>:</b> <font color="black">Weights for tensors' nuclear norm, e.g., <code>alpha = np.ones(3) / 3</code>. </font></li>
<li><b><code>rho</code>:</b> <font color="black">Learning rate for ADMM, e.g., <code>rho = 0.0005</code>. </font></li>
<li><b><code>lambda0</code>:</b> <font color="black">Weight for time series regressor, e.g., <code>lambda0 = 5 * rho</code></font></li>
<li><b><code>epsilon</code>:</b> <font color="black">Stop criteria, e.g., <code>epsilon = 0.0001</code>. </font></li>
<li><b><code>maxiter</code>:</b> <font color="black">Maximum iteration to stop algorithm, e.g., <code>maxiter = 100</code>. </font></li>
</ul>
</div>

In [6]:
def imputer(dense_mat, sparse_mat, time_lags, rho0, lambda0, theta, epsilon, maxiter, K = 3):
    """Low-Rank Autoregressive Matrix Completion, LAMC-imputer."""
    
    dim = np.array(sparse_mat.shape)
    d = len(time_lags)
    max_lag = np.max(time_lags)
    pos_missing = np.where(sparse_mat == 0)
    pos_test = np.where((dense_mat != 0) & (sparse_mat == 0))
    dense_test = dense_mat[pos_test]
    del dense_mat
    
    T = np.zeros(dim)
    Z = sparse_mat.copy()
    Z[pos_missing] = np.mean(sparse_mat[sparse_mat != 0])
    A = 0.001 * np.random.rand(dim[0], d)
    Psis = generate_Psi(dim[1], time_lags)
    iden = sparse.coo_matrix((np.ones(dim[1]), (np.arange(0, dim[1]), np.arange(0, dim[1]))), shape = (dim[1], dim[1]))
    it = 0
    ind = np.zeros((d, dim[1] - max_lag), dtype = np.int_)
    for i in range(d):
        ind[i, :] = np.arange(max_lag - time_lags[i], dim[1] - time_lags[i])
    last_mat = sparse_mat.copy()
    snorm = np.linalg.norm(sparse_mat, 'fro')
    rho = rho0
    while True:
        B = []
        for m in range(dim[0]):
            Psis0 = Psis.copy()
            for i in range(d):
                Psis0[i + 1] = A[m, i] * Psis[i + 1]
            B.append(Psis0[0] - sum(Psis0[1 :]))
        for k in range(K):
            rho = min(rho * 1.05, 1e5)
            X = svt_tnn(Z - T / rho, 1 / rho, theta)
            temp0 = rho / lambda0 * (X + T / rho)
            mat = np.zeros(dim)
            for m in range(dim[0]):
                mat[m, :] = spsolve(B[m].T @ B[m] + rho * iden / lambda0, temp0[m, :])
            Z[pos_missing] = mat[pos_missing]
            T = T + rho * (X - Z)
        for m in range(dim[0]):
            Vm = Z[m, ind].T
            A[m, :] = np.linalg.pinv(Vm) @ Z[m, max_lag :]
        tol = np.linalg.norm((X - last_mat), 'fro') / snorm
        last_mat = X.copy()
        it += 1
        if it % 200 == 0:
            print('Iter: {}'.format(it))
            print('Tolerance: {:.6}'.format(tol))
            print('MAPE: {:.6}'.format(compute_mape(dense_test, X[pos_test])))
            print('RMSE: {:.6}'.format(compute_rmse(dense_test, X[pos_test])))
            print()
        if (tol < epsilon) or (it >= maxiter):
            break

    print('Total iteration: {}'.format(it))
    print('Tolerance: {:.6}'.format(tol))
    print('Imputation MAPE: {:.6}'.format(compute_mape(dense_test, X[pos_test])))
    print('Imputation RMSE: {:.6}'.format(compute_rmse(dense_test, X[pos_test])))
    print()
    
    return X

> We use `spslove` of `scipy.sparse.linalg` for updating $\boldsymbol{Z}$ because computing the inverse of a large matrix directly is computationally expensive.

### Guangzhou urban traffic speed data set

In [12]:
import time
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Random Missing (RM)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15, 20, 25, 30]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 27
Tolerance: 6.96911e-05
Imputation MAPE: 0.096086
Imputation RMSE: 4.04553

Running time: 363 seconds

0.1
10
Total iteration: 36
Tolerance: 9.52972e-05
Imputation MAPE: 0.0953806
Imputation RMSE: 4.06367

Running time: 456 seconds

0.1
15
Total iteration: 42
Tolerance: 8.90414e-05
Imputation MAPE: 0.0961029
Imputation RMSE: 4.14698

Running time: 556 seconds

0.1
20
Total iteration: 46
Tolerance: 8.86395e-05
Imputation MAPE: 0.0973817
Imputation RMSE: 4.24509

Running time: 632 seconds

0.1
25
Total iteration: 47
Tolerance: 8.72375e-05
Imputation MAPE: 0.0991777
Imputation RMSE: 4.35809

Running time: 636 seconds

0.1
30
Total iteration: 48
Tolerance: 9.09497e-05
Imputation MAPE: 0.101092
Imputation RMSE: 4.47111

Running time: 561 seconds

0.2
5
Total iteration: 27
Tolerance: 6.9516e-05
Imputation MAPE: 0.0960945
Imputation RMSE: 4.04472

Running time: 321 seconds

0.2
10
Total iteration: 36
Tolerance: 8.38063e-05
Imputation MAPE: 0.0953331
Imputation RMSE: 4

Best parameters:

- Coefficient $c=1$
- Weight parameter $\lambda=15$

In [13]:
import time
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.7

## Random Missing (RM)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15, 20, 25, 30]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 35
Tolerance: 9.60843e-05
Imputation MAPE: 0.105683
Imputation RMSE: 4.42063

Running time: 442 seconds

0.1
10
Total iteration: 44
Tolerance: 8.64132e-05
Imputation MAPE: 0.111698
Imputation RMSE: 4.93094

Running time: 556 seconds

0.1
15
Total iteration: 47
Tolerance: 9.00451e-05
Imputation MAPE: 0.120519
Imputation RMSE: 5.58798

Running time: 590 seconds

0.1
20
Total iteration: 49
Tolerance: 9.58098e-05
Imputation MAPE: 0.132996
Imputation RMSE: 6.33466

Running time: 664 seconds

0.1
25
Total iteration: 51
Tolerance: 9.20908e-05
Imputation MAPE: 0.144456
Imputation RMSE: 6.97178

Running time: 603 seconds

0.1
30
Total iteration: 52
Tolerance: 9.16057e-05
Imputation MAPE: 0.156511
Imputation RMSE: 7.5526

Running time: 611 seconds

0.2
5
Total iteration: 35
Tolerance: 9.61497e-05
Imputation MAPE: 0.10559
Imputation RMSE: 4.41419

Running time: 411 seconds

0.2
10
Total iteration: 44
Tolerance: 8.66549e-05
Imputation MAPE: 0.111073
Imputation RMSE: 4.8615



Best parameters:

- Coefficient $c=5$
- Weight parameter $\lambda=15$

In [None]:
import time
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.9

## Random Missing (RM)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15, 20, 25, 30]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

Best parameters:

- Coefficient $c=5$
- Weight parameter $\lambda=5$

In [10]:
import time
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Non-random Missing (NM)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim3) + 0.5 - missing_rate)[:, None, :]
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 27
Tolerance: 9.31395e-05
Imputation MAPE: 0.101791
Imputation RMSE: 4.25672

Running time: 323 seconds

0.1
10
Total iteration: 39
Tolerance: 9.27791e-05
Imputation MAPE: 0.100999
Imputation RMSE: 4.27739

Running time: 451 seconds

0.1
15
Total iteration: 45
Tolerance: 8.87543e-05
Imputation MAPE: 0.103114
Imputation RMSE: 4.43917

Running time: 519 seconds

0.2
5
Total iteration: 27
Tolerance: 9.34238e-05
Imputation MAPE: 0.10178
Imputation RMSE: 4.25482

Running time: 310 seconds

0.2
10
Total iteration: 40
Tolerance: 8.58911e-05
Imputation MAPE: 0.100945
Imputation RMSE: 4.27338

Running time: 461 seconds

0.2
15
Total iteration: 45
Tolerance: 8.8778e-05
Imputation MAPE: 0.102815
Imputation RMSE: 4.41664

Running time: 518 seconds

1
5
Total iteration: 29
Tolerance: 8.65196e-05
Imputation MAPE: 0.101757
Imputation RMSE: 4.24309

Running time: 333 seconds

1
10
Total iteration: 38
Tolerance: 8.96605e-05
Imputation MAPE: 0.100585
Imputation RMSE: 4.24479

Runn

Best parameters:

- Coefficient $c=5$
- Weight parameter $\lambda=10$

In [11]:
import time
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.7

## Non-random Missing (NM)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim3) + 0.5 - missing_rate)[:, None, :]
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 39
Tolerance: 9.37384e-05
Imputation MAPE: 0.114547
Imputation RMSE: 4.77448

Running time: 527 seconds

0.1
10
Total iteration: 48
Tolerance: 8.67785e-05
Imputation MAPE: 0.125124
Imputation RMSE: 5.54422

Running time: 691 seconds

0.1
15
Total iteration: 50
Tolerance: 8.68138e-05
Imputation MAPE: 0.13751
Imputation RMSE: 6.29824

Running time: 833 seconds

0.2
5
Total iteration: 37
Tolerance: 9.33293e-05
Imputation MAPE: 0.114034
Imputation RMSE: 4.75143

Running time: 522 seconds

0.2
10
Total iteration: 46
Tolerance: 9.9363e-05
Imputation MAPE: 0.123562
Imputation RMSE: 5.4073

Running time: 719 seconds

0.2
15
Total iteration: 49
Tolerance: 9.49456e-05
Imputation MAPE: 0.133323
Imputation RMSE: 6.05905

Running time: 780 seconds

1
5
Total iteration: 38
Tolerance: 9.7932e-05
Imputation MAPE: 0.112276
Imputation RMSE: 4.65979

Running time: 511 seconds

1
10
Total iteration: 45
Tolerance: 8.73126e-05
Imputation MAPE: 0.117872
Imputation RMSE: 5.04149

Runnin

Best parameters:

- Coefficient $c=5$
- Weight parameter $\lambda=5$

In [12]:
import time
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Block-out Missing (BM)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape

dim_time = dim2 * dim3
block_window = 6
vec = np.random.rand(int(dim_time / block_window))
temp = np.array([vec] * block_window)
vec = temp.reshape([dim2 * dim3], order = 'F')

sparse_tensor = mat2ten(ten2mat(dense_tensor, 0) * np.round(vec + 0.5 - missing_rate)[None, :], np.array([dim1, dim2, dim3]), 0)
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 37
Tolerance: 8.52874e-05
Imputation MAPE: 0.180602
Imputation RMSE: 6.81957

Running time: 587 seconds

0.1
10
Total iteration: 41
Tolerance: 7.45306e-05
Imputation MAPE: 0.17946
Imputation RMSE: 6.78048

Running time: 529 seconds

0.1
15
Total iteration: 49
Tolerance: 5.85836e-05
Imputation MAPE: 0.17861
Imputation RMSE: 6.75226

Running time: 613 seconds

0.2
5
Total iteration: 38
Tolerance: 9.53689e-05
Imputation MAPE: 0.157536
Imputation RMSE: 6.03583

Running time: 461 seconds

0.2
10
Total iteration: 41
Tolerance: 8.49283e-05
Imputation MAPE: 0.155873
Imputation RMSE: 5.98347

Running time: 479 seconds

0.2
15
Total iteration: 49
Tolerance: 6.7125e-05
Imputation MAPE: 0.154526
Imputation RMSE: 5.94424

Running time: 607 seconds

1
5
Total iteration: 41
Tolerance: 8.92924e-05
Imputation MAPE: 0.132635
Imputation RMSE: 5.38683

Running time: 492 seconds

1
10
Total iteration: 43
Tolerance: 9.36033e-05
Imputation MAPE: 0.130029
Imputation RMSE: 5.31964

Runni

Best parameters:

- Coefficient $c=10$
- Weight parameter $\lambda=15$

### Hangzhou metro passenger flow data set

In [13]:
import time
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Random Missing (RM)
dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 36
Tolerance: 8.91948e-05
Imputation MAPE: 0.232225
Imputation RMSE: 45.0044

Running time: 49 seconds

0.1
10
Total iteration: 38
Tolerance: 9.34579e-05
Imputation MAPE: 0.240361
Imputation RMSE: 47.4795

Running time: 50 seconds

0.1
15
Total iteration: 41
Tolerance: 9.9146e-05
Imputation MAPE: 0.260085
Imputation RMSE: 49.6169

Running time: 55 seconds

0.2
5
Total iteration: 31
Tolerance: 9.07232e-05
Imputation MAPE: 0.226515
Imputation RMSE: 42.9396

Running time: 39 seconds

0.2
10
Total iteration: 36
Tolerance: 9.13275e-05
Imputation MAPE: 0.232166
Imputation RMSE: 43.9497

Running time: 48 seconds

0.2
15
Total iteration: 41
Tolerance: 8.87669e-05
Imputation MAPE: 0.248529
Imputation RMSE: 45.5564

Running time: 53 seconds

1
5
Total iteration: 37
Tolerance: 9.14285e-05
Imputation MAPE: 0.219899
Imputation RMSE: 44.9002

Running time: 46 seconds

1
10
Total iteration: 38
Tolerance: 8.68934e-05
Imputation MAPE: 0.221564
Imputation RMSE: 45.4364

Running ti

Best parameters:

- Coefficient $c=0.2$
- Weight parameter $\lambda=5$

In [14]:
import time
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.7

## Random Missing (RM)
dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 45
Tolerance: 8.64094e-05
Imputation MAPE: 0.292239
Imputation RMSE: 69.8626

Running time: 68 seconds

0.1
10
Total iteration: 45
Tolerance: 9.14226e-05
Imputation MAPE: 0.349963
Imputation RMSE: 80.2176

Running time: 69 seconds

0.1
15
Total iteration: 46
Tolerance: 9.34278e-05
Imputation MAPE: 0.409283
Imputation RMSE: 83.8329

Running time: 72 seconds

0.2
5
Total iteration: 44
Tolerance: 9.27933e-05
Imputation MAPE: 0.272882
Imputation RMSE: 58.3302

Running time: 70 seconds

0.2
10
Total iteration: 45
Tolerance: 9.64703e-05
Imputation MAPE: 0.315546
Imputation RMSE: 65.8737

Running time: 64 seconds

0.2
15
Total iteration: 46
Tolerance: 9.35691e-05
Imputation MAPE: 0.365661
Imputation RMSE: 69.764

Running time: 74 seconds

1
5
Total iteration: 49
Tolerance: 9.31952e-05
Imputation MAPE: 0.253041
Imputation RMSE: 51.2579

Running time: 79 seconds

1
10
Total iteration: 49
Tolerance: 9.84967e-05
Imputation MAPE: 0.265703
Imputation RMSE: 53.1601

Running ti

Best parameters:

- Coefficient $c=1$
- Weight parameter $\lambda=5$

In [15]:
import time
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.9

## Random Missing (RM)
dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 48
Tolerance: 9.99477e-05
Imputation MAPE: 0.494564
Imputation RMSE: 101.208

Running time: 105 seconds

0.1
10
Total iteration: 51
Tolerance: 9.07879e-05
Imputation MAPE: 0.648042
Imputation RMSE: 108.852

Running time: 80 seconds

0.1
15
Total iteration: 58
Tolerance: 9.42627e-05
Imputation MAPE: 0.667801
Imputation RMSE: 108.158

Running time: 101 seconds

0.2
5
Total iteration: 49
Tolerance: 9.36387e-05
Imputation MAPE: 0.42179
Imputation RMSE: 91.7059

Running time: 86 seconds

0.2
10
Total iteration: 50
Tolerance: 9.79365e-05
Imputation MAPE: 0.560791
Imputation RMSE: 100.361

Running time: 120 seconds

0.2
15
Total iteration: 57
Tolerance: 9.8656e-05
Imputation MAPE: 0.575001
Imputation RMSE: 100.326

Running time: 107 seconds

1
5
Total iteration: 55
Tolerance: 8.99808e-05
Imputation MAPE: 0.329058
Imputation RMSE: 71.9563

Running time: 95 seconds

1
10
Total iteration: 56
Tolerance: 9.69177e-05
Imputation MAPE: 0.385003
Imputation RMSE: 78.7119

Running

Best parameters:

- Coefficient $c=10$
- Weight parameter $\lambda=5$

In [16]:
import time
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Non-random Missing (NM)
dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim3) + 0.5 - missing_rate)[:, None, :]
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 37
Tolerance: 9.75974e-05
Imputation MAPE: 0.236537
Imputation RMSE: 73.5825

Running time: 50 seconds

0.1
10
Total iteration: 41
Tolerance: 9.53985e-05
Imputation MAPE: 0.254801
Imputation RMSE: 80.5959

Running time: 54 seconds

0.1
15
Total iteration: 43
Tolerance: 8.96128e-05
Imputation MAPE: 0.277609
Imputation RMSE: 83.4242

Running time: 56 seconds

0.2
5
Total iteration: 35
Tolerance: 9.45346e-05
Imputation MAPE: 0.229312
Imputation RMSE: 67.0813

Running time: 51 seconds

0.2
10
Total iteration: 40
Tolerance: 8.68989e-05
Imputation MAPE: 0.243648
Imputation RMSE: 71.8384

Running time: 54 seconds

0.2
15
Total iteration: 42
Tolerance: 9.84784e-05
Imputation MAPE: 0.264876
Imputation RMSE: 73.4558

Running time: 55 seconds

1
5
Total iteration: 41
Tolerance: 9.33975e-05
Imputation MAPE: 0.223234
Imputation RMSE: 68.7565

Running time: 54 seconds

1
10
Total iteration: 44
Tolerance: 9.74703e-05
Imputation MAPE: 0.231276
Imputation RMSE: 71.4145

Running t

Best parameters:

- Coefficient $c=0.2$
- Weight parameter $\lambda=5$

In [17]:
import time
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.7

## Non-random Missing (NM)
dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim3) + 0.5 - missing_rate)[:, None, :]
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 43
Tolerance: 9.2328e-05
Imputation MAPE: 0.391182
Imputation RMSE: 114.653

Running time: 71 seconds

0.1
10
Total iteration: 45
Tolerance: 9.49186e-05
Imputation MAPE: 0.444395
Imputation RMSE: 116.755

Running time: 77 seconds

0.1
15
Total iteration: 47
Tolerance: 9.40615e-05
Imputation MAPE: 0.502293
Imputation RMSE: 120.034

Running time: 79 seconds

0.2
5
Total iteration: 46
Tolerance: 8.71626e-05
Imputation MAPE: 0.362834
Imputation RMSE: 106.14

Running time: 80 seconds

0.2
10
Total iteration: 45
Tolerance: 9.78469e-05
Imputation MAPE: 0.394065
Imputation RMSE: 111.096

Running time: 69 seconds

0.2
15
Total iteration: 47
Tolerance: 8.6772e-05
Imputation MAPE: 0.45361
Imputation RMSE: 114.598

Running time: 65 seconds

1
5
Total iteration: 50
Tolerance: 8.86142e-05
Imputation MAPE: 0.282958
Imputation RMSE: 80.1618

Running time: 69 seconds

1
10
Total iteration: 50
Tolerance: 9.49733e-05
Imputation MAPE: 0.327609
Imputation RMSE: 98.0196

Running time:

Best parameters:

- Coefficient $c=5$
- Weight parameter $\lambda=5$

In [18]:
import time
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Block-out Missing (BM)
dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape

dim_time = dim2 * dim3
block_window = 6
vec = np.random.rand(int(dim_time / block_window))
temp = np.array([vec] * block_window)
vec = temp.reshape([dim2 * dim3], order = 'F')

sparse_tensor = mat2ten(ten2mat(dense_tensor, 0) * np.round(vec + 0.5 - missing_rate)[None, :], np.array([dim1, dim2, dim3]), 0)
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 41
Tolerance: 8.84986e-05
Imputation MAPE: 0.990984
Imputation RMSE: 84.0888

Running time: 55 seconds

0.1
10
Total iteration: 41
Tolerance: 9.02685e-05
Imputation MAPE: 0.999072
Imputation RMSE: 83.0202

Running time: 54 seconds

0.1
15
Total iteration: 42
Tolerance: 8.00465e-05
Imputation MAPE: 0.996694
Imputation RMSE: 82.8555

Running time: 54 seconds

0.2
5
Total iteration: 41
Tolerance: 9.44289e-05
Imputation MAPE: 0.549392
Imputation RMSE: 70.3621

Running time: 55 seconds

0.2
10
Total iteration: 42
Tolerance: 8.83414e-05
Imputation MAPE: 0.533817
Imputation RMSE: 68.8891

Running time: 54 seconds

0.2
15
Total iteration: 42
Tolerance: 9.26899e-05
Imputation MAPE: 0.529892
Imputation RMSE: 68.5892

Running time: 55 seconds

1
5
Total iteration: 45
Tolerance: 9.38246e-05
Imputation MAPE: 0.320449
Imputation RMSE: 66.3201

Running time: 58 seconds

1
10
Total iteration: 46
Tolerance: 8.8802e-05
Imputation MAPE: 0.307794
Imputation RMSE: 66.0309

Running ti

Best parameters:

- Coefficient $c=1$
- Weight parameter $\lambda=10$

### Seattle freeway traffic speed data set

In [20]:
import numpy as np
import pandas as pd
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Random missing (RM)
dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0).values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288]).transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 26
Tolerance: 1.56469e-05
Imputation MAPE: 0.0615075
Imputation RMSE: 3.78069

Running time: 431 seconds

0.1
10
Total iteration: 31
Tolerance: 6.65135e-05
Imputation MAPE: 0.0605066
Imputation RMSE: 3.751

Running time: 498 seconds

0.1
15
Total iteration: 34
Tolerance: 7.04919e-05
Imputation MAPE: 0.059849
Imputation RMSE: 3.73199

Running time: 542 seconds

0.2
5
Total iteration: 26
Tolerance: 2.51772e-05
Imputation MAPE: 0.0616754
Imputation RMSE: 3.78488

Running time: 443 seconds

0.2
10
Total iteration: 31
Tolerance: 6.87392e-05
Imputation MAPE: 0.0606254
Imputation RMSE: 3.75334

Running time: 578 seconds

0.2
15
Total iteration: 34
Tolerance: 7.01717e-05
Imputation MAPE: 0.0599216
Imputation RMSE: 3.73267

Running time: 618 seconds

1
5
Total iteration: 27
Tolerance: 8.57012e-05
Imputation MAPE: 0.0633312
Imputation RMSE: 3.82976

Running time: 457 seconds

1
10
Total iteration: 31
Tolerance: 9.14476e-05
Imputation MAPE: 0.0619627
Imputation RMSE: 3.7866

Best parameters:

- Coefficient $c=0.1$
- Weight parameter $\lambda=15$

In [21]:
import numpy as np
import pandas as pd
import scipy.io
np.random.seed(1000)

missing_rate = 0.7

## Random missing (RM)
dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0).values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288]).transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 27
Tolerance: 9.57687e-05
Imputation MAPE: 0.0797933
Imputation RMSE: 4.71099

Running time: 502 seconds

0.1
10
Total iteration: 36
Tolerance: 8.91404e-05
Imputation MAPE: 0.0792212
Imputation RMSE: 4.72228

Running time: 588 seconds

0.1
15
Total iteration: 38
Tolerance: 9.82608e-05
Imputation MAPE: 0.0806456
Imputation RMSE: 4.83656

Running time: 736 seconds

0.2
5
Total iteration: 28
Tolerance: 7.79547e-05
Imputation MAPE: 0.0799515
Imputation RMSE: 4.71119

Running time: 568 seconds

0.2
10
Total iteration: 29
Tolerance: 9.76257e-05
Imputation MAPE: 0.0791482
Imputation RMSE: 4.71205

Running time: 615 seconds

0.2
15
Total iteration: 39
Tolerance: 8.97194e-05
Imputation MAPE: 0.0802693
Imputation RMSE: 4.80806

Running time: 723 seconds

1
5
Total iteration: 35
Tolerance: 9.07712e-05
Imputation MAPE: 0.0821485
Imputation RMSE: 4.75289

Running time: 552 seconds

1
10
Total iteration: 35
Tolerance: 9.59027e-05
Imputation MAPE: 0.0802372
Imputation RMSE: 4.7

Best parameters:

- Coefficient $c=1$
- Weight parameter $\lambda=10$

In [22]:
import numpy as np
import pandas as pd
import scipy.io
np.random.seed(1000)

missing_rate = 0.9

## Random missing (RM)
dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0).values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288]).transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 43
Tolerance: 8.66792e-05
Imputation MAPE: 0.105945
Imputation RMSE: 6.08593

Running time: 832 seconds

0.1
10
Total iteration: 52
Tolerance: 8.70562e-05
Imputation MAPE: 0.123448
Imputation RMSE: 7.3116

Running time: 971 seconds

0.1
15
Total iteration: 53
Tolerance: 8.79003e-05
Imputation MAPE: 0.141236
Imputation RMSE: 8.57607

Running time: 950 seconds

0.2
5
Total iteration: 41
Tolerance: 8.8733e-05
Imputation MAPE: 0.105343
Imputation RMSE: 6.02993

Running time: 622 seconds

0.2
10
Total iteration: 52
Tolerance: 8.67689e-05
Imputation MAPE: 0.118301
Imputation RMSE: 6.93044

Running time: 858 seconds

0.2
15
Total iteration: 52
Tolerance: 9.49733e-05
Imputation MAPE: 0.133957
Imputation RMSE: 7.96725

Running time: 818 seconds

1
5
Total iteration: 44
Tolerance: 9.87288e-05
Imputation MAPE: 0.105591
Imputation RMSE: 5.908

Running time: 656 seconds

1
10
Total iteration: 50
Tolerance: 9.19702e-05
Imputation MAPE: 0.108513
Imputation RMSE: 6.16701

Runnin

Best parameters:

- Coefficient $c=1$
- Weight parameter $\lambda=5$

In [None]:
import numpy as np
import pandas as pd
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Non-random Missing (NM)
dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0).values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288]).transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim3) + 0.5 - missing_rate)[:, None, :]
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

Best parameters:

- Coefficient $c=0.2$
- Weight parameter $\lambda=15$

In [24]:
import numpy as np
import pandas as pd
import scipy.io
np.random.seed(1000)

missing_rate = 0.7

## Non-random Missing (NM)
dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0).values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288]).transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim3) + 0.5 - missing_rate)[:, None, :]
dense_mat = dense_tensor.reshape([dim1, dim2 * dim3])
sparse_mat = sparse_tensor.reshape([dim1, dim2 * dim3])
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 33
Tolerance: 9.36951e-05
Imputation MAPE: 0.0974706
Imputation RMSE: 5.60495

Running time: 580 seconds

0.1
10
Total iteration: 41
Tolerance: 8.87281e-05
Imputation MAPE: 0.10283
Imputation RMSE: 6.05378

Running time: 739 seconds

0.1
15
Total iteration: 44
Tolerance: 9.55696e-05
Imputation MAPE: 0.109031
Imputation RMSE: 6.57883

Running time: 780 seconds

0.2
5
Total iteration: 33
Tolerance: 8.66587e-05
Imputation MAPE: 0.0976847
Imputation RMSE: 5.59776

Running time: 602 seconds

0.2
10
Total iteration: 39
Tolerance: 9.98807e-05
Imputation MAPE: 0.101275
Imputation RMSE: 5.99241

Running time: 673 seconds

0.2
15
Total iteration: 44
Tolerance: 9.64849e-05
Imputation MAPE: 0.108827
Imputation RMSE: 6.50087

Running time: 709 seconds

1
5
Total iteration: 40
Tolerance: 8.86125e-05
Imputation MAPE: 0.100555
Imputation RMSE: 5.6334

Running time: 586 seconds

1
10
Total iteration: 40
Tolerance: 9.67059e-05
Imputation MAPE: 0.101868
Imputation RMSE: 5.78247

Ru

Best parameters:

- Coefficient $c=0.1$
- Weight parameter $\lambda=5$

In [25]:
import numpy as np
import pandas as pd
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Block-out Missing (BM)
dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0).values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288]).transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
block_window = 12
vec = np.random.rand(int(dim2 * dim3 / block_window))
temp = np.array([vec] * block_window)
vec = temp.reshape([dim2 * dim3], order = 'F')
sparse_tensor = mat2ten(dense_mat * np.round(vec + 0.5 - missing_rate)[None, :], np.array([dim1, dim2, dim3]), 0)

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 7)
        rho = 1e-4
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 33
Tolerance: 9.38222e-05
Imputation MAPE: 0.282421
Imputation RMSE: 14.6145

Running time: 464 seconds

0.1
10
Total iteration: 41
Tolerance: 8.82233e-05
Imputation MAPE: 0.287119
Imputation RMSE: 15.0232

Running time: 574 seconds

0.1
15
Total iteration: 44
Tolerance: 9.52253e-05
Imputation MAPE: 0.290991
Imputation RMSE: 15.329

Running time: 623 seconds

0.2
5
Total iteration: 33
Tolerance: 8.66756e-05
Imputation MAPE: 0.281815
Imputation RMSE: 14.5643

Running time: 466 seconds

0.2
10
Total iteration: 39
Tolerance: 9.98613e-05
Imputation MAPE: 0.286323
Imputation RMSE: 14.9432

Running time: 550 seconds

0.2
15
Total iteration: 45
Tolerance: 8.77388e-05
Imputation MAPE: 0.289598
Imputation RMSE: 15.1825

Running time: 633 seconds

1
5
Total iteration: 40
Tolerance: 8.86121e-05
Imputation MAPE: 0.277965
Imputation RMSE: 14.2584

Running time: 564 seconds

1
10
Total iteration: 41
Tolerance: 8.94229e-05
Imputation MAPE: 0.280744
Imputation RMSE: 14.4375

Run

Best parameters:

- Coefficient $c=10$
- Weight parameter $\lambda=5$

### Portland highway traffic volume data set

In [None]:
import numpy as np
import pandas as pd
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

# Random Missing (RM)
dense_mat = np.load('../datasets/Portland-data-set/volume.npy')
dim1, dim2 = dense_mat.shape
dim = np.array([dim1, 96, 31])
np.random.seed(1000)
sparse_mat = dense_mat * np.round(np.random.rand(dim1, dim2) + 0.5 - missing_rate)

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 5)
        rho = 1e-5
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

In [None]:
import numpy as np
import pandas as pd
import scipy.io
np.random.seed(1000)

missing_rate = 0.7

# Random Missing (RM)
dense_mat = np.load('../datasets/Portland-data-set/volume.npy')
dim1, dim2 = dense_mat.shape
dim = np.array([dim1, 96, 31])
np.random.seed(1000)
sparse_mat = dense_mat * np.round(np.random.rand(dim1, dim2) + 0.5 - missing_rate)

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 5)
        rho = 1e-5
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

In [14]:
import numpy as np
import pandas as pd
import scipy.io
np.random.seed(1000)

missing_rate = 0.9

# Random Missing (RM)
dense_mat = np.load('../datasets/Portland-data-set/volume.npy')
dim1, dim2 = dense_mat.shape
dim = np.array([dim1, 96, 31])
np.random.seed(1000)
sparse_mat = dense_mat * np.round(np.random.rand(dim1, dim2) + 0.5 - missing_rate)

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 5)
        rho = 1e-5
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 57
Tolerance: 9.06487e-05
Imputation MAPE: 0.279643
Imputation RMSE: 27.6974

Running time: 1137 seconds

0.1
10
Total iteration: 59
Tolerance: 9.8772e-05
Imputation MAPE: 0.284015
Imputation RMSE: 30.3293

Running time: 1156 seconds

0.1
15
Total iteration: 61
Tolerance: 9.90556e-05
Imputation MAPE: 0.294737
Imputation RMSE: 32.0589

Running time: 1156 seconds

0.2
5
Total iteration: 56
Tolerance: 8.93487e-05
Imputation MAPE: 0.278623
Imputation RMSE: 27.2739

Running time: 1049 seconds

0.2
10
Total iteration: 58
Tolerance: 9.60742e-05
Imputation MAPE: 0.281845
Imputation RMSE: 29.0939

Running time: 1087 seconds

0.2
15
Total iteration: 61
Tolerance: 9.09197e-05
Imputation MAPE: 0.28897
Imputation RMSE: 29.8038

Running time: 1175 seconds

1
5
Total iteration: 57
Tolerance: 9.35199e-05
Imputation MAPE: 0.274377
Imputation RMSE: 25.9512

Running time: 1220 seconds

1
10
Total iteration: 58
Tolerance: 9.15977e-05
Imputation MAPE: 0.272092
Imputation RMSE: 25.595

Best parameters:

- Coefficient $c=5$
- Weight parameter $\lambda=15$

In [15]:
import numpy as np
import pandas as pd
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

# Non-random Missing (NM)
dense_mat = np.load('../datasets/Portland-data-set/volume.npy')
dim1, dim2 = dense_mat.shape
dim = np.array([dim1, 96, 31])
dense_tensor = mat2ten(dense_mat, dim, 0)
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim[2]) + 0.5 - missing_rate)[:, None, :]
sparse_mat = ten2mat(sparse_tensor, 0)
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 5)
        rho = 1e-5
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 41
Tolerance: 8.59493e-05
Imputation MAPE: 0.201585
Imputation RMSE: 21.7109

Running time: 775 seconds

0.1
10
Total iteration: 49
Tolerance: 9.38177e-05
Imputation MAPE: 0.203197
Imputation RMSE: 26.8456

Running time: 922 seconds

0.1
15
Total iteration: 49
Tolerance: 9.68219e-05
Imputation MAPE: 0.200768
Imputation RMSE: 24.6295

Running time: 924 seconds

0.2
5
Total iteration: 45
Tolerance: 8.74286e-05
Imputation MAPE: 0.201804
Imputation RMSE: 21.6384

Running time: 844 seconds

0.2
10
Total iteration: 49
Tolerance: 9.21641e-05
Imputation MAPE: 0.203391
Imputation RMSE: 25.8576

Running time: 918 seconds

0.2
15
Total iteration: 50
Tolerance: 8.6589e-05
Imputation MAPE: 0.200926
Imputation RMSE: 26.267

Running time: 936 seconds

1
5
Total iteration: 41
Tolerance: 9.99243e-05
Imputation MAPE: 0.202043
Imputation RMSE: 20.9876

Running time: 764 seconds

1
10
Total iteration: 49
Tolerance: 9.75457e-05
Imputation MAPE: 0.201701
Imputation RMSE: 23.6411

Runn

Best parameters:

- Coefficient $c=10$
- Weight parameter $\lambda=5$

In [11]:
import numpy as np
import pandas as pd
import scipy.io
np.random.seed(1000)

missing_rate = 0.7

# Non-random Missing (NM)
dense_mat = np.load('../datasets/Portland-data-set/volume.npy')
dim1, dim2 = dense_mat.shape
dim = np.array([dim1, 96, 31])
dense_tensor = mat2ten(dense_mat, dim, 0)
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim[2]) + 0.5 - missing_rate)[:, None, :]
sparse_mat = ten2mat(sparse_tensor, 0)
del dense_tensor, sparse_tensor

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 5)
        rho = 1e-5
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

0.1
5
Total iteration: 49
Tolerance: 8.92426e-05
Imputation MAPE: 0.263487
Imputation RMSE: 30.2899

Running time: 1029 seconds

0.1
10
Total iteration: 52
Tolerance: 9.15674e-05
Imputation MAPE: 0.262784
Imputation RMSE: 32.8445

Running time: 1142 seconds

0.1
15
Total iteration: 53
Tolerance: 9.79261e-05
Imputation MAPE: 0.265488
Imputation RMSE: 33.4139

Running time: 1084 seconds

0.2
5
Total iteration: 49
Tolerance: 9.59027e-05
Imputation MAPE: 0.27235
Imputation RMSE: 51.2019

Running time: 987 seconds

0.2
10
Total iteration: 52
Tolerance: 9.50635e-05
Imputation MAPE: 0.262761
Imputation RMSE: 37.9079

Running time: 1072 seconds

0.2
15
Total iteration: 52
Tolerance: 9.71327e-05
Imputation MAPE: 0.266855
Imputation RMSE: 39.8665

Running time: 999 seconds

1
5
Total iteration: 50
Tolerance: 8.62551e-05
Imputation MAPE: 0.267448
Imputation RMSE: 45.1185

Running time: 986 seconds

1
10
Total iteration: 51
Tolerance: 9.33135e-05
Imputation MAPE: 0.263259
Imputation RMSE: 34.2221


Best parameters:

- Coefficient $c=10$
- Weight parameter $\lambda=5$

In [None]:
import time
import numpy as np
import pandas as pd
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Block-out Missing (BM)
dense_mat = np.load('../datasets/Portland-data-set/volume.npy')
dim1, dim2 = dense_mat.shape
dim = np.array([dim1, 96, 31])
block_window = 4
vec = np.random.rand(int(dim2 / block_window))
temp = np.array([vec] * block_window)
vec = temp.reshape([dim2], order = 'F')
sparse_mat = dense_mat * np.round(vec + 0.5 - missing_rate)[None, :]

for c in [1/10, 1/5, 1, 5, 10]:
    for theta in [5, 10, 15]:
        start = time.time()
        time_lags = np.arange(1, 5)
        rho = 1e-5
        lambda0 = c * rho
        print(c)
        print(theta)
        epsilon = 1e-4
        maxiter = 100
        mat_hat = imputer(dense_mat, sparse_mat, time_lags, rho, lambda0, theta, epsilon, maxiter)
        end = time.time()
        print('Running time: %d seconds'%(end - start))
        print()

### License

<div class="alert alert-block alert-danger">
<b>This work is released under the MIT license.</b>
</div>