## Low-Tubal-Rank Autoregressive Tensor Completion Imputer (LATC-Tubal-imputer)

This notebook shows how to implement a LATC-Tubal imputer on some real-world large-scale data sets. To overcome the problem of missing values within multivariate time series data, this method takes into account both low-rank structure and time series regression. Meanwhile, to make the model scalable, we also integrate linear transform into the LATC model. For an in-depth discussion of LATC-Tubal-imputer, please see [1].

<div class="alert alert-block alert-info">
<font color="black">
<b>[1]</b> Xinyu Chen, Yixian Chen, Lijun Sun (2020). <b>Scalable low-rank autoregressive tensor learning for spatiotemporal traffic data imputation</b>. arXiv: 2008.03194. <a href="https://arxiv.org/abs/2008.03194" title="PDF"><b>[PDF]</b></a> <a href="https://doi.org/10.5281/zenodo.3939792" title="data"><b>[data]</b></a> 
</font>
</div>


In [1]:
import numpy as np
from numpy.linalg import inv as inv

### Define LATC-imputer kernel

We start by introducing some necessary functions that relies on `Numpy`.

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>ten2mat</code>:</b> <font color="black">Unfold tensor as matrix by specifying mode.</font></li>
<li><b><code>mat2ten</code>:</b> <font color="black">Fold matrix as tensor by specifying dimension (i.e, tensor size) and mode.</font></li>
<li><b><code>svt</code>:</b> <font color="black">Implement the process of Singular Value Thresholding (SVT).</font></li>
</ul>
</div>

In [2]:
def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')

def mat2ten(mat, dim, mode):
    index = list()
    index.append(mode)
    for i in range(dim.shape[0]):
        if i != mode:
            index.append(i)
    return np.moveaxis(np.reshape(mat, list(dim[index]), order = 'F'), 0, mode)

In [3]:
def unitary_transform(tensor, Phi):
    return np.einsum('kt, ijk -> ijt', Phi, tensor)

def inv_unitary_transform(tensor, Phi):
    return np.einsum('kt, ijt -> ijk', Phi, tensor)

In [4]:
def tsvt(tensor, Phi, tau):
    dim = tensor.shape
    X = np.zeros(dim)
    tensor = unitary_transform(tensor, Phi)
    for t in range(dim[2]):
        u, s, v = np.linalg.svd(tensor[:, :, t], full_matrices = False)
        r = len(np.where(s > tau)[0])
        if r >= 1:
            s = s[: r]
            s[: r] = s[: r] - tau
            X[:, :, t] = u[:, : r] @ np.diag(s) @ v[: r, :]
    return inv_unitary_transform(X, Phi)

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>compute_mape</code>:</b> <font color="black">Compute the value of Mean Absolute Percentage Error (MAPE).</font></li>
<li><b><code>compute_rmse</code>:</b> <font color="black">Compute the value of Root Mean Square Error (RMSE).</font></li>
</ul>
</div>

> Note that $$\mathrm{MAPE}=\frac{1}{n} \sum_{i=1}^{n} \frac{\left|y_{i}-\hat{y}_{i}\right|}{y_{i}} \times 100, \quad\mathrm{RMSE}=\sqrt{\frac{1}{n} \sum_{i=1}^{n}\left(y_{i}-\hat{y}_{i}\right)^{2}},$$ where $n$ is the total number of estimated values, and $y_i$ and $\hat{y}_i$ are the actual value and its estimation, respectively.

In [5]:
def compute_mape(var, var_hat):
    return np.sum(np.abs(var - var_hat) / var) / var.shape[0]

def compute_rmse(var, var_hat):
    return  np.sqrt(np.sum((var - var_hat) ** 2) / var.shape[0])

The main idea behind LATC-imputer is to approximate partially observed data with both low-rank structure and time series dynamics. The following `imputer` kernel includes some necessary inputs:

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>dense_tensor</code>:</b> <font color="black">This is an input which has the ground truth for validation. If this input is not available, you could use <code>dense_tensor = sparse_tensor.copy()</code> instead.</font></li>
<li><b><code>sparse_tensor</code>:</b> <font color="black">This is a partially observed tensor which has many missing entries.</font></li>
<li><b><code>time_lags</code>:</b> <font color="black">Time lags, e.g., <code>time_lags = np.array([1, 2, 3])</code>. </font></li>
<li><b><code>alpha</code>:</b> <font color="black">Weights for tensors' nuclear norm, e.g., <code>alpha = np.ones(3) / 3</code>. </font></li>
<li><b><code>rho</code>:</b> <font color="black">Learning rate for ADMM, e.g., <code>rho = 0.0005</code>. </font></li>
<li><b><code>lambda0</code>:</b> <font color="black">Weight for time series regressor, e.g., <code>lambda0 = 5 * rho</code>. If <code>lambda0 = 0</code>, then this imputer is actually a standard low-rank tensor completion (i.e., High-accuracy Low-Rank Tensor Completion, or HaLRTC).</font></li>
<li><b><code>epsilon</code>:</b> <font color="black">Stop criteria, e.g., <code>epsilon = 0.001</code>. </font></li>
<li><b><code>maxiter</code>:</b> <font color="black">Maximum iteration to stop algorithm, e.g., <code>maxiter = 50</code>. </font></li>
</ul>
</div>


In [6]:
def imputer(dense_tensor, sparse_tensor, time_lags, rho0, lambda0, epsilon, maxiter):
    """Low-Tubal-Rank Autoregressive Tensor Completion, LATC-Tubal-imputer."""
    dim = np.array(sparse_tensor.shape)
    dim_time = np.int(np.prod(dim) / dim[0])
    d = len(time_lags)
    max_lag = np.max(time_lags)
    sparse_mat = ten2mat(sparse_tensor, 0)
    pos_missing = np.where(sparse_mat == 0)
    pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
    
    T = np.zeros(dim)                         # \boldsymbol{\mathcal{T}}
    Z = sparse_mat.copy()                     # \boldsymbol{Z}
    Z[pos_missing] = np.mean(sparse_mat[sparse_mat != 0])
    A = 0.001 * np.random.rand(dim[0], d)     # \boldsymbol{A}
    it = 0
    ind = np.zeros((d, dim_time - max_lag), dtype = np.int_)
    for i in range(d):
        ind[i, :] = np.arange(max_lag - time_lags[i], dim_time - time_lags[i])
    last_mat = sparse_mat.copy()
    snorm = np.linalg.norm(sparse_mat, 'fro')
    rho = rho0
    temp1 = ten2mat(mat2ten(Z, dim, 0), 2)
    _, Phi = np.linalg.eig(temp1 @ temp1.T)
    del temp1
    if (dim_time > 5e3) and (dim_time <= 1e4):
        sample_rate = 0.2
    elif dim_time > 1e4:
        sample_rate = 0.1
    while True:
        rho = min(rho * 1.05, 1e5)
        X = tsvt(mat2ten(Z, dim, 0) - T / rho, Phi, 1 / rho)
        mat_hat = ten2mat(X, 0)
        mat0 = np.zeros((dim[0], dim_time - max_lag))
        temp2 = ten2mat(rho * X + T, 0)
        if lambda0 > 0:
            if dim_time <= 5e3:
                for m in range(dim[0]):
                    Qm = mat_hat[m, ind].T
                    A[m, :] = np.linalg.pinv(Qm) @ Z[m, max_lag :]
                    mat0[m, :] = Qm @ A[m, :]
            elif dim_time > 5e3:
                for m in range(dim[0]):
                    idx = np.arange(0, dim_time - max_lag)
                    np.random.shuffle(idx)
                    idx = idx[: int(sample_rate * (dim_time - max_lag))]
                    Qm = mat_hat[m, ind].T
                    A[m, :] = np.linalg.pinv(Qm[idx[:], :]) @ Z[m, max_lag:][idx[:]]
                    mat0[m, :] = Qm @ A[m, :]
            Z[pos_missing] = np.append((temp2[:, : max_lag] / rho), (temp2[:, max_lag :] + lambda0 * mat0) 
                                       / (rho + lambda0), axis = 1)[pos_missing]
        else:
            Z[pos_missing] = temp2[pos_missing] / rho
        T = T + rho * (X - mat2ten(Z, dim, 0))
        tol = np.linalg.norm((mat_hat - last_mat), 'fro') / snorm
        last_mat = mat_hat.copy()
        it += 1
        if it % 10 == 0:
            temp1 = ten2mat(mat2ten(Z, dim, 0) - T / rho, 2)
            _, Phi = np.linalg.eig(temp1 @ temp1.T)
            del temp1
        if it % 1 == 0:
            print('Iter: {}'.format(it))
            print('Tolerance: {:.6}'.format(tol))
            var = dense_tensor[pos_test]
            var_hat = X[pos_test]
            print('MAPE: {:.6}'.format(compute_mape(var, var_hat)))
            print('RMSE: {:.6}'.format(compute_rmse(var, var_hat)))
            print()
        if (tol < epsilon) or (it >= maxiter):
            break

    print('Total iteration: {}'.format(it))
    print('Tolerance: {:.6}'.format(tol))
    var = dense_tensor[pos_test]
    var_hat = X[pos_test]
    print('Imputation MAPE: {:.6}'.format(compute_mape(var, var_hat)))
    print('Imputation RMSE: {:.6}'.format(compute_rmse(var, var_hat)))
    print()
    
    return X

If you want to set parameters reasonably, please use this cross validation on your data set.

### California data - 4W

We generate **random missing (RM)** values on California traffic speed data set.

In [7]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 4 * 7)

missing_rate = 0.3

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

del data, random_tensor, binary_tensor

In [8]:
import time
start = time.time()
time_lags = np.array([1, 2, 3, 4, 5, 6, 286, 287, 288, 289, 290, 291])
rho = 1e-4
lambda0 = 5 * rho
epsilon = 1e-3
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, time_lags, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes' % ((end - start)/60.0))

Iter: 1
Tolerance: 0.647327
MAPE: 0.0899981
RMSE: 6.28607

Iter: 2
Tolerance: 0.0690296
MAPE: 0.0599087
RMSE: 4.81448

Iter: 3
Tolerance: 0.0344016
MAPE: 0.0518156
RMSE: 4.401

Iter: 4
Tolerance: 0.0243698
MAPE: 0.0464286
RMSE: 4.03114

Iter: 5
Tolerance: 0.0245212
MAPE: 0.0415564
RMSE: 3.66794

Iter: 6
Tolerance: 0.0195316
MAPE: 0.0384225
RMSE: 3.42873

Iter: 7
Tolerance: 0.0183613
MAPE: 0.0351059
RMSE: 3.17804

Iter: 8
Tolerance: 0.0165536
MAPE: 0.0324769
RMSE: 2.97633

Iter: 9
Tolerance: 0.0155779
MAPE: 0.0299752
RMSE: 2.77773

Iter: 10
Tolerance: 0.0140358
MAPE: 0.0279574
RMSE: 2.61118

Iter: 11
Tolerance: 0.0315772
MAPE: 0.0318502
RMSE: 2.81174

Iter: 12
Tolerance: 0.0148439
MAPE: 0.0279472
RMSE: 2.51987

Iter: 13
Tolerance: 0.0153364
MAPE: 0.0246935
RMSE: 2.29322

Iter: 14
Tolerance: 0.010391
MAPE: 0.0233228
RMSE: 2.18711

Iter: 15
Tolerance: 0.00865738
MAPE: 0.0221988
RMSE: 2.09393

Iter: 16
Tolerance: 0.00824002
MAPE: 0.0212484
RMSE: 2.01201

Iter: 17
Tolerance: 0.00752405
MAPE

In [9]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 4 * 7)

missing_rate = 0.7

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

del data, random_tensor, binary_tensor

In [10]:
import time
start = time.time()
time_lags = np.array([1, 2, 3, 4, 5, 6, 286, 287, 288, 289, 290, 291])
rho = 1e-4
lambda0 = 5 * rho
epsilon = 1e-3
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, time_lags, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes' % ((end - start)/60.0))

Iter: 1
Tolerance: 1.49565
MAPE: 0.111673
RMSE: 7.6865

Iter: 2
Tolerance: 0.0659524
MAPE: 0.0885881
RMSE: 6.42707

Iter: 3
Tolerance: 0.0626322
MAPE: 0.072184
RMSE: 5.4087

Iter: 4
Tolerance: 0.0543781
MAPE: 0.062498
RMSE: 4.9748

Iter: 5
Tolerance: 0.0381568
MAPE: 0.0586561
RMSE: 4.87148

Iter: 6
Tolerance: 0.0274268
MAPE: 0.055733
RMSE: 4.74114

Iter: 7
Tolerance: 0.0257445
MAPE: 0.0522394
RMSE: 4.49304

Iter: 8
Tolerance: 0.0277895
MAPE: 0.0486647
RMSE: 4.1938

Iter: 9
Tolerance: 0.0279944
MAPE: 0.0458032
RMSE: 3.94779

Iter: 10
Tolerance: 0.024903
MAPE: 0.0439645
RMSE: 3.79358

Iter: 11
Tolerance: 0.0526873
MAPE: 0.0458262
RMSE: 3.85918

Iter: 12
Tolerance: 0.0203436
MAPE: 0.0454827
RMSE: 3.83438

Iter: 13
Tolerance: 0.0194672
MAPE: 0.0424796
RMSE: 3.62383

Iter: 14
Tolerance: 0.0224751
MAPE: 0.0388502
RMSE: 3.37363

Iter: 15
Tolerance: 0.0223525
MAPE: 0.03592
RMSE: 3.17719

Iter: 16
Tolerance: 0.0198079
MAPE: 0.0339244
RMSE: 3.04197

Iter: 17
Tolerance: 0.0172621
MAPE: 0.0323508


We generate **non-random missing (NM)** values on PeMS traffic speed data set. Then, we conduct the imputation experiment.

In [11]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 4 * 7)

missing_rate = 0.3

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

del data, random_matrix, binary_tensor

In [12]:
import time
start = time.time()
time_lags = np.array([1, 2, 3, 4, 5, 6, 286, 287, 288, 289, 290, 291])
rho = 1e-4
lambda0 = 0.5 * rho
epsilon = 1e-3
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, time_lags, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 1
Tolerance: 0.645668
MAPE: 0.0918986
RMSE: 6.39562

Iter: 2
Tolerance: 0.0651272
MAPE: 0.064569
RMSE: 5.04311

Iter: 3
Tolerance: 0.0340665
MAPE: 0.0568177
RMSE: 4.71683

Iter: 4
Tolerance: 0.0265416
MAPE: 0.0539692
RMSE: 4.56606

Iter: 5
Tolerance: 0.0211723
MAPE: 0.0533843
RMSE: 4.50619

Iter: 6
Tolerance: 0.0185128
MAPE: 0.0540366
RMSE: 4.49844

Iter: 7
Tolerance: 0.0168523
MAPE: 0.0551118
RMSE: 4.52062

Iter: 8
Tolerance: 0.014793
MAPE: 0.0558892
RMSE: 4.54371

Iter: 9
Tolerance: 0.0130702
MAPE: 0.0562918
RMSE: 4.55667

Iter: 10
Tolerance: 0.0116926
MAPE: 0.0564333
RMSE: 4.56024

Iter: 11
Tolerance: 0.0280666
MAPE: 0.0570605
RMSE: 4.60032

Iter: 12
Tolerance: 0.0149335
MAPE: 0.0571899
RMSE: 4.60708

Iter: 13
Tolerance: 0.0111733
MAPE: 0.0572616
RMSE: 4.6082

Iter: 14
Tolerance: 0.00834148
MAPE: 0.0574338
RMSE: 4.61597

Iter: 15
Tolerance: 0.0074744
MAPE: 0.0575274
RMSE: 4.61977

Iter: 16
Tolerance: 0.00683718
MAPE: 0.0575498
RMSE: 4.61959

Iter: 17
Tolerance: 0.00607056
MAPE

In [13]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 4 * 7)

missing_rate = 0.7

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

del data, random_matrix, binary_tensor

In [14]:
import time
start = time.time()
time_lags = np.array([1, 2, 3, 4, 5, 6, 286, 287, 288, 289, 290, 291])
rho = 1e-4
lambda0 = 0.5 * rho
epsilon = 1e-3
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, time_lags, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 1
Tolerance: 1.49254
MAPE: 0.111823
RMSE: 7.70025

Iter: 2
Tolerance: 0.0658237
MAPE: 0.0942654
RMSE: 6.62477

Iter: 3
Tolerance: 0.0633283
MAPE: 0.0814255
RMSE: 5.91185

Iter: 4
Tolerance: 0.0440083
MAPE: 0.0738461
RMSE: 5.56295

Iter: 5
Tolerance: 0.0285614
MAPE: 0.0695315
RMSE: 5.39384

Iter: 6
Tolerance: 0.0243852
MAPE: 0.0667351
RMSE: 5.28698

Iter: 7
Tolerance: 0.0218139
MAPE: 0.0650192
RMSE: 5.21544

Iter: 8
Tolerance: 0.0189064
MAPE: 0.0641718
RMSE: 5.16519

Iter: 9
Tolerance: 0.0174167
MAPE: 0.064123
RMSE: 5.13684

Iter: 10
Tolerance: 0.0169784
MAPE: 0.0646846
RMSE: 5.13167

Iter: 11
Tolerance: 0.0345356
MAPE: 0.0643317
RMSE: 5.05135

Iter: 12
Tolerance: 0.0161364
MAPE: 0.0649247
RMSE: 5.06307

Iter: 13
Tolerance: 0.0137539
MAPE: 0.0655172
RMSE: 5.07885

Iter: 14
Tolerance: 0.0121573
MAPE: 0.0659727
RMSE: 5.09141

Iter: 15
Tolerance: 0.0106798
MAPE: 0.0662878
RMSE: 5.10095

Iter: 16
Tolerance: 0.00974281
MAPE: 0.0664785
RMSE: 5.1072

Iter: 17
Tolerance: 0.00887178
MAPE: 

### California data - 8W

We generate **random missing (RM)** values on California traffic speed data set.

In [15]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-8w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 8 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 8 * 7)

missing_rate = 0.3

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

del data, random_tensor, binary_tensor

In [16]:
import time
start = time.time()
time_lags = np.array([1, 2, 3, 4, 5, 6, 286, 287, 288, 289, 290, 291])
rho = 1e-4
lambda0 = 5 * rho
epsilon = 1e-3
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, time_lags, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 1
Tolerance: 0.649777
MAPE: 0.0876833
RMSE: 6.14259

Iter: 2
Tolerance: 0.070026
MAPE: 0.0571981
RMSE: 4.71698

Iter: 3
Tolerance: 0.0310767
MAPE: 0.0510803
RMSE: 4.40944

Iter: 4
Tolerance: 0.0219511
MAPE: 0.0466115
RMSE: 4.10429

Iter: 5
Tolerance: 0.0238603
MAPE: 0.0421061
RMSE: 3.75455

Iter: 6
Tolerance: 0.0206909
MAPE: 0.0389598
RMSE: 3.49763

Iter: 7
Tolerance: 0.0181997
MAPE: 0.0360279
RMSE: 3.26706

Iter: 8
Tolerance: 0.0174
MAPE: 0.0331661
RMSE: 3.04239

Iter: 9
Tolerance: 0.016257
MAPE: 0.0306099
RMSE: 2.83431

Iter: 10
Tolerance: 0.0144016
MAPE: 0.0285893
RMSE: 2.66436

Iter: 11
Tolerance: 0.0324301
MAPE: 0.0324726
RMSE: 2.85919

Iter: 12
Tolerance: 0.0152632
MAPE: 0.0284876
RMSE: 2.56016

Iter: 13
Tolerance: 0.0155099
MAPE: 0.0251744
RMSE: 2.32883

Iter: 14
Tolerance: 0.0106503
MAPE: 0.0237285
RMSE: 2.217

Iter: 15
Tolerance: 0.00884107
MAPE: 0.0225821
RMSE: 2.12273

Iter: 16
Tolerance: 0.00823973
MAPE: 0.0216343
RMSE: 2.04153

Iter: 17
Tolerance: 0.00756475
MAPE: 0.

In [17]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-8w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 8 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 8 * 7)

missing_rate = 0.7

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

del data, random_tensor, binary_tensor

In [18]:
import time
start = time.time()
time_lags = np.array([1, 2, 3, 4, 5, 6, 286, 287, 288, 289, 290, 291])
rho = 1e-4
lambda0 = 5 * rho
epsilon = 1e-3
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, time_lags, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 1
Tolerance: 1.50177
MAPE: 0.114411
RMSE: 7.84667

Iter: 2
Tolerance: 0.0733078
MAPE: 0.0871832
RMSE: 6.27937

Iter: 3
Tolerance: 0.0690878
MAPE: 0.0683821
RMSE: 5.24177

Iter: 4
Tolerance: 0.0543815
MAPE: 0.0601464
RMSE: 4.93727

Iter: 5
Tolerance: 0.0364426
MAPE: 0.058073
RMSE: 4.91355

Iter: 6
Tolerance: 0.0251957
MAPE: 0.0557268
RMSE: 4.79312

Iter: 7
Tolerance: 0.0255973
MAPE: 0.0522814
RMSE: 4.54155

Iter: 8
Tolerance: 0.0278937
MAPE: 0.0490479
RMSE: 4.26969

Iter: 9
Tolerance: 0.0278595
MAPE: 0.046556
RMSE: 4.04071

Iter: 10
Tolerance: 0.0255182
MAPE: 0.044749
RMSE: 3.87351

Iter: 11
Tolerance: 0.0549496
MAPE: 0.0459392
RMSE: 3.91982

Iter: 12
Tolerance: 0.0229806
MAPE: 0.0463003
RMSE: 3.95228

Iter: 13
Tolerance: 0.0200112
MAPE: 0.0433907
RMSE: 3.74169

Iter: 14
Tolerance: 0.0241304
MAPE: 0.0393483
RMSE: 3.45239

Iter: 15
Tolerance: 0.024629
MAPE: 0.0360455
RMSE: 3.22315

Iter: 16
Tolerance: 0.0217436
MAPE: 0.0340846
RMSE: 3.08097

Iter: 17
Tolerance: 0.0182841
MAPE: 0.03

We generate **non-random missing (NM)** values on California traffic speed data set. Then, we conduct the imputation experiment.

In [19]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-8w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 8 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 8 * 7)

missing_rate = 0.3

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

del data, random_matrix, binary_tensor

In [20]:
import time
start = time.time()
time_lags = np.array([1, 2, 3, 4, 5, 6, 286, 287, 288, 289, 290, 291])
rho = 1e-4
lambda0 = 0.5 * rho
epsilon = 1e-3
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, time_lags, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 1
Tolerance: 0.647614
MAPE: 0.0890179
RMSE: 6.22464

Iter: 2
Tolerance: 0.0667528
MAPE: 0.0603672
RMSE: 4.8821

Iter: 3
Tolerance: 0.0309992
MAPE: 0.0541497
RMSE: 4.63106

Iter: 4
Tolerance: 0.0250838
MAPE: 0.0516975
RMSE: 4.50059

Iter: 5
Tolerance: 0.021561
MAPE: 0.0508511
RMSE: 4.42577

Iter: 6
Tolerance: 0.0185431
MAPE: 0.0515386
RMSE: 4.42017

Iter: 7
Tolerance: 0.0169387
MAPE: 0.052422
RMSE: 4.43529

Iter: 8
Tolerance: 0.0154603
MAPE: 0.0530191
RMSE: 4.44826

Iter: 9
Tolerance: 0.0132661
MAPE: 0.0533255
RMSE: 4.45664

Iter: 10
Tolerance: 0.0121098
MAPE: 0.0534091
RMSE: 4.45809

Iter: 11
Tolerance: 0.0293285
MAPE: 0.0539415
RMSE: 4.49082

Iter: 12
Tolerance: 0.0145783
MAPE: 0.0539238
RMSE: 4.49049

Iter: 13
Tolerance: 0.011676
MAPE: 0.0538161
RMSE: 4.48398

Iter: 14
Tolerance: 0.00866769
MAPE: 0.0539322
RMSE: 4.48995

Iter: 15
Tolerance: 0.00739479
MAPE: 0.0539995
RMSE: 4.49276

Iter: 16
Tolerance: 0.00690332
MAPE: 0.0540032
RMSE: 4.49179

Iter: 17
Tolerance: 0.00634713
MAPE

In [21]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-8w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 8 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 8 * 7)

missing_rate = 0.7

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

del data, random_matrix, binary_tensor

In [22]:
import time
start = time.time()
time_lags = np.array([1, 2, 3, 4, 5, 6, 286, 287, 288, 289, 290, 291])
rho = 1e-4
lambda0 = 0.5 * rho
epsilon = 1e-3
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, time_lags, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 1
Tolerance: 1.50012
MAPE: 0.114658
RMSE: 7.85768

Iter: 2
Tolerance: 0.0733968
MAPE: 0.0919281
RMSE: 6.45897

Iter: 3
Tolerance: 0.0681923
MAPE: 0.0747051
RMSE: 5.55924

Iter: 4
Tolerance: 0.0483264
MAPE: 0.0656827
RMSE: 5.19655

Iter: 5
Tolerance: 0.0327675
MAPE: 0.0622502
RMSE: 5.09503

Iter: 6
Tolerance: 0.0243698
MAPE: 0.0602758
RMSE: 5.0326

Iter: 7
Tolerance: 0.0217602
MAPE: 0.0586976
RMSE: 4.9622

Iter: 8
Tolerance: 0.0212728
MAPE: 0.0576869
RMSE: 4.89451

Iter: 9
Tolerance: 0.0212193
MAPE: 0.0576729
RMSE: 4.85462

Iter: 10
Tolerance: 0.0190769
MAPE: 0.058626
RMSE: 4.85156

Iter: 11
Tolerance: 0.0365553
MAPE: 0.0587581
RMSE: 4.78112

Iter: 12
Tolerance: 0.0175795
MAPE: 0.0598376
RMSE: 4.81113

Iter: 13
Tolerance: 0.0145984
MAPE: 0.0605057
RMSE: 4.83005

Iter: 14
Tolerance: 0.0130442
MAPE: 0.0608484
RMSE: 4.83788

Iter: 15
Tolerance: 0.0118377
MAPE: 0.0610011
RMSE: 4.84133

Iter: 16
Tolerance: 0.0106785
MAPE: 0.0610421
RMSE: 4.84307

Iter: 17
Tolerance: 0.00955053
MAPE: 0.

### License

<div class="alert alert-block alert-danger">
<b>This work is released under the MIT license.</b>
</div>