## Low-Tubal-Rank Autoregressive Tensor Completion Imputer (LATC-Tubal-imputer)

This notebook shows how to implement a LATC imputer on some real-world data sets (e.g., PeMS traffic speed data, Guangzhou traffic speed data). To overcome the problem of missing values within multivariate time series data, this method takes into account both low-rank structure and time series regression. For an in-depth discussion of LATC-imputer, please see [1].

<div class="alert alert-block alert-info">
<font color="black">
<b>[1]</b> Author Name (2020). <b>Low-Rank Autorgressive Tensor Completion for Multivariate Time Series Forecasting</b>. arXiv.2006. <a href="xx" title="PDF"><b>[PDF]</b></a> 
</font>
</div>


In [1]:
import numpy as np
from numpy.linalg import inv as inv

### Define LATC-imputer kernel

We start by introducing some necessary functions that relies on `Numpy`.

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>ten2mat</code>:</b> <font color="black">Unfold tensor as matrix by specifying mode.</font></li>
<li><b><code>mat2ten</code>:</b> <font color="black">Fold matrix as tensor by specifying dimension (i.e, tensor size) and mode.</font></li>
<li><b><code>svt</code>:</b> <font color="black">Implement the process of Singular Value Thresholding (SVT).</font></li>
</ul>
</div>

In [2]:
def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')

def mat2ten(mat, dim, mode):
    index = list()
    index.append(mode)
    for i in range(dim.shape[0]):
        if i != mode:
            index.append(i)
    return np.moveaxis(np.reshape(mat, list(dim[index]), order = 'F'), 0, mode)

In [3]:
from scipy.fftpack import dctn, idctn

def tsvt(tensor, tau):
    dim = tensor.shape
    X = np.zeros(dim)
    tensor = dctn(tensor, axes = (2,), norm = 'ortho')
    for t in range(dim[2]):
        u, s, v = np.linalg.svd(tensor[:, :, t], full_matrices = False)
        r = len(np.where(s > tau)[0])
        if r >= 1:
            s = s[: r]
            s[: r] = s[: r] - tau
            X[:, :, t] = u[:, : r] @ np.diag(s) @ v[: r, :]
    return idctn(X, axes = (2,), norm = 'ortho')

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>compute_mape</code>:</b> <font color="black">Compute the value of Mean Absolute Percentage Error (MAPE).</font></li>
<li><b><code>compute_rmse</code>:</b> <font color="black">Compute the value of Root Mean Square Error (RMSE).</font></li>
</ul>
</div>

> Note that $$\mathrm{MAPE}=\frac{1}{n} \sum_{i=1}^{n} \frac{\left|y_{i}-\hat{y}_{i}\right|}{y_{i}} \times 100, \quad\mathrm{RMSE}=\sqrt{\frac{1}{n} \sum_{i=1}^{n}\left(y_{i}-\hat{y}_{i}\right)^{2}},$$ where $n$ is the total number of estimated values, and $y_i$ and $\hat{y}_i$ are the actual value and its estimation, respectively.

In [4]:
def compute_mape(var, var_hat):
    return np.sum(np.abs(var - var_hat) / var) / var.shape[0]

def compute_rmse(var, var_hat):
    return  np.sqrt(np.sum((var - var_hat) ** 2) / var.shape[0])

The main idea behind LATC-imputer is to approximate partially observed data with both low-rank structure and time series dynamics. The following `imputer` kernel includes some necessary inputs:

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>dense_tensor</code>:</b> <font color="black">This is an input which has the ground truth for validation. If this input is not available, you could use <code>dense_tensor = sparse_tensor.copy()</code> instead.</font></li>
<li><b><code>sparse_tensor</code>:</b> <font color="black">This is a partially observed tensor which has many missing entries.</font></li>
<li><b><code>time_lags</code>:</b> <font color="black">Time lags, e.g., <code>time_lags = np.array([1, 2, 3])</code>. </font></li>
<li><b><code>alpha</code>:</b> <font color="black">Weights for tensors' nuclear norm, e.g., <code>alpha = np.ones(3) / 3</code>. </font></li>
<li><b><code>rho</code>:</b> <font color="black">Learning rate for ADMM, e.g., <code>rho = 0.0005</code>. </font></li>
<li><b><code>lambda0</code>:</b> <font color="black">Weight for time series regressor, e.g., <code>lambda0 = 5 * rho</code>. If <code>lambda0 = 0</code>, then this imputer is actually a standard low-rank tensor completion (i.e., High-accuracy Low-Rank Tensor Completion, or HaLRTC).</font></li>
<li><b><code>epsilon</code>:</b> <font color="black">Stop criteria, e.g., <code>epsilon = 0.001</code>. </font></li>
<li><b><code>maxiter</code>:</b> <font color="black">Maximum iteration to stop algorithm, e.g., <code>maxiter = 50</code>. </font></li>
</ul>
</div>


In [5]:
def imputer(dense_tensor, sparse_tensor, time_lags, rho0, lambda0, epsilon, maxiter):
    """Low-Tubal-Rank Autoregressive Tensor Completion, LATC-Tubal-imputer."""
    dim = np.array(sparse_tensor.shape)
    dim_time = np.int(np.prod(dim) / dim[0])
    d = len(time_lags)
    max_lag = np.max(time_lags)
    sparse_mat = ten2mat(sparse_tensor, 0)
    pos_missing = np.where(sparse_mat == 0)
    pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
    
    T = np.zeros(dim)                         # \boldsymbol{\mathcal{T}}
    Z = sparse_mat.copy()                     # \boldsymbol{Z}
    Z[pos_missing] = np.mean(sparse_mat[sparse_mat != 0])
    A = 0.001 * np.random.rand(dim[0], d)     # \boldsymbol{A}
    it = 0
    ind = np.zeros((d, dim_time - max_lag), dtype = np.int_)
    for i in range(d):
        ind[i, :] = np.arange(max_lag - time_lags[i], dim_time - time_lags[i])
    last_mat = sparse_mat.copy()
    snorm = np.linalg.norm(sparse_mat, 'fro')
    rho = rho0
    if (dim_time > 5e3) and (dim_time <= 1e4):
        sample_rate = 0.2
    elif dim_time > 1e4:
        sample_rate = 0.1
    while True:
        rho = min(rho * 1.05, 1e5)
        X = tsvt(mat2ten(Z, dim, 0) - T / rho, 1 / rho)
        mat_hat = ten2mat(X, 0)
        mat0 = np.zeros((dim[0], dim_time - max_lag))
        temp2 = ten2mat(rho * X + T, 0)
        if lambda0 > 0:
            if dim_time <= 5e3:
                for m in range(dim[0]):
                    Qm = mat_hat[m, ind].T
                    A[m, :] = np.linalg.pinv(Qm) @ Z[m, max_lag :]
                    mat0[m, :] = Qm @ A[m, :]
            elif dim_time > 5e3:
                for m in range(dim[0]):
                    idx = np.arange(0, dim_time - max_lag)
                    np.random.shuffle(idx)
                    idx = idx[: int(sample_rate * (dim_time - max_lag))]
                    Qm = mat_hat[m, ind].T
                    A[m, :] = np.linalg.pinv(Qm[idx[:], :]) @ Z[m, max_lag:][idx[:]]
                    mat0[m, :] = Qm @ A[m, :]
            Z[pos_missing] = np.append((temp2[:, : max_lag] / rho), (temp2[:, max_lag :] + lambda0 * mat0) 
                                       / (rho + lambda0), axis = 1)[pos_missing]
        else:
            Z[pos_missing] = temp2[pos_missing] / rho
        T = T + rho * (X - mat2ten(Z, dim, 0))
        tol = np.linalg.norm((mat_hat - last_mat), 'fro') / snorm
        last_mat = mat_hat.copy()
        it += 1
        if it % 1 == 0:
            print('Iter: {}'.format(it))
            print('Tolerance: {:.6}'.format(tol))
            var = dense_tensor[pos_test]
            var_hat = X[pos_test]
            print('MAPE: {:.6}'.format(compute_mape(var, var_hat)))
            print('RMSE: {:.6}'.format(compute_rmse(var, var_hat)))
            print()
        if (tol < epsilon) or (it >= maxiter):
            break

    print('Total iteration: {}'.format(it))
    print('Tolerance: {:.6}'.format(tol))
    var = dense_tensor[pos_test]
    var_hat = X[pos_test]
    print('Imputation MAPE: {:.6}'.format(compute_mape(var, var_hat)))
    print('Imputation RMSE: {:.6}'.format(compute_rmse(var, var_hat)))
    print()
    
    return X

If you want to set parameters reasonably, please use this cross validation on your data set.

### California data - 4W

We generate **random missing (RM)** values on California traffic speed data set.

In [6]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('F:/PeMS/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 4 * 7)

missing_rate = 0.3

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_tensor, binary_tensor

In [7]:
import time
start = time.time()
time_lags = np.array([1, 2, 3, 4, 5, 6, 286, 287, 288, 289, 290, 291])
rho = 1e-4
lambda0 = 5 * rho
epsilon = 1e-3
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, time_lags, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes' % ((end - start)/60.0))

Iter: 1
Tolerance: 0.647975
MAPE: 0.0944086
RMSE: 6.5952

Iter: 2
Tolerance: 0.0679353
MAPE: 0.0638386
RMSE: 5.10411

Iter: 3
Tolerance: 0.0399947
MAPE: 0.0540608
RMSE: 4.54673

Iter: 4
Tolerance: 0.0295381
MAPE: 0.0471602
RMSE: 4.05591

Iter: 5
Tolerance: 0.0243795
MAPE: 0.0425867
RMSE: 3.71311

Iter: 6
Tolerance: 0.0203427
MAPE: 0.0389105
RMSE: 3.45172

Iter: 7
Tolerance: 0.0183293
MAPE: 0.0357975
RMSE: 3.21849

Iter: 8
Tolerance: 0.0164267
MAPE: 0.0331746
RMSE: 3.01535

Iter: 9
Tolerance: 0.015533
MAPE: 0.0306994
RMSE: 2.81998

Iter: 10
Tolerance: 0.0145263
MAPE: 0.0285015
RMSE: 2.64195

Iter: 11
Tolerance: 0.0129102
MAPE: 0.0266796
RMSE: 2.49219

Iter: 12
Tolerance: 0.0116661
MAPE: 0.0251317
RMSE: 2.36429

Iter: 13
Tolerance: 0.0102477
MAPE: 0.0239025
RMSE: 2.25907

Iter: 14
Tolerance: 0.00918486
MAPE: 0.022897
RMSE: 2.16989

Iter: 15
Tolerance: 0.0084404
MAPE: 0.0219867
RMSE: 2.08824

Iter: 16
Tolerance: 0.00807166
MAPE: 0.0211645
RMSE: 2.01276

Iter: 17
Tolerance: 0.0074147
MAPE:

In [8]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('F:/PeMS/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 4 * 7)

missing_rate = 0.7

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_tensor, binary_tensor

In [9]:
import time
start = time.time()
time_lags = np.array([1, 2, 3, 4, 5, 6, 286, 287, 288, 289, 290, 291])
rho = 1e-4
lambda0 = 5 * rho
epsilon = 1e-3
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, time_lags, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes' % ((end - start)/60.0))

Iter: 1
Tolerance: 1.49571
MAPE: 0.113089
RMSE: 7.76575

Iter: 2
Tolerance: 0.0647284
MAPE: 0.0920512
RMSE: 6.65963

Iter: 3
Tolerance: 0.0591993
MAPE: 0.0776198
RMSE: 5.81298

Iter: 4
Tolerance: 0.0546168
MAPE: 0.0669306
RMSE: 5.29258

Iter: 5
Tolerance: 0.0442553
MAPE: 0.0609046
RMSE: 5.03258

Iter: 6
Tolerance: 0.0321335
MAPE: 0.0577185
RMSE: 4.86737

Iter: 7
Tolerance: 0.0291536
MAPE: 0.054056
RMSE: 4.58559

Iter: 8
Tolerance: 0.0305637
MAPE: 0.0501129
RMSE: 4.26418

Iter: 9
Tolerance: 0.0289928
MAPE: 0.0472448
RMSE: 4.02719

Iter: 10
Tolerance: 0.0252492
MAPE: 0.0451126
RMSE: 3.85483

Iter: 11
Tolerance: 0.0222237
MAPE: 0.0430661
RMSE: 3.70661

Iter: 12
Tolerance: 0.0205093
MAPE: 0.0408726
RMSE: 3.55653

Iter: 13
Tolerance: 0.0199107
MAPE: 0.0386989
RMSE: 3.40893

Iter: 14
Tolerance: 0.0190647
MAPE: 0.0368012
RMSE: 3.27759

Iter: 15
Tolerance: 0.0180378
MAPE: 0.0350949
RMSE: 3.1562

Iter: 16
Tolerance: 0.0172212
MAPE: 0.0335058
RMSE: 3.03997

Iter: 17
Tolerance: 0.0165435
MAPE: 0.

We generate **non-random missing (NM)** values on Guangzhou traffic speed data set. Then, we conduct the imputation experiment.

In [10]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('F:/PeMS/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 4 * 7)

missing_rate = 0.3

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_matrix, binary_tensor

In [11]:
import time
start = time.time()
time_lags = np.array([1, 2, 3, 4, 5, 6, 286, 287, 288, 289, 290, 291])
rho = 1e-4
lambda0 = 0.5 * rho
epsilon = 1e-3
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, time_lags, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 1
Tolerance: 0.646364
MAPE: 0.0962517
RMSE: 6.69797

Iter: 2
Tolerance: 0.0648808
MAPE: 0.0700553
RMSE: 5.45557

Iter: 3
Tolerance: 0.0389321
MAPE: 0.0624619
RMSE: 5.11984

Iter: 4
Tolerance: 0.0255076
MAPE: 0.0602923
RMSE: 4.99137

Iter: 5
Tolerance: 0.0208353
MAPE: 0.0599435
RMSE: 4.92852

Iter: 6
Tolerance: 0.018834
MAPE: 0.0605519
RMSE: 4.91958

Iter: 7
Tolerance: 0.0157746
MAPE: 0.0614398
RMSE: 4.93784

Iter: 8
Tolerance: 0.0151608
MAPE: 0.0620163
RMSE: 4.95196

Iter: 9
Tolerance: 0.0135707
MAPE: 0.0623529
RMSE: 4.9612

Iter: 10
Tolerance: 0.0120095
MAPE: 0.0625017
RMSE: 4.96423

Iter: 11
Tolerance: 0.0103715
MAPE: 0.0625537
RMSE: 4.96458

Iter: 12
Tolerance: 0.00907752
MAPE: 0.0625375
RMSE: 4.96239

Iter: 13
Tolerance: 0.00824932
MAPE: 0.0624952
RMSE: 4.95941

Iter: 14
Tolerance: 0.00782988
MAPE: 0.0624567
RMSE: 4.95668

Iter: 15
Tolerance: 0.007424
MAPE: 0.0624259
RMSE: 4.95471

Iter: 16
Tolerance: 0.00669401
MAPE: 0.0624023
RMSE: 4.9529

Iter: 17
Tolerance: 0.00616104
MAP

In [12]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('F:/PeMS/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 4 * 7)

missing_rate = 0.7

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_matrix, binary_tensor

In [13]:
import time
start = time.time()
time_lags = np.array([1, 2, 3, 4, 5, 6, 286, 287, 288, 289, 290, 291])
rho = 1e-4
lambda0 = 0.5 * rho
epsilon = 1e-3
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, time_lags, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 1
Tolerance: 1.4926
MAPE: 0.113239
RMSE: 7.77958

Iter: 2
Tolerance: 0.0634211
MAPE: 0.0966828
RMSE: 6.78675

Iter: 3
Tolerance: 0.0684241
MAPE: 0.0847372
RMSE: 6.16802

Iter: 4
Tolerance: 0.0443705
MAPE: 0.0781492
RMSE: 5.9053

Iter: 5
Tolerance: 0.0301521
MAPE: 0.0745838
RMSE: 5.79075

Iter: 6
Tolerance: 0.0226262
MAPE: 0.0723058
RMSE: 5.71328

Iter: 7
Tolerance: 0.0216501
MAPE: 0.0709006
RMSE: 5.64807

Iter: 8
Tolerance: 0.019948
MAPE: 0.0703148
RMSE: 5.59668

Iter: 9
Tolerance: 0.0186526
MAPE: 0.0705044
RMSE: 5.57024

Iter: 10
Tolerance: 0.017631
MAPE: 0.0711691
RMSE: 5.56546

Iter: 11
Tolerance: 0.0158797
MAPE: 0.0720321
RMSE: 5.57634

Iter: 12
Tolerance: 0.0145892
MAPE: 0.0728361
RMSE: 5.59397

Iter: 13
Tolerance: 0.0128383
MAPE: 0.0734539
RMSE: 5.61141

Iter: 14
Tolerance: 0.0117443
MAPE: 0.0738465
RMSE: 5.62444

Iter: 15
Tolerance: 0.0108161
MAPE: 0.0740458
RMSE: 5.63181

Iter: 16
Tolerance: 0.00987681
MAPE: 0.0740936
RMSE: 5.6336

Iter: 17
Tolerance: 0.00885359
MAPE: 0.0

### California data - 8W

We generate **random missing (RM)** values on California traffic speed data set.

In [14]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('F:/PeMS/California-data-set/pems-8w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 8 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 8 * 7)

missing_rate = 0.3

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_tensor, binary_tensor

We use `imputer` to fill in the missing entries and measure performance metrics on the ground truth.

In [15]:
import time
start = time.time()
time_lags = np.array([1, 2, 3, 4, 5, 6, 286, 287, 288, 289, 290, 291])
rho = 1e-4
lambda0 = 5 * rho
epsilon = 1e-3
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, time_lags, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 1
Tolerance: 0.650439
MAPE: 0.0922593
RMSE: 6.47367

Iter: 2
Tolerance: 0.0715977
MAPE: 0.0611626
RMSE: 4.99684

Iter: 3
Tolerance: 0.0350422
MAPE: 0.0531356
RMSE: 4.54456

Iter: 4
Tolerance: 0.0263502
MAPE: 0.0477805
RMSE: 4.14968

Iter: 5
Tolerance: 0.0252656
MAPE: 0.0429635
RMSE: 3.7791

Iter: 6
Tolerance: 0.0205193
MAPE: 0.0396681
RMSE: 3.52896

Iter: 7
Tolerance: 0.018556
MAPE: 0.0364488
RMSE: 3.28614

Iter: 8
Tolerance: 0.0172542
MAPE: 0.0336325
RMSE: 3.07082

Iter: 9
Tolerance: 0.0156939
MAPE: 0.0312252
RMSE: 2.87596

Iter: 10
Tolerance: 0.0148986
MAPE: 0.0290014
RMSE: 2.69139

Iter: 11
Tolerance: 0.0132
MAPE: 0.0271945
RMSE: 2.53815

Iter: 12
Tolerance: 0.0119333
MAPE: 0.0256399
RMSE: 2.40611

Iter: 13
Tolerance: 0.0106391
MAPE: 0.0243231
RMSE: 2.29332

Iter: 14
Tolerance: 0.0095974
MAPE: 0.0232523
RMSE: 2.19823

Iter: 15
Tolerance: 0.00867217
MAPE: 0.0223415
RMSE: 2.11507

Iter: 16
Tolerance: 0.00811921
MAPE: 0.0215108
RMSE: 2.03924

Iter: 17
Tolerance: 0.00760742
MAPE: 

In [16]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('F:/PeMS/California-data-set/pems-8w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 8 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 8 * 7)

missing_rate = 0.7

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_tensor, binary_tensor

In [17]:
import time
start = time.time()
time_lags = np.array([1, 2, 3, 4, 5, 6, 286, 287, 288, 289, 290, 291])
rho = 1e-4
lambda0 = 5 * rho
epsilon = 1e-3
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, time_lags, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 1
Tolerance: 1.50181
MAPE: 0.115507
RMSE: 7.90727

Iter: 2
Tolerance: 0.0701509
MAPE: 0.0913721
RMSE: 6.58161

Iter: 3
Tolerance: 0.0667147
MAPE: 0.0733852
RMSE: 5.59095

Iter: 4
Tolerance: 0.0569729
MAPE: 0.063722
RMSE: 5.1699

Iter: 5
Tolerance: 0.0400148
MAPE: 0.0606872
RMSE: 5.07977

Iter: 6
Tolerance: 0.0299589
MAPE: 0.0576342
RMSE: 4.89819

Iter: 7
Tolerance: 0.0293981
MAPE: 0.0538754
RMSE: 4.6161

Iter: 8
Tolerance: 0.0302902
MAPE: 0.0503293
RMSE: 4.32357

Iter: 9
Tolerance: 0.0291948
MAPE: 0.047626
RMSE: 4.08616

Iter: 10
Tolerance: 0.0261618
MAPE: 0.0457268
RMSE: 3.9166

Iter: 11
Tolerance: 0.0226413
MAPE: 0.0439465
RMSE: 3.7774

Iter: 12
Tolerance: 0.0207613
MAPE: 0.0417889
RMSE: 3.62949

Iter: 13
Tolerance: 0.0204608
MAPE: 0.0394228
RMSE: 3.47336

Iter: 14
Tolerance: 0.0200625
MAPE: 0.0372693
RMSE: 3.32964

Iter: 15
Tolerance: 0.0190464
MAPE: 0.0354865
RMSE: 3.20511

Iter: 16
Tolerance: 0.0178856
MAPE: 0.0339559
RMSE: 3.09056

Iter: 17
Tolerance: 0.0171258
MAPE: 0.0325

We generate **non-random missing (NM)** values on California traffic speed data set. Then, we conduct the imputation experiment.

In [18]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('F:/PeMS/California-data-set/pems-8w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 8 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 8 * 7)

missing_rate = 0.3

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_matrix, binary_tensor

In [19]:
import time
start = time.time()
time_lags = np.array([1, 2, 3, 4, 5, 6, 286, 287, 288, 289, 290, 291])
rho = 1e-4
lambda0 = 0.5 * rho
epsilon = 1e-3
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, time_lags, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 1
Tolerance: 0.648323
MAPE: 0.0936155
RMSE: 6.55669

Iter: 2
Tolerance: 0.0679914
MAPE: 0.0649966
RMSE: 5.21787

Iter: 3
Tolerance: 0.0358925
MAPE: 0.0580778
RMSE: 4.92035

Iter: 4
Tolerance: 0.0260883
MAPE: 0.0556847
RMSE: 4.77966

Iter: 5
Tolerance: 0.022088
MAPE: 0.05495
RMSE: 4.69992

Iter: 6
Tolerance: 0.0188376
MAPE: 0.0555414
RMSE: 4.68686

Iter: 7
Tolerance: 0.0168084
MAPE: 0.0563209
RMSE: 4.69612

Iter: 8
Tolerance: 0.0155689
MAPE: 0.0568531
RMSE: 4.7071

Iter: 9
Tolerance: 0.0140152
MAPE: 0.057172
RMSE: 4.71478

Iter: 10
Tolerance: 0.0122906
MAPE: 0.0573
RMSE: 4.71668

Iter: 11
Tolerance: 0.010718
MAPE: 0.0573203
RMSE: 4.7154

Iter: 12
Tolerance: 0.00966136
MAPE: 0.0572778
RMSE: 4.71234

Iter: 13
Tolerance: 0.00850409
MAPE: 0.0572259
RMSE: 4.70946

Iter: 14
Tolerance: 0.00806491
MAPE: 0.057195
RMSE: 4.7076

Iter: 15
Tolerance: 0.00736355
MAPE: 0.0571559
RMSE: 4.70506

Iter: 16
Tolerance: 0.0067228
MAPE: 0.0571326
RMSE: 4.70335

Iter: 17
Tolerance: 0.00641414
MAPE: 0.057

In [20]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('F:/PeMS/California-data-set/pems-8w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 8 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 8 * 7)

missing_rate = 0.7

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_matrix, binary_tensor

In [21]:
import time
start = time.time()
time_lags = np.array([1, 2, 3, 4, 5, 6, 286, 287, 288, 289, 290, 291])
rho = 1e-4
lambda0 = 0.5 * rho
epsilon = 1e-3
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, time_lags, rho, lambda0, epsilon, maxiter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 1
Tolerance: 1.50017
MAPE: 0.115751
RMSE: 7.91847

Iter: 2
Tolerance: 0.0694456
MAPE: 0.0947315
RMSE: 6.66189

Iter: 3
Tolerance: 0.0720332
MAPE: 0.0785047
RMSE: 5.84366

Iter: 4
Tolerance: 0.0473167
MAPE: 0.0702011
RMSE: 5.51556

Iter: 5
Tolerance: 0.0333616
MAPE: 0.0668626
RMSE: 5.41358

Iter: 6
Tolerance: 0.0238857
MAPE: 0.064609
RMSE: 5.32947

Iter: 7
Tolerance: 0.0228198
MAPE: 0.0628356
RMSE: 5.23838

Iter: 8
Tolerance: 0.0216481
MAPE: 0.0619071
RMSE: 5.16358

Iter: 9
Tolerance: 0.0209039
MAPE: 0.0620874
RMSE: 5.12754

Iter: 10
Tolerance: 0.019037
MAPE: 0.0630599
RMSE: 5.12777

Iter: 11
Tolerance: 0.0168624
MAPE: 0.0642418
RMSE: 5.14778

Iter: 12
Tolerance: 0.0153232
MAPE: 0.065241
RMSE: 5.17241

Iter: 13
Tolerance: 0.0138611
MAPE: 0.0659242
RMSE: 5.19321

Iter: 14
Tolerance: 0.0128146
MAPE: 0.0663108
RMSE: 5.20723

Iter: 15
Tolerance: 0.0113424
MAPE: 0.0664713
RMSE: 5.21451

Iter: 16
Tolerance: 0.0103315
MAPE: 0.0664708
RMSE: 5.21569

Iter: 17
Tolerance: 0.0093422
MAPE: 0.0

### License

<div class="alert alert-block alert-danger">
<b>This work is released under the MIT license.</b>
</div>