# About this Notebook

This notebook mainly implements a Low-Rank Tensor Completion (LRTC) model with Truncated Nuclear Norm (TNN) regularization.


## Quick Run

This notebook is publicly available for any usage at our data imputation project. Please click [**transdim - GitHub**](https://github.com/xinychen/transdim).


## Low-Rank Tensor Completion

We start by importing the necessary dependencies.

In [2]:
import numpy as np
from numpy.linalg import inv as inv

### Tensor Unfolding (`ten2mat`) and Matrix Folding (`mat2ten`)

Using numpy reshape to perform 3rd rank tensor unfold operation. [[**link**](https://stackoverflow.com/questions/49970141/using-numpy-reshape-to-perform-3rd-rank-tensor-unfold-operation)]

In [3]:
def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')

In [4]:
X = np.array([[[1, 2, 3, 4], [3, 4, 5, 6]], 
              [[5, 6, 7, 8], [7, 8, 9, 10]], 
              [[9, 10, 11, 12], [11, 12, 13, 14]]])
print('tensor size:')
print(X.shape)
print('original tensor:')
print(X)
print()
print('(1) mode-1 tensor unfolding:')
print(ten2mat(X, 0))
print()
print('(2) mode-2 tensor unfolding:')
print(ten2mat(X, 1))
print()
print('(3) mode-3 tensor unfolding:')
print(ten2mat(X, 2))

tensor size:
(3, 2, 4)
original tensor:
[[[ 1  2  3  4]
  [ 3  4  5  6]]

 [[ 5  6  7  8]
  [ 7  8  9 10]]

 [[ 9 10 11 12]
  [11 12 13 14]]]

(1) mode-1 tensor unfolding:
[[ 1  3  2  4  3  5  4  6]
 [ 5  7  6  8  7  9  8 10]
 [ 9 11 10 12 11 13 12 14]]

(2) mode-2 tensor unfolding:
[[ 1  5  9  2  6 10  3  7 11  4  8 12]
 [ 3  7 11  4  8 12  5  9 13  6 10 14]]

(3) mode-3 tensor unfolding:
[[ 1  5  9  3  7 11]
 [ 2  6 10  4  8 12]
 [ 3  7 11  5  9 13]
 [ 4  8 12  6 10 14]]


In [5]:
def mat2ten(mat, tensor_size, mode):
    index = list()
    index.append(mode)
    for i in range(tensor_size.shape[0]):
        if i != mode:
            index.append(i)
    return np.moveaxis(np.reshape(mat, list(tensor_size[index]), order = 'F'), 0, mode)

### Singular Value Thresholding (SVT) for TNN

In [5]:
def svt_tnn(mat, alpha, rho, theta):
    """This is a Numpy dependent singular value thresholding (SVT) process."""
    u, s, v = np.linalg.svd(mat, full_matrices = 0)
    vec = s.copy()
    vec[theta :] = s[theta :] - alpha / rho
    vec[vec < 0] = 0
    return np.matmul(np.matmul(u, np.diag(vec)), v)

**Potential alternative for this**:

If you prefer, there is a competitively efficient singular value thresholding (SVT) process which uses the randomized SVD (`randomized_svd`) of `sklearn`.

In [15]:
from sklearn.utils.extmath import randomized_svd

In [16]:
def svt_tnn(mat, alpha, rho, theta):
    """This is a sklearn dependent singular value thresholding (SVT) process."""
    u, s, v = randomized_svd(mat, n_components = min(mat.shape[0], mat.shape[1]), n_iter = 1)
    vec = s.copy()
    vec[theta :] = s[theta :] - alpha / rho
    vec[vec < 0] = 0
    return np.matmul(np.matmul(u, np.diag(vec)), v)

**Understanding these codes**:

- **`line 1`**: Necessary inputs including any input matrix $\boldsymbol{X}$, weight of Truncated Nuclear Norm (TNN) regularization $\alpha$, learning rate $\rho$, and positive integer number $\theta$ for nuclear norm truncation.

- **`line 2`**: Compute the Singular Value Decomposition (SVD) for any matrix $\boldsymbol{X}$ with `numpy.linalg.svd` (i.e., SVD function in `Numpy`'s linear algebra package).

- **`line 3-5`**: Truncate singular values $\sigma_{\theta+1},...$ with the following rule:

\begin{equation}
\sigma_{i}=\left[\sigma_{i}(\boldsymbol{X})-\frac{\alpha}{\rho}\right]_{+}.
\end{equation}

- **`line 6`**: Return the resulted matrix.

### Define Performance Metrics

- **RMSE**
- **MAPE**

In [17]:
def Compute_RMSE(var, var_hat):
    return np.sqrt(np.sum((var - var_hat) ** 2) / var.shape[0])

In [18]:
def Compute_MAPE(var, var_hat):
    return np.sum(np.abs(var - var_hat) / var) / var.shape[0]

### Define LRTC-TNN Function with `Numpy`

In [19]:
def LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter):
    """Low-Rank Tenor Completion with Truncated Nuclear Norm, LRTC-TNN."""
    
    dim = np.array(sparse_tensor.shape)
    pos_train = np.where(sparse_tensor != 0)
    pos_missing = np.where(sparse_tensor == 0)
    pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
    
    X = np.zeros(np.insert(dim, 0, len(dim))) # \boldsymbol{\mathcal{X}}
    T = np.zeros(np.insert(dim, 0, len(dim))) # \boldsymbol{\mathcal{T}}
    Z = sparse_tensor.copy()
    for it in range(maxiter):
        for k in range(len(dim)):
            X[k] = mat2ten(svt_tnn(ten2mat(Z - T[k] / rho, k), alpha[k], rho, np.int(np.ceil(theta * dim[k]))), dim, k)
        Z[pos_missing] = np.mean(X + T / rho, axis = 0)[pos_missing]
        T = T + rho * (X - np.broadcast_to(Z, np.insert(dim, 0, len(dim))))
        tensor_hat = np.einsum('k, kmnt -> mnt', alpha, X)
        if (it + 1) % 50 == 0:
            print('Iter: {}'.format(it + 1))
            print('RMSE: {:.6}'.format(Compute_RMSE(dense_tensor[pos_test], tensor_hat[pos_test])))
            print()

    print('Imputation MAPE: {:.6}'.format(Compute_MAPE(dense_tensor[pos_test], tensor_hat[pos_test])))
    print('Imputation RMSE: {:.6}'.format(Compute_RMSE(dense_tensor[pos_test], tensor_hat[pos_test])))
    print()
    
    return tensor_hat

**Understanding these codes**:

- **`line 18-19`**: Update $\boldsymbol{\mathcal{Z}}_{k}^{l+1},k=1,2,3$.

- **`line 20-22`**: Update $\boldsymbol{\mathcal{X}}_{k}^{l+1}$ by

\begin{equation}
\boldsymbol{\mathcal{X}}_{k}^{l+1}=\mathcal{P}_{\Omega}(\boldsymbol{\mathcal{Y}})+\mathcal{P}_{\Omega}^{\perp}\left(\boldsymbol{\mathcal{Z}}_{k}^{l+1}-\frac{1}{\rho}\boldsymbol{\mathcal{T}}_{k}^{l}\right),k=1,2,3.
\end{equation}

- **`line 23`**: Update $\boldsymbol{\mathcal{T}}_{k}^{l+1}$ by

\begin{equation}
\boldsymbol{\mathcal{T}}_{k}^{l+1}=\boldsymbol{\mathcal{T}}_{k}^{l}+\rho_k\left(\boldsymbol{\mathcal{X}}_{k}^{l+1}-\boldsymbol{\mathcal{Z}}_{k}^{l+1}\right).
\end{equation}

## Data Organization

### 1) Matrix Structure

We consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{f},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We express spatio-temporal dataset as a matrix $Y\in\mathbb{R}^{m\times f}$ with $m$ rows (e.g., locations) and $f$ columns (e.g., discrete time intervals),

$$Y=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{m1} & y_{m2} & \cdots & y_{mf} \\ \end{array} \right]\in\mathbb{R}^{m\times f}.$$

### 2) Tensor Structure

We consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{nf},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We partition each time series into intervals of predifined length $f$. We express each partitioned time series as a matrix $Y_{i}$ with $n$ rows (e.g., days) and $f$ columns (e.g., discrete time intervals per day),

$$Y_{i}=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{n1} & y_{n2} & \cdots & y_{nf} \\ \end{array} \right]\in\mathbb{R}^{n\times f},i=1,2,...,m,$$

therefore, the resulting structure is a tensor $\mathcal{Y}\in\mathbb{R}^{m\times n\times f}$.

## Missing Data Imputation

In the following, we apply the above defined TRMF function to the task of missing data imputation task on the following spatiotemporal multivariate time series datasets/matrices:

- **Guangzhou data set**: [Guangzhou urban traffic speed data set](https://doi.org/10.5281/zenodo.1205228).
- **Birmingham data set**: [Birmingham parking data set](https://archive.ics.uci.edu/ml/datasets/Parking+Birmingham).
- **Hangzhou data set**: [Hangzhou metro passenger flow data set](https://doi.org/10.5281/zenodo.3145403).
- **Settle data set**: [Seattle freeway traffic speed data set](https://github.com/zhiyongc/Seattle-Loop-Data).

The original data sets have been adapted into our experiments, and it is now available at the fold of `datasets`.

### Experiments on Guangzhou Data Set

In [9]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.2

# =============================================================================
### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [54]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.002
theta = 0.30
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 2.84958

Iter: 100
RMSE: 2.84361

Iter: 150
RMSE: 2.84324

Iter: 200
RMSE: 2.84309

Imputation MAPE: 0.0662603
Imputation RMSE: 2.84309

Running time: 127 seconds


In [31]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [32]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.002
theta = 0.30
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 7.74311

Iter: 100
RMSE: 3.10197

Iter: 150
RMSE: 3.11387

Iter: 200
RMSE: 3.11496

Imputation MAPE: 0.0722136
Imputation RMSE: 3.11496

Running time: 142 seconds


In [33]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.2

# =============================================================================
### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [37]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.002
theta = 0.05
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 6.57525

Iter: 100
RMSE: 4.21873

Iter: 150
RMSE: 3.97082

Iter: 200
RMSE: 3.97132

Imputation MAPE: 0.0938764
Imputation RMSE: 3.97132

Running time: 134 seconds


In [20]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

# =============================================================================
### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [21]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.0005
theta = 0.05
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 4.07891

Iter: 100
RMSE: 4.06736

Iter: 150
RMSE: 4.06617

Iter: 200
RMSE: 4.06606

Imputation MAPE: 0.0956982
Imputation RMSE: 4.06606

Running time: 123 seconds


**Experiment results** of missing data imputation using LRTC-TNN:

|  scenario |`alpha` (vector input)|`rho`|`theta`|`maxiter`|       mape |      rmse |
|:----------|-----:|---------:|---------:|-------- --:|-------- --:|----------:|
|**0.2, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.002 | 0.30 | 200 | **0.0663** | **2.84**|
|**0.4, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.002 | 0.30 | 200 | **0.0722** | **3.11**|
|**0.2, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.002 | 0.05 | 200 | **0.0939** | **3.97**|
|**0.4, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ |0.0005 | 0.05 | 200 | **0.0957** | **4.07**|


### Experiments on Birmingham Data Set


In [42]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.1

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [47]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.0005
theta = 0.20
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 135.489

Iter: 100
RMSE: 12.3854

Iter: 150
RMSE: 12.3673

Iter: 200
RMSE: 12.3742

Imputation MAPE: 0.0411626
Imputation RMSE: 12.3742

Running time: 2 seconds


In [49]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.3

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [59]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.0005
theta = 0.15
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 281.746

Iter: 100
RMSE: 109.418

Iter: 150
RMSE: 21.0593

Iter: 200
RMSE: 17.507

Imputation MAPE: 0.0521031
Imputation RMSE: 17.507

Running time: 2 seconds


In [60]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.1

# =============================================================================
### Non-random missing (NM) scenario
### Set the RM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [67]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.0002
theta = 0.10
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 225.961

Iter: 100
RMSE: 20.8381

Iter: 150
RMSE: 20.8942

Iter: 200
RMSE: 20.9185

Imputation MAPE: 0.0877233
Imputation RMSE: 20.9185

Running time: 2 seconds


In [68]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.3

# =============================================================================
### Non-random missing (NM) scenario
### Set the RM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [75]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.0002
theta = 0.05
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 416.54

Iter: 100
RMSE: 186.726

Iter: 150
RMSE: 57.7627

Iter: 200
RMSE: 47.65

Imputation MAPE: 0.13257
Imputation RMSE: 47.65

Running time: 2 seconds


**Experiment results** of missing data imputation using LRTC-TNN:

|  scenario |`alpha` (vector input)|`rho` | `theta` |`maxiter`|       mape |      rmse |
|:----------|-----:|---------:|---------:|-------- --:|-------- --:|----------:|
|**0.1, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.0005 | 0.20 | 200 | **0.0412** | **12.37**|
|**0.3, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.0005 | 0.15 | 200 | **0.0521** | **17.51**|
|**0.1, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.0002 | 0.10 | 200 | **0.0877** | **20.92**|
|**0.3, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.0002 | 0.05 | 200 | **0.1326** | **47.65**|


### Experiments on Hangzhou Data Set

In [76]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.2

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [83]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.0002
theta = 0.10
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 39.063

Iter: 100
RMSE: 24.9086

Iter: 150
RMSE: 24.6781

Iter: 200
RMSE: 24.6533

Imputation MAPE: 0.181488
Imputation RMSE: 24.6533

Running time: 13 seconds


In [89]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [90]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.0002
theta = 0.10
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 48.7718

Iter: 100
RMSE: 35.2227

Iter: 150
RMSE: 26.5963

Iter: 200
RMSE: 25.7474

Imputation MAPE: 0.187988
Imputation RMSE: 25.7474

Running time: 13 seconds


In [92]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.2

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [94]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.0002
theta = 0.05
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 66.9782

Iter: 100
RMSE: 32.2313

Iter: 150
RMSE: 31.6912

Iter: 200
RMSE: 31.7906

Imputation MAPE: 0.20379
Imputation RMSE: 31.7906

Running time: 11 seconds


In [103]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [105]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.0002
theta = 0.05
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 83.7915

Iter: 100
RMSE: 54.7268

Iter: 150
RMSE: 33.813

Iter: 200
RMSE: 32.3401

Imputation MAPE: 0.210977
Imputation RMSE: 32.3401

Running time: 13 seconds


**Experiment results** of missing data imputation using LRTC-TNN:

|  scenario |`alpha` (vector input)|`rho`|`theta`|`maxiter`|       mape |      rmse |
|:----------|-----:|---------:|---------:|--------:|----------:|----------:|
|**0.2, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.0005 | 0.05 | 200 | **0.1792** | **25.00**|
|**0.4, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.0005 | 0.05 | 200 | **0.1857** | **26.15**|
|**0.2, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.0005 | 0.04 | 200 | **0.2033** | **27.85**|
|**0.4, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.0005 | 0.04 | 200 | **0.2093** | **29.89**|


### Experiments on Seattle Data Set

In [106]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
RM_mat = RM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])
RM_tensor = RM_mat.reshape([RM_mat.shape[0], 28, 288])

missing_rate = 0.2

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(RM_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [108]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.001
theta = 0.30
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 3.01931

Iter: 100
RMSE: 3.01648

Iter: 150
RMSE: 3.01518

Iter: 200
RMSE: 3.01493

Imputation MAPE: 0.0460414
Imputation RMSE: 3.01493

Running time: 228 seconds


In [112]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
RM_mat = RM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])
RM_tensor = RM_mat.reshape([RM_mat.shape[0], 28, 288])

missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(RM_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [113]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.001
theta = 0.30
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 3.22771

Iter: 100
RMSE: 3.22321

Iter: 150
RMSE: 3.22076

Iter: 200
RMSE: 3.21982

Imputation MAPE: 0.0501131
Imputation RMSE: 3.21982

Running time: 214 seconds


In [120]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
NM_mat = NM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])

missing_rate = 0.2

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))
for i1 in range(binary_tensor.shape[0]):
    for i2 in range(binary_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [121]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.0005
theta = 0.05
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 4.20882

Iter: 100
RMSE: 4.18975

Iter: 150
RMSE: 4.18834

Iter: 200
RMSE: 4.18746

Imputation MAPE: 0.069471
Imputation RMSE: 4.18746

Running time: 217 seconds


In [117]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
NM_mat = NM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])

missing_rate = 0.4

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))
for i1 in range(binary_tensor.shape[0]):
    for i2 in range(binary_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [119]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.0005
theta = 0.05
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 15.0981

Iter: 100
RMSE: 4.55854

Iter: 150
RMSE: 4.51553

Iter: 200
RMSE: 4.51677

Imputation MAPE: 0.0762777
Imputation RMSE: 4.51677

Running time: 229 seconds


**Experiment results** of missing data imputation using LRTC-TNN:

|  scenario |`alpha` (vector input)|`rho`|`theta`|`maxiter`|       mape |      rmse |
|:----------|-----:|---------:|---------:|---------:|-------- --:|----------:|
|**0.2, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.005 | 0.30 | 200 | **0.0467** | **3.07**|
|**0.4, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 0.30 | 200 | **0.0507** | **3.24**|
|**0.2, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ |0.0005 | 0.05 | 200 | **0.0720** | **4.30**|
|**0.4, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ |0.0005 | 0.05 | 200 | **0.0770** | **4.55**|
