# About this Notebook

This notebook mainly discusses a Low-Rank Tensor Completion (LRTC) model which is called High accuracy LRTC (HaLRTC) in the following article:

> Ji Liu, Przemyslaw Musialski, Peter Wonka, Jieping Ye, 2013. **Tensor completion for estimating missing values in visual data**. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1): 208-220.


## Quick Run

This notebook is publicly available for any usage at our data imputation project. Please click [**transdim - GitHub**](https://github.com/xinychen/transdim).


We start by importing the necessary dependencies.

In [1]:
import numpy as np
from numpy.linalg import inv as inv

# Part 1: Tensor Unfolding (`ten2mat`) and Matrix Folding (`mat2ten`)

Using numpy reshape to perform 3rd rank tensor unfold operation. [[**link**](https://stackoverflow.com/questions/49970141/using-numpy-reshape-to-perform-3rd-rank-tensor-unfold-operation)]

In [2]:
import numpy as np
def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')

In [3]:
X = np.array([[[1, 2, 3, 4], [3, 4, 5, 6]], 
              [[5, 6, 7, 8], [7, 8, 9, 10]], 
              [[9, 10, 11, 12], [11, 12, 13, 14]]])
print('tensor size:')
print(X.shape)
print('original tensor:')
print(X)
print()
print('(1) mode-1 tensor unfolding:')
print(ten2mat(X, 0))
print()
print('(2) mode-2 tensor unfolding:')
print(ten2mat(X, 1))
print()
print('(3) mode-3 tensor unfolding:')
print(ten2mat(X, 2))

tensor size:
(3, 2, 4)
original tensor:
[[[ 1  2  3  4]
  [ 3  4  5  6]]

 [[ 5  6  7  8]
  [ 7  8  9 10]]

 [[ 9 10 11 12]
  [11 12 13 14]]]

(1) mode-1 tensor unfolding:
[[ 1  3  2  4  3  5  4  6]
 [ 5  7  6  8  7  9  8 10]
 [ 9 11 10 12 11 13 12 14]]

(2) mode-2 tensor unfolding:
[[ 1  5  9  2  6 10  3  7 11  4  8 12]
 [ 3  7 11  4  8 12  5  9 13  6 10 14]]

(3) mode-3 tensor unfolding:
[[ 1  5  9  3  7 11]
 [ 2  6 10  4  8 12]
 [ 3  7 11  5  9 13]
 [ 4  8 12  6 10 14]]


In [4]:
def mat2ten(mat, tensor_size, mode):
    index = list()
    index.append(mode)
    for i in range(tensor_size.shape[0]):
        if i != mode:
            index.append(i)
    return np.moveaxis(np.reshape(mat, list(tensor_size[index]), order = 'F'), 0, mode)

# Part 2: High accuracy Low-Rank Tensor Completion (HaLRTC)

In [5]:
def svt(mat, lambda0): ## Singular value thresholding (SVT)
    u, s, v = np.linalg.svd(mat, full_matrices = 0)
    vec = s - lambda0
    vec[np.where(vec < 0)] = 0
    
    return np.matmul(np.matmul(u, np.diag(vec)), v)

In [6]:
def HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter):
    """High accuracy Low-Rank Tensor Completion, HaLRTC."""
    
    dim0 = sparse_tensor.ndim
    dim1, dim2, dim3 = sparse_tensor.shape
    pos = np.where((dense_tensor != 0) & (sparse_tensor == 0))
    position = np.where(sparse_tensor != 0)
    binary_tensor = np.zeros((dim1, dim2, dim3))
    binary_tensor[position] = 1
    tensor_hat = sparse_tensor.copy()
    
    Z = np.zeros((dim1, dim2, dim3, dim0)) # \boldsymbol{\mathcal{Z}} (n1*n2*3*d)
    T = np.zeros((dim1, dim2, dim3, dim0)) # \boldsymbol{\mathcal{T}} (n1*n2*3*d)
    
    for iters in range(maxiter):
        for k in range(dim0):
            Z[:, :, :, k] = mat2ten(svt(ten2mat(tensor_hat + T[:, :, :, k] / rho, k), 
                                        alpha[k] / rho), np.array([dim1, dim2, dim3]), k)
        tensor_hat = np.mean(Z - T / rho, axis = 3)
        tensor_hat[position] = sparse_tensor[position]
        for k in range(dim0):
            T[:, :, :, k] = T[:, :, :, k] + rho * (tensor_hat - Z[:, :, :, k])

        rmse = np.sqrt(np.sum((dense_tensor[pos] - tensor_hat[pos]) ** 2) / dense_tensor[pos].shape[0])
        if (iters + 1) % 200 == 0:
            print('Iter: {}'.format(iters + 1))
            print('RMSE: {:.6}'.format(rmse))
            print()

    if maxiter >= 100:
        final_mape = np.sum(np.abs(dense_tensor[pos] - tensor_hat[pos]) / dense_tensor[pos]) / dense_tensor[pos].shape[0]
        final_rmse = np.sqrt(np.sum((dense_tensor[pos] - tensor_hat[pos]) ** 2) / dense_tensor[pos].shape[0])
        print('Imputation MAPE: {:.6}'.format(final_mape))
        print('Imputation RMSE: {:.6}'.format(final_rmse))
        print()

    return tensor_hat

# Part 3: Data Organization

## 1) Matrix Structure

We consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{f},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We express spatio-temporal dataset as a matrix $Y\in\mathbb{R}^{m\times f}$ with $m$ rows (e.g., locations) and $f$ columns (e.g., discrete time intervals),

$$Y=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{m1} & y_{m2} & \cdots & y_{mf} \\ \end{array} \right]\in\mathbb{R}^{m\times f}.$$

## 2) Tensor Structure

We consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{nf},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We partition each time series into intervals of predifined length $f$. We express each partitioned time series as a matrix $Y_{i}$ with $n$ rows (e.g., days) and $f$ columns (e.g., discrete time intervals per day),

$$Y_{i}=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{n1} & y_{n2} & \cdots & y_{nf} \\ \end{array} \right]\in\mathbb{R}^{n\times f},i=1,2,...,m,$$

therefore, the resulting structure is a tensor $\mathcal{Y}\in\mathbb{R}^{m\times n\times f}$.

# Part 4: Experiments on Guangzhou Data Set

In [7]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.2

# =============================================================================
### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [8]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.01
maxiter = 1000
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 3.3328

Iter: 400
RMSE: 3.33242

Iter: 600
RMSE: 3.33247

Iter: 800
RMSE: 3.33245

Iter: 1000
RMSE: 3.33245

Imputation MAPE: 0.0815113
Imputation RMSE: 3.33245

Running time: 961 seconds


In [9]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [10]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.01
maxiter = 1000
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 3.65684

Iter: 400
RMSE: 3.61437

Iter: 600
RMSE: 3.61432

Iter: 800
RMSE: 3.61427

Iter: 1000
RMSE: 3.61426

Imputation MAPE: 0.0887212
Imputation RMSE: 3.61426

Running time: 982 seconds


In [11]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.2

# =============================================================================
### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [12]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.01
maxiter = 1000
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 6.3564

Iter: 400
RMSE: 4.2148

Iter: 600
RMSE: 4.20912

Iter: 800
RMSE: 4.20865

Iter: 1000
RMSE: 4.20858

Imputation MAPE: 0.104594
Imputation RMSE: 4.20858

Running time: 938 seconds


In [13]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

# =============================================================================
### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [14]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.01
maxiter = 1000
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 17.2941

Iter: 400
RMSE: 5.09387

Iter: 600
RMSE: 4.42171

Iter: 800
RMSE: 4.38976

Iter: 1000
RMSE: 4.38132

Imputation MAPE: 0.108805
Imputation RMSE: 4.38132

Running time: 958 seconds


**Experiment results** of missing data imputation using HaLRTC:

|  scenario |`alpha` (vector input)|`rho`|`maxiter`|       mape |      rmse |
|:----------|-----:|---------:|---------:|-------- --:|----------:|
|**0.2, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.01 | 1000 | **0.0815** | **3.33**|
|**0.4, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.01 | 1000 | **0.0887** | **3.61**|
|**0.2, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.01 | 1000 | **0.1046** | **4.21**|
|**0.4, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.01 | 1000 | **0.1088** | **4.38**|


# Part 5: Experiments on Birmingham Data Set


In [15]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.1

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [16]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.001
maxiter = 1000
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 17.4024

Iter: 400
RMSE: 17.3576

Iter: 600
RMSE: 17.3514

Iter: 800
RMSE: 17.351

Iter: 1000
RMSE: 17.3511

Imputation MAPE: 0.0485009
Imputation RMSE: 17.3511

Running time: 11 seconds


In [17]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.3

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [18]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.001
maxiter = 1000
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 27.1823

Iter: 400
RMSE: 26.8319

Iter: 600
RMSE: 26.7962

Iter: 800
RMSE: 26.7923

Iter: 1000
RMSE: 26.7919

Imputation MAPE: 0.0664143
Imputation RMSE: 26.7919

Running time: 11 seconds


In [19]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.1

# =============================================================================
### Non-random missing (NM) scenario
### Set the RM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [20]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.001
maxiter = 1000
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 62.3232

Iter: 400
RMSE: 35.3005

Iter: 600
RMSE: 34.7651

Iter: 800
RMSE: 34.7215

Iter: 1000
RMSE: 34.7161

Imputation MAPE: 0.0946557
Imputation RMSE: 34.7161

Running time: 11 seconds


In [21]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.3

# =============================================================================
### Non-random missing (NM) scenario
### Set the RM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [22]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.001
maxiter = 1000
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 341.995

Iter: 400
RMSE: 158.246

Iter: 600
RMSE: 105.057

Iter: 800
RMSE: 94.7617

Iter: 1000
RMSE: 92.5922

Imputation MAPE: 0.14828
Imputation RMSE: 92.5922

Running time: 11 seconds


**Experiment results** of missing data imputation using HaLRTC:

|  scenario |`alpha` (vector input)|`rho`|`maxiter`|       mape |      rmse |
|:----------|-----:|---------:|---------:|-------- --:|----------:|
|**0.1, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.0485** | **17.35**|
|**0.3, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.0664** | **26.79**|
|**0.1, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.0947** | **34.72**|
|**0.3, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.1483** | **92.59**|


# Part 6: Experiments on Hangzhou Data Set

In [23]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.2

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [24]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.001
maxiter = 1000
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 28.978

Iter: 400
RMSE: 28.8789

Iter: 600
RMSE: 28.8787

Iter: 800
RMSE: 28.8787

Iter: 1000
RMSE: 28.8787

Imputation MAPE: 0.182614
Imputation RMSE: 28.8787

Running time: 69 seconds


In [25]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [26]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.001
maxiter = 1000
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 32.221

Iter: 400
RMSE: 31.818

Iter: 600
RMSE: 31.8141

Iter: 800
RMSE: 31.8141

Iter: 1000
RMSE: 31.8141

Imputation MAPE: 0.1901
Imputation RMSE: 31.8141

Running time: 63 seconds


In [27]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.2

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [28]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.001
maxiter = 1000
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 58.1452

Iter: 400
RMSE: 43.093

Iter: 600
RMSE: 40.8895

Iter: 800
RMSE: 40.5774

Iter: 1000
RMSE: 40.5329

Imputation MAPE: 0.202926
Imputation RMSE: 40.5329

Running time: 61 seconds


In [29]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [30]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.001
maxiter = 1000
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 75.0597

Iter: 400
RMSE: 58.435

Iter: 600
RMSE: 54.4693

Iter: 800
RMSE: 53.5021

Iter: 1000
RMSE: 53.2638

Imputation MAPE: 0.214657
Imputation RMSE: 53.2638

Running time: 62 seconds


**Experiment results** of missing data imputation using HaLRTC:

|  scenario |`alpha` (vector input)|`rho`|`maxiter`|       mape |      rmse |
|:----------|-----:|---------:|---------:|-------- --:|----------:|
|**0.2, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.1826** | **28.88**|
|**0.4, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.1901** | **31.81**|
|**0.2, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.2029** | **40.53**|
|**0.4, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.2147** | **53.26**|


# Part 7: Experiments on Seattle Data Set

In [31]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
RM_mat = RM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])
RM_tensor = RM_mat.reshape([RM_mat.shape[0], 28, 288])

missing_rate = 0.2

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(RM_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [32]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.01
maxiter = 1000
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 3.49451

Iter: 400
RMSE: 3.4773

Iter: 600
RMSE: 3.4773

Iter: 800
RMSE: 3.4773

Iter: 1000
RMSE: 3.4773

Imputation MAPE: 0.0594576
Imputation RMSE: 3.4773

Running time: 1233 seconds


In [33]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
RM_mat = RM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])
RM_tensor = RM_mat.reshape([RM_mat.shape[0], 28, 288])

missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(RM_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [34]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.01
maxiter = 1000
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 5.95185

Iter: 400
RMSE: 3.84158

Iter: 600
RMSE: 3.8411

Iter: 800
RMSE: 3.8411

Iter: 1000
RMSE: 3.8411

Imputation MAPE: 0.0677406
Imputation RMSE: 3.8411

Running time: 1304 seconds


In [35]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
NM_mat = NM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])

missing_rate = 0.2

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))
for i1 in range(binary_tensor.shape[0]):
    for i2 in range(binary_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [36]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.01
maxiter = 1000
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 34.6249

Iter: 400
RMSE: 14.4031

Iter: 600
RMSE: 5.06751

Iter: 800
RMSE: 4.70646

Iter: 1000
RMSE: 4.70185

Imputation MAPE: 0.0882067
Imputation RMSE: 4.70185

Running time: 1282 seconds


In [37]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
NM_mat = NM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])

missing_rate = 0.4

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))
for i1 in range(binary_tensor.shape[0]):
    for i2 in range(binary_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [38]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 0.001
maxiter = 1000
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 5.27941

Iter: 400
RMSE: 5.2794

Iter: 600
RMSE: 5.2794

Iter: 800
RMSE: 5.2794

Iter: 1000
RMSE: 5.2794

Imputation MAPE: 0.102027
Imputation RMSE: 5.2794

Running time: 1234 seconds


**Experiment results** of missing data imputation using HaLRTC:

|  scenario |`alpha` (vector input)|`rho`|`maxiter`|       mape |      rmse |
|:----------|-----:|---------:|---------:|-------- --:|----------:|
|**0.2, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.01 | 1000 | **0.0595** | **3.48**|
|**0.4, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.01 | 1000 | **0.0677** | **3.84**|
|**0.2, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.01 | 1000 | **0.0882** | **4.70**|
|**0.4, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.1020** | **5.28**|
