# About this Notebook

This notebook mainly discusses a Low-Rank Tensor Completion (LRTC) model which is called High accuracy LRTC (HaLRTC) in the following article:

> Ji Liu, Przemyslaw Musialski, Peter Wonka, Jieping Ye, 2013. **Tensor completion for estimating missing values in visual data**. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1): 208-220.


## Quick Run

This notebook is publicly available for any usage at our data imputation project. Please click [**transdim - GitHub**](https://github.com/xinychen/transdim).


We start by importing the necessary dependencies.

In [1]:
import numpy as np
from numpy.linalg import inv as inv

# Part 1: Tensor Unfolding (`ten2mat`) and Matrix Folding (`mat2ten`)

Using numpy reshape to perform 3rd rank tensor unfold operation. [[**link**](https://stackoverflow.com/questions/49970141/using-numpy-reshape-to-perform-3rd-rank-tensor-unfold-operation)]

In [2]:
import numpy as np
def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')

In [3]:
X = np.array([[[1, 2, 3, 4], [3, 4, 5, 6]], 
              [[5, 6, 7, 8], [7, 8, 9, 10]], 
              [[9, 10, 11, 12], [11, 12, 13, 14]]])
print('tensor size:')
print(X.shape)
print('original tensor:')
print(X)
print()
print('(1) mode-1 tensor unfolding:')
print(ten2mat(X, 0))
print()
print('(2) mode-2 tensor unfolding:')
print(ten2mat(X, 1))
print()
print('(3) mode-3 tensor unfolding:')
print(ten2mat(X, 2))

tensor size:
(3, 2, 4)
original tensor:
[[[ 1  2  3  4]
  [ 3  4  5  6]]

 [[ 5  6  7  8]
  [ 7  8  9 10]]

 [[ 9 10 11 12]
  [11 12 13 14]]]

(1) mode-1 tensor unfolding:
[[ 1  3  2  4  3  5  4  6]
 [ 5  7  6  8  7  9  8 10]
 [ 9 11 10 12 11 13 12 14]]

(2) mode-2 tensor unfolding:
[[ 1  5  9  2  6 10  3  7 11  4  8 12]
 [ 3  7 11  4  8 12  5  9 13  6 10 14]]

(3) mode-3 tensor unfolding:
[[ 1  5  9  3  7 11]
 [ 2  6 10  4  8 12]
 [ 3  7 11  5  9 13]
 [ 4  8 12  6 10 14]]


In [4]:
def mat2ten(mat, tensor_size, mode):
    index = list()
    index.append(mode)
    for i in range(tensor_size.shape[0]):
        if i != mode:
            index.append(i)
    return np.moveaxis(np.reshape(mat, list(tensor_size[index]), order = 'F'), 0, mode)

In [5]:
def compute_mape(var, var_hat):
    return np.sum(np.abs(var - var_hat) / var) / var.shape[0]

def compute_rmse(var, var_hat):
    return  np.sqrt(np.sum((var - var_hat) ** 2) / var.shape[0])

# Part 2: High accuracy Low-Rank Tensor Completion (HaLRTC)

In [6]:
def svt(mat, tau):
    [m, n] = mat.shape
    if 2 * m < n:
        u, s, v = np.linalg.svd(mat @ mat.T, full_matrices = 0)
        s = np.sqrt(s)
        tol = n * np.finfo(float).eps * np.max(s)
        idx = np.sum(s > max(tau,tol))
        mid = (s[:idx] - tau) / s[:idx]
        return u[:,:idx] @ np.diag(mid) @ u[:,:idx].T @ mat
    elif m > 2 * n:
        return svt(mat.T, tau).T
    u, s, v = np.linalg.svd(mat, full_matrices = 0)
    idx = np.sum(s > tau)
    return u[:,:idx] @ np.diag(s[:idx]-tau) @ v[:idx,:]

In [7]:
def HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter):
    """High accuracy Low-Rank Tensor Completion, HaLRTC."""
    
    dim0 = sparse_tensor.ndim
    dim1, dim2, dim3 = sparse_tensor.shape
    pos_missing = np.where(sparse_tensor == 0)
    pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
    tensor_hat = sparse_tensor.copy()
    
    Z = np.zeros((dim1, dim2, dim3, dim0)) # \boldsymbol{\mathcal{Z}} (n1*n2*3*d)
    T = np.zeros((dim1, dim2, dim3, dim0)) # \boldsymbol{\mathcal{T}} (n1*n2*3*d)
    last_tensor = sparse_tensor.copy()
    snorm = np.sqrt(np.sum(sparse_tensor ** 2))
    it = 0
    while True:
        rho = min(rho * 1.05, 1e5)
        for k in range(dim0):
            Z[:, :, :, k] = mat2ten(svt(ten2mat(tensor_hat + T[:, :, :, k] / rho, k), 
                                        alpha[k] / rho), np.array([dim1, dim2, dim3]), k)
        tensor_hat[pos_missing] = np.mean(Z - T / rho, axis = 3)[pos_missing]
        for k in range(dim0):
            T[:, :, :, k] = T[:, :, :, k] + rho * (tensor_hat - Z[:, :, :, k])
        tol = np.sqrt(np.sum((tensor_hat - last_tensor) ** 2)) / snorm
        last_tensor = tensor_hat.copy()
        it += 1
        if (it + 1) % 50 == 0:
            print('Iter: {}'.format(it + 1))
            print('RMSE: {:.6}'.format(compute_rmse(dense_tensor[pos_test], tensor_hat[pos_test])))
            print()
        if (tol < epsilon) or (it >= maxiter):
            break

    print('Imputation MAPE: {:.6}'.format(compute_mape(dense_tensor[pos_test], tensor_hat[pos_test])))
    print('Imputation RMSE: {:.6}'.format(compute_rmse(dense_tensor[pos_test], tensor_hat[pos_test])))
    print()

    return tensor_hat

# Part 3: Data Organization

## 1) Matrix Structure

We consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{f},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We express spatio-temporal dataset as a matrix $Y\in\mathbb{R}^{m\times f}$ with $m$ rows (e.g., locations) and $f$ columns (e.g., discrete time intervals),

$$Y=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{m1} & y_{m2} & \cdots & y_{mf} \\ \end{array} \right]\in\mathbb{R}^{m\times f}.$$

## 2) Tensor Structure

We consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{nf},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We partition each time series into intervals of predifined length $f$. We express each partitioned time series as a matrix $Y_{i}$ with $n$ rows (e.g., days) and $f$ columns (e.g., discrete time intervals per day),

$$Y_{i}=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{n1} & y_{n2} & \cdots & y_{nf} \\ \end{array} \right]\in\mathbb{R}^{n\times f},i=1,2,...,m,$$

therefore, the resulting structure is a tensor $\mathcal{Y}\in\mathbb{R}^{m\times n\times f}$.

# Part 4: Experiments on Guangzhou Data Set

In [8]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.2

# =============================================================================
### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [9]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 3.30616

Imputation MAPE: 0.0813338
Imputation RMSE: 3.32578

Running time: 16 seconds


In [10]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [11]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 3.58806

Imputation MAPE: 0.0886173
Imputation RMSE: 3.61067

Running time: 17 seconds


In [12]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.5

# =============================================================================
### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [13]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-4
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Imputation MAPE: 0.0930198
Imputation RMSE: 3.77161

Running time: 7 seconds


In [14]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.6

# =============================================================================
### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [15]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-4
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Imputation MAPE: 0.0982231
Imputation RMSE: 3.9581

Running time: 10 seconds


In [16]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.7

# =============================================================================
### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [17]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-4
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Imputation MAPE: 0.104459
Imputation RMSE: 4.17598

Running time: 11 seconds


In [18]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.2

# =============================================================================
### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [19]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 4.20067

Imputation MAPE: 0.104497
Imputation RMSE: 4.20627

Running time: 15 seconds


In [20]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.4

# =============================================================================
### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [21]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 4.36471

Imputation MAPE: 0.108782
Imputation RMSE: 4.37534

Running time: 17 seconds


In [22]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.5

# =============================================================================
### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [23]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 4.50592

Imputation MAPE: 0.113016
Imputation RMSE: 4.51998

Running time: 17 seconds


In [24]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.6

# =============================================================================
### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [25]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 4.66611

Imputation MAPE: 0.118022
Imputation RMSE: 4.68937

Running time: 18 seconds


In [26]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.7

# =============================================================================
### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [27]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-4
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 4.97112

Imputation MAPE: 0.12656
Imputation RMSE: 4.97011

Running time: 16 seconds


**Experiment results** of missing data imputation using HaLRTC:

|  scenario |`alpha` (vector input)|`rho`|`maxiter`|       mape |      rmse |
|:----------|-----:|---------:|---------:|-------- --:|----------:|
|**0.2, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.01 | 1000 | **0.0815** | **3.33**|
|**0.4, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.01 | 1000 | **0.0887** | **3.61**|
|**0.2, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.01 | 1000 | **0.1046** | **4.21**|
|**0.4, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.01 | 1000 | **0.1088** | **4.38**|


# Part 5: Experiments on Birmingham Data Set


In [28]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.1

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [29]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 16.8984

Imputation MAPE: 0.0478406
Imputation RMSE: 17.1755

Running time: 0 seconds


In [30]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.3

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [31]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 26.433

Imputation MAPE: 0.0658561
Imputation RMSE: 26.6339

Running time: 0 seconds


In [32]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.5

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [33]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 39.657

Imputation MAPE: 0.0912206
Imputation RMSE: 39.8505

Running time: 0 seconds


In [34]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.6

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [35]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 54.8144

Imputation MAPE: 0.111388
Imputation RMSE: 55.0234

Running time: 0 seconds


In [36]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.7

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [37]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 73.1056

Imputation MAPE: 0.141341
Imputation RMSE: 73.4255

Running time: 0 seconds


In [38]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.1

# =============================================================================
### Non-random missing (NM) scenario
### Set the RM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [39]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 34.2752

Imputation MAPE: 0.0937521
Imputation RMSE: 34.5154

Running time: 0 seconds


In [40]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.3

# =============================================================================
### Non-random missing (NM) scenario
### Set the RM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [41]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 91.451

Imputation MAPE: 0.146875
Imputation RMSE: 91.6576

Running time: 0 seconds


In [42]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.5

# =============================================================================
### Non-random missing (NM) scenario
### Set the RM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [43]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 194.457

Imputation MAPE: 0.192377
Imputation RMSE: 194.818

Running time: 0 seconds


In [44]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.6

# =============================================================================
### Non-random missing (NM) scenario
### Set the RM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [45]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 303.963

Imputation MAPE: 0.241191
Imputation RMSE: 303.416

Running time: 0 seconds


In [46]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.7

# =============================================================================
### Non-random missing (NM) scenario
### Set the RM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [47]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 388.64

Imputation MAPE: 0.293385
Imputation RMSE: 387.755

Running time: 0 seconds


**Experiment results** of missing data imputation using HaLRTC:

|  scenario |`alpha` (vector input)|`rho`|`maxiter`|       mape |      rmse |
|:----------|-----:|---------:|---------:|-------- --:|----------:|
|**0.1, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.0485** | **17.35**|
|**0.3, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.0664** | **26.79**|
|**0.1, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.0947** | **34.72**|
|**0.3, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.1483** | **92.59**|


# Part 6: Experiments on Hangzhou Data Set

In [48]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.2

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [49]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Imputation MAPE: 0.182712
Imputation RMSE: 28.8714

Running time: 1 seconds


In [50]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [51]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 31.8073

Imputation MAPE: 0.190249
Imputation RMSE: 31.8102

Running time: 1 seconds


In [52]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.5

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [53]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 33.2532

Imputation MAPE: 0.195149
Imputation RMSE: 33.256

Running time: 1 seconds


In [54]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.6

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [55]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 36.1837

Imputation MAPE: 0.200866
Imputation RMSE: 36.1915

Running time: 1 seconds


In [56]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.7

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [57]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 40.0645

Imputation MAPE: 0.209469
Imputation RMSE: 40.0769

Running time: 1 seconds


In [58]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.2

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [59]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Imputation MAPE: 0.202969
Imputation RMSE: 40.5109

Running time: 1 seconds


In [60]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.4

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [61]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Imputation MAPE: 0.214628
Imputation RMSE: 53.1454

Running time: 1 seconds


In [62]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.5

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [63]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 60.6308

Imputation MAPE: 0.228757
Imputation RMSE: 60.6308

Running time: 1 seconds


In [64]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.6

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [65]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 91.9103

Imputation MAPE: 0.239317
Imputation RMSE: 91.9153

Running time: 1 seconds


In [66]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.7

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [67]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 107.589

Imputation MAPE: 0.262338
Imputation RMSE: 107.631

Running time: 1 seconds


**Experiment results** of missing data imputation using HaLRTC:

|  scenario |`alpha` (vector input)|`rho`|`maxiter`|       mape |      rmse |
|:----------|-----:|---------:|---------:|-------- --:|----------:|
|**0.2, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.1826** | **28.88**|
|**0.4, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.1901** | **31.81**|
|**0.2, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.2029** | **40.53**|
|**0.4, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.2147** | **53.26**|


# Part 7: Experiments on Seattle Data Set

In [68]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
RM_mat = RM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])
RM_tensor = RM_mat.reshape([RM_mat.shape[0], 28, 288])

missing_rate = 0.2

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(RM_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [69]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 3.38495

Imputation MAPE: 0.0592615
Imputation RMSE: 3.46856

Running time: 24 seconds


In [70]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
RM_mat = RM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])
RM_tensor = RM_mat.reshape([RM_mat.shape[0], 28, 288])

missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(RM_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [71]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 3.73783

Imputation MAPE: 0.0675857
Imputation RMSE: 3.83403

Running time: 24 seconds


In [72]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
RM_mat = RM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])
RM_tensor = RM_mat.reshape([RM_mat.shape[0], 28, 288])

missing_rate = 0.5

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(RM_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [73]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 3.96685

Imputation MAPE: 0.0729742
Imputation RMSE: 4.0707

Running time: 25 seconds


In [74]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
RM_mat = RM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])
RM_tensor = RM_mat.reshape([RM_mat.shape[0], 28, 288])

missing_rate = 0.6

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(RM_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [75]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 4.22189

Imputation MAPE: 0.079017
Imputation RMSE: 4.33587

Running time: 27 seconds


In [76]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
RM_mat = RM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])
RM_tensor = RM_mat.reshape([RM_mat.shape[0], 28, 288])

missing_rate = 0.7

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(RM_tensor + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [77]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-4
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Imputation MAPE: 0.0889422
Imputation RMSE: 4.76464

Running time: 15 seconds


In [78]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
NM_mat = NM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])

missing_rate = 0.2

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))
for i1 in range(binary_tensor.shape[0]):
    for i2 in range(binary_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [79]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 4.61608

Imputation MAPE: 0.0879117
Imputation RMSE: 4.68977

Running time: 24 seconds


In [80]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
NM_mat = NM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])

missing_rate = 0.4

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))
for i1 in range(binary_tensor.shape[0]):
    for i2 in range(binary_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [81]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 5.21986

Imputation MAPE: 0.101906
Imputation RMSE: 5.27444

Running time: 27 seconds


In [82]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
NM_mat = NM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])

missing_rate = 0.5

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))
for i1 in range(binary_tensor.shape[0]):
    for i2 in range(binary_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [83]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 5.58773

Imputation MAPE: 0.111927
Imputation RMSE: 5.63865

Running time: 32 seconds


In [84]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
NM_mat = NM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])

missing_rate = 0.6

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))
for i1 in range(binary_tensor.shape[0]):
    for i2 in range(binary_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [85]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 6.06276

Imputation MAPE: 0.12373
Imputation RMSE: 6.10839

Running time: 28 seconds


In [86]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
NM_mat = NM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])

missing_rate = 0.7

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))
for i1 in range(binary_tensor.shape[0]):
    for i2 in range(binary_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [87]:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 5e-5
epsilon = 1e-4
maxiter = 200
HaLRTC(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 50
RMSE: 6.95703

Imputation MAPE: 0.143418
Imputation RMSE: 6.93915

Running time: 26 seconds


**Experiment results** of missing data imputation using HaLRTC:

|  scenario |`alpha` (vector input)|`rho`|`maxiter`|       mape |      rmse |
|:----------|-----:|---------:|---------:|-------- --:|----------:|
|**0.2, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.01 | 1000 | **0.0595** | **3.48**|
|**0.4, RM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.01 | 1000 | **0.0677** | **3.84**|
|**0.2, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.01 | 1000 | **0.0882** | **4.70**|
|**0.4, NM**| $\left(\frac{1}{3},\frac{1}{3},\frac{1}{3}\right)$ | 0.001 | 1000 | **0.1020** | **5.28**|
