# Generalized Higher-Order Orthogonal Iteration for Tensor Decomposition and Completion

This notebook shows how to implement a gHOI imputer on three real-world data sets (i.e., PeMS traffic speed data, Guangzhou traffic speed data, Electricity data). To overcome the problem of missing values within multivariate time series data, this method takes into account both tensor tucker decomposition and low rank core tensor structure. For an in-depth discussion of gHOI, please see [1].

<div class="alert alert-block alert-info">
<font color="black">
<b>[1]</b> Yuanyuan Liu, Fanhua Shang, Wei Fan, James Cheng,  Hong Cheng (2014). <b>Generalized Higher-Order Orthogonal Iteration for Tensor Decomposition and Completion</b>. NIPS Proceedings. <a href="https://papers.nips.cc/paper/5476-generalized-higher-order-orthogonal-iteration-for-tensor-decomposition-and-completion.pdf" title="PDF"><b>[PDF]</b></a> 
</font>
</div>


In [2]:
import numpy as np
import time
from numpy.linalg import inv as inv

### Define LATC-imputer kernel

We start by introducing some necessary functions that relies on `Numpy`.

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>ten2mat</code>:</b> <font color="black">Unfold tensor as matrix by specifying mode.</font></li>
<li><b><code>mat2ten</code>:</b> <font color="black">Fold matrix as tensor by specifying dimension (i.e, tensor size) and mode.</font></li>
<li><b><code>tucker_combine</code>:</b> <font color="black">Combine core tensor and unitary matrices as full tensor.</font></li>
</ul>
</div>

### Tensor folding and unfolding

In [3]:
def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')

def mat2ten(mat, dim, mode):
    index = list()
    index.append(mode)
    for i in range(dim.shape[0]):
        if i != mode:
            index.append(i)
    return np.moveaxis(np.reshape(mat, list(dim[index]), order = 'F'), 0, mode)

## Tensor tucker combination

In [4]:
def tucker_combine(var, skip = False, skipn = 0):
    annotation = 'qwertyuiop'
    G = var[0]
    R = G.shape
    dim_N = len(R)
    anno = annotation[:dim_N]
    W = G.copy()
    for n in range(len(var) - 1):
        if skip == True and n == skipn:
            continue
        target = anno.replace(anno[n], 'n')
        mul_type = anno + ', n' + annotation[n] + '->' + target
        W = np.einsum(mul_type, W, var[n + 1])
    return W

## Function to randomly initiate variables

In [5]:
def init_variables(dim, R):
    G = np.random.rand(*R)
    dim_N = len(dim)
    U = []
    for i in range(dim_N):
        U.append(np.random.rand(dim[i], R[i]))
        
    V = []
    for i in range(dim_N):
        V.append(ten2mat(G, i))
    
    Y = []
    for i in range(dim_N):
        Y.append(np.zeros_like(V[i]))
    return G, U, V, Y

## Loss Calculator

In [6]:
def losscal(V, Y, G, U, X, X_pre, mu, lambda_l):
    loss = 0
    dim_N = len(X.shape)
    for i in range(dim_N):
        u, s, v = np.linalg.svd(V[i], full_matrices=0)
        mat = ten2mat(G, i) - V[i]
        loss += np.sum(s) + np.einsum('ij, ij', Y[i], mat) + mu / 2 * np.sum(np.square(mat))
    loss += lambda_l * np.sum(np.square(X - X_pre))
    return loss

## Error calculator
<div class="alert alert-block alert-warning">
<ul>
<li><b><code>compute_mape</code>:</b> <font color="black">Compute the value of Mean Absolute Percentage Error (MAPE).</font></li>
<li><b><code>compute_rmse</code>:</b> <font color="black">Compute the value of Root Mean Square Error (RMSE).</font></li>
</ul>
</div>

> Note that $$\mathrm{MAPE}=\frac{1}{n} \sum_{i=1}^{n} \frac{\left|y_{i}-\hat{y}_{i}\right|}{y_{i}} \times 100, \quad\mathrm{RMSE}=\sqrt{\frac{1}{n} \sum_{i=1}^{n}\left(y_{i}-\hat{y}_{i}\right)^{2}},$$ where $n$ is the total number of estimated values, and $y_i$ and $\hat{y}_i$ are the actual value and its estimation, respectively.

In [7]:
def compute_mape(var, var_hat):
    return np.sum(np.abs(var - var_hat) / var) / var.shape[0]

def compute_rmse(var, var_hat):
    return  np.sqrt(np.sum((var - var_hat) ** 2) / var.shape[0])

## Function validity test
<div class="alert alert-block alert-warning">
<ul>
<li><b><code>tucker_combination</code>:</b> <font color="black">Test the function validity of `tucker_combination`</font></li>
<li><b><code>padding</code>:</b> <font color="black">Test the validity of tensor padding.</font></li>
</ul>
</div>

### tucker_combination test

In [9]:
G = np.random.rand(1, 2, 3)
U_1 = np.random.rand(3, 1)
U_2 = np.random.rand(2, 2)
U_3 = np.random.rand(1, 3)
var = [G, U_1, U_2, U_3]
tucker_cb = tucker_combine(var)
print(tucker_cb.shape)
dim1 = U_1.shape[0]
dim2 = U_2.shape[0]
dim3 = U_3.shape[0]

R1 = U_1.shape[1]
R2 = U_2.shape[1]
R3 = U_3.shape[1]

GU = np.zeros((dim1, R2, R3))
for i in range(dim1):
    for j in range(R2):
        for k in range(R3):
            GU[i,j,k] = np.matmul(G[:, j, k], U_1[i, :])

GUU = np.zeros((dim1, dim2, R3))
for i in range(dim1):
    for j in range(dim2):
        for k in range(R3):
            GUU[i,j,k] = np.matmul(GU[i, :, k], U_2[j, :])
            
GUUU = np.zeros((dim1, dim2, dim3))
for i in range(dim1):
    for j in range(dim2):
        for k in range(dim3):
            GUUU[i,j,k] = np.matmul(GUU[i, j, :], U_3[k, :])
print('Func: tucker_combine result:')
print(tucker_cb)
print()
print('Ground Truth:')
print(GUUU)
print()

# print(tucker_cb.dtype)
# print(GUUU.dtype)
# print()
# print('Dose tucker_cb equal GUUU?')
# if np.array_equal(GUUU, tucker_cb):
#     print('Yes!')
# else:
#     print('No~')

(3, 2, 1)
Func: tucker_combine result:
[[[0.09250088]
  [0.07387468]]

 [[0.22082989]
  [0.17636305]]

 [[0.5817401 ]
  [0.46459951]]]

Ground Truth:
[[[0.09250088]
  [0.07387468]]

 [[0.22082989]
  [0.17636305]]

 [[0.5817401 ]
  [0.46459951]]]



### Padding test

In [10]:
# A = np.zeros((5,5))
B = np.array([[[1,1],[2,2]],[[3,3],[4,4]]])
delta = np.array([2, 1, 0])
N_tuple = ()
for i in range(len(delta)):
    N_tuple += ((0, delta[i]), )
C = np.pad(B, N_tuple, 'constant', constant_values=(0))
# C = B.resize((5, 5))
print(C)

[[[1 1]
  [2 2]
  [0 0]]

 [[3 3]
  [4 4]
  [0 0]]

 [[0 0]
  [0 0]
  [0 0]]

 [[0 0]
  [0 0]
  [0 0]]]


## Generalized Higher-Order Orthogonal Iteration Imputer
The following `imputer` kernel includes some necessary inputs:

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>dense_tensor</code>:</b> <font color="black">This is an input which has the ground truth for validation. If this input is not available, you could use <code>dense_tensor = sparse_tensor.copy()</code> instead.</font></li>
<li><b><code>sparse_tensor</code>:</b> <font color="black">This is a partially observed tensor which has many missing entries.</font></li>
<li><b><code>r</code>:</b> <font color="black">Initial n-rank of the aprroximated tensor, e.g., <code>r = np.array([10, 10, 10])</code>. </font></li>
<li><b><code>R_max</code>:</b> <font color="black">The upper bound of the approximated tensor, e.g., <code>R_max = np.array([80, 80, 80])</code>. </font></li>
<li><b><code>lambda_l</code>:</b> <font color="black">Weight for sum of squared residual error  e.g., <code>lambda_l = 1</code>. </font></li>
<li><b><code>rho</code>:</b> <font color="black">Scalling factor of mu, e.g., <code>epsilon = 1.01</code>. </font></li>
<li><b><code>mu0</code>:</b> <font color="black">Initial learning rate for ADMM, e.g., <code>mu0 = 0.0005</code>. </font></li>
<li><b><code>mu_max</code>:</b> <font color="black">Upper bound of learning rate for ADMM, e.g., <code>mu_max = 0.01</code>. </font></li>
<li><b><code>delta</code>:</b> <font color="black">Rank increasing step lengths, e.g., <code>delta = np.array([5, 5, 5])</code>. </font></li>
<li><b><code>epsilon</code>:</b> <font color="black">Rank increasing criteria, e.g., <code>epsilon = 0.2 </code>. </font></li>
<li><b><code>maxiter</code>:</b> <font color="black">Maximum iteration to stop algorithm, e.g., <code>maxiter = 100 </code>. </font></li>
</ul>
</div>

In [39]:
def imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter):
    X = sparse_tensor.copy()
    mu = mu0
    R = r.copy()
    dim = sparse_tensor.shape
    dim_N = len(dim)
    G, U, V, Y = init_variables(dim, R)
    pos = np.where(sparse_tensor == 0)
    pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
    start_time = time.time()
    L_pre = np.inf
    for iteration in range(maxiter):
        # Update unitary matrices
        for i in range(dim_N):
            parse_list = [X]
            for j in range(dim_N):
                    parse_list.append(U[j].T)
            M = tucker_combine(parse_list, skip = True, skipn = i)
            N = np.zeros(R)
            for j in range(dim_N):
                N = N + mat2ten(V[j]-Y[j]/mu, R, j)
            MN = np.matmul(ten2mat(M, i), ten2mat(N, i).T)
            Ud, sd, Vd = np.linalg.svd(MN, full_matrices=0)
            U[i] = np.matmul(Ud, Vd)
        
        # Update core tensor
        parse_list = [X]
        for j in range(dim_N):
            parse_list.append(U[j].T)
        G = lambda_l / (lambda_l + dim_N * mu) * tucker_combine(parse_list, skip = False)
        for j in range(dim_N):
            G = G + mu / (lambda_l + dim_N * mu) * mat2ten(V[j]-Y[j]/mu, R, j)
        
        # Update auxiliary matrices
        for i in range(dim_N):
            Us, ss, Vs = np.linalg.svd(ten2mat(G, i) + Y[i]/mu, full_matrices=0)
            vec = ss - 1 / mu
            vec[vec <= 0] = 0
            V[i] = np.matmul(np.matmul(Us, np.diag(vec)), Vs)
        
        # Update data tensor (imputation)
        parse_list = [G]
        for i in range(dim_N):
            parse_list.append(U[i])
        X_hat = tucker_combine(parse_list, skip = False)
        X_pre = X.copy()
        X[pos] = X_hat[pos].copy()

        # Update multiplier
        for i in range(dim_N):
            Y[i] = Y[i] + mu * (ten2mat(G, i) - V[i])
        
        # Update parameter mu
        mu = min(mu * rho, mu_max)

#         # Stop criteria
#         GVD = []
#         for i in range(dim_N):
#             GVD.append(np.sum(np.square(ten2mat(G, i) - V[i])))
#         tolerance = max(GVD)
#         if tolerance < tol:
#             break
        
        # Rank increasing
        L = losscal(V, Y, G, U, X, X_pre, mu, lambda_l)
        lcr = np.abs(1 - L/L_pre)
        L_pre = L
        delta_c = delta.copy()
        if lcr <= epsilon:
            for i in range(dim_N):
                delta_c[i] = min(delta[i], R_max[i]-R[i])
                if delta_c[i] != 0:
                    H = np.random.rand(dim[i], delta_c[i])
                    U_hat = np.matmul((np.eye(dim[i]) - np.matmul(U[i], U[i].T)), H)
                    U[i] = np.concatenate((U[i], U_hat), axis=1)

            R_pre = R.copy()
            R = R + delta_c
            delta_tuple = ()
            for i in range(dim_N):
                delta_tuple += ((0, delta_c[i]), )
                
            for i in range(dim_N):
                W_cal = mat2ten(V[i], R_pre, i)
                W_cal_c = np.pad(W_cal, delta_tuple, 'constant', constant_values=(0))
                V[i] = ten2mat(W_cal_c, i)

            for i in range(dim_N):
                W_cal = mat2ten(Y[i], R_pre, i)
                W_cal_c = np.pad(W_cal, delta_tuple, 'constant', constant_values=(0))
                Y[i] = ten2mat(W_cal_c, i)
        
            
        if (iteration + 1) % 50 == 0:
            print('Iteration: %d, Time cost: %ds'%(iteration + 1, time.time() - start_time))
#             print('Tolerance: {:.6}'.format(tolerance))
            print('MAPE: {:.6}'.format(compute_mape(dense_tensor[pos_test], X[pos_test])))
            print('RMSE: {:.6}'.format(compute_rmse(dense_tensor[pos_test], X[pos_test])))
            print('Current rank:')
            print(R)
            print()
            start_time = time.time()
            
    print('Total iteration: %d'%(iteration + 1))
#     print('Tolerance: {:.6}'.format(tolerance))
    print('Imputation MAPE: {:.6}'.format(compute_mape(dense_tensor[pos_test], X[pos_test])))
    print('Imputation RMSE: {:.6}'.format(compute_rmse(dense_tensor[pos_test], X[pos_test])))
    print('Current rank:')
    print(R)
    return X

### Guangzhou data

We generate **random missing (RM)** values on Guangzhou traffic speed data set.

In [40]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.2

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

dense_tensor = np.transpose(dense_tensor, [0, 2, 1])
sparse_tensor = np.transpose(sparse_tensor, [0, 2, 1])
print('Tensor shape:')
print(dense_tensor.shape)

Tensor shape:
(214, 144, 61)


We use `imputer` to fill in the missing entries and measure performance metrics on the ground truth.

In [54]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 9s
MAPE: 0.11697
RMSE: 5.57439
Current rank:
[11 11 11]

Iteration: 100, Time cost: 25s
MAPE: 0.130908
RMSE: 6.14031
Current rank:
[56 56 56]

Total iteration: 100
Imputation MAPE: 0.130908
Imputation RMSE: 6.14031
Current rank:
[56 56 56]
Running time: 35 seconds


In [55]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

dense_tensor = np.transpose(dense_tensor, [0, 2, 1])
sparse_tensor = np.transpose(sparse_tensor, [0, 2, 1])

del tensor, random_tensor,binary_tensor

In [56]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 9s
MAPE: 0.0981221
RMSE: 4.14299
Current rank:
[11 11 11]

Iteration: 100, Time cost: 15s
MAPE: 0.0883596
RMSE: 3.76896
Current rank:
[34 34 34]

Total iteration: 100
Imputation MAPE: 0.0883596
Imputation RMSE: 3.76896
Current rank:
[34 34 34]
Running time: 25 seconds


In [57]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.6

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

dense_tensor = np.transpose(dense_tensor, [0, 2, 1])
sparse_tensor = np.transpose(sparse_tensor, [0, 2, 1])

del tensor, random_tensor,binary_tensor

In [58]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 18s
MAPE: 0.0910087
RMSE: 3.91056
Current rank:
[46 46 46]

Iteration: 100, Time cost: 79s
MAPE: 0.120026
RMSE: 5.34915
Current rank:
[80 80 80]

Total iteration: 100
Imputation MAPE: 0.120026
Imputation RMSE: 5.34915
Current rank:
[80 80 80]
Running time: 98 seconds


We generate **non-random missing (NM)** values on Guangzhou traffic speed data set. Then, we conduct the imputation experiment.

In [59]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.2

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

dense_tensor = np.transpose(dense_tensor, [0, 2, 1])
sparse_tensor = np.transpose(sparse_tensor, [0, 2, 1])

del tensor, random_matrix, binary_tensor

In [60]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 11s
MAPE: 0.0986107
RMSE: 4.17167
Current rank:
[16 16 16]

Iteration: 100, Time cost: 11s
MAPE: 0.0983929
RMSE: 4.16396
Current rank:
[16 16 16]

Total iteration: 100
Imputation MAPE: 0.0983929
Imputation RMSE: 4.16396
Current rank:
[16 16 16]
Running time: 23 seconds


In [61]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.4

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

dense_tensor = np.transpose(dense_tensor, [0, 2, 1])
sparse_tensor = np.transpose(sparse_tensor, [0, 2, 1])

del tensor, random_matrix, binary_tensor

In [62]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 8s
MAPE: 0.103144
RMSE: 4.43527
Current rank:
[10 10 10]

Iteration: 100, Time cost: 13s
MAPE: 0.105151
RMSE: 4.67524
Current rank:
[37 37 37]

Total iteration: 100
Imputation MAPE: 0.105151
Imputation RMSE: 4.67524
Current rank:
[37 37 37]
Running time: 22 seconds


In [63]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']

missing_rate = 0.6

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

dense_tensor = np.transpose(dense_tensor, [0, 2, 1])
sparse_tensor = np.transpose(sparse_tensor, [0, 2, 1])

del tensor, random_matrix, binary_tensor

In [64]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 9s
MAPE: 0.108579
RMSE: 4.9738
Current rank:
[15 15 15]

Iteration: 100, Time cost: 31s
MAPE: 0.136828
RMSE: 6.286
Current rank:
[64 64 64]

Total iteration: 100
Imputation MAPE: 0.136828
Imputation RMSE: 6.286
Current rank:
[64 64 64]
Running time: 40 seconds


### PeMS data

In [65]:
dense_mat = np.load('../datasets/PeMS-data-set/pems.npy')
random_tensor = np.load('../datasets/PeMS-data-set/random_tensor.npy')

missing_rate = 0.2

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_mat = np.multiply(dense_mat, ten2mat(binary_tensor, 0))

sparse_tensor = mat2ten(sparse_mat, np.array(binary_tensor.shape), 0)
dense_tensor = mat2ten(dense_mat, np.array(binary_tensor.shape), 0)

del dense_mat, random_tensor, binary_tensor

In [66]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 14s
MAPE: 0.0876598
RMSE: 5.96094
Current rank:
[13 13 13]

Iteration: 100, Time cost: 45s
MAPE: 0.0541411
RMSE: 3.68486
Current rank:
[62 62 62]

Total iteration: 100
Imputation MAPE: 0.0541411
Imputation RMSE: 3.68486
Current rank:
[62 62 62]
Running time: 60 seconds


In [67]:
dense_mat = np.load('../datasets/PeMS-data-set/pems.npy')
random_tensor = np.load('../datasets/PeMS-data-set/random_tensor.npy')

missing_rate = 0.4

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_mat = np.multiply(dense_mat, ten2mat(binary_tensor, 0))

sparse_tensor = mat2ten(sparse_mat, np.array(binary_tensor.shape), 0)
dense_tensor = mat2ten(dense_mat, np.array(binary_tensor.shape), 0)

del dense_mat, random_tensor, binary_tensor

In [68]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 21s
MAPE: 0.0746132
RMSE: 5.05658
Current rank:
[29 29 29]

Iteration: 100, Time cost: 66s
MAPE: 0.0533752
RMSE: 3.63289
Current rank:
[76 76 76]

Total iteration: 100
Imputation MAPE: 0.0533752
Imputation RMSE: 3.63289
Current rank:
[76 76 76]
Running time: 88 seconds


In [69]:
dense_mat = np.load('../datasets/PeMS-data-set/pems.npy')
random_tensor = np.load('../datasets/PeMS-data-set/random_tensor.npy')

missing_rate = 0.6

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_mat = np.multiply(dense_mat, ten2mat(binary_tensor, 0))

sparse_tensor = mat2ten(sparse_mat, np.array(binary_tensor.shape), 0)
dense_tensor = mat2ten(dense_mat, np.array(binary_tensor.shape), 0)

del dense_mat, random_tensor, binary_tensor

In [70]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 27s
MAPE: 0.0654111
RMSE: 4.41019
Current rank:
[46 46 46]

Iteration: 100, Time cost: 96s
MAPE: 0.0623232
RMSE: 4.27831
Current rank:
[80 80 80]

Total iteration: 100
Imputation MAPE: 0.0623232
Imputation RMSE: 4.27831
Current rank:
[80 80 80]
Running time: 124 seconds


In [71]:
dense_mat = np.load('../datasets/PeMS-data-set/pems.npy')
random_matrix = np.load('../datasets/PeMS-data-set/random_matrix.npy')

missing_rate = 0.2

### Nonrandom missing (NM) scenario:
binary_tensor = np.zeros((dense_mat.shape[0], 288, 44))
for i1 in range(dense_mat.shape[0]):
    for i2 in range(44):
        binary_tensor[i1,:,i2] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
binary_mat = ten2mat(binary_tensor, 0)
sparse_mat = np.multiply(dense_mat, binary_mat)

sparse_tensor = mat2ten(sparse_mat, np.array(binary_tensor.shape), 0)
dense_tensor = mat2ten(dense_mat, np.array(binary_tensor.shape), 0)

del dense_mat, random_matrix, binary_tensor

In [72]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 19s
MAPE: 0.09342
RMSE: 6.87262
Current rank:
[21 21 21]

Iteration: 100, Time cost: 52s
MAPE: 0.100346
RMSE: 7.7148
Current rank:
[68 68 68]

Total iteration: 100
Imputation MAPE: 0.100346
Imputation RMSE: 7.7148
Current rank:
[68 68 68]
Running time: 71 seconds


In [73]:
dense_mat = np.load('../datasets/PeMS-data-set/pems.npy')
random_matrix = np.load('../datasets/PeMS-data-set/random_matrix.npy')

missing_rate = 0.4

### Nonrandom missing (NM) scenario:
binary_tensor = np.zeros((dense_mat.shape[0], 288, 44))
for i1 in range(dense_mat.shape[0]):
    for i2 in range(44):
        binary_tensor[i1,:,i2] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
binary_mat = ten2mat(binary_tensor, 0)
sparse_mat = np.multiply(dense_mat, binary_mat)

sparse_tensor = mat2ten(sparse_mat, np.array(binary_tensor.shape), 0)
dense_tensor = mat2ten(dense_mat, np.array(binary_tensor.shape), 0)

del dense_mat, random_matrix, binary_tensor

In [74]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 14s
MAPE: 0.0948288
RMSE: 6.4998
Current rank:
[12 12 12]

Iteration: 100, Time cost: 41s
MAPE: 0.103716
RMSE: 7.47094
Current rank:
[60 60 60]

Total iteration: 100
Imputation MAPE: 0.103716
Imputation RMSE: 7.47094
Current rank:
[60 60 60]
Running time: 55 seconds


In [75]:
dense_mat = np.load('../datasets/PeMS-data-set/pems.npy')
random_matrix = np.load('../datasets/PeMS-data-set/random_matrix.npy')

missing_rate = 0.6

### Nonrandom missing (NM) scenario:
binary_tensor = np.zeros((dense_mat.shape[0], 288, 44))
for i1 in range(dense_mat.shape[0]):
    for i2 in range(44):
        binary_tensor[i1,:,i2] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
binary_mat = ten2mat(binary_tensor, 0)
sparse_mat = np.multiply(dense_mat, binary_mat)

sparse_tensor = mat2ten(sparse_mat, np.array(binary_tensor.shape), 0)
dense_tensor = mat2ten(dense_mat, np.array(binary_tensor.shape), 0)

del dense_mat, random_matrix, binary_tensor

In [76]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 17s
MAPE: 0.1281
RMSE: 11.4652
Current rank:
[26 26 26]

Iteration: 100, Time cost: 66s
MAPE: 0.154614
RMSE: 12.654
Current rank:
[76 76 76]

Total iteration: 100
Imputation MAPE: 0.154614
Imputation RMSE: 12.654
Current rank:
[76 76 76]
Running time: 84 seconds


### Electricity data

- **Random Missing (RM)**:

In [77]:
dense_mat = np.load('../datasets/Electricity-data-set/electricity35.npy')
random_tensor = np.load('../datasets/Electricity-data-set/random_tensor.npy')

missing_rate = 0.2

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_mat = np.multiply(dense_mat, ten2mat(binary_tensor, 0))

sparse_tensor = mat2ten(sparse_mat, np.array(binary_tensor.shape), 0)
dense_tensor = mat2ten(dense_mat, np.array(binary_tensor.shape), 0)

del dense_mat, random_tensor, binary_tensor

In [78]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 4s
MAPE: 0.158298
RMSE: 2516.01
Current rank:
[51 51 51]

Iteration: 100, Time cost: 28s
MAPE: 0.150148
RMSE: 2514.23
Current rank:
[80 80 80]

Total iteration: 100
Imputation MAPE: 0.150148
Imputation RMSE: 2514.23
Current rank:
[80 80 80]
Running time: 32 seconds


In [79]:
dense_mat = np.load('../datasets/Electricity-data-set/electricity35.npy')
random_tensor = np.load('../datasets/Electricity-data-set/random_tensor.npy')

missing_rate = 0.4

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_mat = np.multiply(dense_mat, ten2mat(binary_tensor, 0))

sparse_tensor = mat2ten(sparse_mat, np.array(binary_tensor.shape), 0)
dense_tensor = mat2ten(dense_mat, np.array(binary_tensor.shape), 0)

del dense_mat, random_tensor, binary_tensor

In [80]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 1s
MAPE: 0.166842
RMSE: 4322.99
Current rank:
[27 27 27]

Iteration: 100, Time cost: 14s
MAPE: 0.165652
RMSE: 4323.51
Current rank:
[75 75 75]

Total iteration: 100
Imputation MAPE: 0.165652
Imputation RMSE: 4323.51
Current rank:
[75 75 75]
Running time: 16 seconds


In [81]:
dense_mat = np.load('../datasets/Electricity-data-set/electricity35.npy')
random_tensor = np.load('../datasets/Electricity-data-set/random_tensor.npy')

missing_rate = 0.6

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_mat = np.multiply(dense_mat, ten2mat(binary_tensor, 0))

sparse_tensor = mat2ten(sparse_mat, np.array(binary_tensor.shape), 0)
dense_tensor = mat2ten(dense_mat, np.array(binary_tensor.shape), 0)

del dense_mat, random_tensor, binary_tensor

In [82]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 2s
MAPE: 0.18421
RMSE: 5314.5
Current rank:
[38 38 38]

Iteration: 100, Time cost: 21s
MAPE: 0.191164
RMSE: 5314.52
Current rank:
[80 80 80]

Total iteration: 100
Imputation MAPE: 0.191164
Imputation RMSE: 5314.52
Current rank:
[80 80 80]
Running time: 24 seconds


- **Nonrandom Missing (NM)**:

In [83]:
dense_mat = np.load('../datasets/Electricity-data-set/electricity35.npy')
random_matrix = np.load('../datasets/Electricity-data-set/random_matrix.npy')

missing_rate = 0.2

### Nonrandom missing (NM) scenario:
binary_tensor = np.zeros((dense_mat.shape[0], 24, 35))
for i1 in range(dense_mat.shape[0]):
    for i2 in range(35):
        binary_tensor[i1,:,i2] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
binary_mat = ten2mat(binary_tensor, 0)
sparse_mat = np.multiply(dense_mat, binary_mat)

sparse_tensor = mat2ten(sparse_mat, np.array(binary_tensor.shape), 0)
dense_tensor = mat2ten(dense_mat, np.array(binary_tensor.shape), 0)

del dense_mat, random_matrix, binary_tensor

In [84]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 3s
MAPE: 0.215312
RMSE: 7632.65
Current rank:
[43 43 43]

Iteration: 100, Time cost: 24s
MAPE: 0.227758
RMSE: 7633.02
Current rank:
[80 80 80]

Total iteration: 100
Imputation MAPE: 0.227758
Imputation RMSE: 7633.02
Current rank:
[80 80 80]
Running time: 27 seconds


In [85]:
dense_mat = np.load('../datasets/Electricity-data-set/electricity35.npy')
random_matrix = np.load('../datasets/Electricity-data-set/random_matrix.npy')

missing_rate = 0.4

### Nonrandom missing (NM) scenario:
binary_tensor = np.zeros((dense_mat.shape[0], 24, 35))
for i1 in range(dense_mat.shape[0]):
    for i2 in range(35):
        binary_tensor[i1,:,i2] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
binary_mat = ten2mat(binary_tensor, 0)
sparse_mat = np.multiply(dense_mat, binary_mat)

sparse_tensor = mat2ten(sparse_mat, np.array(binary_tensor.shape), 0)
dense_tensor = mat2ten(dense_mat, np.array(binary_tensor.shape), 0)

del dense_mat, random_matrix, binary_tensor

In [86]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 3s
MAPE: 0.208571
RMSE: 8679.02
Current rank:
[47 47 47]

Iteration: 100, Time cost: 26s
MAPE: 0.229431
RMSE: 8679.35
Current rank:
[80 80 80]

Total iteration: 100
Imputation MAPE: 0.229431
Imputation RMSE: 8679.35
Current rank:
[80 80 80]
Running time: 30 seconds


In [87]:
dense_mat = np.load('../datasets/Electricity-data-set/electricity35.npy')
random_matrix = np.load('../datasets/Electricity-data-set/random_matrix.npy')

missing_rate = 0.6

### Nonrandom missing (NM) scenario:
binary_tensor = np.zeros((dense_mat.shape[0], 24, 35))
for i1 in range(dense_mat.shape[0]):
    for i2 in range(35):
        binary_tensor[i1,:,i2] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
binary_mat = ten2mat(binary_tensor, 0)
sparse_mat = np.multiply(dense_mat, binary_mat)

sparse_tensor = mat2ten(sparse_mat, np.array(binary_tensor.shape), 0)
dense_tensor = mat2ten(dense_mat, np.array(binary_tensor.shape), 0)

del dense_mat, random_matrix, binary_tensor

In [88]:
import time
start = time.time()
r = np.array([10, 10, 10])
R_max = np.array([80, 80, 80])
delta = np.array([1, 1, 1])
epsilon = 0.2
lambda_l = 1
rho = 1.01
mu0 = 0.00001
mu_max = 0.01
maxiter = 100
tensor_hat = imputer(dense_tensor, sparse_tensor, r, R_max, lambda_l, rho, mu0, mu_max, delta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iteration: 50, Time cost: 3s
MAPE: 0.318191
RMSE: 9480.63
Current rank:
[33 33 33]

Iteration: 100, Time cost: 17s
MAPE: 0.3292
RMSE: 9480.51
Current rank:
[80 80 80]

Total iteration: 100
Imputation MAPE: 0.3292
Imputation RMSE: 9480.51
Current rank:
[80 80 80]
Running time: 20 seconds


### License

<div class="alert alert-block alert-danger">
<b>This work is released under the MIT license.</b>
</div>