# Bayesian Augmented Tensor Factorization

**Published**: September 30, 2020

**Author**: Yixian Chen [[**GitHub homepage**](https://github.com/yxnchen)], Xinyu Chen [[**GitHub homepage**](https://github.com/xinychen)]

**Download**: This Jupyter notebook is at our GitHub repository. If you want to evaluate the code, please download the notebook from the [**transdim**](https://github.com/xinychen/transdim/blob/master/imputer/BATF.ipynb) repository.

This notebook shows how to implement the Bayesian Augmented Tensor Factorization (BATF) model on some real-world data sets. In the following, we will discuss:

- What the BATF is.

- How to implement BATF mainly using Python `numpy` with high efficiency.

- How to make imputation on some real-world spatiotemporal datasets.

To overcome the problem of missing values within multivariate time series data, this model takes into account low-rank tensor structure by folding data along day dimension. For an in-depth discussion of BATF, please see [1].

<div class="alert alert-block alert-info">
<font color="black">
<b>[1]</b> Xinyu Chen, Zhaocheng He, Yixian Chen, Yuhuan Lu, Jiawei Wang (2019). <b>Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model</b>. Transportation Research Part C: Emerging Technologies, 104: 66-77. <a href="https://doi.org/10.1016/j.trc.2019.03.003" title="PDF"><b>[PDF]</b></a> 
</font>
</div>

We start by importing the necessary dependencies. We will make use of `numpy` and `scipy`.

In [61]:
import numpy as np
from scipy.linalg import khatri_rao as kr_prod
from numpy.linalg import inv
from numpy import cov

### CP decomposition

#### CP Combination (`cp_combine`)

- **Definition**:

The CP decomposition factorizes a tensor into a sum of outer products of vectors. For example, for a third-order tensor $\mathcal{Y}\in\mathbb{R}^{m\times n\times f}$, the CP decomposition can be written as

$$\hat{\mathcal{Y}}=\sum_{s=1}^{r}\boldsymbol{u}_{s}\circ\boldsymbol{v}_{s}\circ\boldsymbol{x}_{s},$$
or element-wise,

$$\hat{y}_{ijt}=\sum_{s=1}^{r}u_{is}v_{js}x_{ts},\forall (i,j,t),$$
where vectors $\boldsymbol{u}_{s}\in\mathbb{R}^{m},\boldsymbol{v}_{s}\in\mathbb{R}^{n},\boldsymbol{x}_{s}\in\mathbb{R}^{f}$ are columns of factor matrices $U\in\mathbb{R}^{m\times r},V\in\mathbb{R}^{n\times r},X\in\mathbb{R}^{f\times r}$, respectively. The symbol $\circ$ denotes vector outer product.

- **Example**:

Given matrices $U=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ \end{array} \right]\in\mathbb{R}^{2\times 2}$, $V=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ \end{array} \right]\in\mathbb{R}^{3\times 2}$ and $X=\left[ \begin{array}{cc} 1 & 5 \\ 2 & 6 \\ 3 & 7 \\ 4 & 8 \\ \end{array} \right]\in\mathbb{R}^{4\times 2}$, then if $\hat{\mathcal{Y}}=\sum_{s=1}^{r}\boldsymbol{u}_{s}\circ\boldsymbol{v}_{s}\circ\boldsymbol{x}_{s}$, then, we have

$$\hat{Y}_1=\hat{\mathcal{Y}}(:,:,1)=\left[ \begin{array}{ccc} 31 & 42 & 65 \\ 63 & 86 & 135 \\ \end{array} \right],$$
$$\hat{Y}_2=\hat{\mathcal{Y}}(:,:,2)=\left[ \begin{array}{ccc} 38 & 52 & 82 \\ 78 & 108 & 174 \\ \end{array} \right],$$
$$\hat{Y}_3=\hat{\mathcal{Y}}(:,:,3)=\left[ \begin{array}{ccc} 45 & 62 & 99 \\ 93 & 130 & 213 \\ \end{array} \right],$$
$$\hat{Y}_4=\hat{\mathcal{Y}}(:,:,4)=\left[ \begin{array}{ccc} 52 & 72 & 116 \\ 108 & 152 & 252 \\ \end{array} \right].$$

In [62]:
def cp_combine(var):
    return np.einsum('is, js, ts -> ijt', var[0], var[1], var[2])

In [63]:
factor = [np.array([[1, 2], [3, 4]]), np.array([[1, 3], [2, 4], [5, 6]]), 
          np.array([[1, 5], [2, 6], [3, 7], [4, 8]])]
print(cp_combine(factor))
print()
print('tensor size:')
print(cp_combine(factor).shape)

[[[ 31  38  45  52]
  [ 42  52  62  72]
  [ 65  82  99 116]]

 [[ 63  78  93 108]
  [ 86 108 130 152]
  [135 174 213 252]]]

tensor size:
(2, 3, 4)


### Vector combination (`vec_combine`)

In [64]:
## 1st solution
def vec_combine(vector):
    tensor = 0
    d = len(vector)
    for i in range(d):
        ax = [len(vector[i]) if j == i else 1 for j in range(d)]
        tensor = tensor + vector[i].reshape(ax, order = 'F')
    return tensor

## 2nd solution
def vec_combine(vector):
    return (vector[0][:, np.newaxis, np.newaxis] + vector[1][np.newaxis, :, np.newaxis]
            + vector[2][np.newaxis, np.newaxis, :])

In [65]:
vector = []
for i in range(3):
    vector.append(np.array([i + 1 for i in range(i + 2)]))
print(vector)
print(vec_combine(vector))
print()
print(vector[0][1] + vector[1][1] + vector[2][2])
print(vec_combine(vector)[1, 1, 2])

[array([1, 2]), array([1, 2, 3]), array([1, 2, 3, 4])]
[[[3 4 5 6]
  [4 5 6 7]
  [5 6 7 8]]

 [[4 5 6 7]
  [5 6 7 8]
  [6 7 8 9]]]

7
7


### Tensor Unfolding (`ten2mat`)

Using numpy reshape to perform 3rd rank tensor unfold operation. [[**link**](https://stackoverflow.com/questions/49970141/using-numpy-reshape-to-perform-3rd-rank-tensor-unfold-operation)]

In [66]:
def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')

In [67]:
X = np.array([[[1, 2, 3, 4], [3, 4, 5, 6]], 
              [[5, 6, 7, 8], [7, 8, 9, 10]], 
              [[9, 10, 11, 12], [11, 12, 13, 14]]])
print('tensor size:')
print(X.shape)
print('original tensor:')
print(X)
print()
print('(1) mode-1 tensor unfolding:')
print(ten2mat(X, 0))
print()
print('(2) mode-2 tensor unfolding:')
print(ten2mat(X, 1))
print()
print('(3) mode-3 tensor unfolding:')
print(ten2mat(X, 2))

tensor size:
(3, 2, 4)
original tensor:
[[[ 1  2  3  4]
  [ 3  4  5  6]]

 [[ 5  6  7  8]
  [ 7  8  9 10]]

 [[ 9 10 11 12]
  [11 12 13 14]]]

(1) mode-1 tensor unfolding:
[[ 1  3  2  4  3  5  4  6]
 [ 5  7  6  8  7  9  8 10]
 [ 9 11 10 12 11 13 12 14]]

(2) mode-2 tensor unfolding:
[[ 1  5  9  2  6 10  3  7 11  4  8 12]
 [ 3  7 11  4  8 12  5  9 13  6 10 14]]

(3) mode-3 tensor unfolding:
[[ 1  5  9  3  7 11]
 [ 2  6 10  4  8 12]
 [ 3  7 11  5  9 13]
 [ 4  8 12  6 10 14]]


### Define Performance Metrics

- **RMSE**
- **MAPE**

In [68]:
def compute_mape(var, var_hat):
    return np.sum(np.abs(var - var_hat) / var) / var.shape[0]

def compute_rmse(var, var_hat):
    return  np.sqrt(np.sum((var - var_hat) ** 2) / var.shape[0])

### Define BATF with `Numpy`

In [80]:
def update_global_mu(sparse_tensor, bias_tensor, factor_tensor, pos_obs, tau_eps, tau0):
    tau_mu = tau_eps * len(pos_obs[0]) + tau0
    Ez = sparse_tensor[pos_obs] - bias_tensor[pos_obs] - factor_tensor[pos_obs]
    return 1 / tau_mu * tau_eps * np.sum(Ez)

In [85]:
def update_bias_vector(sparse_tensor, mu_glb, factor_tensor, vector, ind, tau_eps, tau0, dim):
    for k in range(len(dim)):
        temp = vector.copy()
        temp[k] = np.zeros((dim[k]))
        bias_tensor = vec_combine(temp)
        Ef = ind * (sparse_tensor - mu_glb - bias_tensor - factor_tensor)
        tau_bias = np.sum(ten2mat(tau_eps * ind, k), axis = 1) + tau0
        vector[k] = tau_eps / tau_bias * np.sum(ten2mat(Ef, k), axis = 1)
    return vec_combine(vector)

In [94]:
def update_factor(sparse_tensor, mu_glb, bias_tensor, factor, EUUT, ind, k, tau_eps, Sigma_U, beta0 = 1):
    dim, rank = factor[k].shape
    dim = factor[k].shape[0]
    factor_bar = np.mean(factor[k], axis = 0)
    temp = dim / (dim + beta0)
    var_mu_hyper = temp * factor_bar
    var_W_hyper = inv(np.eye(rank) + (dim - 1) * cov(factor[k], rowvar = False) 
                      + temp * beta0 * np.outer(factor_bar, factor_bar))
    var_Lambda_hyper = (dim + rank) * var_W_hyper
    
    idx = list(filter(lambda x: x != k, range(len(factor))))
    var1 = kr_prod(EUUT[idx[1]], EUUT[idx[0]]).T
    var2 = kr_prod(factor[idx[1]], factor[idx[0]]).T
    var3 = (var1 @ ten2mat(tau_eps * ind, k).T).reshape([rank, rank, dim]) + var_Lambda_hyper[:, :, np.newaxis]
    Ew = ten2mat(ind * (sparse_tensor - mu_glb - bias_tensor), k).T
    var4 = tau_eps * var2 @ Ew + (var_Lambda_hyper @ var_mu_hyper)[:, np.newaxis]
    for i in range(dim):
        Sigma_U[k][:, :, i] = inv(var3[:, :, i])
        factor[k][i, :] = inv(var3[:, :, i]) @ var4[:, i]
    EUUT[k] = (Sigma_U[k].reshape(rank * rank, dim, order = 'F') + kr_prod(factor[k].T, factor[k].T)).T
    
    return factor[k], EUUT[k]

In [92]:
def BATF_VB(dense_tensor, sparse_tensor, vector, factor, max_iter):
    """Bayesian Augmented Tensor Factorization (BATF) with variational Bayes."""

    dim = np.array(sparse_tensor.shape)
    rank = factor[0].shape[1]
    if np.isnan(sparse_tensor).any() == False:
        ind = sparse_tensor != 0
        pos_obs = np.where(ind)
        pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
    elif np.isnan(sparse_tensor).any() == True:
        pos_test = np.where((dense_tensor != 0) & (np.isnan(sparse_tensor)))
        ind = ~np.isnan(sparse_tensor)
        pos_obs = np.where(ind)
        sparse_tensor[np.isnan(sparse_tensor)] = 0
    num_obs = len(pos_obs[0])

    tau_eps = 1
    tau0 = 1
    alpha = 1e-6
    beta = 1e-6

    # init variables
    mu_glb = 0
    Sigma_U = []
    EUUT = []
    for k in range(len(dim)):
        Sigma_U.append(np.tile(np.eye(rank), (1,dim[k])).reshape(rank, rank, dim[k], order='F'))
        EUUT.append(Sigma_U[k].reshape(rank * rank, dim[k], order='F').T)
    bias_tensor = vec_combine(vector)
    factor_tensor = cp_combine(factor)

    for it in range(max_iter):
        mu_glb = update_global_mu(sparse_tensor, bias_tensor, factor_tensor, pos_obs, tau_eps, tau0)
        bias_tensor = update_bias_vector(sparse_tensor, mu_glb, factor_tensor, vector, ind, tau_eps, tau0, dim)
        for k in range(len(dim)):
            factor[k], EUUT[k] = update_factor(sparse_tensor, mu_glb, bias_tensor, factor, EUUT, 
                                               ind, k, tau_eps, Sigma_U)
        factor_tensor = cp_combine(factor)
        tensor_hat = mu_glb + bias_tensor + factor_tensor
        if (it + 1) % 5 == 0:
            print('Iter: {}'.format(it + 1))
            print('MAPE: {:.6}'.format(compute_mape(dense_tensor[pos_test], tensor_hat[pos_test])))
            print('RMSE: {:.6}'.format(compute_rmse(dense_tensor[pos_test], tensor_hat[pos_test])))
            print()

        # update precision tau_eps
        Emuglb2 = num_obs * mu_glb ** 2
        Ebias2 = bias_tensor[pos_obs] @ bias_tensor[pos_obs]
        Efactor2 = 0
        temp0 = []
        for j in range(rank):
            for k in range(len(dim)):
                temp0.append(EUUT[k][:, j * rank : (j + 1) * rank])
            Efactor2 = Efactor2 + np.sum(binary_tensor.flatten('F') @ khatri_rao(khatri_rao(temp0[2], temp0[1]), temp0[0]))
        Ecomb = (np.sum(mu_glb * bias_tensor[pos_obs]) + np.sum(mu_glb * factor_tensor[pos_obs])
                 + bias_tensor[pos_obs] @ factor_tensor[pos_obs])
        EYstar2 = Emuglb2 + Ebias2 + Efactor2 + 2 * Ecomb
        Eerr = sparse_tensor.flatten('F') @ sparse_tensor.flatten('F') - \
               2 * sparse_tensor.flatten('F') @ tensor_hat.flatten('F') + EYstar2
        tau_eps = (alpha + 0.5 * num_obs) / (beta + 0.5 * Eerr)

        # evaluate lower bound
        # ...

    print('Final results:')
    print('Imputation MAPE: {:.6}'.format(compute_mape(dense_tensor[pos_test], tensor_hat[pos_test])))
    print('Imputation RMSE: {:.6}'.format(compute_rmse(dense_tensor[pos_test], tensor_hat[pos_test])))
    print()

    return tensor_hat, mu_glb, vector, factor

## Data Organization

### 1) Matrix Structure

We consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{f},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We express spatio-temporal dataset as a matrix $Y\in\mathbb{R}^{m\times f}$ with $m$ rows (e.g., locations) and $f$ columns (e.g., discrete time intervals),

$$Y=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{m1} & y_{m2} & \cdots & y_{mf} \\ \end{array} \right]\in\mathbb{R}^{m\times f}.$$

### 2) Tensor Structure

We consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{nf},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We partition each time series into intervals of predifined length $f$. We express each partitioned time series as a matrix $Y_{i}$ with $n$ rows (e.g., days) and $f$ columns (e.g., discrete time intervals per day),

$$Y_{i}=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{n1} & y_{n2} & \cdots & y_{nf} \\ \end{array} \right]\in\mathbb{R}^{n\times f},i=1,2,...,m,$$

therefore, the resulting structure is a tensor $\mathcal{Y}\in\mathbb{R}^{m\times n\times f}$.

In [75]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [93]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 10
vector = []
factor = []
for k in range(len(dim)):
    vector.append(5.0 * np.random.randn(dim[k],))
    factor.append(1.5 * np.random.randn(dim[k], rank))
max_iter = 100
BATF_VB(dense_tensor, sparse_tensor, vector, factor, max_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 5
MAPE: 0.10423
RMSE: 4.38207

Iter: 10
MAPE: 0.102899
RMSE: 4.34359

Iter: 15
MAPE: 0.102486
RMSE: 4.32888

Iter: 20
MAPE: 0.102267
RMSE: 4.31988

Iter: 25
MAPE: 0.102138
RMSE: 4.3141

Iter: 30
MAPE: 0.102033
RMSE: 4.31037

Iter: 35
MAPE: 0.101902
RMSE: 4.30752

Iter: 40
MAPE: 0.101776
RMSE: 4.3054

Iter: 45
MAPE: 0.101688
RMSE: 4.30406

Iter: 50
MAPE: 0.101645
RMSE: 4.30351

Iter: 55
MAPE: 0.101647
RMSE: 4.3038

Iter: 60
MAPE: 0.101685
RMSE: 4.3047

Iter: 65
MAPE: 0.101726
RMSE: 4.30553

Iter: 70
MAPE: 0.101751
RMSE: 4.30595

Iter: 75
MAPE: 0.101765
RMSE: 4.30603

Iter: 80
MAPE: 0.101774
RMSE: 4.30595

Iter: 85
MAPE: 0.101782
RMSE: 4.30582

Iter: 90
MAPE: 0.101791
RMSE: 4.30572

Iter: 95
MAPE: 0.101803
RMSE: 4.30569

Iter: 100
MAPE: 0.101818
RMSE: 4.30575

Final results:
Imputation MAPE: 0.101818
Imputation RMSE: 4.30575

Running time: 161 seconds


In [None]:
a = np.random.rand(3, 4, 5)
b = a.flatten()
c = a.flatten('F')
print(np.sum(np.abs(b - c)))

## Evaluation on London Movement Speed Data

In [None]:
import numpy as np
np.random.seed(1000)

mask_rate = 0.20

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
pos_obs = np.where(dense_mat != 0)
num = len(pos_obs[0])
sample_ind = np.random.choice(num, size = int(mask_rate * num), replace = False)
sparse_mat = dense_mat.copy()
sparse_mat[pos_obs[0][sample_ind], pos_obs[1][sample_ind]] = 0

dense_tensor = dense_mat.reshape([dense_mat.shape[0], 30, 24])
sparse_tensor = sparse_mat.reshape([sparse_mat.shape[0], 30, 24])
del dense_mat, sparse_mat

**Question**: Given only the partially observed data $\mathcal{Y}\in\mathbb{R}^{m\times n\times f}$, how can we impute the unknown missing values?

The main influential factors for such imputation model are:

- `rank`.

- `burn_iter`.

- `gibbs_iter`.

In [None]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 10
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.rand(dim[k], rank))
burn_iter = 100
gibbs_iter = 50
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

In [95]:
# guangzhou speed data
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.3

# =============================================================================
### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario:
# binary_tensor = np.zeros(dense_tensor.shape)
# for i1 in range(dense_tensor.shape[0]):
#     for i2 in range(dense_tensor.shape[1]):
#         binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [None]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 80
vector = []
factor = []
for k in range(len(dim)):
    vector.append(5.0 * np.random.randn(dim[k],))
    factor.append(1.5 * np.random.randn(dim[k], rank))
max_iter = 100
BATF_VB(dense_tensor, sparse_tensor, vector, factor, max_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 5
MAPE: 0.0878859
RMSE: 3.76203

Iter: 10
MAPE: 0.0861377
RMSE: 3.69592

Iter: 15
MAPE: 0.0854057
RMSE: 3.67054



### License

<div class="alert alert-block alert-danger">
<b>This work is released under the MIT license.</b>
</div>