# Chapter 1: Tensor Factorization with Alternative Least Square (ALS)

---
**About this chapter**: In many real-world applications, data are multi-dimensional tensors by nature rather than table matrices. In this chapter, we first provide a preliminary overview of tensor factorization family. Then, we provide an tensor factorization implementation using an iterative Alternative Least Square (ALS), which is a good starting point for understanding tensor factorization. Finally, we adapt two public real-world datasets (i.e., Urban traffic speed datasets in Guangzhou, China and Metro station passenger flow datasets in Hangzhou, China) to third-order tensors and evaluate tensor factorization techniques with missing data imputation task.

---

## 1.1 Tensor Factorization Family

**1) Tucker Factorization**

The idea of tensor decomposition/factorization is to find a low-rank structure approximating the original data. In mathematics, Tucker factorization decomposes a tensor into a set of matrices and one small core tensor [[**wiki**](https://en.wikipedia.org/wiki/Tucker_decomposition)]. Formally, given a third-order tensor $\mathcal{Y}\in\mathbb{R}^{M\times N\times T}$, the Tucker form of a tensor (also known as Tucker decomposition/factorization) with low-rank $\left(R_1,R_2,R_3\right)$ is defined as

$$\mathcal{Y}\approx\mathcal{G}\times_1 U\times_2 V\times_3 X,$$
where $\mathcal{G}\in\mathbb{R}^{R_1\times R_2\times R_3}$ is core tensor, and $U\in\mathbb{R}^{M\times R_1},V\in\mathbb{R}^{N\times R_2},X\in\mathbb{R}^{T\times R_3}$ are factor matrices.

Element-wise, for any $(i,j,t)$-th entry in tensor $\mathcal{Y}$, the above formula of Tucker factorization can be rewritten as

$$y_{ijt}\approx\sum_{r_1=1}^{R_1}\sum_{r_2=1}^{R_2}\sum_{r_3=1}^{R_3}g_{r_1r_2r_3}u_{ir_1}v_{jr_2}x_{tr_3},$$
where $i\in\left\{1,2,...,M\right\}$, $j\in\left\{1,2,...,N\right\}$, and $t\in\left\{1,2,...,T\right\}$.

In [1]:
def tucker_combine(core_tensor, mat1, mat2, mat3):
    import numpy as np
    return np.einsum('abc, ia, jb, tc -> ijt', core_tensor, mat1, mat2, mat3)

In [7]:
import numpy as np
dim1 = 2
dim2 = 2
dim3 = 3
r1 = 2
r2 = 2
r3 = 2
core_tensor = np.random.rand(r1, r2, r3)
mat1 = np.random.rand(dim1, r1)
mat2 = np.random.rand(dim2, r2)
mat3 = np.random.rand(dim3, r3)
tensor = tucker_combine(core_tensor, mat1, mat2, mat3)
print(tensor)

[[[0.62385494 0.61868749 0.51256948]
  [0.32577527 0.32350121 0.26792985]]

 [[0.40753966 0.41597023 0.34228418]
  [0.21298488 0.21772561 0.17909269]]]


**2) CP Factorization**

Another common-used type of tensor factorization is CANDECOMP/PARAFAC (CP) factorization. This form assumes that a data tensor is approximated by a sum of outer products of few factor vectors. Specifically, given a third-order tensor $\mathcal{Y}\in\mathbb{R}^{M\times N\times T}$, CP factorization is

$$\mathcal{Y}\approx\sum_{r=1}^{R}\boldsymbol{u}_{r}\circ\boldsymbol{v}_{r}\circ\boldsymbol{x}_{r},$$
where vector $\boldsymbol{u}_{r}\in\mathbb{R}^{M}$ is $r$-th column of factor matrix $U\in\mathbb{R}^{M\times R}$, and there are same definitions for vectors $\boldsymbol{v}_{r}\in\mathbb{R}^{N}$ and $\boldsymbol{x}_{r}\in\mathbb{R}^{T}$ in factor matrices $V\in\mathbb{R}^{N\times R}$ and $X\in\mathbb{R}^{T\times R}$, respectively. In fact, the outer product of these vectors is a rank-one tensor, therefore, we can approximate original data by $R$ rank-one tensors.

Element-wise, for any $(i,j,t)$-th entry in tensor $\mathcal{Y}$, we have

$$y_{ijt}\approx\sum_{r=1}^{R}u_{ir}v_{jr}x_{tr},$$
where $i\in\left\{1,2,...,M\right\}$, $j\in\left\{1,2,...,N\right\}$, and $t\in\left\{1,2,...,T\right\}$. The symbol $\circ$ denotes vector outer product.

- **Example of CP combination**:

Given matrices $U=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ \end{array} \right]\in\mathbb{R}^{2\times 2}$, $V=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ \end{array} \right]\in\mathbb{R}^{3\times 2}$ and $X=\left[ \begin{array}{cc} 1 & 5 \\ 2 & 6 \\ 3 & 7 \\ 4 & 8 \\ \end{array} \right]\in\mathbb{R}^{4\times 2}$, then if $\hat{\mathcal{Y}}=\sum_{r=1}^{R}\boldsymbol{u}_{r}\circ\boldsymbol{v}_{r}\circ\boldsymbol{x}_{r}$, then, we have

$$\hat{Y}_1=\hat{\mathcal{Y}}(:,:,1)=\left[ \begin{array}{ccc} 31 & 42 & 65 \\ 63 & 86 & 135 \\ \end{array} \right],$$
$$\hat{Y}_2=\hat{\mathcal{Y}}(:,:,2)=\left[ \begin{array}{ccc} 38 & 52 & 82 \\ 78 & 108 & 174 \\ \end{array} \right],$$
$$\hat{Y}_3=\hat{\mathcal{Y}}(:,:,3)=\left[ \begin{array}{ccc} 45 & 62 & 99 \\ 93 & 130 & 213 \\ \end{array} \right],$$
$$\hat{Y}_4=\hat{\mathcal{Y}}(:,:,4)=\left[ \begin{array}{ccc} 52 & 72 & 116 \\ 108 & 152 & 252 \\ \end{array} \right].$$

In [9]:
def cp_combine(mat1, mat2, mat3):
    import numpy as np
    return np.einsum('ir, jr, tr -> ijt', mat1, mat2, mat3)

In [12]:
U = np.array([[1, 2], [3, 4]])
V = np.array([[1, 3], [2, 4], [5, 6]])
X = np.array([[1, 5], [2, 6], [3, 7], [4, 8]])
print(cp_combine(U, V, X))
print()
print('tensor size:')
print(cp_combine(U, V, X).shape)

[[[ 31  38  45  52]
  [ 42  52  62  72]
  [ 65  82  99 116]]

 [[ 63  78  93 108]
  [ 86 108 130 152]
  [135 174 213 252]]]

tensor size:
(2, 3, 4)


## 1.2 Optimization Problem

