Tensors are higher order extensions of matrices that can encode multi-dimensional data

![tensor_illustration](../img/tensor_cartoon.jpg)

In this tutorial we will show how to manipulate tensors as NDArrays, and write from scratch functions to manipulate these as defined in [TensorLy](http://tensorly.github.io).

In [1]:
import mxnet.ndarray as nd

# 1. Creating a Tensor

A tensor can be represented in multiple ways. The simplest is the slice representation through multiple matrices.

Let's take for this example the tensor $\tilde X$ defined by its frontal slices:

$$
   X_1 = 
   \left[
   \begin{matrix}
   0  & 2  & 4  & 6\\
   8  & 10 & 12 & 14\\
   16 & 18 & 20 & 22
   \end{matrix}
   \right]
$$

and 

$$
   X_2 =
   \left[
   \begin{matrix}
   1  & 3  & 5  & 7\\
   9  & 11 & 13 & 15\\
   17 & 19 & 21 & 23
   \end{matrix}
   \right]
$$




In Python, this array can be expressed as a numpy array::

In [2]:
X = nd.arange(24).reshape((3, 4, 2))

In [3]:
X


[[[  0.   1.]
  [  2.   3.]
  [  4.   5.]
  [  6.   7.]]

 [[  8.   9.]
  [ 10.  11.]
  [ 12.  13.]
  [ 14.  15.]]

 [[ 16.  17.]
  [ 18.  19.]
  [ 20.  21.]
  [ 22.  23.]]]
<NDArray 3x4x2 @cpu(0)>

You can view the frontal slices by fixing the last axis:

In [4]:
X[:, :, 0]


[[  0.   2.   4.   6.]
 [  8.  10.  12.  14.]
 [ 16.  18.  20.  22.]]
<NDArray 3x4 @cpu(0)>

In [5]:
X[:, :, 1]


[[  1.   3.   5.   7.]
 [  9.  11.  13.  15.]
 [ 17.  19.  21.  23.]]
<NDArray 3x4 @cpu(0)>

# 3. Basic Tensor Operations

## 3.1 Unfolding

Also called **matrization**, **unfolding** a tensor is done by reading the element in a given way as to obtain a matrix instead of a tensor.

It is done by stacking the **fibers** of the tensor into a matrix.

![tensor_illustration](../img/tensor_fibers.png)
Illustration: *Nonnegative Matrix and Tensor Factorizations*, Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan, and Shun-ichi Amari, John Wiley & Sons, 2009.



### Definition
For a tensor of size $(I_1, I_2, \cdots, I_N)$, the n-mode unfolding of this tensor will be of size $(I_n, I_1 \times \cdots \times I_{n-1} \times I_{n+1} \cdots \times I_N)$ and is obtained by reading the tensor as a matrix with the $n$-th dimension first. 


Specifically, given a tensor $\tilde X \in \mathbb{R}^{I_1, I_2, \cdots, I_N}$, the
mode-n unfolding of $\tilde X$ is a matrix $\mathbf{X}_{[n]} \in \mathbb{R}^{I_n, I_M}$,
with $M = \prod\limits_{\substack{k=1,\\k \neq n}}^N I_k$ and is defined by
the mapping from element $(i_1, i_2, \cdots, i_N)$ to $(i_n, j)$, with


$$
    j = \sum\limits_{\substack{k=1,\\k \neq n}}^N i_k \times \prod\limits_{\substack{m=k+1,\\m \neq n}}^N I_m.
$$

### Convention

   Traditionally, mode-1 unfolding denotes the unfolding along the first dimension.
   However, to be consistent with the Python indexing that always starts at zero,
   as done in tensorly, we will start indexing modes at zero!

   Therefore ``unfold(tensor, 0)`` will unfold said tensor along its first dimension!
   

### Example

For instance, using the $\tilde X$ previously defined:
$$
   X_1 = 
   \left[
   \begin{matrix}
   0  & 2  & 4  & 6\\
   8  & 10 & 12 & 14\\
   16 & 18 & 20 & 22
   \end{matrix}
   \right]
$$

and 

$$
   X_2 =
   \left[
   \begin{matrix}
   1  & 3  & 5  & 7\\
   9  & 11 & 13 & 15\\
   17 & 19 & 21 & 23
   \end{matrix}
   \right]
$$

The 0-mode unfolding of $\tilde X$:

$$
   \tilde X_{[0]} =
   \left[ \begin{matrix}
      0 & 1 & 2 & 3 & 4 & 5 & 6 & 7\\
      8 & 9 & 10 & 11 & 12 & 13 & 14 & 15\\
      16 & 17 & 18 & 19 & 20 & 21 & 22 & 23\\
   \end{matrix} \right]
$$

The 1-mode unfolding is given by:

$$
   \tilde X_{[1]} =
   \left[ \begin{matrix}
      0 & 1 & 8 & 9 & 16 & 17\\
      2 & 3 & 10 & 11 & 18 & 19\\
      4 & 5 & 12 & 13 & 20 & 21\\
      6 & 7 & 14 & 15 & 22 & 23\\
   \end{matrix} \right]
$$

Finally, the 2-mode unfolding is the unfolding along the last axis:

$$
    \tilde X_{[2]} =
   \left[ \begin{matrix}
      0 & 2 & 4 & 6 & 8 & 10 & 12 & 14 & 16 & 18 & 20 & 22\\
      1 & 3 & 5 & 7 & 9 & 11 & 13 & 15 & 17 & 19 & 21 & 23\\
   \end{matrix} \right]
$$

### In MXNet

Let's define the unfolding function in MXNet. Given the mode $n$ along which to unfold, it will take a tensor, put the $n$-th dimension first, and matricize the result. Note that our definition of unfolding corresponds to a C-ordering of the elements. MXNet also has a C ordering of the elements, making that matricization a simple reshaping.

In [1]:
def unfold(tensor, mode):
    """Returns the mode-`mode` unfolding of `tensor` with modes starting at `0`.
    
    Parameters
    ----------
    tensor : ndarray
    mode : int, default is 0
           indexing starts at 0, therefore mode is in ``range(0, tensor.ndim)``
    
    Returns
    -------
    ndarray
        unfolded_tensor of shape ``(tensor.shape[mode], -1)``
    """
    return nd.reshape(nd.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1))

In [7]:
unfold(X, mode=0)


[[  0.   1.   2.   3.   4.   5.   6.   7.]
 [  8.   9.  10.  11.  12.  13.  14.  15.]
 [ 16.  17.  18.  19.  20.  21.  22.  23.]]
<NDArray 3x8 @cpu(0)>

In [8]:
unfold(X, mode=1)


[[  0.   1.   8.   9.  16.  17.]
 [  2.   3.  10.  11.  18.  19.]
 [  4.   5.  12.  13.  20.  21.]
 [  6.   7.  14.  15.  22.  23.]]
<NDArray 4x6 @cpu(0)>

In [9]:
unfold(X, mode=2)


[[  0.   2.   4.   6.   8.  10.  12.  14.  16.  18.  20.  22.]
 [  1.   3.   5.   7.   9.  11.  13.  15.  17.  19.  21.  23.]]
<NDArray 2x12 @cpu(0)>

## 3.2 Folding

Folding is the inverse operation: we reshape the matrix into a tensor and move back the first dimension back to its original place.

In [10]:
def fold(unfolded_tensor, mode, shape):
    """Refolds the mode-`mode` unfolding into a tensor of shape `shape`
    
    Parameters
    ----------
    unfolded_tensor : ndarray
        unfolded tensor of shape ``(shape[mode], -1)``
    mode : int
        the mode of the unfolding
    shape : tuple
        shape of the original tensor before unfolding
    
    Returns
    -------
    ndarray
        folded_tensor of shape `shape`
    """
    full_shape = list(shape)
    mode_dim = full_shape.pop(mode)
    full_shape.insert(0, mode_dim)
    return nd.moveaxis(nd.reshape(unfolded_tensor, full_shape), 0, mode)

In [11]:
unfolding = unfold(X, 1)
original_shape = X.shape
fold(unfolding, mode=1, shape=original_shape)


[[[  0.   1.]
  [  2.   3.]
  [  4.   5.]
  [  6.   7.]]

 [[  8.   9.]
  [ 10.  11.]
  [ 12.  13.]
  [ 14.  15.]]

 [[ 16.  17.]
  [ 18.  19.]
  [ 20.  21.]
  [ 22.  23.]]]
<NDArray 3x4x2 @cpu(0)>

## 3.3 n-mode product

Also known as **tensor contraction**. This is a natural generalization of matrix-vector and matrix-matrix product. When multiplying a tensor by a matrix or a vector, we now have to specify the **mode** $n$ along which to take the product.

### Tensor times matrix

In that case we are doing an operation analogous to a matrix multiplication on the $n$-th mode. Given a tensor $\tilde X$ of size $(I_1, I_2, \cdots, I_N)$, and a matrix $M$ of size $(D, I_n)$, the $n$-mode product of $\tilde X$ by $M$ is written $\tilde X \times_n M$ and is of size $(D, I_1 \times \cdots \times I_{n-1} \times I_{n+1} \cdots \times I_N)$.


One simple way to mathematically define the n-mode product is using the unfolding: if we write $\tilde R = \tilde X \times_n M$, then we have:

$$
    \tilde R_{[n]} = M \times \tilde X_{[n]}
$$

As a consequence, to get the n-mode product of $\tilde X$ by $M$, we can simply take a matrix product between $M$ and the unfolding of $\tilde X$ along the $n^{th}$ dimension, and refold the result into a tensor of shape $(I_1, \cdots, I_{n-1}, D, I_{n+1}, \cdots, I_N)$.

### Tensor times vector

In that case we are contracting over the $n$-th mode by multiplying it with a vector. Given a tensor $\tilde X$ of size $(I_1, I_2, \cdots, I_N)$, and a vector $v$ of size $(I_n)$, the $n$-mode product of $\tilde X$ by $v$ is written $\tilde X \times_n v$ and is of size $(I_1 \times \cdots \times I_{n-1} \times I_{n+1} \cdots \times I_N)$ --we have essentially summed over (or contracted over) the $n$-th dimension--.

![tensor_illustration](../img/tensor_contraction.png)


### Example

We will write a function `mode_dot` that works transparently for multiplying a tensor by a matrix or a vector, along a given mode.

In [12]:
def mode_dot(tensor, matrix_or_vector, mode):
    """n-mode product of a tensor by a matrix at the specified mode.

    Parameters
    ----------
    tensor : ndarray
        tensor of shape ``(i_1, ..., i_k, ..., i_N)``
    matrix_or_vector : ndarray
        1D or 2D array of shape ``(J, i_k)`` or ``(i_k, )``
        matrix or vectors to which to n-mode multiply the tensor
    mode : int

    Returns
    -------
    ndarray
        `mode`-mode product of `tensor` by `matrix_or_vector`
        * of shape :math:`(i_1, ..., i_{k-1}, J, i_{k+1}, ..., i_N)` if matrix_or_vector is a matrix
        * of shape :math:`(i_1, ..., i_{k-1}, i_{k+1}, ..., i_N)` if matrix_or_vector is a vector
    """
    # the mode along which to fold might decrease if we take product with a vector
    fold_mode = mode
    new_shape = list(tensor.shape)

    # tensor times vector case: make sure the sizes are correct 
    # (we are contracting over one dimension which then disappearas)
    if matrix_or_vector.ndim == 1: 
        if len(new_shape) > 1:
            new_shape.pop(mode)
            fold_mode -= 1
        else:
            new_shape = [1]

    # This is the actual operation: we use the equivalent formulation of the n-mode-product using the unfolding
    res = nd.dot(matrix_or_vector, unfold(tensor, mode))

    # refold the result into a tensor and return it 
    return fold(res, fold_mode, new_shape)


#### Tensor times matrix

With the tensor $\tilde X$ of size (3, 4, 2) we defined previously, let's define a matrix M of size (5, 4) to multiply along the second mode:

In [13]:
M = nd.arange(4*5).reshape((5, 4))
print(M.shape)

(5, 4)


Keep in mind indexing starts at zero, so the second mode is represented by `mode=1`:

In [14]:
res = mode_dot(X, M, mode=1)

As expected the result is of shape (3, 5, 2)

In [15]:
res.shape

(3, 5, 2)

#### Tensor times vector

Similarly, we can contract along mode 1 with a vector of size 4 (our tensor is of size (3, 4, 2).


In [16]:
v = nd.arange(4)
print(v.shape)

(4,)


In [17]:
res = mode_dot(X, v, mode=1)

Since we have multiplied by a vector, we have effectively contracted out one mode of the tensor so the result is a matrix:

In [18]:
res.shape

(3, 2)