In [35]:
from tensor_visualizations import *

# Tensor Derivatives in Deep Learning

In this notebook, we'll discuss some of the key derivatives needed to understand computations in the backpropagation (BP) step of the learning process.  
While it aims to provide enough essential information for practical understanding, it does not cover all aspects of the topic in depth.

**Table of Contents**
1. [Notation](#Notation)
2. [Scalar Derivatives](#Scalar-Derivatives)
    - [Scalar w.r.t. Vector](#Scalar-w.r.t.-Vector)
    - [Scalar w.r.t. Matrix](#Scalar-w.r.t.-Matrix)
    - [Scalar w.r.t. 3D Tensor](#Scalar-w.r.t.-3D-Tensor)
3. [Vector Derivatives](#Vector-Derivatives)
    - [Vector w.r.t. Scalar](#Vector-w.r.t.-Scalar)
    - [Vector w.r.t. Vector (Jacobian)](#Vector-w.r.t.-Vector)
    - [Vector w.r.t. Matrix](#Vector-w.r.t.-Matrix)
    - [Vector w.r.t. 3D Tensor](#Vector-w.r.t.-Tensor)
4. [Matrix Derivatives](#Matrix-Derivatives)
    - [Matrix w.r.t. Scalar](#Matrix-w.r.t.-Scalar)
    - [Matrix w.r.t. Vector](#Matrix-w.r.t.-Vector)
    - [Matrix w.r.t. Matrix](#Matrix-w.r.t.-Matrix)
    - [Matrix w.r.t. Tensor](#Matrix-w.r.t.-Tensor)
5. [Summary Table](#Summary-Table)


## Notation

- Scalars: $s, x, y$
- Vectors: $\mathbf{a} \in \mathbb{R}^m$, $\mathbf{v} \in \mathbb{R}^n$
- Matrices: $\mathbf{X} \in \mathbb{R}^{m \times n}$
- Tensors: $\boldsymbol{\mathcal{T}} \in \mathbb{R}^{m \times n \times p}$
- Indices: $i, j, k, l$ (e.g., $\mathbf{a}_i$, $\mathbf{X}_{ij}$, $\boldsymbol{\mathcal{T}}_{ijk}$)
- Dimensions: $m, n, p, q$
- Partial derivatives: $\frac{\partial}{\partial x}$

**Example:**
- $\mathbf{a} = [a_1, a_2, \ldots, a_m]^T$
- $\mathbf{X} = [X_{ij}]_{i=1,\ldots,m;\;j=1,\ldots,n}$
- $\boldsymbol{\mathcal{T}} = [\mathcal{T}_{ijk}]_{i=1,\ldots,m;\;j=1,\ldots,n;\;k=1,\ldots,p}$

---
---

# Scalar Derivatives

### Scalar w.r.t. Vector

$$\boxed{\frac{\partial s}{\partial \mathbf{v}} = \mathbf{a}}$$

**Definition:**

Suppose $\mathbf{a}, \mathbf{v} \in \mathbb{R}^n$ and consider the scalar $s = \mathbf{v}^T \mathbf{a} = \sum_{i=1}^n v_i a_i$.

The gradient of $s$ with respect to the vector $\mathbf{v}$ is:

$$
\frac{\partial s}{\partial \mathbf{v}} =
\begin{bmatrix}
    \frac{\partial s}{\partial v_1} \\
    \frac{\partial s}{\partial v_2} \\
    \vdots \\
    \frac{\partial s}{\partial v_n}
\end{bmatrix}
=
\begin{bmatrix}
    a_1 \\
    a_2 \\
    \vdots \\
    a_n
\end{bmatrix}
= \mathbf{a}
$$

<div style="background-color:#e6f7ff; color:#000; border-left:5px solid #1890ff; padding:10px;">
<b>Key Result:</b> The gradient of a scalar with respect to a vector is a vector of the same length.
</div>

**Output Dimension:**

If $\mathbf{v} \in \mathbb{R}^n$ then $\frac{\partial s}{\partial \mathbf{v}} \in \mathbb{R}^n$.

**Iterative Process:**

For $i = 1$ to $n$:<br>
&nbsp;&nbsp;Compute $\frac{\partial s}{\partial v_i}$

---
---

In [None]:
visualize_scalar_wrt_vector_simple(n=3)

interactive(children=(IntSlider(value=0, description='Step:', max=3), Output()), _dom_classes=('widget-interac…

### Scalar w.r.t. Matrix

$$\boxed{\frac{\partial s}{\partial \mathbf{A}}}$$

**Definition:**

Let $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $s = f(\mathbf{A})$ be a scalar function of $\mathbf{A}$.

The gradient of $s$ with respect to $\mathbf{A}$ is the $m \times n$ matrix:

$$
\frac{\partial s}{\partial \mathbf{A}} =
\begin{bmatrix}
    \frac{\partial s}{\partial A_{11}} & \frac{\partial s}{\partial A_{12}} & \cdots & \frac{\partial s}{\partial A_{1n}} \\
    \frac{\partial s}{\partial A_{21}} & \frac{\partial s}{\partial A_{22}} & \cdots & \frac{\partial s}{\partial A_{2n}} \\
    \vdots & \vdots & \ddots & \vdots \\
    \frac{\partial s}{\partial A_{m1}} & \frac{\partial s}{\partial A_{m2}} & \cdots & \frac{\partial s}{\partial A_{mn}}
\end{bmatrix}
$$

<div style="background-color:#e6f7ff; color:#000; border-left:5px solid #1890ff; padding:10px;">
<b>Key Result:</b> The gradient of a scalar with respect to a matrix is a matrix of the same shape.
</div>

**Output Dimension:**

If $\mathbf{A} \in \mathbb{R}^{m \times n}$ then $\frac{\partial s}{\partial \mathbf{A}} \in \mathbb{R}^{m \times n}$.

**Iterative Process:**

For $i = 1$ to $m$:<br>
&nbsp;&nbsp;For $j = 1$ to $n$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;Compute $\frac{\partial s}{\partial A_{ij}}$

---
---

In [37]:
visualize_scalar_wrt_matrix_simple(m=2, n=3)

interactive(children=(IntSlider(value=0, description='Step:', max=6), Output()), _dom_classes=('widget-interac…

### Scalar w.r.t. 3D Tensor

$$\boxed{\frac{\partial s}{\partial \mathcal{T}}}$$

**Definition:**

Suppose $\mathcal{T} \in \mathbb{R}^{m \times n \times p}$ and $s$ is a scalar-valued function of $\mathcal{T}$, that is $s = f(\mathcal{T})$ where $f : \mathbb{R}^{m \times n \times p} \to \mathbb{R}$.

The gradient of $s$ with respect to $\mathcal{T}$ is the $m \times n \times p$ tensor:

$$
\frac{\partial s}{\partial \mathcal{T}} = \left[\, \frac{\partial s}{\partial \mathcal{T}_{ijk}} \,\right]_{i=1,\ldots,m;\;j=1,\ldots,n;\;k=1,\ldots,p}
$$

<div style="background-color:#e6f7ff; color:#000; border-left:5px solid #1890ff; padding:10px;">
<b>Key Result:</b> The gradient of a scalar with respect to a 3D tensor is a tensor of the same shape.
</div>

**Output Dimension:**

If $\mathcal{T} \in \mathbb{R}^{m \times n \times p}$ then $\frac{\partial s}{\partial \mathcal{T}} \in \mathbb{R}^{m \times n \times p}$.

**Iterative Process:**

For $i = 1$ to $m$:<br>
&nbsp;&nbsp;For $j = 1$ to $n$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;For $k = 1$ to $p$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Compute $\frac{\partial s}{\partial \mathcal{T}_{ijk}}$

---
---

In [38]:
visualize_scalar_wrt_tensor_simple(m=2, n=2, p=3)

interactive(children=(IntSlider(value=0, description='Step:', max=12), Output()), _dom_classes=('widget-intera…

### RGB Example: Gradient of a Scalar with Respect to a 3D Tensor

Let’s see how the gradient looks for a $2\times2$ RGB image tensor $T \in \mathbb{R}^{2 \times 2 \times 3}$, where $T_{ijk}$ is the pixel at row $i$, column $j$, channel $k$ ($k=1$ is Red, $k=2$ is Green, $k=3$ is Blue).

Suppose:
$$
s = f(T) = T_{111} + 2T_{122} + 3T_{213}
$$

#### Iteration Process
We want to compute the gradient tensor $\frac{\partial s}{\partial T}$, which has the same shape as $T$ ($2 \times 2 \times 3$).  
We fill in each entry by iterating over all $i = 1,2$, $j = 1,2$, $k = 1,2,3$:

For each $(i, j, k)$:
- Compute $\frac{\partial s}{\partial T_{ijk}}$

**Step-by-step:**

| Pixel $(i,j)$ | $k=1$ (Red) | $k=2$ (Green) | $k=3$ (Blue) |
|:------------:|:----------:|:------------:|:------------:|
| (1,1)        | 1          | 0            | 0            |
| (1,2)        | 0          | 2            | 0            |
| (2,1)        | 0          | 0            | 3            |
| (2,2)        | 0          | 0            | 0            |

**Tensor notation:**
$$
\frac{\partial s}{\partial T} =
\begin{bmatrix}
  \begin{bmatrix} 1 & 0 & 0 \end{bmatrix} & \begin{bmatrix} 0 & 2 & 0 \end{bmatrix} \\
  \begin{bmatrix} 0 & 0 & 3 \end{bmatrix} & \begin{bmatrix} 0 & 0 & 0 \end{bmatrix}
\end{bmatrix}
$$

- Row 1, Col 1: $[1,\,0,\,0]$ (Red: $1$, Green: $0$, Blue: $0$)
- Row 1, Col 2: $[0,\,2,\,0]$ (Red: $0$, Green: $2$, Blue: $0$)
- Row 2, Col 1: $[0,\,0,\,3]$ (Red: $0$, Green: $0$, Blue: $3$)
- Row 2, Col 2: $[0,\,0,\,0]$ (all zeros)

<div style="background-color:#e6f7ff; color:#000; border-left:5px solid #1890ff; padding:10px;">

<b>Key Result:</b> The gradient tensor has the same shape as $T$. Each entry is the partial derivative of $s$ with respect to the corresponding entry of $T$. Only the entries that appear in $s$ are nonzero; all others are zero.
</div>


# Vector Derivatives


---
---

### Vector w.r.t. Scalar

$$\boxed{\frac{\partial \mathbf{a}}{\partial s}}$$

**Definition:**

Suppose $s \in \mathbb{R}$ is a scalar and $\mathbf{a} \in \mathbb{R}^n$ is a vector defined as a function of $s$:
$$
\mathbf{a} = f(s)
$$
where $f : \mathbb{R} \to \mathbb{R}^n$.

The derivative of the vector $\mathbf{a}$ with respect to the scalar $s$ is a vector of the same length as $\mathbf{a}$, where each entry is the partial derivative of $a_i$ with respect to $s$:
$$
\frac{\partial \mathbf{a}}{\partial s} =
\begin{bmatrix}
    \frac{\partial a_1}{\partial s} \\
    \frac{\partial a_2}{\partial s} \\
    \vdots \\
    \frac{\partial a_n}{\partial s}
\end{bmatrix}
$$

<div style="background-color:#e6f7ff; color:#000; border-left:5px solid #1890ff; padding:10px;">
<b>Key Result:</b> The derivative of a vector with respect to a scalar is a vector of the same length.
</div>

**Output Dimension:**

If $\mathbf{a} \in \mathbb{R}^{n}$ then $\frac{\partial \mathbf{a}}{\partial s} \in \mathbb{R}^{n}$.

**Iterative Process:**

For $i = 1$ to $n$:<br>
&nbsp;&nbsp;Compute $\frac{\partial a_i}{\partial s}$


In [31]:
visualize_vector_wrt_scalar_simple(m=3)

interactive(children=(IntSlider(value=0, description='Step:', max=3), Output()), _dom_classes=('widget-interac…


---

<div align="center">

**Example:**  
</div>

Let $$a = \begin{bmatrix} s^2 \\ 3s \\ \sin(s) \end{bmatrix}$$

Then,
$$
\frac{\partial a}{\partial s} =
\begin{bmatrix}
    \frac{\partial}{\partial s}(s^2) \\
    \frac{\partial}{\partial s}(3s) \\
    \frac{\partial}{\partial s}(\sin(s))
\end{bmatrix}
=
\begin{bmatrix}
    2s \\
    3 \\
    \cos(s)
\end{bmatrix}
$$


---
---

### Vector w.r.t. Vector (Jacobian)

$$\boxed{\frac{\partial \mathbf{a}}{\partial \mathbf{v}}}$$

**Definition:**

Suppose $\mathbf{a} \in \mathbb{R}^m$ is a vector-valued function of $\mathbf{v} \in \mathbb{R}^n$:
$$
\mathbf{a} = f(\mathbf{v})
$$
where $f : \mathbb{R}^n \to \mathbb{R}^m$.

The derivative of $\mathbf{a}$ with respect to $\mathbf{v}$ is called the **Jacobian matrix**. Each entry is:
$$
\left[\frac{\partial \mathbf{a}}{\partial \mathbf{v}}\right]_{ij} = \frac{\partial a_i}{\partial v_j}
$$

So the Jacobian is an $m \times n$ matrix:
$$
\frac{\partial \mathbf{a}}{\partial \mathbf{v}} =
\begin{bmatrix}
    \frac{\partial a_1}{\partial v_1} & \frac{\partial a_1}{\partial v_2} & \cdots & \frac{\partial a_1}{\partial v_n} \\
    \frac{\partial a_2}{\partial v_1} & \frac{\partial a_2}{\partial v_2} & \cdots & \frac{\partial a_2}{\partial v_n} \\
    \vdots & \vdots & \ddots & \vdots \\
    \frac{\partial a_m}{\partial v_1} & \frac{\partial a_m}{\partial v_2} & \cdots & \frac{\partial a_m}{\partial v_n}
\end{bmatrix}
$$

<div style="background-color:#e6f7ff; color:#000; border-left:5px solid #1890ff; padding:10px;">

<b>Key Result:</b> The derivative of a vector with respect to a vector is the Jacobian matrix, with shape $(m, n)$.

</div>

**Output Dimension:**

If $\mathbf{a} \in \mathbb{R}^m$ and $\mathbf{v} \in \mathbb{R}^n$, then $\frac{\partial \mathbf{a}}{\partial \mathbf{v}} \in \mathbb{R}^{m \times n}$.

**Iterative Process:**

For $i = 1$ to $m$:<br>
&nbsp;&nbsp;For $j = 1$ to $n$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;Compute $\frac{\partial a_i}{\partial v_j}$


In [None]:
visualize_vector_wrt_vector_simple(m=3, n=3)

interactive(children=(IntSlider(value=0, description='Step:', max=9), Output()), _dom_classes=('widget-interac…



---

<div align="center">

**Example:**  

</div>

Let $$a = \begin{bmatrix} v_1^2 \\ v_1 + v_2 \\ \sin(v_2) \end{bmatrix}, \quad v \in \mathbb{R}^2$$

Compute the Jacobian:
- $a_1 = v_1^2$
- $a_2 = v_1 + v_2$
- $a_3 = \sin(v_2)$

So,
$$
\frac{\partial a}{\partial v} =
\begin{bmatrix}
    \frac{\partial a_1}{\partial v_1} & \frac{\partial a_1}{\partial v_2} \\
    \frac{\partial a_2}{\partial v_1} & \frac{\partial a_2}{\partial v_2} \\
    \frac{\partial a_3}{\partial v_1} & \frac{\partial a_3}{\partial v_2}
\end{bmatrix}
=
\begin{bmatrix}
    2v_1 & 0 \\
    1 & 1 \\
    0 & \cos(v_2)
\end{bmatrix}
$$

---
---

### Vector w.r.t. Matrix

$$\boxed{\frac{\partial \mathbf{a}}{\partial \mathbf{X}}}$$

**Definition:**

Suppose $\mathbf{a} \in \mathbb{R}^m$ is a vector-valued function of a matrix $\mathbf{X} \in \mathbb{R}^{n \times p}$:
$$
\mathbf{a} = f(\mathbf{X})
$$
where $f : \mathbb{R}^{n \times p} \to \mathbb{R}^m$.

The derivative of $\mathbf{a}$ with respect to $\mathbf{X}$ is a rank-3 tensor of shape $(m, n, p)$:
$$
\left[\frac{\partial \mathbf{a}}{\partial \mathbf{X}}\right]_{i j k} = \frac{\partial a_i}{\partial X_{j k}}
$$

<div style="background-color:#e6f7ff; color:#000; border-left:5px solid #1890ff; padding:10px;">

<b>Key Result:</b> The derivative of a vector with respect to a matrix is a rank-3 tensor of shape $(m, n, p)$.
</div>

**Output Dimension:**

If $\mathbf{a} \in \mathbb{R}^m$ and $\mathbf{X} \in \mathbb{R}^{n \times p}$, then $\frac{\partial \mathbf{a}}{\partial \mathbf{X}} \in \mathbb{R}^{m \times n \times p}$.

**Iterative Process:**

For $i = 1$ to $m$:<br>
&nbsp;&nbsp;For $j = 1$ to $n$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;For $k = 1$ to $p$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Compute $\frac{\partial a_i}{\partial X_{j k}}$


In [None]:
visualize_vector_wrt_matrix_simple(m=2, n=2, p=3)

interactive(children=(IntSlider(value=0, description='Step:', max=12), Output()), _dom_classes=('widget-intera…



---

**Example:**

Let $\mathbf{a} \in \mathbb{R}^2$, $\mathbf{X} \in \mathbb{R}^{2 \times 2}$, and define:
- $a_1 = X_{11} + 2X_{12}$
- $a_2 = 3X_{21} + 4X_{22}$

Compute the derivatives:
- $\frac{\partial a_1}{\partial X_{11}} = 1$, $\frac{\partial a_1}{\partial X_{12}} = 2$, $\frac{\partial a_1}{\partial X_{21}} = 0$, $\frac{\partial a_1}{\partial X_{22}} = 0$
- $\frac{\partial a_2}{\partial X_{11}} = 0$, $\frac{\partial a_2}{\partial X_{12}} = 0$, $\frac{\partial a_2}{\partial X_{21}} = 3$, $\frac{\partial a_2}{\partial X_{22}} = 4$

So, the tensor $\frac{\partial \mathbf{a}}{\partial \mathbf{X}}$ is:
$$
\frac{\partial \mathbf{a}}{\partial \mathbf{X}} =
\begin{bmatrix}
  \begin{bmatrix} 1 & 2 \\ 0 & 0 \end{bmatrix} \\
  \begin{bmatrix} 0 & 0 \\ 3 & 4 \end{bmatrix}
\end{bmatrix}
$$
- The first matrix is for $a_1$
- The second matrix is for $a_2$

---
---

### Vector w.r.t. Tensor

$$\boxed{\frac{\partial \mathbf{a}}{\partial \mathcal{T}}}$$

**Definition:**

Suppose $\mathbf{a} \in \mathbb{R}^m$ is a vector-valued function of a 3-D tensor $\mathcal{T} \in \mathbb{R}^{n \times p \times q}$:
$$
\mathbf{a} = f(\mathcal{T})
$$
where $f : \mathbb{R}^{n \times p \times q} \to \mathbb{R}^m$.

The derivative of $\mathbf{a}$ with respect to $\mathcal{T}$ is a rank-4 tensor of shape $(m, n, p, q)$:
$$
\left[\frac{\partial \mathbf{a}}{\partial \mathcal{T}}\right]_{i j k l} = \frac{\partial a_i}{\partial \mathcal{T}_{j k l}}
$$

<div style="background-color:#e6f7ff; color:#000; border-left:5px solid #1890ff; padding:10px;">

<b>Key Result:</b> The derivative of a vector with respect to a 3-D tensor is a rank-4 tensor of shape $(m, n, p, q)$.
</div>

**Output Dimension:**

If $\mathbf{a} \in \mathbb{R}^m$ and $\mathcal{T} \in \mathbb{R}^{n \times p \times q}$, then $\frac{\partial \mathbf{a}}{\partial \mathcal{T}} \in \mathbb{R}^{m \times n \times p \times q}$.

**Iterative Process:**

For $i = 1$ to $m$:<br>
&nbsp;&nbsp;For $j = 1$ to $n$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;For $k = 1$ to $p$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;For $l = 1$ to $q$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Compute $\frac{\partial a_i}{\partial \mathcal{T}_{j k l}}$



In [28]:
visualize_vector_wrt_tensor_simple(m=2, n=2, p=2, q=2)

interactive(children=(IntSlider(value=0, description='Step:', max=16), Output()), _dom_classes=('widget-intera…



---

**Example:**

Let $\mathbf{a} \in \mathbb{R}^2$, $\mathcal{T} \in \mathbb{R}^{2 \times 2 \times 2}$, and define:
- $a_1 = \mathcal{T}_{111} + 2\mathcal{T}_{122}$
- $a_2 = 3\mathcal{T}_{211} - \mathcal{T}_{222}$

Compute the derivatives:
- $\frac{\partial a_1}{\partial \mathcal{T}_{111}} = 1$, $\frac{\partial a_1}{\partial \mathcal{T}_{122}} = 2$, all other $\frac{\partial a_1}{\partial \mathcal{T}_{jkl}} = 0$
- $\frac{\partial a_2}{\partial \mathcal{T}_{211}} = 3$, $\frac{\partial a_2}{\partial \mathcal{T}_{222}} = -1$, all other $\frac{\partial a_2}{\partial \mathcal{T}_{jkl}} = 0$

So, the tensor $\frac{\partial \mathbf{a}}{\partial \mathcal{T}}$ has shape $(2, 2, 2, 2)$, and only the above entries are nonzero.

---
---

# Matrix Derivatives

### Matrix w.r.t. Scalar

$$\boxed{\frac{\partial \mathbf{A}}{\partial x}}$$

**Definition:**

Suppose $\mathbf{A} \in \mathbb{R}^{m \times n}$ is a matrix-valued function of a scalar $x$:
$$
\mathbf{A} = f(x)
$$
where $f : \mathbb{R} \to \mathbb{R}^{m \times n}$.

The derivative of $\mathbf{A}$ with respect to $x$ is a matrix of the same shape:
$$
\left[\frac{\partial \mathbf{A}}{\partial x}\right]_{ij} = \frac{\partial A_{ij}}{\partial x}
$$

<div style="background-color:#e6f7ff; color:#000; border-left:5px solid #1890ff; padding:10px;">
<b>Key Result:</b> The derivative of a matrix with respect to a scalar is a matrix of the same shape.
</div>

**Output Dimension:**

If $\mathbf{A} \in \mathbb{R}^{m \times n}$, then $\frac{\partial \mathbf{A}}{\partial x} \in \mathbb{R}^{m \times n}$.

**Iterative Process:**

For $i = 1$ to $m$:<br>
&nbsp;&nbsp;For $j = 1$ to $n$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;Compute $\frac{\partial A_{ij}}{\partial x}$



In [27]:
visualize_matrix_wrt_scalar_simple(m=2, n=2)

interactive(children=(IntSlider(value=0, description='Step:', max=4), Output()), _dom_classes=('widget-interac…



---

**Example:**

Let $\mathbf{A} \in \mathbb{R}^{2 \times 2}$, $x \in \mathbb{R}$, and define:
- $A_{11} = x^2$, $A_{12} = \sin x$
- $A_{21} = e^x$, $A_{22} = 3x$

Compute the derivatives:
- $\frac{\partial A_{11}}{\partial x} = 2x$
- $\frac{\partial A_{12}}{\partial x} = \cos x$
- $\frac{\partial A_{21}}{\partial x} = e^x$
- $\frac{\partial A_{22}}{\partial x} = 3$

$$\therefore \frac{\partial \mathbf{A}}{\partial x} = \begin{bmatrix} 2x & \cos x \\ e^x & 3 \end{bmatrix}$$

---
---

### Matrix w.r.t. Vector

$$\boxed{\frac{\partial \mathbf{A}}{\partial \mathbf{x}}}$$

**Definition:**

Suppose $\mathbf{A} \in \mathbb{R}^{m \times n}$ is a matrix-valued function of a vector $\mathbf{x} \in \mathbb{R}^p$:
$$
\mathbf{A} = f(\mathbf{x})
$$
where $f : \mathbb{R}^p \to \mathbb{R}^{m \times n}$.

The derivative of $\mathbf{A}$ with respect to $\mathbf{x}$ is a rank-3 tensor of shape $(m, n, p)$:
$$
\left[\frac{\partial \mathbf{A}}{\partial \mathbf{x}}\right]_{i j k} = \frac{\partial A_{ij}}{\partial x_k}
$$

<div style="background-color:#e6f7ff; color:#000; border-left:5px solid #1890ff; padding:10px;">

<b>Key Result:</b> The derivative of a matrix with respect to a vector is a rank-3 tensor of shape $(m, n, p)$.
</div>

**Output Dimension:**

If $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{x} \in \mathbb{R}^p$, then $\frac{\partial \mathbf{A}}{\partial \mathbf{x}} \in \mathbb{R}^{m \times n \times p}$.

**Iterative Process:**

For $i = 1$ to $m$:<br>
&nbsp;&nbsp;For $j = 1$ to $n$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;For $k = 1$ to $p$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Compute $\frac{\partial A_{ij}}{\partial x_k}$

---


In [26]:
visualize_matrix_wrt_vector_simple(m=2, n=2, p=3)

interactive(children=(IntSlider(value=0, description='Step:', max=12), Output()), _dom_classes=('widget-intera…


**Example:**

Let $\mathbf{A} \in \mathbb{R}^{2 \times 2}$, $\mathbf{x} \in \mathbb{R}^2$, and define:
- $A_{11} = x_1^2 + x_2$
- $A_{12} = \sin x_1$
- $A_{21} = e^{x_2}$
- $A_{22} = x_1 x_2$

Compute the derivatives:
- $\frac{\partial A_{11}}{\partial x_1} = 2x_1$, $\frac{\partial A_{11}}{\partial x_2} = 1$
- $\frac{\partial A_{12}}{\partial x_1} = \cos x_1$, $\frac{\partial A_{12}}{\partial x_2} = 0$
- $\frac{\partial A_{21}}{\partial x_1} = 0$, $\frac{\partial A_{21}}{\partial x_2} = e^{x_2}$
- $\frac{\partial A_{22}}{\partial x_1} = x_2$, $\frac{\partial A_{22}}{\partial x_2} = x_1$

$\therefore \frac{\partial \mathbf{A}}{\partial \mathbf{x}}$ is a $(2,2,2)$ tensor with these entries.

---
---

### Matrix w.r.t. Matrix

$$\boxed{\frac{\partial \mathbf{A}}{\partial \mathbf{X}}}$$

**Definition:**

Suppose $\mathbf{A} \in \mathbb{R}^{m \times n}$ is a matrix-valued function of a matrix $\mathbf{X} \in \mathbb{R}^{p \times q}$:
$$
\mathbf{A} = f(\mathbf{X})
$$
where $f : \mathbb{R}^{p \times q} \to \mathbb{R}^{m \times n}$.

The derivative of $\mathbf{A}$ with respect to $\mathbf{X}$ is a rank-4 tensor of shape $(m, n, p, q)$:
$$
\left[\frac{\partial \mathbf{A}}{\partial \mathbf{X}}\right]_{i j k l} = \frac{\partial A_{ij}}{\partial X_{kl}}
$$

<div style="background-color:#e6f7ff; color:#000; border-left:5px solid #1890ff; padding:10px;">

<b>Key Result:</b> The derivative of a matrix with respect to a matrix is a rank-4 tensor of shape $(m, n, p, q)$.
</div>

**Output Dimension:**

If $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{X} \in \mathbb{R}^{p \times q}$, then $\frac{\partial \mathbf{A}}{\partial \mathbf{X}} \in \mathbb{R}^{m \times n \times p \times q}$.

**Iterative Process:**

For $i = 1$ to $m$:<br>
&nbsp;&nbsp;For $j = 1$ to $n$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;For $k = 1$ to $p$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;For $l = 1$ to $q$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Compute $\frac{\partial A_{ij}}{\partial X_{kl}}$

---


In [25]:
visualize_matrix_wrt_matrix_simple(m=2, n=2, p=2, q=2)

interactive(children=(IntSlider(value=0, description='Step:', max=16), Output()), _dom_classes=('widget-intera…


**Example:**

Let $\mathbf{A} \in \mathbb{R}^{2 \times 2}$, $\mathbf{X} \in \mathbb{R}^{2 \times 2}$, and define:
- $A_{11} = X_{11}^2 + X_{12}$
- $A_{12} = \sin X_{21}$
- $A_{21} = e^{X_{22}}$
- $A_{22} = X_{11} X_{22}$

Compute the derivatives:
- $\frac{\partial A_{11}}{\partial X_{11}} = 2X_{11}$, $\frac{\partial A_{11}}{\partial X_{12}} = 1$, all other $=0$
- $\frac{\partial A_{12}}{\partial X_{21}} = \cos X_{21}$, all other $=0$
- $\frac{\partial A_{21}}{\partial X_{22}} = e^{X_{22}}$, all other $=0$
- $\frac{\partial A_{22}}{\partial X_{11}} = X_{22}$, $\frac{\partial A_{22}}{\partial X_{22}} = X_{11}$, all other $=0$

$ \therefore\frac{\partial \mathbf{A}}{\partial \mathbf{X}}$ is a $(2,2,2,2)$ tensor with these entries.


---
---

### Matrix w.r.t. Tensor

$$\boxed{\frac{\partial \mathbf{A}}{\partial \mathcal{T}}}$$

**Definition:**

Suppose $\mathbf{A} \in \mathbb{R}^{m \times n}$ is a matrix-valued function of a 3-D tensor $\mathcal{T} \in \mathbb{R}^{p \times q \times r}$:
$$
\mathbf{A} = f(\mathcal{T})
$$
where $f : \mathbb{R}^{p \times q \times r} \to \mathbb{R}^{m \times n}$.

The derivative of $\mathbf{A}$ with respect to $\mathcal{T}$ is a rank-5 tensor of shape $(m, n, p, q, r)$:
$$
\left[\frac{\partial \mathbf{A}}{\partial \mathcal{T}}\right]_{i j k l s} = \frac{\partial A_{ij}}{\partial \mathcal{T}_{k l s}}
$$

<div style="background-color:#e6f7ff; color:#000; border-left:5px solid #1890ff; padding:10px;">

<b>Key Result:</b> The derivative of a matrix with respect to a 3-D tensor is a rank-5 tensor of shape $(m, n, p, q, r)$.
</div>

**Output Dimension:**

If $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathcal{T} \in \mathbb{R}^{p \times q \times r}$, then $\frac{\partial \mathbf{A}}{\partial \mathcal{T}} \in \mathbb{R}^{m \times n \times p \times q \times r}$.

**Iterative Process:**

For $i = 1$ to $m$:<br>
&nbsp;&nbsp;For $j = 1$ to $n$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;For $k = 1$ to $p$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;For $l = 1$ to $q$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;For $s = 1$ to $r$:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Compute $\frac{\partial A_{ij}}{\partial \mathcal{T}_{k l s}}$

---



In [None]:
visualize_matrix_wrt_tensor_simple(m=2, n=2, p=2, q=2, r=2)

interactive(children=(IntSlider(value=0, description='Step:', max=32), Output()), _dom_classes=('widget-intera…



**Example:**

Let $\mathbf{A} \in \mathbb{R}^{2 \times 2}$, $\mathcal{T} \in \mathbb{R}^{2 \times 2 \times 2}$, and define:
- $A_{11} = \mathcal{T}_{111} + 2\mathcal{T}_{122}$
- $A_{12} = \sin \mathcal{T}_{211}$
- $A_{21} = e^{\mathcal{T}_{222}}$
- $A_{22} = \mathcal{T}_{111} \mathcal{T}_{222}$

Compute the derivatives:
- $\frac{\partial A_{11}}{\partial \mathcal{T}_{111}} = 1$, $\frac{\partial A_{11}}{\partial \mathcal{T}_{122}} = 2$, all other $=0$
- $\frac{\partial A_{12}}{\partial \mathcal{T}_{211}} = \cos \mathcal{T}_{211}$, all other $=0$
- $\frac{\partial A_{21}}{\partial \mathcal{T}_{222}} = e^{\mathcal{T}_{222}}$, all other $=0$
- $\frac{\partial A_{22}}{\partial \mathcal{T}_{111}} = \mathcal{T}_{222}$, $\frac{\partial A_{22}}{\partial \mathcal{T}_{222}} = \mathcal{T}_{111}$, all other $=0$

$ \therefore\frac{\partial \mathbf{A}}{\partial \mathcal{T}}$ is a $(2,2,2,2,2)$ tensor with these entries.

---
---
---

# Summary Table
<div align="center">

| Output Type | Input Type | Derivative Shape | Formula (Componentwise) | Iterative Process |
|-------------|------------|------------------|------------------------|------------------|
| Scalar ($s$) | Vector ($\mathbf{x} \in \mathbb{R}^n$) | $(n,)$ | $\frac{\partial s}{\partial x_i}$ | for $i=1$ to $n$:<br>&nbsp;&nbsp;$\frac{\partial s}{\partial x_i}$ |
| Scalar ($s$) | Matrix ($\mathbf{X} \in \mathbb{R}^{m \times n}$) | $(m, n)$ | $\frac{\partial s}{\partial X_{ij}}$ | for $i=1$ to $m$:<br>&nbsp;&nbsp;for $j=1$ to $n$:<br>&nbsp;&nbsp;&nbsp;&nbsp;$\frac{\partial s}{\partial X_{ij}}$ |
| Scalar ($s$) | 3D Tensor ($\mathcal{T} \in \mathbb{R}^{m \times n \times p}$) | $(m, n, p)$ | $\frac{\partial s}{\partial \mathcal{T}_{ijk}}$ | for $i=1$ to $m$:<br>&nbsp;&nbsp;for $j=1$ to $n$:<br>&nbsp;&nbsp;&nbsp;&nbsp;for $k=1$ to $p$:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$\frac{\partial s}{\partial \mathcal{T}_{ijk}}$ |
| Vector ($\mathbf{a} \in \mathbb{R}^m$) | Scalar ($x$) | $(m,)$ | $\frac{\partial a_i}{\partial x}$ | for $i=1$ to $m$:<br>&nbsp;&nbsp;$\frac{\partial a_i}{\partial x}$ |
| Vector ($\mathbf{a} \in \mathbb{R}^m$) | Vector ($\mathbf{x} \in \mathbb{R}^n$) | $(m, n)$ | $\frac{\partial a_i}{\partial x_j}$ | for $i=1$ to $m$:<br>&nbsp;&nbsp;for $j=1$ to $n$:<br>&nbsp;&nbsp;&nbsp;&nbsp;$\frac{\partial a_i}{\partial x_j}$ |
| Vector ($\mathbf{a} \in \mathbb{R}^m$) | Matrix ($\mathbf{X} \in \mathbb{R}^{n \times p}$) | $(m, n, p)$ | $\frac{\partial a_i}{\partial X_{jk}}$ | for $i=1$ to $m$:<br>&nbsp;&nbsp;for $j=1$ to $n$:<br>&nbsp;&nbsp;&nbsp;&nbsp;for $k=1$ to $p$:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$\frac{\partial a_i}{\partial X_{jk}}$ |
| Vector ($\mathbf{a} \in \mathbb{R}^m$) | 3D Tensor ($\mathcal{T} \in \mathbb{R}^{n \times p \times q}$) | $(m, n, p, q)$ | $\frac{\partial a_i}{\partial \mathcal{T}_{jkl}}$ | for $i=1$ to $m$:<br>&nbsp;&nbsp;for $j=1$ to $n$:<br>&nbsp;&nbsp;&nbsp;&nbsp;for $k=1$ to $p$:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for $l=1$ to $q$:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$\frac{\partial a_i}{\partial \mathcal{T}_{jkl}}$ |
| Matrix ($\mathbf{A} \in \mathbb{R}^{m \times n}$) | Scalar ($x$) | $(m, n)$ | $\frac{\partial A_{ij}}{\partial x}$ | for $i=1$ to $m$:<br>&nbsp;&nbsp;for $j=1$ to $n$:<br>&nbsp;&nbsp;&nbsp;&nbsp;$\frac{\partial A_{ij}}{\partial x}$ |
| Matrix ($\mathbf{A} \in \mathbb{R}^{m \times n}$) | Vector ($\mathbf{x} \in \mathbb{R}^p$) | $(m, n, p)$ | $\frac{\partial A_{ij}}{\partial x_k}$ | for $i=1$ to $m$:<br>&nbsp;&nbsp;for $j=1$ to $n$:<br>&nbsp;&nbsp;&nbsp;&nbsp;for $k=1$ to $p$:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$\frac{\partial A_{ij}}{\partial x_k}$ |
| Matrix ($\mathbf{A} \in \mathbb{R}^{m \times n}$) | Matrix ($\mathbf{X} \in \mathbb{R}^{p \times q}$) | $(m, n, p, q)$ | $\frac{\partial A_{ij}}{\partial X_{kl}}$ | for $i=1$ to $m$:<br>&nbsp;&nbsp;for $j=1$ to $n$:<br>&nbsp;&nbsp;&nbsp;&nbsp;for $k=1$ to $p$:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for $l=1$ to $q$:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$\frac{\partial A_{ij}}{\partial X_{kl}}$ |
| Matrix ($\mathbf{A} \in \mathbb{R}^{m \times n}$) | 3D Tensor ($\mathcal{T} \in \mathbb{R}^{p \times q \times r}$) | $(m, n, p, q, r)$ | $\frac{\partial A_{ij}}{\partial \mathcal{T}_{kls}}$ | for $i=1$ to $m$:<br>&nbsp;&nbsp;for $j=1$ to $n$:<br>&nbsp;&nbsp;&nbsp;&nbsp;for $k=1$ to $p$:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for $l=1$ to $q$:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for $s=1$ to $r$:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$\frac{\partial A_{ij}}{\partial \mathcal{T}_{kls}}$ |

This table summarizes the shapes, componentwise formulas, and iterative processes for derivatives of scalars, vectors, and matrices with respect to vectors, matrices, and tensors.

---
---
---