# Machine Learning Notes - Backpropagation

## A List of Related Posts
1. [Pytorch]({% post_url 2021-05-04-machine-learning-pytorch %})
2. [Loss function]({% post_url 2021-05-07-machine-learning-loss %})
3. [Backpropagation(this)]({% post_url 2021-05-07-machine-learning-backpropagation %})

## Backpropagation Introduction

## Chain Rule

$$
y = f_1(x) \\
L = f_2(y)
$$
Chain rule:

$$
\frac{\partial{L}}{\partial{y}} = \frac{\partial{f_2(y)}}{\partial{y}} \\
\frac{\partial{L}}{\partial{x}} = \frac{\partial{L}}{\partial{y}}\frac{\partial{y}}{\partial{x}} = \frac{\partial{f_2(y)}}{\partial{y}} \frac{\partial{f_1(x)}}{\partial{x}}
$$

In [4]:
# Chain Rule

import torch
x = torch.tensor([5], requires_grad=True, dtype=torch.float64)
y = 2 * x
L = 3 * y
dL = 1
dy = 3 * dL
dx = 2 * dy
L.backward()
print(f'dL/dx: {dx}')
print(f'Pytorch autograd {x.grad}')

#

dL/dx: 6
Pytorch autograd tensor([6.], dtype=torch.float64)


$$
\begin{bmatrix}
y_1 \\
y_2
\end{bmatrix} = f(\begin{bmatrix}
x_1 \\
x_2 \\
x_3
\end{bmatrix}) = \begin{bmatrix}
f_{1}(x_1, x_2, x_3) \\
f_{2}(x_1, x_2, x_3)
\end{bmatrix}\\
L = g(\begin{bmatrix}
y_1 \\
y_2
\end{bmatrix}) \\ 
$$
Chain rule:

$$
\frac{\partial{L}}{\partial{y}} = \begin{bmatrix} 
    \frac{\partial{g(y)}}{\partial{y_1}} \\ 
    \frac{\partial{g(y)}}{\partial{y_2}} \\ 
\end{bmatrix}\\
\frac{\partial{L}}{\partial{x}} = J_x^y \frac{\partial{L}}{\partial{y}} =
\begin{bmatrix} 
    \frac{\partial{f_1(x)}}{\partial{x_1}}, \frac{\partial{f_2(x)}}{\partial{x_1}}\\ 
    \frac{\partial{f_1(x)}}{\partial{x_2}}, \frac{\partial{f_2(x)}}{\partial{x_2}}\\ 
    \frac{\partial{f_1(x)}}{\partial{x_3}}, \frac{\partial{f_2(x)}}{\partial{x_3}}
\end{bmatrix} \frac{\partial{L}}{\partial{y}}
$$

In [8]:
# Gradients with respect to a vector

# Chain Rule

import torch
x = torch.tensor([1, 2, 3], requires_grad=True, dtype=torch.float64)
y = torch.zeros(2)
y[0] = x[0] + 2*x[1]
y[1] = x[1] + 2*x[2]
L = y.sum()
dL = 1
dy = torch.tensor([1, 1])
dx = torch.tensor([[1, 0],[2, 1], [0, 2]]).matmul(dy)
L.backward()
print(f'dL/dx: {dx}')
print(f'Pytorch autograd {x.grad}')

#

dL/dx: tensor([1, 3, 2])
Pytorch autograd tensor([1., 3., 2.], dtype=torch.float64)


## Local Gradient

... ---> $\textbf{x}$ ---> $f(\textbf{x})$ ---> $\textbf{y}$ ---> ... ---> L

$$
\frac{\partial{L}}{\partial{\textbf{x}}} = J^{\textbf{y}}_\textbf{x}\frac{\partial{L}}{\partial{\textbf{y}}} 
$$
$J^{\textbf{y}}_\textbf{x}$ is a Jacobian matrix, and it has the same number of rows as $\textbf{x}$

## Common Computation Blocks - Forward and Backward Pass