# Deep Learning with PyTorch I (intro)
- Part II: [Deep Learning w PyTorch II (define a nn)](https://github.com/tm1611/Deep-Learning/blob/master/Deep%20Learning%20w%20PyTorch%20II%20(define%20a%20nn).ipynb)
- Part III: [Deep Learning w PyTorch III (training nn, theory)](https://github.com/tm1611/Deep-Learning/blob/master/Deep%20Learning%20w%20PyTorch%20III%20(training%20nn%2C%20theory).ipynb)
- Part IV: [Deep Learning w PyTorch IV (mnist, mlp)](https://github.com/tm1611/Deep-Learning/blob/master/Deep%20Learning%20w%20PyTorch%20IV%20(mnist%2C%20mlp).ipynb)
- Part V: [Deep Learning w PyTorch V (fmnist, mlp, inference and validation)](https://github.com/tm1611/Deep-Learning/blob/master/Deep%20Learning%20w%20PyTorch%20V%20(fmnist%2C%20mlp%2C%20inference%20and%20validation).ipynb)

## 1. Introduction
This series of notebooks briefly introduces the basics of deep learning (DL) using PyTorch. After introducing the respective building blocks that are necessary to define and train a network, we will apply these for prediction.

DL is based on so-called neural networks. A neural network consists of individual units (which ought to resemble a neuron in the human brain). Each of these units has a number of weighted inputs which are summed together as a linear combination. Those weighted sums are then passed to an activation function to get the unit's output. 

<img src="images/one_neuron.png">

This can be formulated in the simple case of two layers as:

$$
\begin{align}
y &= f(w_1 x_1 + w_2 x_2 + b) \\
y &= f\left(\sum_i w_i x_i +b \right)
\end{align}
$$

or in vectorized form ([dot product](https://en.wikipedia.org/wiki/Dot_product), i.e. inner product of two vectors): 

$$
h = \begin{bmatrix}
x_1 \, x_2 \cdots  x_n
\end{bmatrix}
\cdot 
\begin{bmatrix}
           w_1 \\
           w_2 \\
           \vdots \\
           w_n
\end{bmatrix}
$$

#### Ressources: Mathematics
There are two branches of mathematics that neural networks heavily relies on. 
1. Linear Algebra
2. Multivariable Calculus 

Ressources to revisit some concepts: 
- [Essence of linear algebra (3B1B)](https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab)
- [Essence of calculus (3B1B)](https://www.youtube.com/playlist?list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr)
- [Neural Networks (3B1B)](https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi)

For a more academic approach towards the mathematical aspects:
- [MIT OCW: Linear Algebra (18.06SC)](https://ocw.mit.edu/courses/mathematics/18-06sc-linear-algebra-fall-2011/)
- [MIT OCW: Single Variable Calculus (18.01SC)](https://ocw.mit.edu/courses/mathematics/18-01sc-single-variable-calculus-fall-2010/)
- [MIT OCW: Multivariable Calculus (18.02SC)](https://ocw.mit.edu/courses/mathematics/18-02sc-multivariable-calculus-fall-2010/index.htm)

#### Introducing tensors
Neural networks are based on linear algebra operations, especially on tensors. Tensors are a generalization of matrices in the sense that they allow to have more than two dimensions:
- Vector: 1-dimensional tensor
- Matrix: 2-dimensional tensor
- Higher order arrays: n-dimensional tensors. 

<img src="images/Tensor.PNG">

Intuititively, a 3-dimensional tensor is like a cube where behind every matrix-like structure other numbers are hidden. A rubic's cube can be called a tensor of dimensions [3,3,3]. Working with tensors in python requires `torch` which provides a way to work with an n-dimensional array (tensor). 

In [1]:
# Standard imports
import numpy as np

# Torch imports
import torch

# check version
print(torch.__version__)

1.0.0


#### Tensors in Python
Tensors in Python can be thought of as matrices in higher dimensions. A 3x2x2 tensor can be thought of as 3 stacked 2x2 matrices and so forth. 

In [2]:
# 3x4 array.
a = torch.randn(3,4)
print("a (3x4):\n",a)
print("\n",a[2,1])

# 3x3x3 array
b = torch.randn(3,3,3)
print("\nb (3x3x3):\n",b)
print("\n",b[2,0,1])

# 5x3 array
c = torch.rand(5,3)
print("\nc (5x3):\n",c)

a (3x4):
 tensor([[-0.0065,  2.8250,  1.0579,  0.0723],
        [-1.1421, -0.5649, -0.2128,  1.3161],
        [ 1.3855, -0.2905,  0.0922, -0.5330]])

 tensor(-0.2905)

b (3x3x3):
 tensor([[[ 1.1005,  1.4149,  1.0838],
         [-1.1407,  0.2112,  0.3487],
         [ 0.8094,  0.0327,  0.2239]],

        [[ 0.0356, -1.0853,  0.1023],
         [ 0.5790, -1.6186,  0.1011],
         [ 1.7942, -0.0531, -1.7610]],

        [[ 2.0220, -0.5716, -0.7366],
         [ 1.6217,  0.0141,  1.6517],
         [-0.9199, -0.2983, -0.7970]]])

 tensor(-0.5716)

c (5x3):
 tensor([[0.7249, 0.9768, 0.8921],
        [0.3772, 0.2931, 0.1563],
        [0.0411, 0.2572, 0.8437],
        [0.7373, 0.7870, 0.3610],
        [0.7794, 0.0548, 0.0207]])


#### Simple network with linear algbera
Using linear algebra to show a simple network with sigmoid activation function.

In [3]:
# sigmoid activation
def sigmoid(x):
    """Sigmoid activation function
    Arguments
    ---------
    x: torch.Tensor
    """
    return 1/(1+torch.exp(-x))

# random data x 
torch.manual_seed(42)
x = torch.randn(1,5)

# weights
w = torch.randn_like(x)

# bias (intercept)
bias = torch.randn(1,1)

print("x:\n",x,"\nw:\n",w,"\nbias:\n",bias)

# Calculate the output
h = torch.sum(x*w) + bias
y = sigmoid(h)
print("\ny:\n",y)

# Using matrix multiplication
y_mm = sigmoid(torch.mm(w, x.reshape(5,1)) + bias)
print("\ny_mm:\n", y_mm)

x:
 tensor([[ 0.3367,  0.1288,  0.2345,  0.2303, -1.1229]]) 
w:
 tensor([[-0.1863,  2.2082, -0.6380,  0.4617,  0.2674]]) 
bias:
 tensor([[0.5349]])

y:
 tensor([[0.6018]])

y_mm:
 tensor([[0.6018]])


#### Change size of matrix
For some operations it will become important to change the size of the matrix, i.e. for flattening or using the transpose of a matrix: 

* `w.reshape(a, b)` will return a new tensor with the same data as `w` with size `(a, b)` sometimes, and sometimes a clone, as it copies the data to another part of memory.
* `w.resize_(a, b)` returns the same tensor with a different shape. However, if the new shape results in fewer elements than the original tensor, some elements will be removed from the tensor (but not from memory). If the new shape results in more elements than the original tensor, new elements will be uninitialized in memory. Here it should be noted that the underscore at the end of the method denotes that this method is performed [**in-place**](https://discuss.pytorch.org/t/what-is-in-place-operation/16244) in PyTorch.
* `w.view(a, b)` will return a new tensor with the same data as `w` with size `(a, b)`.
* In general, it is a good advice to use these methods only to flip a matrix over its diagonal leading to a [transpose](https://en.wikipedia.org/wiki/Transpose) of the matrix or for flattening (i.e. reducing dimensionality from 2D array to 1D array).

#### Stacking matrices
The real fun starts when stacking individual units into layers and stacks of layers, into a network of neurons. Matrix algebra is necessary here as we are now working with matrices instead of vectors.

<img src=images/stacking_weights.png>

The hidden layer ($h_1$ and $h_2$ here) can be calculated as

$$
\vec{h} = [h_1 \, h_2] = 
\begin{bmatrix}
x_1 \, x_2 \cdots \, x_n
\end{bmatrix}
\cdot 
\begin{bmatrix}
           w_{11} & w_{12} \\
           w_{21} &w_{22} \\
           \vdots &\vdots \\
           w_{n1} &w_{n2}
\end{bmatrix}
$$

The output for this small network can be found by treating the hidden layer as inputs for the output unit. The network output is expressed as

$$
y =  f_2 \! \left(\, f_1 \! \left(\vec{x} \, \mathbf{W_1}\right) \mathbf{W_2} \right)
$$

#### Network with hidden layer in matrix notation
- 3 variables $x_1, x_2, x_3$ per observation: Input vector of size (1x3) leading to a matrix of (kx3).
- 1 output: Hence, 1 output node. The output value is then passed to an activation function. 
- Weights are calculated from input to output using matrix calculations (Feedforward). We are starting with random weights. 
- In every layer there is a bias which can be thought of as error like in a regression model.

Steps: 
- Initiate weight matrices of the right dimensions (with random numbers for now).
- Calculate output

Matrix dimensions:
- W1 (3,2) -> Input variables to hidden layer
- W2 (2,1) -> hidden layer to output node

In [4]:
# seed
torch.manual_seed(42)

# one observation for x
x = torch.randn(1,3)

# Size of each layer
n_input = x.shape[1]
n_hidden = 2 
n_output = 1

# Weight matrices 
W1 = torch.randn(n_input, n_hidden)
W2 = torch.randn(n_hidden, n_output)

# Biases for both layers
B1 = torch.randn(1, n_hidden)
B2 = torch.randn(1, n_output)

print("W1:\n", W1)
print("W2:\n", W2)
print("B1:\n", B1)
print("B2:\n", B2)

# Calculate output
h = sigmoid(torch.mm(x, W1) + B1)
print("\nhidden layer h:\n", h)
output = sigmoid(torch.mm(h,W2) + B2)
print("\noutput:\n", output)

W1:
 tensor([[ 0.2303, -1.1229],
        [-0.1863,  2.2082],
        [-0.6380,  0.4617]])
W2:
 tensor([[0.2674],
        [0.5349]])
B1:
 tensor([[0.8094, 1.1103]])
B2:
 tensor([[-1.6898]])

hidden layer h:
 tensor([[0.6711, 0.7549]])

output:
 tensor([[0.2485]])


#### Conversion NumPy to Torch and back
Converting between numpy arrays and torch tensor use `torch.from_numpy()`. To convert a tensor to a numpy array, use the `.numpy()` method.

In [5]:
# random numbers
d = np.random.rand(2,3)
e = torch.from_numpy(d)

print("d:\n", d)
print("e:\n", e)
print("\ntorch to numpy of e:\n", e.numpy())
print("\ne*2:\n",e.mul_(2))

d:
 [[0.58265814 0.13556693 0.9434121 ]
 [0.95882513 0.11286997 0.89693295]]
e:
 tensor([[0.5827, 0.1356, 0.9434],
        [0.9588, 0.1129, 0.8969]], dtype=torch.float64)

torch to numpy of e:
 [[0.58265814 0.13556693 0.9434121 ]
 [0.95882513 0.11286997 0.89693295]]

e*2:
 tensor([[1.1653, 0.2711, 1.8868],
        [1.9177, 0.2257, 1.7939]], dtype=torch.float64)


Memory is shared between np array and torch tensor (aliasing), so if you change the values in-place of one object, the other will change as well.

Regarding underscores: 
[in-place operations](https://discuss.pytorch.org/t/what-is-in-place-operation/16244)
`"_"`denotes that the method is performed in-place.

### Next
In the next notbeok we will see how a neural network can be defined with PyTorch. In particular, how a multilayer perceptron network can be formulated that can be used to identify simple images like from the MNIST or Fashion-MNIST data.