# 02 Deep Learning Theory
## Dr. Tristan Behrens

In the following we will lean about the essential Deep Learning building blocks. We will learn 

- when ANNs are really deep, and
- about the purpose of activation functions.

## Supervised vs. Unsupervised Learning vs. Reinforcement Learning.

| Supervised Learning        | Unsupervised Learning            | Reinforcement Learning                 |
| -------------------------- | -------------------------------- | -------------------------------------- |
| mapping inputs to outputs  | looks for undiscovered patterns  | agents act in environments             |
| classification with labels | no labels/targets                | maximizing a cumulative reward         |
| regression with targets    | principal component analysis     | balances exploration and exploitation  |
| optimize and generalize    | clustering                       | Markov Decision Processes              |

## Data Preprocessing.

Relevant steps:

- Data acquisition.
- Analysis/visualization.
- Cleaning.
- Balancing.
- Normalization/standardization.
- Encoding.

Important concept: Mechanical Turk [sic!].

**Question:** How much time is usually spent on data preprocessing?

## The Math behind Fully Connected Neural Networks.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

![](https://www.researchgate.net/profile/Mohamed_Zahran6/publication/303875065/figure/fig4/AS:371118507610123@1465492955561/A-hypothetical-example-of-Multilayer-Perceptron-Network.png)
(Image copyright Mohamed Zahran)

---

**Question:** Why do we want to have multiple layers?

---
Firstly, we define a Neural Network using TensorFlow Keras. We will keep this in mind throughout this lesson. We used a similar model for solving the MNIST problem. This model however is slightly simplified.

In [None]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(512, input_shape=(784,)))
model.add(tf.keras.layers.Dense(10))
model.summary()

---
Let us now move away from TensorFlow and try to emulate what the model does using pure NumPy.

A fully connected or dense layer can be written like this:

```
y = dot(W, x) + b
```

What are the terms in the formula? 

- `x` is the input vector (tensor),
- `W` is a weights matrix (tensor) and belongs to the trainable parameters,
- `b` is a bias vector (tensor) and also belongs to the trainable parameters, and
- `dot` is the dot product (matrix multiplication).

Note: Of course such layers can easily be stacked.

---
Next, we will build a multi-layer perceptron (MLP) using two fully connected layers by hand. 

Idea:

```
y = mlp(x)
```
or

```
h = layer_1(x)
y = layer_2(h)
```
or
```
h = w1 * x + b1
y = w2 * h + b2
```

In [None]:
# Input.
x = np.random.uniform(size=(784,))
print("x: ", x.shape)

# First Layer.
w1 = np.random.randn(512, 784)
b1 = np.random.randn(512)
print("w1:", w1.shape)
print("b1:", b1.shape)

# Latent result.
h = np.dot(w1, x) + b1
print("h: ", h.shape)

# Second layer.
w2 = np.random.randn(10, 512)
b2 = np.random.randn(10)
print("w2:", w2.shape)
print("b2:", b2.shape)

# Output.
y = np.dot(w2, h) + b2
print("y: ", y.shape)

---
Visualization always helps. How does the MLP work visually?

In [None]:
figsize = (20, 10)

plt.figure(figsize=figsize)
plt.title("Input x")
plt.imshow(x.reshape(1, -1), cmap="inferno")
plt.axis("off")
plt.show()
plt.close()

plt.figure(figsize=figsize)
plt.subplot(1, 2, 1)
plt.title("Weights w1")
plt.imshow(w1, cmap="inferno")
plt.axis("off")
plt.subplot(1, 2, 2)
plt.title("Biases b1")
plt.imshow(b1.reshape(1, -1), cmap="inferno")
plt.axis("off")
plt.show()
plt.close()

plt.figure(figsize=figsize)

plt.title("Hidden h = dot(w1, x) + b1")
plt.imshow(h.reshape(1, -1), cmap="inferno")
plt.axis("off")
plt.show()
plt.close()

plt.figure(figsize=figsize)
plt.subplot(1, 2, 1)
plt.title("Weights w2")
plt.imshow(w2, cmap="inferno")
plt.axis("off")
plt.subplot(1, 2, 2)
plt.title("Biases b2")
plt.imshow(b2.reshape(1, -1), cmap="inferno")
plt.axis("off")
plt.show()
plt.close()

plt.figure(figsize=figsize)
plt.title("Output y = dot(w2, h) + b2")
plt.imshow(y.reshape(1, -1), cmap="inferno")
plt.axis("off")
plt.show()
plt.close()

### Question: Is this Neural Network really deep?

Start with:
```
h = w1 * x + b1
y = w2 * h + b2
```

### What is the solution?

...

## Activation Functions.

Tensorflow/Keras provide a lot of activation functions. It is also possible to implement new ones.

https://keras.io/api/layers/activations/

### Linear.

In [None]:
def linear(x):
    y = []
    for number in x:
        y.append(number)
    return np.array(y)
    
x = np.linspace(-2.0, 2.0, 100).astype("float32")
y = linear(x)

plt.scatter(x, y)
plt.title("Linear")
plt.show()
plt.close()

### Rectified Linear Unit (ReLU).

Easy and efficient to compute. First choice for any layer that is not an output layer.

In [None]:
def relu(x):
    y = []
    for number in x:
        if number < 0.0:
            y.append(0.0)
        else:
            y.append(number)
    return np.array(y)
    
x = np.linspace(-4.0, 4.0, 100).astype("float32")
y = relu(x)

plt.scatter(x, y)
plt.title("ReLU")
plt.show()
plt.close()

### Sigmoid.

Usually seen as output activation for either binary classifiers or multi-class binary classifiers.

In [None]:
def sigmoid(x):
    y = 1.0 / (1.0 + np.exp(-x))
    return y
    
x = np.linspace(-4.0, 4.0, 100).astype("float32")
y = sigmoid(x)

plt.scatter(x, y)
plt.title("ReLU")
plt.show()
plt.close()

### Tanh.

Another choice for hidden layers of outputs between -1 and 1. Computationaly quite expensive.

In [None]:
def tanh(x):
    y = np.tanh(x)
    return y
    
x = np.linspace(-4.0, 4.0, 100).astype("float32")
y = tanh(x)

plt.scatter(x, y)
plt.title("tanh")
plt.show()
plt.close()

### Softmax.

Usually seen as output activation in categorical classifiers. Sometimes seen as gating functions in advanced ANN architectures.

In [None]:
def softmax(x):
    y = np.exp(x) / np.sum(np.exp(x), axis=0)
    return y
    
x = np.random.randn(10).astype("float32")
y = softmax(x)

plt.plot(x, label="x")
plt.plot(y, label="y = softmax(x)")
plt.title("softmax")
plt.legend()
plt.show()
plt.close()

### Other Activation functions.

- LeakyReLU.
- PReLU.
- Softplus.
- Softsign.
- SELU.
- ELU.
- Exponential.

### Fixing our example.

Let us make sure that our ANN is really deep.

In [None]:
# Input.
x = np.random.uniform(size=(784,))
print("x: ", x.shape)

# Latent result.
h = relu(np.dot(w1, x) + b1)
print("h: ", h.shape)

# Output.
y = softmax(np.dot(w2, h) + b2)
print("y: ", y.shape)

---

And in TensorFlow.

In [None]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(512, input_shape=(784,), activation="relu"))
model.add(tf.keras.layers.Dense(10, activation="softmax"))
model.summary()

# Summary.

In this notebook we have seen the essential mathematical building blocks for ANNs. We have learned how to stack Neural Network layers. We have learned about different activation functions. And we have seen, how we can ensure that a Deep Neural Network is really deep.