# 03 Losses, Stochastic Gradient Descent
## Dr. Tristan Behrens

In the following we will lean about the essential Deep Learning building blocks. We will learn 

- the most common loss functions,
- the intuition behind Stochastic Gradient Descent, and
- strategies to overcome overfitting.

## Make sure that we have TensorFlow 2 enabled.

In [None]:
%tensorflow_version 2.x

## Imports.

In [None]:
import tensorflow as tf
import numpy as np
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt

## Loss Functions and Their Use.

These four are the most common:

- Binary Crossentropy (BCE), mainly used for binary classifiers,
- Categorical Crossentropy (CCE), mainly used for categorical classifiers,
- Mean Squared Error (MSE), mainly used for regressions, and
- Mean Absolute Error (MAE), mainly used for regressions, too.

### Binary Crossentropy.

Crossentropy, in layman's terms, is the distance between probability distribtion. Binary Crossentropy is uses when we have two classes.

In [None]:
y_true = [[0., 1.], [0., 0.]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
bce = tf.keras.losses.BinaryCrossentropy()
bce(y_true, y_pred).numpy()

### Categorical Crossentropy.

We use Categorical Crossentropy when we have more than two classes.

In [None]:
y_true = [[0, 1, 0], [0, 0, 1]]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
cce = tf.keras.losses.CategoricalCrossentropy()
cce(y_true, y_pred).numpy()

### Mean Squared Error.

Measures the average of the quares of the errors. Mostly used in regression problems.

In [None]:
y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [1., 0.]]
mse = tf.keras.losses.MeanSquaredError()
mse(y_true, y_pred).numpy()

### Mean Absolute Error.

Measures the average of the absolutes of the errors. Mostly used in regression problems.

In [None]:
y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [1., 0.]]
mae = tf.keras.losses.MeanAbsoluteError()
mae(y_true, y_pred).numpy()

## Intuition behind Stochastic Gradient Descent.

![](http://www.its.caltech.edu/~nazizanr/imgs/nonconvex3.jpg)

(Image copyright Navid Azizan, Caltech)

## Overfitting, Underfitting, Best Practices.

Ways to overcome underfitting:
- Train longer,
- bigger Neural Network architecture.

Ways to overcome overfitting:

- More data,
- better data,
- Data augmentation,
- early stopping,
- smaller Neural Network architecture,
- Dropout and other regularization techniques.

## Summary.

In this notebook we have learned the use of loss functions when it comes to assessing the quality of Neural Networks. On top of that, we heard the intuition behind our learning algorithm Stochastic Gradient Descent. And we discussed underfitting and overfitting.