<a href="https://colab.research.google.com/github/tae898/DeepLearning/blob/master/Chapter_04_Numerical_Computation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 4.1 Overflow and Underflow

As said in the book, most of you deep learning developers / engineers don't have to bother thinking about overflows and underflows. Most of us, including me, use high-level libraries. If you are curious, you can always dig into the low-level libraries where some people have done a lot of work already. Surprisingly you will find some "hacks" too, which might not make 100% sense to you, if you are a math-nerd. 

As of writing this text, 30 July, 2020, most deep learning developers use either tensorflow or pytorch as their main deep learning framework. All of the low-level implementations are done by the contributers.

Nonetheless, it' always fun to try out some examples!

In [None]:
import numpy as np

Softmax is a very easy and popular function used throughout deep learning. You can think of it as a genearlized logistic function, which we have learned in [the previous chapter](https://github.com/tae898/DeepLearning/blob/master/Chapter03_Probability_and_Information_Theory.ipynb).

In [None]:
def logistic(scalar):
    """The logistic function.
    
    Parameters
    ----------
    scalar: a float-like

    Returns
    -------
    logistic: a float-like      
    
    """
    return 1 / (1 + np.exp(-x))

def softmax(vec):
    """Softmax function.
    
    Parameters
    ----------
    vec: a numpy-array like
        a vector-like
    
    Returns
    -------
    softmax: a numpy-array like
        a vector-like

    """
    return np.exp(vec) / np.exp(vec).sum()

The input to the the softmax function should be a vector whose length is more than one.

Let's say $x=[-2, 1.5, 0.5]$. When we plug this vector into the softmax function, it returns a probability distribution. 

In [None]:
x = [-2, 1.5, 0.5]
softmax(x)

array([0.02159923, 0.71526828, 0.26313249])

Each value in the returned vector is a probability and the sum of them should be 1, since it's a probability distribution.

Let's recall the logistic function that we have learned in the previous chapter. It expects a scalar real number as input and outputs a probability, whose value is between 0 and 1. This can be thought of as the softmax function when the input vector has length 2. I will show you below.

Let's say 

$$x = [x_{1}, x_{2}] \tag{1}$$

When we plug this vector into the softmax function, then the output is 

$[\frac{e^{x_1}}{e^{x_1} + e^{x_2}}, \frac{e^{x_2}}{e^{x_1} + e^{x_2}}] \tag{2}$ This can be re-written as $[\frac{1}{1 + e^{-(x_1 - x_2)}}, \frac{1}{1 + e^{x_1 - x_2}} \tag{3}]$

From the softmax function point of view, when the input is $x=[x_1 - x_2, 0]$, what we did with equation (2) and equation (3) are identical. Let's do the math.

$[\frac{e^{x_1-x_2}}{e^{x_1-x_2} + e^{0}}, \frac{e^{0}}{e^{x_1-x_2} + e^{0}}] = [\frac{1}{1 + e^{-(x_1 - x_2)}}, \frac{1}{1 + e^{x_1 - x_2}}] \tag{4}$

This means that when the input to the softmax is a vector of length 2, then we can always make it look like $[t, 0]$, by subtracting the second element. 

Recall that $logistic(t) = \frac{1}{1+e^{-t}}$, which is the probability of the first element of equation (3) and (4), when $t=x_1 - x_2$.

What this is telling us is that, when the softmax input is a vector of length 2, then we can just simplfy it to the sigmoid function. Having two elements as input to the softmax is redundant. We don't need to calculate the probabilities twice since once we worked out the probability of the first element $p$, the second should be $1-p$ anyways. 

Enough with maths. Let's go back to overflow and underflow.

Below cell will throw you an overflow warning.

In [None]:
x = np.array([1e10, 0.1, -123])
softmax(x)

  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app


array([nan,  0.,  0.])

Obviously calculating $e^{e^{10}}$ results in a very big number.

As said in the book, we can subtract the maximum value from every element in the input vector since this doesn't change the output.

In [None]:
x = np.array([1e10, 0.1, -123])
x = x - max(x)
softmax(x)

array([1., 0., 0.])

Let's try underflow.

In [None]:
x = np.array([-1e10, -5e10])
softmax(x)

  from ipykernel import kernelapp as app


array([nan, nan])

`invalid value encountered in true_divide` is a warning that numpy throws when it encounters division by 0.

This can also be solved by subtracting the maximum value.

In [None]:
x = np.array([-1e10, -5e10])
x = x - max(x)
softmax(x)

array([1., 0.])

logsoftmax mentioned in the book is nothing but the function composition of log and softmax. $log(softmax(\pmb{x}))$ is what it means.

Let's say the vector $x=[-1000, 0.1]$. 

In [None]:
x = np.array([-1000, 0.1])
softmax(x)

array([0., 1.])

As you can see -1000 is already a pretty small number and when this goes to the softmax function, it results in a probability value of 0.

In [None]:
np.log(softmax(x))

  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  """Entry point for launching an IPython kernel.


array([-inf,  nan])

That's why above error happens!

One of the hacks we can do is to add a very small value to the probabilities so that none of them are 0.

In [None]:
x = np.array([-1000, 0.1])
z = softmax(x)
z += 1e-100
np.log(z)

array([-230.2585093,    0.       ])

# 4.2 Poor Conditioning

In [None]:
import numpy as np

In [None]:
from numpy import linalg as LA

In [None]:
w

array([1., 2., 3.])