### Problems with squared error when training a neural networks
>-If the desired output is 1 and the actual output is 0.00000001 there is almost no gradient for logistic unit to fix up the error(change). eg: when on plateaux slow is almost horizontal(0). 
### Is there a different cost function that works better? 
### The Softmax Function:

>$x = \{0,0,0,0,0,0,0,0,...0\}^N$

>$y_i=\frac{e^{x_i}}{\sum_{j\in{group}}{e^{x_j}}}$

>We can interpret it as a probability distribution.

### Derivative of softmax function:
>$\frac{\partial{y_i}}{\partial{x_i}}=y_i(1-y_i)$

### Problems with softmax:
>**Overflow:** When x is too large sofxmax is inf

>**Underflow:** When x is too small softmax is zero

>**Fix:** Use $(x-\max_i{(x_i)})$ instead of $x_i$

>>$y_i=\frac{e^{(x-\max_i{(x_i)})}}{\sum_{j\in{group}}{e^{(x-\max_i{(x_i)})}}}$


### So what is the right cost function to use with softmax? (multiclass classification)

>$C = -\sum_{j}{t_jlogy_j}$

>where $t_j$ is the target value.

>It is the negative llog probability of the right answer

>It's called **cross-entropy**

>E.g: C has a very big gradient when the target value is 1 and the output is almost zero

[Video(Hinton)](https://www.youtube.com/watch?v=mlaLLQofmR8&t=10s)

[VIdeo(Ng)](https://www.youtube.com/watch?v=LLux1SW--oM)

In [62]:
import numpy as np

x = np.random.rand(5)
print(x)

[0.81245063 0.71863571 0.19172764 0.33516107 0.40421493]


###Softmax:

In [63]:
def softmax(x):
  softmax = np.exp(x)/np.sum(np.exp(x))
  return softmax

print(softmax(x))
print("check sum:",sum(softmax(x)))

[0.26786012 0.24387358 0.14398972 0.16619725 0.17807934]
check sum: 1.0


### Stable Softmax:

In [85]:
def stable_softmax(x):
  tmp = x-np.max(x)
  softmax = np.exp(tmp)/np.sum(np.exp(tmp))
  return softmax

x = np.random.rand(5)+100000

#Compare the two
print(softmax(x))
print("check",sum(softmax(x)))

print(stable_softmax(x))
print("check sum:",sum(stable_softmax(x)))

[nan nan nan nan nan]
check nan
[0.24745667 0.16267377 0.26770832 0.19848325 0.12367799]
check sum: 1.0


  
  


In [0]:
def cross_entropy(X,t):
    y = stable_softmax(X)
    #fro two class
    loss = -np.sum(t*np.log(y)+(1-t)*np.log(1-y))
    #loss = -np.sum(t*np.log(y))
    return loss

In [132]:
#compare implimentations in sklearn
from sklearn.metrics import log_loss

t = np.random.randint(2, size=5)
print(cross_entropy(x,t))

print("sklearn:",log_loss(t,stable_softmax(x),normalize = False))

6.1190060844735585
sklearn: 6.1190060844735585
