![softmax_equation](./softmax_and_cross_entropy/softmax_equation.png)

![softmax_equation](./softmax_and_cross_entropy/softmax_layer.png)

sum of outputs after softmax layer is is 1.

In [1]:
import torch
import torch.nn as nn
import numpy as np

In [2]:
# this is just a mere implementation. We do not use our own function.
def softmax(x):
    #axis = 0 is column
    return np.exp(x)/np.sum(np.exp(x),axis=0)

x = np.array([[2],[1],[0.5]])
print(f'shape is {x.shape} and dimension is {x.ndim}')
print(f'softmax numpy: {softmax(x)}')

shape is (3, 1) and dimension is 2
softmax numpy: [[0.62853172]
 [0.2312239 ]
 [0.14024438]]


# WITH PYTORCH BUILT-IN SOFTMAX

<span style = 'color:cyan'>So the idea is to put more deep-learning-oriented functions in torch.nn.functional and keep general-purpose functions in under torch directly. softmax was deemed to fall into the former, sigmoid in the latter category.While there is torch.softmax, this is by accident (which is why it is not documented). [click here](https://discuss.pytorch.org/t/why-there-isnt-a-method-named-torch-softmax/90554)</span>

In [3]:
x = torch.tensor([[2],[1],[0.5]])
print(f'dimension is {x.dim()}, size is {x.size()}')
outputs = nn.functional.softmax(x,dim = 0)
print(outputs)

dimension is 2, size is torch.Size([3, 1])
tensor([[0.6285],
        [0.2312],
        [0.1402]])


![cross_entropy_equation](./softmax_and_cross_entropy/cross_entropy_equation.png)

# One-Hot Encoding Vs Label Encoding

![One_Hot vs Label](./softmax_and_cross_entropy/label_vs_one-hot.png)

## <span style = 'color:cyan'>For more information</span> [here](https://www.kaggle.com/alexisbcook/categorical-variables)

In [4]:
def cross_entropy(actual,predicted):
    loss = -np.sum(actual * np.log(predicted))
    return loss   #/float(predicted.shape[0])

In [5]:
# y must be one hot encoded
# if class 0 : [1 0 0]
# if class 1 : [0 1 0]
# if class 2 : [0 0 1]
Y = np.array([[1],[0],[0]])
print(f'shape is {Y.shape}')

shape is (3, 1)


In [6]:
# y_pred has probabilities
Y_pred_good = np.array([[0.8],[0.05],[0.15]])
Y_pred_bad = np.array([[0.1],[0.3],[0.6]])
l1 = cross_entropy(Y,Y_pred_good)
l2 = cross_entropy(Y,Y_pred_bad)
print(f'Loss on good prediction is {l1:.3f}')
print(f'Loss on bad prediction is {l2:.3f}')

Loss on good prediction is 0.223
Loss on bad prediction is 2.303


# WITH PYTORCH BUILT-IN CROSS ENTROPY

## **<span style = 'color:cyan'>nn.CrossEntropyLoss applies nn.LogSoftmax + nn.NLLLoss(negative log likelihood loss)</span>**

## <span style = 'color:cyan'>We do not need Softmax in last layer</span>

## label <span style = 'color:cyan'>must not be</span> one-hot encoded

## Y_pred has<span style = 'color:cyan'> raw scores</span>



In [7]:
loss = nn.CrossEntropyLoss()
# 3 samples
Y = torch.tensor([2,0,1])
print(Y.shape)


torch.Size([3])


In [8]:
# [2.0,1,0.1] are raw values. Not softmax
# n_samples x n_classes  = 1 sample is testing with 3 possible classes.
Y_pred_good = torch.tensor([[0.7,2,3],[2.0,1,0.1],[1,7,4]])
Y_pred_bad = torch.tensor([[0.5,2.1,0.3],[0.7,2,3],[9,0.9,5]])
print(f'shape is {Y_pred_bad.shape}')

shape is torch.Size([3, 3])


In [9]:
l1 = loss(Y_pred_good,Y)
l2 = loss(Y_pred_bad,Y)
print(f'Loss on good prediction is {l1:.3f}')
print(f'Loss on bad prediction is {l2:.3f}')

Loss on good prediction is 0.284
Loss on bad prediction is 4.305


In [10]:
_, prediction1 = torch.max(Y_pred_good,dim = 1)
_, prediction2 = torch.max(Y_pred_bad, dim = 1)
print(prediction1)
print(prediction2)

tensor([2, 0, 1])
tensor([1, 2, 0])


### How we get tensor([2, 0, 1])?
Y_pred_good is [ [ 0.7, 2, 3 ], [ 2.0, 1, 0.1 ], [ 1, 7, 4 ] ] and
dim = 1 is declared in the torch.max function. So, max function will look each row in Y_pred_good tensor.
* the first row [ 0.7, 2, 3 ] has max value at the index of 2.
* the second row [ 2.0, 1, 0.1 ] has max value at the index of 0.
* the third row [ 1, 7, 4 ] has max value at the index of 1.


## Multiclass problem

In [12]:
class NeuralNet2(nn.Module):
    def __init__(self,input_size,hidden_size,num_classes):
        super(NeuralNet2,self).__init__()
        self.linear1 = nn.Linear(input_size,hidden_size)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size,num_classes)
        
    def forward(self,x):
        out = self.linear1(x)
        out = self.relu(out)
        #no softmax at the end
        out = self.linear2(out)
        return out

In [None]:
model = NeuralNet2(input_size=28*28, hidden_size=5, num_classes=3)
criterion = nn.CrossEntropyLoss() # applies softmax built-in

## Binary Classification problem

![binary classification problem](./softmax_and_cross_entropy/binary_classification_problem.png)

In [None]:
class NeuralNet1(nn.Module):
    #no_of_class is always one in binary classification problem.
    def __init__(self,input_size,hidden_size):
        super(NeuralNet1,self).__init__()
        self.linear1 = nn.Linear(input_size,hidden_size)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size,1)
        
    def forward(self,x):
        out = self.linear1(x)
        out = self.relu(out)
        out = self.linear2(out)
        y_pred = torch.sigmoid(out)
        return y_pred

In [None]:
model = NeuralNet1(input_size=28*28, hidden_size=5)
criterion = nn.BCELoss()