## SoftMax and Cross Entropy Loss
------
Softmax and cross-entropy loss are commonly used together in machine learning for multi-class classification tasks. Let's understand each concept individually and then explore how they are related.

**Softmax Function:** The softmax function is a mathematical function that takes a vector of real numbers as input and transforms it into a probability distribution. It is typically used in multi-class classification problems to convert raw scores or logits into probabilities. The softmax function calculates the exponential of each element in the input vector, normalizes the results, and ensures that the sum of the probabilities adds up to 1. This normalization allows us to interpret the outputs as probabilities of belonging to different classes.

***The softmax function is defined as follows for an input vector x of length K:***
- softmax(x_i) = exp(x_i) / (sum(exp(x_j)) for j=1 to K)

***Cross-Entropy Loss:*** Cross-entropy loss, or simply "cross-entropy," is a loss function commonly used in classification tasks to measure the dissimilarity between predicted probabilities and true class labels. It quantifies the difference between the predicted probability distribution (obtained from softmax) and the actual target distribution (one-hot encoded vector representing the true class).

Given a set of N training samples, where each sample has K classes, and denoting the predicted probability distribution for the i-th sample as p_i and the true class labels as y_i, the cross-entropy loss is computed as follows:
- CrossEntropyLoss = - (1/N) * sum(sum(y_i * log(p_i)))

The inner sum is taken over all classes, and the outer sum is taken over all training samples.

## Relation between Softmax and Cross-Entropy Loss
--------

The softmax function is typically applied to the logits or raw scores obtained from the final layer of a neural network before making predictions. It transforms these scores into probabilities that sum up to 1. The cross-entropy loss, on the other hand, measures the difference between these predicted probabilities and the true class labels.

During training, the softmax function is used to obtain the predicted probabilities, and then the cross-entropy loss is computed based on these probabilities and the true labels. The goal is to minimize the cross-entropy loss, which effectively encourages the model to predict higher probabilities for the correct classes and lower probabilities for the incorrect classes.

In summary, the softmax function is used to transform logits into probabilities, while the cross-entropy loss measures the dissimilarity between predicted probabilities and true class labels, providing a gradient for training the model in a supervised learning setting.


In [1]:
import torch 
import torch.nn as nn
import numpy as np

In [2]:
def softmax(x):
    return np.exp(x) / np.sum(np.exp(x), axis = 0)

In [4]:
x = np.array([2.0, 1.0, 0.1])
outputs = softmax(x)
print('softmax numpy:', outputs)

softmax numpy: [0.65900114 0.24243297 0.09856589]


In [5]:
x = torch.tensor([2.0, 1.0, 0.1])
outputs = torch.softmax(x, dim = 0)
print('softmax numpy:', outputs)

softmax numpy: tensor([0.6590, 0.2424, 0.0986])
