Cross-Entropy Loss is used for classification problems where we predict probabilities for multiple classes. 
It measures how different the predicted probability distribution is from the actual distribution.

Why Do We Need Cross-Entropy?

Unlike MSE (which is good for regression), cross-entropy is designed for probability-based predictions.

It strongly penalizes incorrect confident predictions, ensuring the model doesn’t become too confident in wrong predictions.

Works best when combined with Softmax Activation in multi-class classification.

For binary classification (2 classes):

L=−(ylog(ŷ)+(1−y)log(1− ŷ))

For multi-class classification (more than 2 classes):

L=− i=1∑C yi log( ŷi)

Where:

y is the true label (one-hot encoded for multi-class).

ŷ is the predicted probability for each class.

C is the number of classes.

In [3]:
import torch

import torch.nn as nn

In [4]:
# Sample logits (before softmax)

logits = torch.tensor([[2.0, 1.0, 0.1], # Example 1
                       [0.5, 2.5, 1.2],  # Example 2
                       [1.0, 0.3, 3.1]]) # Example 3)

In [5]:
# True labels (class indices)

true_labels = torch.tensor([0,1,2]) # Correct classes for each example

In [6]:
# Define Cross-Entropy Loss function

criterion = nn.CrossEntropyLoss()

In [7]:
# Compute loss

loss = criterion(logits, true_labels)

print(f"Cross-Entropy Loss: {loss.item():.4f}")

Cross-Entropy Loss: 0.3091


Explanation:

Logits: These are raw scores before applying softmax (3 classes per sample).

True Labels: Given as class indices (not one-hot encoded).

CrossEntropyLoss:

Applies softmax internally.

Converts logits into probabilities.

Computes the negative log-likelihood for the correct class.

Output: The final loss value.