# Type of Errors and Loss Functions in PyTorch
We will delve into multiple kind of loss functions


## Mean Absolute Error Loss (L1 loss)   
Its a simple average of the absolute difference between the target value and the value predicted by the model. It is calculated as: 
$$\frac{1}{n}\sum_{i=1}^{n}|y_{i}-\hat{y}_{i}|$$   

This could be used for regression problems. Due to MAE absolute nature is performing better to handle outliers than MSE.

In [2]:
import torch
import torch.nn as nn

input = torch.randn(5, 5, requires_grad=True)
target = torch.randn(5, 5)

# Mean Absolute Error
mae = nn.L1Loss()
output = mae(input, target) 
output.backward()

print('input: ', input) 
print('target: ', target)
print('output: ', output)


input:  tensor([[ 0.4343,  0.9707,  1.5393,  0.2493,  1.3898],
        [ 0.6490, -0.6834,  1.5204,  0.8786, -0.0912],
        [ 1.3075,  0.3545,  0.1328, -0.3996, -0.0953],
        [-0.0098, -0.0437, -1.0588,  0.6889,  0.6560],
        [-1.3469, -1.1643, -0.0500, -0.8692,  0.7422]], requires_grad=True)
target:  tensor([[ 1.8299, -1.4373, -0.0747,  0.1170,  0.7938],
        [-3.0914,  0.5931,  2.5211,  0.4198,  0.2509],
        [ 0.2228, -0.0819, -0.8674,  0.0950,  0.2390],
        [-0.9124, -0.0527, -1.1152, -0.3570, -0.6817],
        [-0.9945, -1.9955,  0.0796, -0.6463, -0.0321]])
output:  tensor(0.8791, grad_fn=<MeanBackward0>)


## Mean Squared Error Loss (L2 Loss)   
Its a simple average of the squared difference between the target value and the value predicted by the model. It is calculated as: 
$$\frac{1}{n}\sum_{i=1}^{n}(y_{i}-\hat{y}_{i})^{2}$$

This model punishes the model making a big mistakes (wrong prediction) and encourage a small mistakes. Good use for regression problem and default loss function for most Pytorch regression problems.

In [3]:
mse_loss = nn.MSELoss()
output = mse_loss(input, target)
output.backward()

print('input: ', input) 
print('target: ', target)
print('output: ', output)


input:  tensor([[ 0.4343,  0.9707,  1.5393,  0.2493,  1.3898],
        [ 0.6490, -0.6834,  1.5204,  0.8786, -0.0912],
        [ 1.3075,  0.3545,  0.1328, -0.3996, -0.0953],
        [-0.0098, -0.0437, -1.0588,  0.6889,  0.6560],
        [-1.3469, -1.1643, -0.0500, -0.8692,  0.7422]], requires_grad=True)
target:  tensor([[ 1.8299, -1.4373, -0.0747,  0.1170,  0.7938],
        [-3.0914,  0.5931,  2.5211,  0.4198,  0.2509],
        [ 0.2228, -0.0819, -0.8674,  0.0950,  0.2390],
        [-0.9124, -0.0527, -1.1152, -0.3570, -0.6817],
        [-0.9945, -1.9955,  0.0796, -0.6463, -0.0321]])
output:  tensor(1.4232, grad_fn=<MseLossBackward0>)


## PyTorch Negative Log-Likelihood Loss Function (NLLLoss) 
The NLL is applied only on models with softmax function as activation layers. Softmax is used to predict the probability of each class and the NLLLoss is used to calculate the negative log likelihood loss. It is calculated as: 
$$\frac{1}{n}\sum_{i=1}^{n}-y_{i}\log(\hat{y}_{i})$$

This loss function punishes the model making correct prediction with smaller probabilities, and enocurage for making prediction with higher probabilities. It is used for multiclass classification problems.

In [6]:
target = torch.tensor([0, 1, 2, 3, 4])

m = nn.LogSoftmax(dim=1)
nll_loss = nn.NLLLoss()
output = nll_loss(m(input), target)
output.backward()

print('input: ', input) 
print('target: ', target)
print('output: ', output)


input:  tensor([[ 0.4343,  0.9707,  1.5393,  0.2493,  1.3898],
        [ 0.6490, -0.6834,  1.5204,  0.8786, -0.0912],
        [ 1.3075,  0.3545,  0.1328, -0.3996, -0.0953],
        [-0.0098, -0.0437, -1.0588,  0.6889,  0.6560],
        [-1.3469, -1.1643, -0.0500, -0.8692,  0.7422]], requires_grad=True)
target:  tensor([0, 1, 2, 3, 4])
output:  tensor(1.7902, grad_fn=<NllLossBackward0>)


## PyTorch Cross-Entropy Loss Function (CrossEntropyLoss)
This computes the difference between two probability distributions.In the context of classification, it is used to quantify the difference between the predicted probability distribution and the true distribution. Its written as:   
$$\frac{1}{n}\sum_{i=1}^{n}-y_{i}\log(\hat{y}_{i})-(1-y_{i})\log(1-\hat{y}_{i})$$

Where y is the true label and y_hat is the predicted label.

The logarithmic terms in the BCE loss ensure that when the model is wrong, especially with high confidence, the penalty is severe. If the model is only slightly incorrect in its prediction, the BCE loss won't penalize it as heavily as when the model is very confident and wrong.   

Consider Mean Squared Error (MSE) for a binary classification task. While MSE will penalize wrong predictions, the penalty for being confidently wrong isn't as severe as with BCE. Thus, BCE is more suited for tasks where we want the model not just to predict correctly but also with high confidence.   

BCE becomes especially crucial in applications where being confidently wrong can have severe consequences. For instance, in medical diagnoses, a model that's very confident about an incorrect diagnosis could lead to inappropriate treatment or Spam SMS. In such cases, BCE is a better choice than MSE.

In [8]:
target = torch.empty(5, dtype=torch.long).random_(5)
cross_entropy_loss = nn.CrossEntropyLoss()
output = cross_entropy_loss(input, target)
output.backward()

print('input: ', input) 
print('target: ', target)
print('output: ', output)

input:  tensor([[ 0.4343,  0.9707,  1.5393,  0.2493,  1.3898],
        [ 0.6490, -0.6834,  1.5204,  0.8786, -0.0912],
        [ 1.3075,  0.3545,  0.1328, -0.3996, -0.0953],
        [-0.0098, -0.0437, -1.0588,  0.6889,  0.6560],
        [-1.3469, -1.1643, -0.0500, -0.8692,  0.7422]], requires_grad=True)
target:  tensor([1, 4, 2, 0, 4])
output:  tensor(1.7043, grad_fn=<NllLossBackward0>)


## Hinge Embedding Loss Function (Hinge Loss)   
This compute the loss where the predicted output is a score between -1 and 1. It is used for training classifiers. It is calculated as:
$$\frac{1}{n}\sum_{i=1}^{n}\max(0,1-y_{i}\hat{y}_{i})$$

This could be used for classification problems whether two inputs are disimilar or similar. Also learning nonlinear embeddings or semi supervised learning

In [9]:
target = torch.randn(5, 5)

hinge_loss = nn.HingeEmbeddingLoss()
output = hinge_loss(input, target)
output.backward()

print('input: ', input) 
print('target: ', target)
print('output: ', output)

input:  tensor([[ 0.4343,  0.9707,  1.5393,  0.2493,  1.3898],
        [ 0.6490, -0.6834,  1.5204,  0.8786, -0.0912],
        [ 1.3075,  0.3545,  0.1328, -0.3996, -0.0953],
        [-0.0098, -0.0437, -1.0588,  0.6889,  0.6560],
        [-1.3469, -1.1643, -0.0500, -0.8692,  0.7422]], requires_grad=True)
target:  tensor([[-5.0075e-01,  7.5170e-01, -1.0888e+00,  5.2347e-01,  2.0991e+00],
        [-1.3888e+00,  2.0310e-02,  1.7041e+00, -8.8872e-01, -2.8956e-01],
        [-5.3095e-02, -1.8192e+00,  1.1073e-01,  1.0953e+00, -9.6332e-01],
        [ 1.5442e-03,  8.9922e-01, -4.1892e-01, -2.0913e+00, -9.8721e-01],
        [ 8.7475e-01, -9.7459e-02,  1.0512e-01,  1.1073e+00, -3.4975e-01]])
output:  tensor(1.0703, grad_fn=<MeanBackward0>)


## Margin Ranking Loss Function (Margin Loss)
This compute the loss / criterion to predict the relative distance two inputs x1 and x2. It is used for learning to rank. It is calculated as:
$$\frac{1}{n}\sum_{i=1}^{n}\max(0,-y_{i}(x_{1i}-x_{2i})+margin)$$

This could be used for ranking problems

In [10]:
input_one = torch.randn(5, requires_grad=True)
input_two = torch.randn(5, requires_grad=True)
target = torch.rand(5).sign()

ranking_loss = nn.MarginRankingLoss()
output = ranking_loss(input_one, input_two, target)
output.backward()

print("input one:", input_one)
print("input two:", input_two)
print("target:", target)
print("output:", output)

input one: tensor([ 1.0745, -0.1528,  1.2359,  0.8667,  1.0300], requires_grad=True)
input two: tensor([ 1.0977, -0.1903, -0.5532,  0.2529,  0.0613], requires_grad=True)
target: tensor([1., 1., 1., 1., 1.])
output: tensor(0.0047, grad_fn=<MeanBackward0>)
