In [5]:
import torch
import torch.nn as nn

### Softmax and Crossentropy

In [2]:
x = torch.Tensor([2.0, 1.0, 0.1])

PyTorch has a softmax method called ```softmax()``` that takes one argument ```dim``` that specifies the axis/dimension along which softmax is to be applied.

In [4]:
outputs = torch.softmax(x, dim=0)
print(outputs)

tensor([0.6590, 0.2424, 0.0986])


Using Cross Entropy loss

In [6]:
loss = nn.CrossEntropyLoss()

PyTorch's ```CrossEntropyLoss()``` applies the ```LogSoftmax``` and ```NLLLoss()``` and so **the predictions should be passed without applying a softmax**. Additionally, the **targets should be given as class labels and not as one-hot vectors**.

In [7]:
y = torch.tensor([0])
#n_samples x n_classes
y_pred_good = torch.Tensor([[2.0, 1.0, 0.1]])
y_pred_bad = torch.Tensor([[0.5, 1.0, 0.3]])
l1 = loss(y_pred_good, y)
l2 = loss(y_pred_bad, y)
print(l1.item())
print(l2.item())

0.4170299470424652
1.243420124053955


Getting the actual predictions

To get the prediction as the maximum value after the application of softmax (or without it), the ```max()``` function can be used. It takes the argument ```dim``` which specifies the axis/dimension along which the maximum is to be found. It returns a tuple consisting of the maximum values contained in ```tensor_name.values``` and their corresponding indices in ```tensor_name.indices```.

In [14]:
pred_good = torch.max(torch.softmax(y_pred_good, 1), 1)
pred_bad = torch.max(torch.softmax(y_pred_bad, 1), 1)
print(pred_good.indices)
print(pred_bad.indices)


tensor([0])
tensor([1])


In a binary classification problem, when using ```BCELoss```, sigmoid is to be applied at the end. In a multi-class classification problem, when using ```CrossEntropyLoss```, softmax is not to be applied at the end.

### Activation Functions

PyTorch provides all the standard activation functions. These are provided either as modules from ```torch.nn``` or as methods directly from torch:
- ```torch.nn.Sigmoid``` or ```torch.sigmoid```
- ```torch.nn.Softmax``` or ```torch.softmax```
- ```torch.nn.ReLU``` or ```torch.relu```
- ```torch.nn.Tanh``` or ```torch.tanh```

However certain activation functions are only available under ```torch.nn.functional``` such as:
- ```torch.nn.LeakyReLU``` or ```torch.nn.functional.leaky_relu```