# Loss Function

For our problem we have the output from the model as:

$[N, C]$

And our labels are one hot encoding are the same:

$[N, C]$ 

where N is the number of samples, C is either {1, 0}.

In PyTorch torch.nn.BCELoss() is [Binary Cross Entropy Loss](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html):

Which expects the input:

***

Input: (N, *)(N,∗) where *∗ means, any number of additional dimensions

Target: (N, *)(N,∗) , same shape as the input

Output: scalar. If reduction is 'none', then (N, *)(N,∗) , same shape as input.

***

We want to apply the sigmoid function to the inputs to ensure they are in the 0 -> 1 range.

We could also just use nn.BCE() which includes this.

[A good example](https://jbencook.com/cross-entropy-loss-in-pytorch/)

In [22]:
# imports
import torch
from torch import nn
from torchvision.models import resnet18

## Accuracy

In [13]:
correct = 0

In [14]:
correct += torch.sum((pred >= 0.5).float() == y.float()).item()

In [15]:
correct

2

In [16]:
correct / (1*3)

0.6666666666666666

In [17]:
# fake ouputs from model and target
pred2 = torch.tensor([[0.5, 0., 1.]])
y2 = torch.tensor([[1., 1., 0.]])
correct += torch.sum((pred2 >= 0.5).float() == y2.float()).item()

In [18]:
correct / (2*3)

0.5

# Fake Model

In [47]:
class NeuralNetwork(nn.Module):
    def __init__(self, NUM_CLASSES, DROPOUT_RATE):
        super(NeuralNetwork, self).__init__()
        self.convolutions = nn.Sequential(*(list(resnet18().children())[0:-1]))
        self.dropout = nn.Dropout(DROPOUT_RATE)
        self.dense = nn.Linear(512, NUM_CLASSES)
        self.out = nn.Sigmoid()

    def forward(self, X):
        batch_size = X.shape[0]
        # Extracts 512x1 feature vector from pretrained resnet18 conv layers
        x = self.convolutions(X).reshape(batch_size, -1)
        # Fully connected dense layer to 19 class output
        output = self.dense(self.dropout(x))
        # Sigmoid activations on output to infer class probabilities
        output_probs = self.out(output)
        return output_probs

In [57]:
model = NeuralNetwork(19, 0.1)
X = torch.randn([5, 3, 512, 512])

pred = model(X)
pred.shape

torch.Size([5, 19])

In [58]:
pred

tensor([[0.6489, 0.5964, 0.3269, 0.5840, 0.6586, 0.4613, 0.4889, 0.4423, 0.5369,
         0.6509, 0.4612, 0.5653, 0.2981, 0.5687, 0.5420, 0.5841, 0.3994, 0.5663,
         0.4315],
        [0.6424, 0.5848, 0.4159, 0.5126, 0.6514, 0.4954, 0.4241, 0.3870, 0.5765,
         0.5841, 0.4682, 0.5380, 0.3206, 0.5519, 0.5673, 0.6103, 0.3598, 0.4929,
         0.4171],
        [0.6688, 0.5651, 0.3660, 0.5471, 0.6974, 0.3379, 0.4272, 0.4477, 0.6643,
         0.5606, 0.4750, 0.5422, 0.3074, 0.5331, 0.5420, 0.5435, 0.3296, 0.5959,
         0.4321],
        [0.6132, 0.6145, 0.3807, 0.5609, 0.6192, 0.4404, 0.4685, 0.4996, 0.6604,
         0.5599, 0.4437, 0.5957, 0.2183, 0.4819, 0.4476, 0.5802, 0.3807, 0.5231,
         0.4161],
        [0.6484, 0.6554, 0.3867, 0.5343, 0.5818, 0.3799, 0.5081, 0.3948, 0.6358,
         0.6064, 0.4788, 0.5642, 0.2464, 0.5235, 0.5132, 0.5608, 0.3935, 0.5355,
         0.4068]], grad_fn=<SigmoidBackward>)

# Fake Loss Function
Using fake model, let's get the fake loss function working

In [67]:
# Loss function
loss_fn = nn.BCELoss()

In [64]:
# First we have to make fake Ground Truths
ground_truth = torch.full((5, 19), 1)
ground_truth.shape

torch.Size([5, 19])

In [65]:
ground_truth

tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

In [70]:
# compute loss 
loss = loss_fn(pred, ground_truth.float())
loss

tensor(0.7060, grad_fn=<BinaryCrossEntropyBackward>)

We then can use loss.backward() to update the change in loss for the weights.

<code>loss.backward() computes dloss/dx for every parameter x which has requires_grad=True. These are accumulated into x.grad for every parameter x. In pseudo-code:

x.grad += dloss/dx
optimizer.step updates the value of x using the gradient x.grad. For example, the SGD optimizer performs:
x += -lr * x.grad
optimizer.zero_grad() clears x.grad for every parameter x in the optimizer. It’s important to call this before loss.backward(), otherwise you’ll accumulate the gradients from multiple passes.</code>

# Fake Optimiser

N.B In PyTorch, we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward passes. This is convenient while training RNNs. So, the default action is to accumulate (i.e. sum) the gradients on every loss.backward() call.

Because of this, when you start your training loop, ideally you should zero out the gradients so that you do the parameter update correctly. Else the gradient would point in some other direction than the intended direction towards the minimum (or maximum, in case of maximization objectives).

[Why Choose Adam](https://debuggercafe.com/adam-algorithm-for-deep-learning-optimization/)

In [77]:
# init
optim = torch.optim.Adam(model.parameters(), lr=0.0001) # we could put a scheduler here.

When we use te optimizer what we are doing is [because](https://deeplearningdemystified.com/article/fdl-4) we want to avoid certain traps in achieving minimal loss (local optima, changing how certain weights are updated).

To use the optimiser we first initialize (as above) and and then in the training loop:

<code>#Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()</code>

# Fake Training Loop
Using fake inputs and predicitons I am going to show what the training loop should look like.

In [79]:
def train_loop(model, optim, dataloader, loss_fn, USE_GPU=False):
    # How Long is our dataset
    size = len(dataloader.dataset)
    
    # Firstly set model to training
    model.train()
    # if we are using a GPU send the model to device
    if USE_GPU:
        model = model.cuda()
    
    # Logging and Stats
    loss_log = list()
    
    # Set gradients to be trainable
    with torch.set_grad_enabled(True):
        for batch_num, (X,y) in enumerate(dataloader):
            # Compute prediction and loss
            pred = model(x)
            loss = loss_fn(pred, y)
            
            # Backpropagation
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            # Logging and Stats
            if batch_num % 100 == 0:
                loss, current = loss.item(), batch * len(X)
                loss_log.append(loss)
                print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
                
    return(loss_log)

# Fake Testing Loop

In [None]:
def test_loop(model, optim, dataloader, loss_fn, USE_GPU=False):
    # How long is our dataset
    size = len(dataloader.dataset)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= size
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")