# Loss functions and optimizers


## Basics

The network doesn't only need to transform input to output, it needs a objective for that system. Here, we need to define a learning objective which a function that accepts two arguments:

1. network's output
2. The desired output

The loss function's job is to return a single number, how close the network's prediction is to the desired output (loss value).

Using the loss value, we calculate the gradients of the networks parameters and adjust them to decrease the loss value

## loss functions

Reside within pytorch module `nn`. Most common listed below

`nn.MSELoss`: mean square error between arguments, which is the standard loss for regression problems

`nn.BCELoss` and `nn.BCWEWithLogits`: Binary cross-entropy loss. When the output is a single probability value

`nn.CrossEntropyLoss` and `nn.NLLLoss`: Maximum likelihood criteria that's used in multi-class classification problems

## Optimizers

Goal is to take gradients of model parameters and change these parameters in order to decrease the loss value

Reside in `torch.optim`. Common ones are

`SGD`: vanilla stochastic gradient descent with an optional momentum extension

`RMSprop`: an optimizer proposed by Geoffrey Hinton

`Adagrad`: An adaptive gradient optimizer

`Adam`: combination of `RMSprop` and `Adagrad`

## Blueprint of training loop

```
for batch_x, batch_y in iterate_batches(data, batch_size=32): #1
    batch_x_t = torch.tensor(batch_x)                           #2    
    batch_y_t = torch.tensor(batch_y)                           #3    
    out_t = net(batch_x_t)                                      #4    
    loss_t = loss_function(out_t, batch_y_t).                   #5    
    loss_t.backward()                                           #6    
    optimizer.step()                                            #7    
    optimizer.zero_grad()   
```

One full iteration over the dataset is called an *epoch*

1. Iterate over dataset of certain batch size
2. Convert batch x variables to tensor
3. convert batch y to tensor
4. feed the batch through the network
5. Pass the input and output to the loss function
6. Calculate the gradients using the `backward()` method, which remembers the graph and calculates the gradient for every leaf. Gradients accummulate in `tensor.grad`
7. Apply the gradients
8. Reset the gradients back to zero