# PyTorch Common Mistakes and ways to mitigate them

This is just a guide on things that you can do to ensure you do not waste time when you are training neural networks

## 1. You did not overfit a single batch at first

The logic here is simple if you cannot even overfit a single batch then you will not be able to train on the full dataset properly

BATCH_SIZE = 1

// this will only iterate over a single example from the train loader

data, targets = next(iter(train_loader))

for epoch in range(num_epochs):

  // comment out the line below 
  
  for batch_idx, (data, targets) in enumerate(train_loader):



## 2. You forgot to set training or evaluation mode

When checking the models accuracies make sure that you are first toggling the *eval()* mode before doing so. 

When eval() is called the model does not use dropout or batchnormalization and will produce better accuracies compared to if you were not to use it. 

## 3. You forgot to use .zero_grad()

*optimizer.zero_grad()* before the *loss.backward()* if you do not zero the gradients then you are basically using the gradients for all the previous batches that are accumulated

## 4. Using Softmax with CrossEntropy

When using Softmax as the output layer and then CrossEntropy loss which combines *nn.LogSoftmax()* and *nn.NLLLoss()* in one single class.

This will result in a slight drop in accuracy but it is still something that should be prevented.

## 5. Using bias when using BatchNorm

You have to set the **bias=False** because the bias is unnecessary 

## 6. Using view() as permute

Using view() will not have the same affects as using permute() as view() just simply places the elements in any order to satisfy the shape that is specified. 

## 7. Using bad data augmentation 

For example do not modify the target output such as using RandomFlipVertical and RandomFlipHorizontal with a probability of 1.0

## 8. Not shuffling the data 

*When using time-series data you would not want to shuffle the data as the order matters when training the model*

For other things you can then just set *shuffle=True* in the DataLoader

## 9. Not Normalizing the data

When setting transforms.Compose() you need to include *transforms.Normalize* with the accurate mean and standard deviation based on the data you have. 

## 10. Not clippling the gradients(when using RNNs, LSTMs, GRUs)

Without it the model would suffer from the exploding gradient problem.

Example:

optimizer.zero_grad()

loss.backward()

torn.nn.utils.clip_grad_norm(model.parameters(), max_norm=1)

