# Pre-trained models

Why spend time designing, coding or training models yourself, when someone else has already done it for you?!

PyTorch's library `torchvision` has a `models` module which contains functionality for easily downloading state of the art models.

See the documentation for this module [here](https://pytorch.org/docs/stable/torchvision/models.html).

In [None]:
from torchvision import transforms, models

model = models.detection.vgg11(pretrained=True)

We can use these models "off the shelf", without the need to train them - just like we use our usual models that we build and train ourself.

If for some reason we didn't want the pretrained weights, we could also set `pretrained=False` and just get back the model architecture.

Let's load in an image to pass through our model to make a prediction.

In [1]:
import torch
from PIL import Image
import torch.nn.functional as F

img = Image.open('plane.jpg')
img.show()

## Preparing your input


### NORMALISE YOUR input

The classification models have all been trained on the [ImageNet](http://www.image-net.org/) dataset.

During this training procedure, each example was normalised before being passed to the model.
This means that our model is trained to process normalised examples. 
An unnormalised example will look very different to a normalised example, and the model will not know how to process it.
If you forget to normalise each input, your predictions will suck.

We do not need to compute the normalisation parameters for the ImageNet dataset, they're commonly found in the documentation and other code.
These are the means and standard deviations of the pixel intensities in the red, green and blue channels of each input.
PyTorch has a transform for easily applying this normalisation.

### Resize your input 
All of the classification models consist of convolutional layers whos output is flattened and passed through linear layers. 
The linear layers require a fixed size output. 
The output of the conv layers changes if its input size changes, so we'll need to resize (warp) each input image to the expected size (which depends on the model).

We'll compose these transforms into a single transform as shown below.

In [None]:
transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

img = transform(img)

What's the size of any input to a torch model?

Let me phrase that in a way that should make the answer more obvious...

What's the first dimension in any input to a torch model?

THE BATCH DIMENSION

The only thing our input image is missing now is the batch dimension.
We can add that by "unsqueezing" a the 1st dimension to add in a new dimension of size 1.

In [None]:
print(img.shape)
img = img.unsqueeze(0)
print(img.shape)

Now we can pass our input through the model. 
Before we do that though, we need to make sure all of the layers of our model are in evaluation mode. This is so that if any of them behave differently between training and evaluation, we specify that they are now being used for evaluation.

In [None]:
model.eval()
pred = model(img)
print(pred)

When using any of the classification models, the output is a vector of logits (what we usually input to a softmax function to produce a probability distribution over classifications).
To find the name of the acutal class predicted, find the argmax of these logits.
This will be an integer index which represents a class.
To see the mapping between index and class, see [here](https://gist.github.com/ageitgey/4e1342c10a71981d0b491e1b8227328b).

In [None]:

# print(pred.shape)
pred = torch.argmax(pred, dim=1)
print(pred)


Try out some of the other classification models. Then try out some of the models for more complicated tasks like segmentation. 