# **DLIP Tutorial - PyTorch**
# Part 1: Inference using pre-trained model (classification)

Y.-K. Kim  (updated 2024. 5. 9)

classification model using a pretrained CNN model provided by PyTorch
 
 The models were pre-trained on the **ImageNet** dataset (1000 classes)


## For CoLab Usage:

1. First, download this notebook
2. Then, open in Colab

# Preparation

In [None]:
import torch
from torchvision import transforms
from torchvision import models
from torchsummary import summary

import cv2 as cv
from matplotlib import pyplot as plt
from PIL import Image



### GPU Setting

In [None]:
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"

print(f"Using {device} device")
if torch.cuda.is_available(): print(f'Device name: {torch.cuda.get_device_name(0)}') 

# Load a pre-trained model from TorchVision


Let’s import models from torchvision module and see what are the different models and architectures available with us. (see: https://pytorch.org/vision/stable/models.html)

In [None]:
dir(models)

Notice that there is one entry called AlexNet and one called alexnet. The capitalised name refers to the Python class (AlexNet) whereas alexnet is a convenience function that returns the model instantiated from the AlexNet class. These convenience functions can have different parameter sets. 

Densenet121, densenet161, densenet169, densenet201, all are instances of DenseNet class but with a different number of layers – 121,161,169 and 201, respectively.

### Load Pretrained VGG-16
We will use VGG-16 for this tutorial.  Check the model architecture using summary

In [None]:
model = models.vgg16(pretrained=True)
model.eval() # run the model with evaluation mode
model = model.cuda()

summary(model, (3, 224, 224))

# Test image preparation

In this tutorial, we load one test image file from the following URL

In [None]:
# Download an example image from URL
url = "https://3.bp.blogspot.com/-W__wiaHUjwI/Vt3Grd8df0I/AAAAAAAAA78/7xqUNj8ujtY/s1600/image02.png"
filename = "test_image.jpg"

import urllib
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)

# image show

# from google.colab.patches import cv2_imshow
# img = cv.imread(filename)
# cv2_imshow(img)
img = cv.imread(filename)
dst = cv.cvtColor(img, cv.COLOR_BGR2RGB)
plt.imshow(dst)
plt.show()


# Inference using pretrained model

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225].

In [None]:


preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

Here's a sample execution. 

The output is the probability value for each 1000 classes. (the sum of all probabilities is 1)

In [None]:
# sample execution (requires torchvision)
# Normalize and resize to 224x224
input_image = Image.open(filename)
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model

# move the input and model to GPU for speed if available
if torch.cuda.is_available():
    input_batch = input_batch.to('cuda')
    model.to('cuda')

# Forward process
with torch.no_grad():
    output = model(input_batch)
# Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes
#print(output[0])

# The output has unnormalized scores. To get probabilities, you can run a softmax on it.
probabilities = torch.nn.functional.softmax(output[0], dim=0)
print(probabilities)

What do we do with the output which is a vector with 1000 elements? We need to get class label list of the image. 

Thus, we will load label information from a text file having a list of all the 1000 class labels. The line number specifies the class number

In [None]:
# Download ImageNet labels
import urllib
url = 'https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt'
filename = 'imagenet_classes.txt'
urllib.request.urlretrieve(url, filename)

Now, we need to find out the index for the maximum probability. This index is the prediction class.
For this tutorial, we will print the top-5 probability

In [None]:
# Read the categories
with open("imagenet_classes.txt", "r") as f:
    categories = [s.strip() for s in f.readlines()]

# Show top 5 categories per image
top5_prob, top5_catid = torch.topk(probabilities, 5)
for i in range(top5_prob.size(0)):
    print(categories[top5_catid[i]], top5_prob[i].item())