LeViT
====

**LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference**
 * Paper: https://arxiv.org/abs/2104.01136

![LeViT Architecture](../assets/levit_architecture.png)

In [3]:
from PIL import Image
import timm
import torch


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = timm.create_model(
    'levit_128.fb_dist_in1k', pretrained=True
)
model.eval().to(device);

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(
    **data_config, is_training=False
)

In [5]:
image_path = "../samples/plants.jpg"
img = Image.open(image_path).convert("RGB")

output = model(transforms(img).unsqueeze(0).to(device))

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

for i in range(5):
    print(f"Class: {top5_class_indices[0][i].item()}, Probability: {top5_probabilities[0][i].item():.2f}%")


Class: 738, Probability: 62.17%
Class: 883, Probability: 27.89%
Class: 725, Probability: 0.65%
Class: 503, Probability: 0.47%
Class: 773, Probability: 0.46%
