#### Mobilenet
- MobileNetV3Small (embedding generation using Cross Entropy Loss, 2048) -> validation accuracy is 60% after 2 rounds of  -> `mobilenet_v3_ft_epoch_6.pth`
- MobileNetV3Large (embedding generation using Cross Entropy Loss, 2048) -> validation accuracy is 65% -> after 1 round of ft -> `mobilenet_large_v3_ft_epoch_5.pth`

#### EfficientNet
- EfficientNet B3 (embedding generation using CE, 2048) -> validation accuracy is 67% -> after 2 rounds of ft -> `efficientnet_b3_ftft_epoch_3` ----Kaggle 64%
- EfficientNet V2 M (embedding generation using CE, 2048) -> validation accuracy is 64% -> after 2 rounds of ft -> `efficientnet_v2_m_ftft_epoch_3`
- EfficientNet B4 (embedding generation using CE, 2048) -> validation accuracy is 64.7% -> after 2 rounds of ft -> ``
- EfficientNet B4 with Aug (embedding generation using CE, 2048) -> validation accuracy is 57.7% -> after 2 rounds of ft
- EfficientNet B3 With AUG (embedding generation using CE, 4096) -> validation accuracy is 65% -> after 1 round of ft (second round didn't change anything) -> `efficientnet_b3_aug_ft_epoch_3`

#### Others
- Max Vit T (without AUG) -> 45-50% after ft
- Deep Encoder 52% after ft (WITH AUG)
- Convnext Tiny (AUG) 1000 embedding and  -> 52%
- Small(no AUG) 2048 embedding -> 50 (Convnexts converge very fast, but get stuck there)
- **ConvV4 (no AUG) 1024 embedding -> 78.6% -> `convv4_ft_5` -> 75% Kaggle**
- **ConvV3 (no AUG) 1024 embedding -> 78.4% (lost the 79.5) -> `convv3_ft_3` -> ?% Kaggle**
- Conv V5 2048 -> 62%
- **Conv V6 1024 -> 79.3% -> `convv6_ftft_epoch_3`**

#### Ensemble
- MobileNetv3 large, EfficientNet B3, EfficientNet b4 -> 70.9% accuracy on validation, 68.2% Kaggle
- MobileNetv3 large, EfficientNet B3, EfficientNet b4, efficient Net v2M -> 71.1 accuracy on validation, ? on Kaggle
- MobileNetv3 samll, MobileNetv3 large, EfficientNet V3, EfficientNet b4, efficient Net v2M -> 70.1%, mobilenet v3 small decreases efficiency
- MobileNetv3 large, EfficientNet B3, Efficient net B3 with Aug, EfficientNet b4, efficient Net v2M -> 73.7 accuracy on validation, 72 on Kaggle
- MobileNetv3 large, EfficientNet B3, Efficient net B3 with Aug, EfficientNet b4, efficient Net v2M, convv4 -> 79.15 accuracy on validation -> Kaggle ?
- MobileNetv3 large (65), EfficientNet B3 (67), Efficient net B3 with Aug (65), EfficientNet b4 (65), convv4 (79) -> 76.45 accuracy on validation
- EfficientNet B3 (67), Efficient net B3 with Aug (65), convv4 (70) -> 78.75 accuracy on validation
- EfficientNet B3 (67), Efficient net B3 with Aug (65), EfficientNet b4 (65), convv4 (79) -> 79.55 accuracy on validation **this is probably better than our current submission**
- EfficientNet B3 (67), Efficient net B3 with Aug (65), efficient Net v2M , convv4 (79) -> 79.3 accuracy on validation
- EfficientNet B3 (67), Efficient net B3 with Aug (65), efficient Net v2M (65), convv3, convv4 (79) -> 81.15% accuracy on validation (80.25 now)
- EfficientNet B3 (67), Efficient net B3 with Aug (65), EfficientNet b4  (65), convv3, convv4 (79) -> 80.5% accuracy on validation
- EfficientNet B3 (67), Efficient net B3 with Aug (65), convv3, convv4 (79) -> 79.9 accuracy on validation
- EfficientNet B3 (67), convv3, convv4 (79) -> 80.85% accuracy on validation
- Efficient net B3 with Aug (65), convv3, convv4 (79) -> 80.85%
- convv3, convv4 -> 79.55%
- **EfficientNet B3 (67), Efficient net B3 with Aug (65), efficient Net v2M  (65), convv3, convv4 (78), conv6 (79) -> 81.5% accuracy on validation**
- convv3, convv4 (78), conv6 (79) -> 81.45

In [1]:
import timm
import torch
import torch.optim as optim
import torchvision.utils as vutils
from torchvision import models
import torchvision.transforms as T
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset
from torch.nn import functional as F
import numpy as np
import pandas as pd
import os
from PIL import Image

import math
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, accuracy_score, classification_report

device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)

cuda


In [2]:
from custom_classifiers import ConvEncoderV3, ConvEncoderV4, ConvEncoderV5, ConvEncoderV6
from pytorch_classifiers import (
    EfficientNetB3,
    EfficientNetB4,
    ConvNextT, 
    MobileNetV3Small, 
    MobileNetV3Large, 
    SwinV2TEncoder,
    EfficientNetV2M,
    ensemble_majority_voting
)
from utils import imshow, CustomDataset, TestDataset

In [3]:
torch.cuda.empty_cache()
import gc
gc.collect()

0

### Data Setup

In [5]:
train_df = pd.read_csv('dataset/train.csv')
valid_df = pd.read_csv('dataset/val.csv')
test_df = pd.read_csv('dataset/test.csv')

print(f"Train: {train_df.shape}; Valid: {valid_df.shape}; Test: {test_df.shape}")

Train: (13000, 2); Valid: (2000, 2); Test: (5000, 1)


In [6]:
train_df.columns, train_df.head()

(Index(['Image', 'Class'], dtype='object'),
                                       Image  Class
 0  0be195e0-eb16-4f29-ac7c-196dec9da47d.png     79
 1  28045419-b3b2-415b-9085-b4d241944235.png     94
 2  b7078f35-d239-4dd6-babb-1af7be1b9364.png     79
 3  0f54f663-2953-432b-bdd4-9b9f7a78bfb9.png     23
 4  ba11dda2-37d7-4d28-8bbb-128d452a171c.png     88)

In [7]:
train_path = 'dataset/train_images/'
val_path = 'dataset/val_images/'
test_path = 'dataset/test_images/'

In [8]:
train_image_path = os.path.join(train_path,train_df['Image'][0])
print(train_image_path)

dataset/train_images/0be195e0-eb16-4f29-ac7c-196dec9da47d.png


In [9]:
image = Image.open(train_image_path)

In [10]:
train_image_names = train_df['Image'].tolist()
train_image_labels = train_df['Class'].tolist()
# print(train_image_names[:5], train_image_labels[:5])

train_image_paths = [os.path.join(train_path, image_name)for image_name in train_image_names]
print(train_image_paths[:5], train_image_labels[:5])

['dataset/train_images/0be195e0-eb16-4f29-ac7c-196dec9da47d.png', 'dataset/train_images/28045419-b3b2-415b-9085-b4d241944235.png', 'dataset/train_images/b7078f35-d239-4dd6-babb-1af7be1b9364.png', 'dataset/train_images/0f54f663-2953-432b-bdd4-9b9f7a78bfb9.png', 'dataset/train_images/ba11dda2-37d7-4d28-8bbb-128d452a171c.png'] [79, 94, 79, 23, 88]


In [11]:
val_image_names = valid_df['Image'].tolist()
val_image_labels = valid_df['Class'].tolist()

val_image_paths = [os.path.join(val_path, image_name)for image_name in val_image_names]
print(val_image_paths[:5], val_image_labels[:5])

['dataset/val_images/e91a8fbc-d3ba-4b39-8c2f-04c14de78e5e.png', 'dataset/val_images/7c40819b-c3ce-4a91-9e98-c3df11b63623.png', 'dataset/val_images/d54269d7-fe86-4112-9c0f-99cc6ab8d9c0.png', 'dataset/val_images/cbf9ac9e-0859-4b54-ae65-347587b45deb.png', 'dataset/val_images/6aafce3f-9002-44e0-9a99-ffe9b49c9bac.png'] [32, 85, 41, 97, 62]


In [12]:
test_image_names = test_df['Image'].tolist()
test_image_paths = [os.path.join(test_path, image_name)for image_name in test_image_names]
print(test_image_paths[:5])

['dataset/test_images/046f61c4-b825-459a-8b2d-07503f5b94a5.png', 'dataset/test_images/67db001f-e287-4950-ac49-6683b493d1a4.png', 'dataset/test_images/9f1d36a1-f046-4c5d-9e8a-0a3758ff605c.png', 'dataset/test_images/5ffef91a-aaf9-4d0d-a219-83a9f5282361.png', 'dataset/test_images/c00af570-0000-4f8f-a3f2-c37a981bfdb1.png']


### Data Loaders

In [13]:
# Define image transformations
transform = T.Compose([
    # T.RandomHorizontalFlip(p=0.5),
    # T.RandomVerticalFlip(p=0.5),
    # T.RandomRotation(30),
    # T.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5),
    T.ToTensor(),
    # T.Normalize([0.42835271, 0.40658227, 0.34071648], [0.2144312,  0.21884131, 0.20464434])
    T.Normalize((0.5,), (0.5,)),
])

In [14]:
val_transform = T.Compose([
    # transforms.Resize((64, 26424)),  # Resize the image to 224x224 (MobileNetV3 input size)
    # T.RandomHorizontalFlip(),  # Randomly flip the images horizontally
    # T.RandomRotation(10),  # Randomly rotate images in the range (-10, 10) degrees
    T.ToTensor(),  # Convert PIL image to tensor
    T.Normalize((0.5),(0.5)),  # Normalization
])

In [15]:
train_dataset = CustomDataset(train_image_paths, train_image_labels, transform = transform)
train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)

In [16]:
val_dataset = CustomDataset(val_image_paths, val_image_labels, transform = val_transform)
val_loader = DataLoader(val_dataset, batch_size=128, shuffle=False)

## Conv V6

In [22]:
model = ConvEncoderV6(embedding_dim=1024)
model.load_state_dict(torch.load('convv6_ftft_epoch_3.pth')['model_state_dict'])
model = model.to(device)

In [23]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.00001)
# optimizer.load_state_dict(torch.load('convnext_tiny_aug_7.pth')['optimizer_state_dict'])

In [24]:
# Training loop
num_epochs = 3 # Number of epochs
for epoch in range(1, num_epochs+1):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:  # Assuming train_loader is your DataLoader
        images, labels = images.to(device), labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    # Print statistics
    print(f'Epoch {epoch}/{num_epochs}, Loss: {running_loss/len(train_loader)}')

    # Validation loop (optional, but recommended)
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:  # Assuming val_loader is your validation DataLoader
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    val_loss /= len(val_loader)
    val_accuracy = 100 * correct / total
    print(f'Validation Loss: {val_loss:.4f}, Accuracy: {val_accuracy:.2f}%')

    if epoch % 1 == 0 and epoch != 0:
       # Save checkpoint
       checkpoint = {
           'epoch': epoch,
           'model_state_dict': model.state_dict(),
           'optimizer_state_dict': optimizer.state_dict(),
           'loss': val_loss,
       }
       torch.save(checkpoint, f'convv6_ftft_epoch_{epoch}.pth')

print('Training Complete')

Epoch 1/3, Loss: 0.2412593969527413
Validation Loss: 0.6346, Accuracy: 79.20%
Epoch 2/3, Loss: 0.23333924205279818
Validation Loss: 0.6312, Accuracy: 78.85%
Epoch 3/3, Loss: 0.2306116392799452
Validation Loss: 0.6324, Accuracy: 79.30%
Training Complete


## Conv Encoder V5

In [19]:
# Example usage
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ConvEncoderV5(embedding_dim=2048)
model.load_state_dict(torch.load('convv5_epoch_6.pth')['model_state_dict'])
model = model.to(device)

In [20]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.00001)
# optimizer.load_state_dict(torch.load('convnext_tiny_aug_7.pth')['optimizer_state_dict'])

In [21]:
# Training loop
num_epochs = 20 # Number of epochs
for epoch in range(1, num_epochs+1):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:  # Assuming train_loader is your DataLoader
        images, labels = images.to(device), labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    # Print statistics
    print(f'Epoch {epoch}/{num_epochs}, Loss: {running_loss/len(train_loader)}')

    # Validation loop (optional, but recommended)
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:  # Assuming val_loader is your validation DataLoader
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    val_loss /= len(val_loader)
    val_accuracy = 100 * correct / total
    print(f'Validation Loss: {val_loss:.4f}, Accuracy: {val_accuracy:.2f}%')

    if epoch % 1 == 0 and epoch != 0:
       # Save checkpoint
       checkpoint = {
           'epoch': epoch,
           'model_state_dict': model.state_dict(),
           'optimizer_state_dict': optimizer.state_dict(),
           'loss': val_loss,
       }
       torch.save(checkpoint, f'convv5_ft_epoch_{epoch}.pth')

print('Training Complete')

Epoch 1/20, Loss: 0.029198386505538344
Validation Loss: 1.3514, Accuracy: 61.95%
Epoch 2/20, Loss: 0.022266855235120245
Validation Loss: 1.3406, Accuracy: 62.20%
Epoch 3/20, Loss: 0.019844524845407874
Validation Loss: 1.3454, Accuracy: 61.30%



KeyboardInterrupt



## ConvEncoder v4

In [85]:
# Example usage
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ConvEncoderV4(embedding_dim=1024)
model.load_state_dict(torch.load('convv4_ft_5.pth')['model_state_dict'])
model = model.to(device)


In [86]:
input_image = torch.randn(1, 3, 64, 64).to(device) # Example input tensor (batch_size, channels, height, width)
output = model(input_image)
print(output.shape)

torch.Size([1, 1024])


In [59]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.00001)
# optimizer.load_state_dict(torch.load('convnext_tiny_aug_7.pth')['optimizer_state_dict'])

In [3]:
# Training loop
num_epochs = 4 # Number of epochs
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:  # Assuming train_loader is your DataLoader
        images, labels = images.to(device), labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    # Print statistics
    print(f'Epoch {epoch + 1}/{num_epochs}, Loss: {running_loss/len(train_loader)}')

    # Validation loop (optional, but recommended)
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:  # Assuming val_loader is your validation DataLoader
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    val_loss /= len(val_loader)
    val_accuracy = 100 * correct / total
    print(f'Validation Loss: {val_loss:.4f}, Accuracy: {val_accuracy:.2f}%')

    if epoch % 1 == 0:
       # Save checkpoint
       checkpoint = {
           'epoch': epoch + 1,
           'model_state_dict': model.state_dict(),
           'optimizer_state_dict': optimizer.state_dict(),
           'loss': val_loss,
       }
       torch.save(checkpoint, f'convv4_ftft_{epoch+1}.pth')

print('Training Complete')

In [29]:
preds = []
val_loss = 0.0
correct = 0
total = 0
with torch.no_grad():
    for images, labels in val_loader:  # Assuming val_loader is your validation DataLoader
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        loss = criterion(outputs, labels)
        val_loss += loss.item()
        _, predicted = torch.max(outputs.data, 1)
        preds.extend(predicted)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

In [30]:
preds = [pred.item() for pred in preds]
print(preds[:5], len(preds))

[32, 85, 72, 97, 62] 2000


In [31]:
accuracy = accuracy_score(val_image_labels, preds)
print(f"Overall Accuracy: {accuracy * 100:.2f}%")

Overall Accuracy: 43.95%


In [32]:
# Generate the confusion matrix
cm = confusion_matrix(val_image_labels, preds)

# Summing the diagonal elements gives the total number of correct predictions
correct_predictions = np.trace(cm)
total_predictions = cm.sum()

print(f"Number of Correct Predictions: {correct_predictions}")
print(f"Number of Incorrect Predictions: {total_predictions - correct_predictions}")

Number of Correct Predictions: 879
Number of Incorrect Predictions: 1121


## Conv V3

In [58]:
model = ConvEncoderV3(embedding_dim=1024)
model.load_state_dict(torch.load('convv3_ftft_2_78.4.pth')['model_state_dict'])
model = model.to(device)

In [59]:
input_image = torch.randn(1, 3, 64, 64).to(device) # Example input tensor (batch_size, channels, height, width)
output = model(input_image)
print(output.shape)

torch.Size([1, 1024])


In [60]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.0005)
# optimizer.load_state_dict(torch.load('convnext_tiny_aug_7.pth')['optimizer_state_dict'])

In [61]:
# Training loop
num_epochs = 20 # Number of epochs
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:  # Assuming train_loader is your DataLoader
        images, labels = images.to(device), labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    # Print statistics
    print(f'Epoch {epoch + 1}/{num_epochs}, Loss: {running_loss/len(train_loader)}')

    # Validation loop (optional, but recommended)
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:  # Assuming val_loader is your validation DataLoader
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    val_loss /= len(val_loader)
    val_accuracy = 100 * correct / total
    print(f'Validation Loss: {val_loss:.4f}, Accuracy: {val_accuracy:.2f}%')

    if epoch % 1 == 0:
       # Save checkpoint
       checkpoint = {
           'epoch': epoch + 1,
           'model_state_dict': model.state_dict(),
           'optimizer_state_dict': optimizer.state_dict(),
           'loss': val_loss,
       }
       torch.save(checkpoint, f'convv3_ftft_{epoch+1}.pth')

print('Training Complete')

Epoch 1/20, Loss: 0.18260784330321292
Validation Loss: 1.1287, Accuracy: 73.40%
Epoch 2/20, Loss: 0.14187204859712543
Validation Loss: 1.0575, Accuracy: 73.95%
Epoch 3/20, Loss: 0.11881564225198007
Validation Loss: 1.2171, Accuracy: 71.00%


KeyboardInterrupt: 

## ConvNextT Small

In [5]:
model = ConvNextT().to(device)
# model.load_state_dict(torch.load('convnext_tiny_aug_7.pth')['model_state_dict'])

In [20]:
input_image = torch.randn(1, 3, 64, 64).to(device) # Example input tensor (batch_size, channels, height, width)
output = model(input_image)
print(output.shape)

torch.Size([1, 2048])


In [23]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.0001)
# optimizer.load_state_dict(torch.load('convnext_tiny_aug_7.pth')['optimizer_state_dict'])

In [6]:
# Training loop
num_epochs = 15 # Number of epochs
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:  # Assuming train_loader is your DataLoader
        images, labels = images.to(device), labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    # Print statistics
    print(f'Epoch {epoch + 1}/{num_epochs}, Loss: {running_loss/len(train_loader)}')

    # Validation loop (optional, but recommended)
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:  # Assuming val_loader is your validation DataLoader
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    val_loss /= len(val_loader)
    val_accuracy = 100 * correct / total
    print(f'Validation Loss: {val_loss:.4f}, Accuracy: {val_accuracy:.2f}%')

    if epoch % 2 == 0 and epoch != 0:
       # Save checkpoint
       checkpoint = {
           'epoch': epoch + 1,
           'model_state_dict': model.state_dict(),
           'optimizer_state_dict': optimizer.state_dict(),
           'loss': val_loss,
       }
       torch.save(checkpoint, f'convnext_tiny_2048_{epoch+1}.pth')

print('Training Complete')

In [25]:
model.load_state_dict(torch.load('convnext_tiny_2048_6.pth')['model_state_dict'])

<All keys matched successfully>

## MobileNetV3 Small

### Declare Model

In [18]:
model = MobileNetV3Small(num_classes=100, embedding_dim=2048).to(device)

In [20]:
input_image = torch.randn(1, 3, 64, 64).to(device) # Example input tensor (batch_size, channels, height, width)
output = model(input_image)
print(output.shape)

torch.Size([1, 2048])


In [25]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.00001)

### Training Loop

In [7]:
# Training loop
num_epochs = 10 # Number of epochs
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:  # Assuming train_loader is your DataLoader
        images, labels = images.to(device), labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    # Print statistics
    print(f'Epoch {epoch + 1}/{num_epochs}, Loss: {running_loss/len(train_loader)}')

    # Validation loop (optional, but recommended)
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:  # Assuming val_loader is your validation DataLoader
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    val_loss /= len(val_loader)
    val_accuracy = 100 * correct / total
    print(f'Validation Loss: {val_loss:.4f}, Accuracy: {val_accuracy:.2f}%')

    if epoch % 2 == 0 and epoch != 0:
       # Save checkpoint
       checkpoint = {
           'epoch': epoch + 1,
           'model_state_dict': model.state_dict(),
           'optimizer_state_dict': optimizer.state_dict(),
           'loss': val_loss,
       }
       torch.save(checkpoint, f'mobilenet_v3_ftft_epoch_{epoch+1}.pth')

print('Training Complete')

In [24]:
# model.load_state_dict(torch.load('mobilenet_v3_epoch_21.pth')['model_state_dict'])

<All keys matched successfully>

In [29]:
model.load_state_dict(torch.load('mobilenet_v3_ft_epoch_6.pth')['model_state_dict'])

<All keys matched successfully>

### Embedding Gen

In [17]:
basic_transform = T.Compose([
    # T.Resize((64, 64)), # -> all are already 64 * 64
    T.ToTensor(),
    # T.Normalize([0.42835271, 0.40658227, 0.34071648], [0.2144312,  0.21884131, 0.20464434])
    T.Normalize((0.5,), (0.5,)),
])

In [18]:
# Function to process an image and get embedding
def get_embedding(image_path, encoder, flatten=True):
    image = Image.open(image_path)
    image = basic_transform(image).unsqueeze(0).to(device)  # Add batch dimension

    with torch.no_grad():
        embedding = encoder(image)
    
    if flatten:
        # Flatten the embedding
        embedding = torch.flatten(embedding, start_dim=0)
    
    return embedding

In [47]:
embedding = get_embedding(os.path.join(train_path, train_image_names[0]), model, flatten=True)
embedding.shape

torch.Size([2048])

In [36]:
train_embeddings = [get_embedding(img_path, model).cpu() for img_path in train_image_paths]
val_embeddings = [get_embedding(img_path, model).cpu() for img_path in val_image_paths]
test_embeddings = [get_embedding(img_path, model).cpu() for img_path in test_image_paths]

In [37]:
import pickle
# Store data (serialize)
with open('train_embeddings_mobilenet_v3_small.pkl', 'wb') as handle:
    pickle.dump(train_embeddings, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('val_embeddings_mobilenet_v3_small.pkl', 'wb') as handle:
    pickle.dump(val_embeddings, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('test_embeddings_mobilenet_v3_small.pkl', 'wb') as handle:
    pickle.dump(test_embeddings, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [38]:
train_embeddings[:5]

[tensor([-0.0525,  2.1514, -0.0647,  ...,  1.6340,  1.5746,  1.7384]),
 tensor([ 0.0904,  2.1105, -0.3276,  ...,  1.5987,  1.3902,  1.5455]),
 tensor([ 0.1437,  2.1723, -0.3820,  ...,  1.3958,  1.3506,  1.4212]),
 tensor([-0.1090,  1.8496, -0.2678,  ...,  1.2653,  1.1023,  1.1181]),
 tensor([ 0.1378,  2.2169, -0.3170,  ...,  1.1384,  1.2770,  1.2246])]

## MobileNetV3 Big

In [27]:
model = MobileNetV3Large(num_classes=100, embedding_dim=2048).to(device)

In [30]:
input_image = torch.randn(1, 3, 64, 64).to(device) # Example input tensor (batch_size, channels, height, width)
output = model(input_image)
print(output.shape)

torch.Size([1, 2048])


In [42]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(model.parameters(), lr=0.0001)

### Training Loop

In [8]:
# Training loop
num_epochs = 10 # Number of epochs
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:  # Assuming train_loader is your DataLoader
        images, labels = images.to(device), labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    # Print statistics
    print(f'Epoch {epoch + 1}/{num_epochs}, Loss: {running_loss/len(train_loader)}')

    # Validation loop (optional, but recommended)
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:  # Assuming val_loader is your validation DataLoader
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    val_loss /= len(val_loader)
    val_accuracy = 100 * correct / total
    print(f'Validation Loss: {val_loss:.4f}, Accuracy: {val_accuracy:.2f}%')

    if epoch % 2 == 0 and epoch != 0:
       # Save checkpoint
       checkpoint = {
           'epoch': epoch + 1,
           'model_state_dict': model.state_dict(),
           'optimizer_state_dict': optimizer.state_dict(),
           'loss': val_loss,
       }
       torch.save(checkpoint, f'mobilenet_large_v3_ft_epoch_{epoch+1}.pth')

print('Training Complete')

In [41]:
 # model.load_state_dict(torch.load('mobilenet_large_v3_epoch_22.pth')['model_state_dict'])

<All keys matched successfully>

In [44]:
 model.load_state_dict(torch.load('mobilenet_large_v3_ft_epoch_5.pth')['model_state_dict'])

<All keys matched successfully>

In [48]:
train_embeddings = [get_embedding(img_path, model).cpu() for img_path in train_image_paths]
val_embeddings = [get_embedding(img_path, model).cpu() for img_path in val_image_paths]
test_embeddings = [get_embedding(img_path, model).cpu() for img_path in test_image_paths]

In [49]:
import pickle
# Store data (serialize)
with open('train_embeddings_mobilenet_v3_large.pkl', 'wb') as handle:
    pickle.dump(train_embeddings, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('val_embeddings_mobilenet_v3_large.pkl', 'wb') as handle:
    pickle.dump(val_embeddings, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('test_embeddings_mobilenet_v3_large.pkl', 'wb') as handle:
    pickle.dump(test_embeddings, handle, protocol=pickle.HIGHEST_PROTOCOL)

## EfficientNetB3 (with and without AUG)


In [17]:
model = EfficientNetB3(num_classes=100, embedding_dim=4096).to(device)

In [18]:
input_image = torch.randn(1, 3, 64, 64).to(device) # Example input tensor (batch_size, channels, height, width)
output = model(input_image)
print(output.shape)

torch.Size([1, 4096])


In [19]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.00001)

In [9]:
# Training loop
num_epochs = 10 # Number of epochs
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:  # Assuming train_loader is your DataLoader
        images, labels = images.to(device), labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    # Print statistics
    print(f'Epoch {epoch + 1}/{num_epochs}, Loss: {running_loss/len(train_loader)}')

    # Validation loop (optional, but recommended)
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:  # Assuming val_loader is your validation DataLoader
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    val_loss /= len(val_loader)
    val_accuracy = 100 * correct / total
    print(f'Validation Loss: {val_loss:.4f}, Accuracy: {val_accuracy:.2f}%')

    if epoch % 1 == 0 and epoch != 0:
       # Save checkpoint
       checkpoint = {
           'epoch': epoch + 1,
           'model_state_dict': model.state_dict(),
           'optimizer_state_dict': optimizer.state_dict(),
           'loss': val_loss,
       }
       torch.save(checkpoint, f'efficientnet_b3_aug_ft_epoch_{epoch+1}.pth')

print('Training Complete')

In [22]:
# model.load_state_dict(torch.load('efficientnet_b3_aug_epoch_67.pth')['model_state_dict'])

<All keys matched successfully>

In [21]:
model.load_state_dict(torch.load('efficientnet_b3_aug_ft_epoch_3.pth')['model_state_dict'])

<All keys matched successfully>

In [31]:
model.load_state_dict(torch.load('efficientnet_b3_ftft_epoch_3.pth')['model_state_dict'])

<All keys matched successfully>

In [38]:
preds = []
with torch.no_grad():
    for images, labels in val_loader:  # Assuming val_loader is your validation DataLoader
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        loss = criterion(outputs, labels)
        val_loss += loss.item()
        _, predicted = torch.max(outputs.data, 1)
        preds.extend(predicted)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

In [39]:
preds = [pred.item() for pred in preds]
print(preds[:5], len(preds))

[32, 85, 41, 74, 33] 2000


In [40]:
accuracy = accuracy_score(val_image_labels, preds)
print(f"Overall Accuracy: {accuracy * 100:.2f}%")

Overall Accuracy: 64.40%


In [41]:
# Generate the confusion matrix
cm = confusion_matrix(val_image_labels, preds)

# Summing the diagonal elements gives the total number of correct predictions
correct_predictions = np.trace(cm)
total_predictions = cm.sum()

print(f"Number of Correct Predictions: {correct_predictions}")
print(f"Number of Incorrect Predictions: {total_predictions - correct_predictions}")

Number of Correct Predictions: 1288
Number of Incorrect Predictions: 712


### Test Loader

In [50]:
test_dataset = TestDataset(test_image_paths, transform = val_transform)
test_data_loader = DataLoader(test_dataset, batch_size=128, shuffle=False)

In [51]:
test_preds = []
with torch.no_grad():
    for images in test_data_loader:  # Assuming val_loader is your validation DataLoader
        images = images.to(device)
        outputs = model(images)
        # loss = criterion(outputs, labels)
        # val_loss += loss.item()
        _, predicted = torch.max(outputs.data, 1)
        test_preds.extend(predicted.cpu().int())
        # total += labels.size(0)
        # correct += (predicted == labels).sum().item()

In [52]:
test_preds = [pred.item() for pred in test_preds]
test_preds[:5]

[67, 16, 40, 13, 69]

In [53]:
test_df['Class'] = test_preds

In [54]:
test_df.head()

Unnamed: 0,Image,Class
0,046f61c4-b825-459a-8b2d-07503f5b94a5.png,67
1,67db001f-e287-4950-ac49-6683b493d1a4.png,16
2,9f1d36a1-f046-4c5d-9e8a-0a3758ff605c.png,40
3,5ffef91a-aaf9-4d0d-a219-83a9f5282361.png,13
4,c00af570-0000-4f8f-a3f2-c37a981bfdb1.png,69


In [55]:
test_df.to_csv('submission_convv4.csv', index=False)

### Generate Embs

In [43]:
train_embeddings = [get_embedding(img_path, model).cpu() for img_path in train_image_paths]
val_embeddings = [get_embedding(img_path, model).cpu() for img_path in val_image_paths]
test_embeddings = [get_embedding(img_path, model).cpu() for img_path in test_image_paths]

In [44]:
import pickle
# Store data (serialize)
with open('train_embeddings_efficientnet_b3.pkl', 'wb') as handle:
    pickle.dump(train_embeddings, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('val_embeddings_efficientnet_b3.pkl', 'wb') as handle:
    pickle.dump(val_embeddings, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('test_embeddings_efficientnet_b3.pkl', 'wb') as handle:
    pickle.dump(test_embeddings, handle, protocol=pickle.HIGHEST_PROTOCOL)

## Swin V2 T

In [21]:
model = SwinV2TEncoder(embedding_dim=4096).to(device)

In [22]:
input_image = torch.randn(1, 3, 64, 64).to(device) # Example input tensor (batch_size, channels, height, width)
output = model(input_image)
print(output.shape)

torch.Size([1, 4096])


In [23]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [11]:
# Training loop
num_epochs = 100 # Number of epochs
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:  # Assuming train_loader is your DataLoader
        images, labels = images.to(device), labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    # Print statistics
    print(f'Epoch {epoch + 1}/{num_epochs}, Loss: {running_loss/len(train_loader)}')

    # Validation loop (optional, but recommended)
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:  # Assuming val_loader is your validation DataLoader
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    val_loss /= len(val_loader)
    val_accuracy = 100 * correct / total
    print(f'Validation Loss: {val_loss:.4f}, Accuracy: {val_accuracy:.2f}%')

    if epoch % 6 == 0 and epoch != 0:
       # Save checkpoint
       checkpoint = {
           'epoch': epoch + 1,
           'model_state_dict': model.state_dict(),
           'optimizer_state_dict': optimizer.state_dict(),
           'loss': val_loss,
       }
       torch.save(checkpoint, f'swin_v2_t_epoch_{epoch+1}.pth')

print('Training Complete')

In [31]:
# model.load_state_dict(torch.load('swin_v2_t_epoch_31.pth')['model_state_dict'])

<All keys matched successfully>

## Efficient Net B4

In [18]:
model = EfficientNetB4(num_classes=100, embedding_dim=2048).to(device)

In [20]:
input_image = torch.randn(1, 3, 64, 64).to(device) # Example input tensor (batch_size, channels, height, width)
output = model(input_image)
print(output.shape)

torch.Size([1, 2048])


In [32]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.0001)

In [12]:
# Training loop
num_epochs = 10 # Number of epochs
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:  # Assuming train_loader is your DataLoader
        images, labels = images.to(device), labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    # Print statistics
    print(f'Epoch {epoch + 1}/{num_epochs}, Loss: {running_loss/len(train_loader)}')

    # Validation loop (optional, but recommended)
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:  # Assuming val_loader is your validation DataLoader
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    val_loss /= len(val_loader)
    val_accuracy = 100 * correct / total
    print(f'Validation Loss: {val_loss:.4f}, Accuracy: {val_accuracy:.2f}%')

    if epoch % 2 == 0 and epoch != 0:
       # Save checkpoint
       checkpoint = {
           'epoch': epoch + 1,
           'model_state_dict': model.state_dict(),
           'optimizer_state_dict': optimizer.state_dict(),
           'loss': val_loss,
       }
       torch.save(checkpoint, f'efficientnet_b4_aug_ftft_epoch_{epoch+1}.pth')

print('Training Complete')

In [25]:
# model.load_state_dict(torch.load('efficientnet_b4_aug_epoch_67.pth')['model_state_dict'])
# optimizer.load_state_dict(torch.load('efficientnet_b4_aug_epoch_67.pth')['optimizer_state_dict'])

In [31]:
model.load_state_dict(torch.load('efficientnet_b4_aug_ft_epoch_6.pth')['model_state_dict'])

<All keys matched successfully>

In [32]:
# model.load_state_dict(torch.load('efficientnet_b4_ftft_epoch_2.pth')['model_state_dict'])

<All keys matched successfully>

## Efficient Net V2 m

In [20]:
model = EfficientNetV2M(num_classes=100, embedding_dim=2048).to(device)

In [22]:
input_image = torch.randn(1, 3, 64, 64).to(device) # Example input tensor (batch_size, channels, height, width)
output = model(input_image)
print(output.shape)

torch.Size([1, 2048])


In [31]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.00001)

In [14]:
# Training loop
num_epochs = 10 # Number of epochs
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:  # Assuming train_loader is your DataLoader
        images, labels = images.to(device), labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    # Print statistics
    print(f'Epoch {epoch + 1}/{num_epochs}, Loss: {running_loss/len(train_loader)}')

    # Validation loop (optional, but recommended)
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:  # Assuming val_loader is your validation DataLoader
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    val_loss /= len(val_loader)
    val_accuracy = 100 * correct / total
    print(f'Validation Loss: {val_loss:.4f}, Accuracy: {val_accuracy:.2f}%')

    if epoch % 2 == 0 and epoch != 0:
       # Save checkpoint
       checkpoint = {
           'epoch': epoch + 1,
           'model_state_dict': model.state_dict(),
           'optimizer_state_dict': optimizer.state_dict(),
           'loss': val_loss,
       }
       torch.save(checkpoint, f'efficientnet_v2_m_ftft_epoch_{epoch+1}.pth')

print('Training Complete')

In [27]:
# model.load_state_dict(torch.load('efficientnet_v2_m_epoch_36.pth')['model_state_dict'])

<All keys matched successfully>

In [30]:
model.load_state_dict(torch.load('efficientnet_v2_m_ft_epoch_5.pth')['model_state_dict'])

<All keys matched successfully>

In [33]:
train_embeddings = [get_embedding(img_path, model).cpu() for img_path in train_image_paths]
val_embeddings = [get_embedding(img_path, model).cpu() for img_path in val_image_paths]
test_embeddings = [get_embedding(img_path, model).cpu() for img_path in test_image_paths]

In [34]:
import pickle
# Store data (serialize)
with open('train_embeddings_efficientnet_v2_m.pkl', 'wb') as handle:
    pickle.dump(train_embeddings, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('val_embeddings_efficientnet_v2_m.pkl', 'wb') as handle:
    pickle.dump(val_embeddings, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('test_embeddings_efficientnet_v2_m.pkl', 'wb') as handle:
    pickle.dump(test_embeddings, handle, protocol=pickle.HIGHEST_PROTOCOL)

## Ensemble Voting

#### Mobilenet
- MobileNetV3Small (embedding generation using Cross Entropy Loss, 2048) -> validation accuracy is 60% after 2 rounds of  -> `mobilenet_v3_ft_epoch_6.pth`
- MobileNetV3Large (embedding generation using Cross Entropy Loss, 2048) -> validation accuracy is 65% -> after 1 round of ft -> `mobilenet_large_v3_ft_epoch_5.pth`

#### EfficientNet
- EfficientNet B3 (embedding generation using CE, 2048) -> validation accuracy is 67% -> after 2 rounds of ft -> `efficientnet_b3_ftft_epoch_3` ----Kaggle 64%
- EfficientNet V2 M (embedding generation using CE, 2048) -> validation accuracy is 64% -> after 2 rounds of ft -> `efficientnet_v2_m_ftft_epoch_3`
- EfficientNet B4 (embedding generation using CE, 2048) -> validation accuracy is 64.7% -> after 2 rounds of ft -> ``
- EfficientNet B4 with Aug (embedding generation using CE, 2048) -> validation accuracy is 57.7% -> after 2 rounds of ft
- EfficientNet B3 With AUG (embedding generation using CE, 4096) -> validation accuracy is 65% -> after 1 round of ft (second round didn't change anything) -> `efficientnet_b3_aug_ft_epoch_3`

#### Others
- Max Vit T (without AUG) -> 45-50% after ft
- Deep Encoder 52% after ft (WITH AUG)
- Convnext Tiny (AUG) 1000 embedding and  -> 52%
- Small(no AUG) 2048 embedding -> 50 (Convnexts converge very fast, but get stuck there)
- **ConvV4 (no AUG) 1024 embedding -> 78.6% -> `convv4_ft_5` -> 75% Kaggle**
- **ConvV3 (no AUG) 1024 embedding -> 78.4% (lost the 79.5) -> `convv3_ft_3` -> ?% Kaggle**
- Conv V5 2048 -> 62%
- **Conv V6 1024 -> 79.3% -> `convv6_ftft_epoch_3`**
- 
#### Ensemble
- MobileNetv3 large, EfficientNet B3, EfficientNet b4 -> 70.9% accuracy on validation, 68.2% Kaggle
- MobileNetv3 large, EfficientNet B3, EfficientNet b4, efficient Net v2M -> 71.1 accuracy on validation, ? on Kaggle
- MobileNetv3 samll, MobileNetv3 large, EfficientNet V3, EfficientNet b4, efficient Net v2M -> 70.1%, mobilenet v3 small decreases efficiency
- MobileNetv3 large, EfficientNet B3, Efficient net B3 with Aug, EfficientNet b4, efficient Net v2M -> 73.7 accuracy on validation, 72 on Kaggle
- MobileNetv3 large, EfficientNet B3, Efficient net B3 with Aug, EfficientNet b4, efficient Net v2M, convv4 -> 79.15 accuracy on validation -> Kaggle ?
- MobileNetv3 large (65), EfficientNet B3 (67), Efficient net B3 with Aug (65), EfficientNet b4 (65), convv4 (79) -> 76.45 accuracy on validation
- EfficientNet B3 (67), Efficient net B3 with Aug (65), convv4 (70) -> 78.75 accuracy on validation
- EfficientNet B3 (67), Efficient net B3 with Aug (65), EfficientNet b4 (65), convv4 (79) -> 79.55 accuracy on validation **this is probably better than our current submission**
- EfficientNet B3 (67), Efficient net B3 with Aug (65), efficient Net v2M , convv4 (79) -> 79.3 accuracy on validation
- EfficientNet B3 (67), Efficient net B3 with Aug (65), efficient Net v2M (65), convv3, convv4 (79) -> 81.15% accuracy on validation (80.25 now)
- EfficientNet B3 (67), Efficient net B3 with Aug (65), EfficientNet b4  (65), convv3, convv4 (79) -> 80.5% accuracy on validation
- EfficientNet B3 (67), Efficient net B3 with Aug (65), convv3, convv4 (79) -> 79.9 accuracy on validation
- EfficientNet B3 (67), convv3, convv4 (79) -> 80.85% accuracy on validation
- Efficient net B3 with Aug (65), convv3, convv4 (79) -> 80.85%
- convv3, convv4 -> 79.55%
- EfficientNet B3 (67), Efficient net B3 with Aug (65), efficient Net v2M  (65), convv3, convv4 (78), conv6 (79) -> 81.5% accuracy on validation
- convv3, convv4 (78), conv6 (79) -> 81.45

In [27]:
convv3 = ConvEncoderV3(embedding_dim=1024)
convv3 = convv3.to(device)

In [28]:
convv4 = ConvEncoderV4(embedding_dim=1024)
convv4 = convv4.to(device)

In [36]:
convv6 = ConvEncoderV6(embedding_dim=1024)
convv6 = model.to(device)

In [30]:
mobilenetv3 = MobileNetV3Large(num_classes=100, embedding_dim=2048).to(device)

In [31]:
efficientnetb3 = EfficientNetB3(num_classes=100, embedding_dim=2048).to(device)
efficientnetb3_aug = EfficientNetB3(num_classes=100, embedding_dim=4096).to(device)

In [32]:
efficientnetb4 = EfficientNetB4(num_classes=100, embedding_dim=2048).to(device)

In [33]:
efficientnetv2m = EfficientNetV2M(num_classes=100, embedding_dim=2048).to(device)

In [43]:
convv3.load_state_dict(torch.load('convv3_ftft_2_78.4.pth')['model_state_dict'])
convv4.load_state_dict(torch.load('convv4_ft_5.pth')['model_state_dict'])
convv6.load_state_dict(torch.load('convv6_ftft_epoch_3.pth')['model_state_dict'])
# mobilenetv3.load_state_dict(torch.load('mobilenet_large_v3_ft_epoch_5.pth')['model_state_dict'])
efficientnetb3.load_state_dict(torch.load('efficientnet_b3_ftft_epoch_3.pth')['model_state_dict'])
efficientnetb3_aug.load_state_dict(torch.load('efficientnet_b3_aug_ft_epoch_3.pth')['model_state_dict'])
efficientnetb4.load_state_dict(torch.load('efficientnet_b4_ftft_epoch_3.pth')['model_state_dict'])
efficientnetv2m.load_state_dict(torch.load('efficientnet_v2_m_ftft_epoch_3.pth')['model_state_dict'])

<All keys matched successfully>

In [44]:
# Assuming you have a list of models
models = [convv6, convv3, convv4, efficientnetb3, efficientnetb3_aug, efficientnetv2m]

In [45]:
# Your validation loop
preds = []
total = 0
correct = 0
val_loss = 0
with torch.no_grad():
    for images, labels in val_loader:  # Assuming val_loader is your validation DataLoader
        images, labels = images.to(device), labels.to(device)

        # Get ensemble predictions
        ensemble_pred = ensemble_majority_voting(models, images)
        preds.extend([pred.cpu().item() for pred in ensemble_pred])

        total += labels.size(0)
        correct += (ensemble_pred.to(device) == labels).sum().item()

# Calculate accuracy and average loss if needed
accuracy = 100 * correct / total
average_loss = val_loss / len(val_loader)
print(f'Accuracy: {accuracy}')


Accuracy: 81.5


### Create test submission

In [46]:
test_dataset = TestDataset(test_image_paths, transform = val_transform)
test_data_loader = DataLoader(test_dataset, batch_size=128, shuffle=False)

In [47]:
# Your validation loop
test_preds = []
total = 0
correct = 0
val_loss = 0
total = 2000
with torch.no_grad():
    for images in test_data_loader:  # Assuming val_loader is your validation DataLoader
        images = images.to(device)

        # Get ensemble predictions
        ensemble_pred = ensemble_majority_voting(models, images)
        test_preds.extend([pred.cpu().item() for pred in ensemble_pred])
        # correct += (ensemble_pred.to(device) == labels).sum().item()

In [48]:
len(test_preds)

5000

In [49]:
test_df['Class'] = test_preds

In [50]:
test_df.head()

Unnamed: 0,Image,Class
0,046f61c4-b825-459a-8b2d-07503f5b94a5.png,67
1,67db001f-e287-4950-ac49-6683b493d1a4.png,91
2,9f1d36a1-f046-4c5d-9e8a-0a3758ff605c.png,40
3,5ffef91a-aaf9-4d0d-a219-83a9f5282361.png,13
4,c00af570-0000-4f8f-a3f2-c37a981bfdb1.png,69


In [51]:
test_df.to_csv('ensemble_voting_b3_b3aug_v2m_cv3_cv4_cv6.csv', index=False)