# Training Naive model - Patch-Detector

## 1. Introduction

In this notebook, we will train a naive model using the same structure as in **training_and_augmentation.ipynb**, but without data augmentation, oversampling, or focal loss.

In [14]:
import numpy as np
import torch
import torchvision
from torch.utils import data
from torchvision import transforms    
from torch.utils.data import random_split
from torch import nn

In [15]:
preprocesser = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(), 
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), #This normalization will be required by our model
])

### 3.1 Data Separation

We will start by dividing the expanded dataset into a training set and a test set. T

We created the code below with a function that ensures the `split_dataset` folder does not accumulate repeated images by deleting all files inside it beforehand if the folder already exists.


In [16]:
expanded_data_path = "../data/expanded_data" #path to expanded data


In [17]:
import os
import shutil
from sklearn.model_selection import train_test_split
from PIL import Image

output_dir = "../data/split_dataset_naive_model" #output directory

test_size = 0.2 # we used a small size because we want to take advantage of all maximum amount of images
random_state = 42

def clear_and_create_dir(path): # function to clean the directory and avoid duplicate files
    if os.path.exists(path):
        shutil.rmtree(path)  
    os.makedirs(path)      
    
for split in ['train', 'test']: 
    split_dir = os.path.join(output_dir, split)
    clear_and_create_dir(split_dir)

images = []
labels = []

for label in os.listdir(expanded_data_path):
    class_dir = os.path.join(expanded_data_path, label)
    if not os.path.isdir(class_dir):
        continue
    for img_name in os.listdir(class_dir):
        if img_name.lower().endswith(('.jpg', '.png', '.jpeg')):
            images.append(os.path.join(class_dir, img_name))
            labels.append(label)

train_imgs, test_imgs, train_labels, test_labels = train_test_split(
    images, labels, test_size=test_size, stratify=labels, random_state=random_state
)

for split in ['train', 'test']:
    split_dir = os.path.join(output_dir, split)
    for label in set(labels):
        os.makedirs(os.path.join(split_dir, label), exist_ok=True)

def copy_images(img_list, label_list, split):
    for img_path, label in zip(img_list, label_list):
        dest_dir = os.path.join(output_dir, split, label)
        shutil.copy(img_path, dest_dir)

copy_images(train_imgs, train_labels, 'train')
copy_images(test_imgs, test_labels, 'test')

### 3.3 Data Loaders

In [18]:
import torch
from torch.utils.data import DataLoader, WeightedRandomSampler
import torchvision
train_path  = os.path.join(output_dir, "train")
test_path  = os.path.join(output_dir, "test")

train_dataset = torchvision.datasets.ImageFolder(root=train_path, transform=preprocesser)
test_dataset = torchvision.datasets.ImageFolder(root=test_path, transform=preprocesser)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)


## 5. Model Architecture

### 5.1. Model Selection

We will use a pretrained model with a large number of parameters, utilizing its default weights in almost all the layers. This helps us because training such a large model from scratch on our device would be infeasible within a reasonable time. Moreover, we can leverage the patterns learned by the pretrained model and apply fine-tuning to adapt it specifically to our task.

The pretrained model we will use is **MobileNetV2_64**. We chose it because it is a lightweight model (allowing us to train some of its layers) while still being powerful enough for our task. This model requires input images of size (3, 224, 224). That’s why we configured the preprocessor as explained earlier.

In [19]:
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights

weights = MobileNet_V2_Weights.DEFAULT  
model_mobilnet_v2 = mobilenet_v2(weights=weights)

### 5.2. Fine Tunning

We freeeze every layer of this model.

In [20]:
for param in model_mobilnet_v2.parameters():
    param.requires_grad = False

Then, we set the last two important layers—a linear layer and a convolutional layer—to be trainable. Additionally, we need to modify the last linear layer to output a tensor with two values, since this is a binary classification problem.

In [21]:
model_mobilnet_v2.classifier[1] = nn.Linear(1280, 1)
for param in model_mobilnet_v2.classifier[1].parameters():
    param.requires_grad = True

for param in model_mobilnet_v2.features[18][0].parameters():
    param.requires_grad = True

## 6. Training

### 6.1. Loss function and optimizer

In [22]:
criterion = torch.nn.CrossEntropyLoss()

The optimizer should update only the trainable layers. We will use the Adam optimizer, a popular and effective choice for many tasks.

In [23]:
params_to_update = [p for p in model_mobilnet_v2.parameters() if p.requires_grad]
optimizer = torch.optim.Adam(params_to_update, lr=1e-3)

### 6.2. Training iteration

We will avoid using automatic learning rate schedulers. This is because each epoch takes a long time to run on my device, so using a scheduler could prolong the training unnecessarily. Additionally, since our dataset is small, the loss will tend to decrease regardless, and overfitting is likely to occur quickly.

In [24]:
import time

num_epochs = 20
losses = torch.zeros(num_epochs)

model_mobilnet_v2.train()  
start = time.time()

for epoch in range(num_epochs):
    counter = 0
    current_loss = 0.0
    print(f"Epoch {epoch + 1}/{num_epochs}")

    for X, y in train_loader:
        optimizer.zero_grad() 
        y = y.unsqueeze(1).float()
        outputs = model_mobilnet_v2(X)
        l = criterion(outputs, y)
        l.backward()
        optimizer.step()

        counter += 1
        current_loss += l.item()  

    losses[epoch] = current_loss / counter

end = time.time()
print(f'The training lasted {end - start} seconds.')


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
The training lasted 1906.2442400455475 seconds.


In [25]:
losses

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

As we can see, the loss decreased steadily throughout training until the last 5 epochs, where it got stuck. This is a symptom of reaching a minimum.


## 7. Evaluation

### 7.1. Classification on test set

To classify the samples in the test set, we will use a threshold t. If the predicted probability of belonging to the positive class exceeds $t$, we classify the sample as positive. Since our model will likely struggle to detect Waldo due to the scarcity of positive samples, we will set the threshold to $t = 0.5$ .

In [26]:
threshold = 0.5

In [27]:
model_mobilnet_v2.eval() # We set the model to evaluation mode because now we want to predict, not train.
all_preds = []
all_labels = []
test_losses = torch.zeros(num_epochs)


with torch.no_grad():
    counter = 0
    for X, y in test_loader:
        counter = 0
        outputs = model_mobilnet_v2(X)
        probs = torch.sigmoid(outputs) # We need to apply sigmoid to convert logits into probabilities.
        preds = (probs > threshold).int()
        
        all_preds.extend(preds.squeeze(1).cpu().numpy())
        all_labels.extend(y.cpu().numpy())

### 7.2. Metrics

In [29]:
from sklearn.metrics import accuracy_score, recall_score, f1_score

In [30]:
accuracy = accuracy_score(all_labels, all_preds)
print(accuracy)

0.4329608938547486


In [31]:
recall = recall_score(all_labels, all_preds, average='weighted')  
print(recall)

0.4329608938547486


In [32]:
class_1_recall = recall_score(all_labels, all_preds, labels=[1], average=None)
print(class_1_recall)

[0.33333333]


In [33]:
class_0_recall = recall_score(all_labels, all_preds, labels=[0], average=None)
print(class_0_recall)

[0.4335206]


In [90]:
torch.save(model_mobilnet_v2.state_dict(), "../model/waldo_detector_64x64_v1.0.pth")

As we can see, the metrics in this case are quite poor. The loss is close to 0 because the model has learned to always predict class 1, which corresponds to "not Waldo" in this case.
