# Inference Notebook for Soil Classification Part 1

## **1. Imports and Setups**

Setting up all the necessary libraries for our soil classification project.

We're using 
- PyTorch for deep learning
- Pandas for data handling
- Matplotlib for visualizations.
  
The key libraries here are torchvision (for computer vision tasks) and PIL (for image processing).

In [None]:
import os
import pandas as pd

import torch
from torch import nn
from torch.utils.data import DataLoader, Dataset
from torchvision import models, transforms
from PIL import Image

## **2. Model Evaluation**

Loading our best saved model and testing it on some validation images gives us
a qualitative sense of how well it's performing. This visual inspection can
reveal patterns in errors and help us understand the model's strengths and
weaknesses.

Green titles indicate correct predictions, red titles show mistakes.

In [None]:
model = models.resnet50(pretrained=True)

# Replace the final layer to match our number of classes
num_classes = 4
model.fc = nn.Linear(model.fc.in_features, num_classes)

In [None]:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_path = "../data/model.pth"

# Reload our trained model
try:
    # Attempt to load the model directly for CUDA
    model.load_state_dict(torch.load(model_path))
except RuntimeError as e:
    # If CUDA is not available but model was saved with CUDA tensors
    if 'Attempting to deserialize object on a CUDA device' in str(e):
        print("CUDA not available. Loading model on CPU instead.")
        model.load_state_dict(torch.load(model_path, map_location=torch.device('cpu')))
    else:
        raise e

model = model.to(device)
model.eval()


## **11. Prepare `submission.csv`**

For the test set, we need a custom dataset class because the test images don't have labels and aren't organized in class folders. This custom class handles loading test images and keeping track of their filenames so we can create proper submissions.

This is a common pattern when working with competition datasets where test data comes as a flat directory of images.

In [None]:
transform = transforms.Compose([
    transforms.Resize((224, 224)),           # Resize to model's expected input size
    transforms.ToTensor(),                   # Convert to PyTorch Tensor
    transforms.Normalize(mean=[0.485, 0.456, 0.406],  # Normalize to match ImageNet stats
                         std=[0.229, 0.224, 0.225])
])

In [None]:
class TestDataset(Dataset):
    def __init__(self, test_dir, transform=None):
        self.test_dir = test_dir
        self.transform = transform
        self.image_names = os.listdir(test_dir)

    def __len__(self):
        return len(self.image_names)

    def __getitem__(self, idx):
        img_name = self.image_names[idx]
        img_path = os.path.join(self.test_dir, img_name)
        image = Image.open(img_path).convert('RGB')
        if self.transform:
            image = self.transform(image)
        return image, img_name 

# Create test dataset and loader
test_dir = "../data/soil_classification-2025/test"
test_dataset = TestDataset(test_dir, transform=transform)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

print(f"Found {len(test_dataset)} test images")

The final step is generating predictions for all test images and creating a submission file. We process images in batches for efficiency and convert the model's numeric predictions back to soil type names.

In [None]:
image_ids = []
predicted_labels = []

# Taken from Training Notebook
idx_to_class = {0: 'Alluvial soil', 1: 'Black Soil', 2: 'Clay soil', 3: 'Red soil'}

print("Generating predictions for test set...")

with torch.no_grad():
    for images, image_names in test_loader:
        images = images.to(device)
        outputs = model(images)           # Raw logits
        _, preds = torch.max(outputs, 1) # Get class index with highest score

        preds = preds.cpu().numpy()
        
        # Convert predictions to class names and store with image names
        for img_name, pred_idx in zip(image_names, preds):
            image_ids.append(img_name)
            predicted_labels.append(idx_to_class[pred_idx])

# Create DataFrame for submission
submission_df = pd.DataFrame({
    "image_id": image_ids,
    "soil_type": predicted_labels
})

submission_df.to_csv("submission.csv", index=False)

print("✅ Submission file created successfully!")
print(f"Total predictions made: {len(submission_df)}")
print("\nSample predictions:")
print(submission_df.head())