# Homework 1 Coding Part

4. [15pts] Training a Convolutional Neural Network (CNN) on EuroSAT for Image Classification

In this assignment, you will train a deep learning model from scratch for EuroSAT dataset classification. EuroSAT is a dataset of 27,000 RGB satellite images ($64\times 64$ pixels) across 10 land cover classes, derived from Sentinel-2 satellite data for remote sensing classification tasks.

You are required to complete the following code by **filling in your own architecture and training function.** In the sections that specify "**To be implemented by students**", you should replace pass with your own implementation.

After completing the implementation, answer the following questions and submit a report in Markdown/PDF format.
- Estimate the number of parameters and the feature map sizes at each layer.
- Report training accuracy and loss over epoches and the testing accuracy on test data.
- Compare training and test error with and without Batch Normalization and Dropout layers.
- Please also submit the Jupyter Notebook (.ipynb) with your complete, executable code.(You can just edit on this notebook file.)

### Notes:
- **Google Colab or AutoDL is recommended for training if you donâ€™t have a local GPU.**
- **Submission Deadline: November 2, 11:59 PM**

The answer to the questions above are in $\texttt{hw1\_2025213446.pdf}$'s Problem 4 part.

### 1.Setup: Load Dataset & Preprocessing

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader, random_split
import time

# Ensure GPU usage
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Data transformations
transform = transforms.Compose([
    transforms.Resize((64, 64)),  # Resize images
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # Standardization
])

# Load EuroSAT dataset
dataset = datasets.EuroSAT(root="./data", transform=transform, download=True)

# Split dataset into 80% training and 20% testing
train_size = int(0.8 * len(dataset))
test_size = len(dataset) - train_size
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])

# Data loaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4, pin_memory=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False, num_workers=4, pin_memory=True)

Using device: cuda


In [2]:
# Test the input size
data_iter = iter(train_loader)
images, labels = next(data_iter)
print(f"Input size: {images.size()}, label size: {labels.size()}")
# (B, 3, 64, 64)
# (B,)

Input size: torch.Size([64, 3, 64, 64]), label size: torch.Size([64])


### 2. Define the Neural Network (**To be implemented by students**)

In [3]:
# For a better control whether use BatchNorm and Dropout
class MyConvLayer(nn.Module):
    def __init__(self, in_c, out_c, use_bn=False, p_drop=0.0):
        super().__init__()
        layers = [nn.Conv2d(in_c, out_c, kernel_size=3, stride=1, padding=1, bias=not use_bn)]
        if use_bn:
            layers.append(nn.BatchNorm2d(out_c, momentum=0.15))
        layers.append(nn.ReLU())
        if p_drop > 0:
            layers.append(nn.Dropout2d(p_drop))
        self.block = nn.Sequential(*layers)

    def forward(self, x):
        return self.block(x)


class MyCNN(nn.Module):
    def __init__(self, use_bn=False, p_drop=0.0):
        super(MyCNN, self).__init__()
        # - Define your CNN model architecture
        # - Experiment with different layers, number of filters, kernel sizes
        # - Try using BatchNorm, Dropout, and deeper architectures

        self.use_bn = use_bn # whether to use BatchNorm
        self.p_prob = p_drop # dropout in probability p_drop

        # Input: (B, 3, 64, 64)
        self.block = nn.Sequential(
            MyConvLayer(3, 32, use_bn, p_drop),     # (B, 32, 64, 64)
            nn.MaxPool2d(2),                        # (B, 32, 32, 32)
            MyConvLayer(32, 64, use_bn, p_drop),    # (B, 64, 32, 32)
            nn.MaxPool2d(2),                        # (B, 64, 16, 16)
            MyConvLayer(64, 128, use_bn, p_drop),   # (B, 128, 16, 16)
            nn.MaxPool2d(2),                        # (B, 128, 8, 8)
            MyConvLayer(128, 256, use_bn, p_drop),  # (B, 256, 8, 8)
            nn.MaxPool2d(2),                        # (B, 256, 4, 4)
            MyConvLayer(256, 512, use_bn, p_drop),  # (B, 512, 4, 4)
            nn.MaxPool2d(2),                        # (B, 512, 2, 2)

            nn.Flatten(),                           # (B, 256*2*2)
            nn.Linear(512*2*2, 256),                # (B, 256)
            nn.ReLU(),
            nn.Dropout(p=p_drop),
            nn.Linear(256, 10)                      # (B, 10)
        )

    def forward(self, x):
        # Notice: no passing through softmax here!
        return self.block(x)

### 3. Define the Training Function (**To be implemented by students**)

In [4]:
def train_model(model, train_loader, criterion, optimizer, device, epochs=10):
    """
    Train the model and measure performance.
    - Record training time per epoch
    - Report training loss and accuracy
    - Measure training time per model architecture
    """
    model.train()
    start_time = time.time()

    for epoch in range(epochs):
        epoch_start_time = time.time()
        train_loss = 0.0
        correct = 0
        total = 0
        for inputs, labels in train_loader:
            inputs, labels = inputs.to(device), labels.to(device)

            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            train_loss += loss.item() * inputs.size(0)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        train_loss /= total
        train_acc = correct / total * 100
        epoch_end_time = time.time()
        print(f"Epoch {epoch+1} / {epochs}, time: {epoch_end_time - epoch_start_time:.2f}s, Loss: {train_loss:.4f}, Accuracy: {train_acc:.4f}")


    end_time = time.time()
    print(f"Training completed in {end_time - start_time:.2f} seconds.")
    print(f"Training Loss: {train_loss:.4f} | Training Acc: {train_acc:.2f}%")

### 4. Model Training

In [5]:
# Instantiate the model and move to device
model = MyCNN().to(device)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
train_model(model, train_loader, criterion, optimizer, device, epochs=10)

Epoch 1 / 10, time: 3.21s, Loss: 1.2289, Accuracy: 51.8380
Epoch 2 / 10, time: 3.15s, Loss: 0.6791, Accuracy: 75.5556
Epoch 3 / 10, time: 3.48s, Loss: 0.5189, Accuracy: 81.7130
Epoch 4 / 10, time: 3.50s, Loss: 0.4194, Accuracy: 85.5185
Epoch 5 / 10, time: 2.86s, Loss: 0.3378, Accuracy: 88.4074
Epoch 6 / 10, time: 2.57s, Loss: 0.2628, Accuracy: 90.9352
Epoch 7 / 10, time: 2.66s, Loss: 0.2170, Accuracy: 92.6296
Epoch 8 / 10, time: 2.63s, Loss: 0.1904, Accuracy: 93.4074
Epoch 9 / 10, time: 2.89s, Loss: 0.1935, Accuracy: 93.2315
Epoch 10 / 10, time: 2.70s, Loss: 0.1302, Accuracy: 95.5093
Training completed in 29.66 seconds.
Training Loss: 0.1302 | Training Acc: 95.51%


### 5. Model Evaluation

In [6]:
def evaluate_model(model, test_loader, device):
    model.eval()
    correct = 0
    total = 0
    test_loss = 0.0
    criterion = nn.CrossEntropyLoss()

    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)

            outputs = model(images)
            loss = criterion(outputs, labels)

            test_loss += loss.item() * images.size(0)
            _, predicted = outputs.max(1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    test_loss = test_loss / total
    test_acc = correct / total * 100
    print(f"Test Loss: {test_loss:.4f} | Test Acc: {test_acc:.2f}%")

# Evaluate the trained model
evaluate_model(model, test_loader, device)

Test Loss: 0.3003 | Test Acc: 90.96%


### 6. Additional Experiments

Test whether adding Batch Normalization and Dropout layers improves the model's performance. Compare the results with the previous model without / without these layers.

In [7]:
def pipeline(use_bn, p_drop):
    print("=========================================================================")
    print(f'Working with BatchNorm={use_bn}, Dropout={p_drop}')

    model = MyCNN(use_bn=use_bn, p_drop=p_drop).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    train_model(model, train_loader, criterion, optimizer, device, epochs=10)
    evaluate_model(model, test_loader, device)


# without batch normalization, without dropout
pipeline(use_bn=False, p_drop=0.0)

# with batch normalization, without dropout
pipeline(use_bn=True, p_drop=0.0)

# without batch normalization, with dropout
pipeline(use_bn=False, p_drop=0.05)

# with batch normalization, with dropout
pipeline(use_bn=True, p_drop=0.05)

Working with BatchNorm=False, Dropout=0.0
Epoch 1 / 10, time: 2.57s, Loss: 1.1805, Accuracy: 54.2546
Epoch 2 / 10, time: 2.86s, Loss: 0.6425, Accuracy: 76.6991
Epoch 3 / 10, time: 2.84s, Loss: 0.4885, Accuracy: 82.8704
Epoch 4 / 10, time: 2.75s, Loss: 0.3912, Accuracy: 86.3519
Epoch 5 / 10, time: 3.22s, Loss: 0.3300, Accuracy: 88.5324
Epoch 6 / 10, time: 3.51s, Loss: 0.2716, Accuracy: 90.3056
Epoch 7 / 10, time: 3.46s, Loss: 0.2211, Accuracy: 92.2500
Epoch 8 / 10, time: 3.02s, Loss: 0.1982, Accuracy: 92.9583
Epoch 9 / 10, time: 3.20s, Loss: 0.1499, Accuracy: 94.7222
Epoch 10 / 10, time: 2.87s, Loss: 0.1286, Accuracy: 95.4769
Training completed in 30.29 seconds.
Training Loss: 0.1286 | Training Acc: 95.48%
Test Loss: 0.2956 | Test Acc: 90.91%
Working with BatchNorm=True, Dropout=0.0
Epoch 1 / 10, time: 2.71s, Loss: 0.8595, Accuracy: 69.1019
Epoch 2 / 10, time: 2.64s, Loss: 0.5206, Accuracy: 81.7037
Epoch 3 / 10, time: 2.56s, Loss: 0.3939, Accuracy: 86.4769
Epoch 4 / 10, time: 2.58s, Los

### 7. The architecture of the CNN model we constructed

In [8]:
print(model)

MyCNN(
  (block): Sequential(
    (0): MyConvLayer(
      (block): Sequential(
        (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU()
      )
    )
    (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (2): MyConvLayer(
      (block): Sequential(
        (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU()
      )
    )
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (4): MyConvLayer(
      (block): Sequential(
        (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU()
      )
    )
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): MyConvLayer(
      (block): Sequential(
        (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU()
      )
    )
    (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, c