<a href="https://colab.research.google.com/github/theLzyDog/CSCI167/blob/main/Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [21]:
pip install datasets



In [22]:
from datasets import load_dataset

dataset = load_dataset('microsoft/cats_vs_dogs')
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['image', 'labels'],
        num_rows: 23410
    })
})


In [23]:
import torch, torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
from torch.optim.lr_scheduler import StepLR
import numpy as np
import matplotlib.pyplot as plt

# Task
Define image transformations for the `cats_vs_dogs` dataset including resizing, conversion to tensor, and normalization.

## Define Image Transformations

### Subtask:
Define the necessary image transformations (e.g., resizing, converting to tensor, normalization) for the input to the CNN using `torchvision.transforms`.


**Reasoning**:
I need to define image transformations as specified in the instructions. This involves importing `torchvision.transforms`, composing a transformation pipeline with `Resize`, `ToTensor`, and `Normalize`, and storing it in a variable named `transform`.



In [24]:
from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

print("Image transformation pipeline defined.")

Image transformation pipeline defined.


## Preprocess Dataset and Create DataLoaders

### Subtask:
Apply the defined transformations to the `cats_vs_dogs` dataset. Split the dataset into training and validation sets. Create `DataLoader` instances for efficient batching of the data during training and evaluation.


**Reasoning**:
I need to define a function to apply the `transform` to each image in the dataset, which is necessary before splitting and loading the data. This function will be mapped over the dataset.



In [25]:
def apply_transforms(batch):
    batch['image'] = [transform(img) for img in batch['image']]
    return batch

print("Transformation application function defined.")

Transformation application function defined.


**Reasoning**:
Now that the transformation function is defined, I will apply it to the dataset using `with_transform`, then split the dataset into training and validation sets, and finally create DataLoader instances for both.



In [26]:
transformed_dataset = dataset['train'].with_transform(apply_transforms)

splits = transformed_dataset.train_test_split(test_size=0.2)
train_dataset = splits['train']
val_dataset = splits['test']

batch_size = 64

train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=batch_size)

print(f"Training dataset size: {len(train_dataset)}")
print(f"Validation dataset size: {len(val_dataset)}")
print(f"Number of training batches: {len(train_dataloader)}")
print(f"Number of validation batches: {len(val_dataloader)}")

Training dataset size: 18728
Validation dataset size: 4682
Number of training batches: 293
Number of validation batches: 74


## Define CNN Model Architecture

### Subtask:
Implement a simple Convolutional Neural Network (CNN) using `torch.nn.Module`. This will include convolutional layers, pooling layers, and fully connected layers for classification.


**Reasoning**:
To implement the CNN model, I will define a class `SimpleCNN` that inherits from `torch.nn.Module`. This class will contain the convolutional, pooling, and fully connected layers as specified, along with the `forward` pass definition.



In [27]:
class SimpleCNN(nn.Module):
    def __init__(self, num_classes=2):
        super(SimpleCNN, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, padding=1), # Input: 3 channels (RGB), Output: 32 channels
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),    # Output size: 112x112

            nn.Conv2d(32, 64, kernel_size=3, padding=1), # Input: 32 channels, Output: 64 channels
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),    # Output size: 56x56

            nn.Conv2d(64, 128, kernel_size=3, padding=1), # Input: 64 channels, Output: 128 channels
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),    # Output size: 28x28

            nn.Conv2d(128, 256, kernel_size=3, padding=1), # Input: 128 channels, Output: 256 channels
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)     # Output size: 14x14
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(256 * 14 * 14, 512),            # Fully connected layer
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(512, num_classes)               # Output layer for 2 classes (cat or dog)
        )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

print("SimpleCNN model architecture defined.")

SimpleCNN model architecture defined.


## Initialize Model, Loss Function, and Optimizer

### Subtask:
Instantiate the defined CNN model. Define the loss function (e.g., `nn.CrossEntropyLoss`) and an optimizer (e.g., `torch.optim.Adam`) for training the model.


**Reasoning**:
To instantiate the model, define the loss function, and optimizer, I will write a code block that creates an instance of `SimpleCNN`, sets up `nn.CrossEntropyLoss` and `torch.optim.Adam`, and moves the model to the appropriate device (GPU if available, otherwise CPU).



In [28]:
model = SimpleCNN(num_classes=2)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

print(f"Model initialized and moved to {device}.")
print(f"Loss function: {criterion}")
print(f"Optimizer: {optimizer}")

Model initialized and moved to cpu.
Loss function: CrossEntropyLoss()
Optimizer: Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    decoupled_weight_decay: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.001
    maximize: False
    weight_decay: 0
)


## Train the Model

### Subtask:
Implement the training loop to train the CNN model over several epochs. This will involve iterating through the training DataLoader, performing forward passes, calculating loss, backpropagating gradients, and updating model weights. Include basic validation during training.


**Reasoning**:
To implement the training loop as described, I need to write a Python code block that iterates through a specified number of epochs, performs training and validation steps, calculates loss and accuracy, and stores these metrics.



In [None]:
num_epochs = 10

train_losses = []
train_accuracies = []
val_losses = []
val_accuracies = []

for epoch in range(num_epochs):
    model.train()
    running_train_loss = 0.0
    correct_train = 0
    total_train = 0

    for i, batch in enumerate(train_dataloader):
        images = batch['image'].to(device)
        labels = batch['labels'].to(device)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_train_loss += loss.item()
        _, predicted = torch.max(outputs.data, 1)
        total_train += labels.size(0)
        correct_train += (predicted == labels).sum().item()

    epoch_train_loss = running_train_loss / len(train_dataloader)
    epoch_train_accuracy = 100 * correct_train / total_train
    train_losses.append(epoch_train_loss)
    train_accuracies.append(epoch_train_accuracy)

    model.eval()
    running_val_loss = 0.0
    correct_val = 0
    total_val = 0

    with torch.no_grad():
        for i, batch in enumerate(val_dataloader):
            images = batch['image'].to(device)
            labels = batch['labels'].to(device)

            outputs = model(images)
            loss = criterion(outputs, labels)

            running_val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total_val += labels.size(0)
            correct_val += (predicted == labels).sum().item()

    epoch_val_loss = running_val_loss / len(val_dataloader)
    epoch_val_accuracy = 100 * correct_val / total_val
    val_losses.append(epoch_val_loss)
    val_accuracies.append(val_accuracies)

    print(f'Epoch [{epoch+1}/{num_epochs}], '
          f'Train Loss: {epoch_train_loss:.4f}, '
          f'Train Acc: {epoch_train_accuracy:.2f}%, '
          f'Val Loss: {epoch_val_loss:.4f}, '
          f'Val Acc: {epoch_val_accuracy:.2f}%')

print("Training complete.")

**Reasoning**:
The `RuntimeError` indicates that some images are grayscale (1 channel) while the normalization step expects 3 channels. To address this, I will add `transforms.Lambda(lambda image: image.convert("RGB"))` to the transformation pipeline in cell `e4c60c7f` to ensure all images are converted to RGB format before normalization. This modification will prevent the channel mismatch error during the training loop.



In [30]:
from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.Lambda(lambda image: image.convert("RGB")), # Add this line to ensure 3 channels
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

print("Image transformation pipeline defined.")

Image transformation pipeline defined.


**Reasoning**:
Now that the `transform` pipeline has been updated to handle potential grayscale images, the `apply_transforms` function needs to be re-defined and re-executed to ensure it uses the corrected `transform`.



In [31]:
def apply_transforms(batch):
    batch['image'] = [transform(img) for img in batch['image']]
    return batch

print("Transformation application function defined.")

Transformation application function defined.


**Reasoning**:
Now that the `apply_transforms` function has been re-defined with the updated `transform` pipeline, the `DataLoader` instances (`train_dataloader` and `val_dataloader`) must be re-created to incorporate these changes. This ensures that all images processed by the DataLoaders will use the corrected RGB transformation.



In [32]:
transformed_dataset = dataset['train'].with_transform(apply_transforms)

splits = transformed_dataset.train_test_split(test_size=0.2)
train_dataset = splits['train']
val_dataset = splits['test']

batch_size = 64

train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=batch_size)

print(f"Training dataset size: {len(train_dataset)}")
print(f"Validation dataset size: {len(val_dataset)}")
print(f"Number of training batches: {len(train_dataloader)}")
print(f"Number of validation batches: {len(val_dataloader)}")

Training dataset size: 18728
Validation dataset size: 4682
Number of training batches: 293
Number of validation batches: 74


**Reasoning**:
Now that the `transform` pipeline has been updated, and the `apply_transforms` function and `DataLoader` instances have been re-created to use the corrected transformations, I will re-execute the training loop. This should resolve the `RuntimeError` related to image channel mismatch and allow the model to train as intended.



In [None]:
num_epochs = 10

train_losses = []
train_accuracies = []
val_losses = []
val_accuracies = []

for epoch in range(num_epochs):
    model.train()
    running_train_loss = 0.0
    correct_train = 0
    total_train = 0

    for i, batch in enumerate(train_dataloader):
        images = batch['image'].to(device)
        labels = batch['labels'].to(device)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        print(f'{i} ')
        loss.backward()
        optimizer.step()

        running_train_loss += loss.item()
        _, predicted = torch.max(outputs.data, 1)
        total_train += labels.size(0)
        correct_train += (predicted == labels).sum().item()

    epoch_train_loss = running_train_loss / len(train_dataloader)
    epoch_train_accuracy = 100 * correct_train / total_train
    train_losses.append(epoch_train_loss)
    train_accuracies.append(epoch_train_accuracy)

    model.eval()
    running_val_loss = 0.0
    correct_val = 0
    total_val = 0

    with torch.no_grad():
        for i, batch in enumerate(val_dataloader):
            images = batch['image'].to(device)
            labels = batch['labels'].to(device)

            outputs = model(images)
            loss = criterion(outputs, labels)

            running_val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total_val += labels.size(0)
            correct_val += (predicted == labels).sum().item()
            print(f'{i}/{i+1} ')

    epoch_val_loss = running_val_loss / len(val_dataloader)
    epoch_val_accuracy = 100 * correct_val / total_val
    val_losses.append(epoch_val_loss)
    val_accuracies.append(epoch_val_accuracy)

    print(f'Epoch [{epoch+1}/{num_epochs}], '
          f'Train Loss: {epoch_train_loss:.4f}, '
          f'Train Acc: {epoch_train_accuracy:.2f}%, '
          f'Val Loss: {epoch_val_loss:.4f}, '
          f'Val Acc: {epoch_val_accuracy:.2f}%')

print("Training complete.")

0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
