<a href="https://colab.research.google.com/github/vijaygwu/IntroToDeepLearning/blob/main/CNNwithAndWithoutPIL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Explanation of the Code**

This code is designed to compare two approaches for loading and processing the MNIST dataset in PyTorch: one using the `PIL` library for image handling and the other directly using PyTorch's built-in tensor handling through `torchvision.datasets.MNIST`. Both approaches involve training a simple neural network to classify handwritten digits from the MNIST dataset, and the results are compared in terms of training time and test accuracy.

Let's go through the code step by step.

---

### **1. Importing Required Libraries**

```python
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
from torchvision import datasets, transforms
from PIL import Image
import time
```

- **`torch`**: Core PyTorch library.
- **`torch.nn`**: Provides modules to build neural networks.
- **`torch.optim`**: Contains optimizers like SGD, Adam, etc.
- **`torch.nn.functional`**: Contains functions like `relu` and `cross_entropy` that are commonly used in neural networks.
- **`torch.utils.data.DataLoader`**: Loads datasets in batches during training.
- **`torchvision.datasets`**: Provides popular datasets like MNIST.
- **`torchvision.transforms`**: Contains functions to transform data, such as converting images to tensors and normalizing them.
- **`PIL`**: Used for handling image files manually.
- **`time`**: For tracking execution time of training loops.

---


In [15]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
from torchvision import datasets, transforms
from PIL import Image
import time



### **2. Version 1: Using PIL for Image Handling**

#### Custom Dataset Class

```python
class MNISTDataset(Dataset):
    def __init__(self, mnist_data, transform=None):
        self.mnist_data = mnist_data  # Store the MNIST dataset from torchvision.
        self.transform = transform    # Transformation like converting to tensor and normalization.

    def __len__(self):
        return len(self.mnist_data)  # Return the number of images in the dataset.

    def __getitem__(self, index):
        img, label = self.mnist_data[index]  # Get an image and its label.
        img = transforms.ToPILImage()(img)   # Convert the tensor image back to a PIL image.

        if self.transform:
            img = self.transform(img)  # Apply the specified transformations (to tensor, normalization).

        return img, label  # Return the transformed image and its label.
```

- **Purpose**: This class wraps the `torchvision.datasets.MNIST` dataset and allows the manual conversion of images from tensors back to `PIL` images for custom processing. It also applies transformations like converting the image back to a tensor and normalizing it.
  
#### Dataset Loading and Transformations

```python
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert the PIL image to a tensor.
    transforms.Normalize((0.1307,), (0.3081,))  # Normalize the data with the mean and std of MNIST.
])

train_data = datasets.MNIST(root='./mnist_data', train=True, download=True, transform=transforms.ToTensor())
test_data = datasets.MNIST(root='./mnist_data', train=False, download=True, transform=transforms.ToTensor())

train_dataset_pil = MNISTDataset(train_data, transform=transform)
test_dataset_pil = MNISTDataset(test_data, transform=transform)

train_loader_pil = DataLoader(dataset=train_dataset_pil, batch_size=64, shuffle=True)
test_loader_pil = DataLoader(dataset=test_dataset_pil, batch_size=64, shuffle=False)
```

- **Transformations**:
  - `ToTensor()`: Converts the PIL image to a PyTorch tensor.
  - `Normalize((0.1307,), (0.3081,))`: Standard normalization for the MNIST dataset (mean and std are specific to MNIST).
  
- **Datasets**:
  - `datasets.MNIST`: Automatically downloads and loads the MNIST dataset if it's not already available.
  
- **DataLoader**: The `DataLoader` is used to load the data in batches, with `shuffle=True` for the training data to randomize the order of images for better generalization.

---



In [16]:
###############################################
# Version 1: Using PIL                        #
###############################################

# Custom Dataset Class for MNIST using PIL and torchvision
# This class wraps the torchvision MNIST dataset but loads images using PIL to allow for manual control over image processing.
class MNISTDataset(Dataset):
    def __init__(self, mnist_data, transform=None):
        self.mnist_data = mnist_data  # Store the dataset passed in (torchvision MNIST dataset).
        self.transform = transform    # Transformation (like converting to tensors and normalizing).

    def __len__(self):
        return len(self.mnist_data)  # Return the number of items in the dataset.

    def __getitem__(self, index):
        # Get the image and label at the specified index from the original dataset.
        img, label = self.mnist_data[index]

        # Convert the image from a tensor back to a PIL image for further processing.
        img = transforms.ToPILImage()(img)

        # Apply any transformations (like converting back to a tensor and normalizing).
        if self.transform:
            img = self.transform(img)

        # Return the processed image and its corresponding label.
        return img, label

# Set up transforms (convert to tensor and normalize)
# We need to convert the images to tensors and normalize them (mean and std values are specific to MNIST).
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert PIL image to PyTorch tensor.
    transforms.Normalize((0.1307,), (0.3081,))  # Normalize the data with MNIST-specific mean and std.
])

# Download the MNIST dataset using torchvision.
# The dataset will be downloaded if not already present in './mnist_data'.
train_data = datasets.MNIST(root='./mnist_data', train=True, download=True, transform=transforms.ToTensor())
test_data = datasets.MNIST(root='./mnist_data', train=False, download=True, transform=transforms.ToTensor())

# Wrap the torchvision MNIST dataset with our custom dataset class, which uses PIL for image handling.
train_dataset_pil = MNISTDataset(train_data, transform=transform)
test_dataset_pil = MNISTDataset(test_data, transform=transform)

# DataLoader for batching. Batching helps in loading a set of images at once during training.
# Shuffle=True ensures that the training data is shuffled each epoch for better generalization.
train_loader_pil = DataLoader(dataset=train_dataset_pil, batch_size=64, shuffle=True)
test_loader_pil = DataLoader(dataset=test_dataset_pil, batch_size=64, shuffle=False)



### **3. Version 2: Without PIL for Image Handling**

```python
train_dataset_no_pil = datasets.MNIST(root='./mnist_data', train=True, download=True, transform=transform)
test_dataset_no_pil = datasets.MNIST(root='./mnist_data', train=False, download=True, transform=transform)

train_loader_no_pil = DataLoader(dataset=train_dataset_no_pil, batch_size=64, shuffle=True)
test_loader_no_pil = DataLoader(dataset=test_dataset_no_pil, batch_size=64, shuffle=False)
```

- **Difference**: In this version, the MNIST dataset is directly handled by `torchvision.datasets.MNIST`. The images are loaded as tensors right from the start, so there's no need for manual conversion using `PIL`.

- **Advantages**: This is more efficient when working with standard datasets like MNIST because the data is already prepared in tensor format.

---



In [17]:
###############################################
# Version 2: Without PIL                      #
###############################################

# In this version, we use the dataset directly as provided by torchvision, without wrapping it in a custom dataset class.

# Transformations (convert to tensor and normalize)
transform = transforms.Compose([
    transforms.ToTensor(),  # Directly convert the images to PyTorch tensors.
    transforms.Normalize((0.1307,), (0.3081,))  # Normalize the data (mean and std specific to MNIST).
])

# Download and load the MNIST dataset directly.
# The dataset will be downloaded and directly loaded without any manual PIL processing.
train_dataset_no_pil = datasets.MNIST(root='./mnist_data', train=True, download=True, transform=transform)
test_dataset_no_pil = datasets.MNIST(root='./mnist_data', train=False, download=True, transform=transform)

# DataLoader for batching. Same as the PIL version.
train_loader_no_pil = DataLoader(dataset=train_dataset_no_pil, batch_size=64, shuffle=True)
test_loader_no_pil = DataLoader(dataset=test_dataset_no_pil, batch_size=64, shuffle=False)



### **4. Neural Network Architecture**

```python
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(28*28, 128)  # Fully connected layer 1: from input size (28*28) to 128 neurons.
        self.fc2 = nn.Linear(128, 64)     # Fully connected layer 2: from 128 neurons to 64.
        self.fc3 = nn.Linear(64, 10)      # Output layer: 10 neurons for 10 digit classes.

    def forward(self, x):
        x = x.view(-1, 28*28)  # Flatten the 28x28 image into a vector of size 784.
        x = F.relu(self.fc1(x))  # Apply ReLU activation to the first layer.
        x = F.relu(self.fc2(x))  # Apply ReLU activation to the second layer.
        x = self.fc3(x)          # No activation here (cross-entropy will handle softmax).
        return x
```

- **Network Overview**:
  - Input size is `28x28` (since MNIST images are 28x28 pixels).
  - Two hidden layers with ReLU activation.
  - The final output layer has 10 neurons (one for each digit class).

- **Purpose**: The network takes in a flattened image, processes it through two fully connected layers with ReLU activation, and then outputs a vector of 10 scores (one for each digit).

---



In [18]:
###############################################
# Shared Neural Network Code                  #
###############################################

# Define a simple fully connected neural network for classification.
# The model has three layers: two hidden layers with ReLU activation and one output layer for classification.
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # Input size is 28*28 (since MNIST images are 28x28 pixels).
        self.fc1 = nn.Linear(28*28, 128)  # First fully connected layer, 128 neurons.
        self.fc2 = nn.Linear(128, 64)     # Second fully connected layer, 64 neurons.
        self.fc3 = nn.Linear(64, 10)      # Output layer, 10 neurons (for 10 digit classes).

    def forward(self, x):
        # Flatten the input tensor (28x28 pixels) into a vector of size 784.
        x = x.view(-1, 28*28)
        x = F.relu(self.fc1(x))  # Apply ReLU activation to the first layer.
        x = F.relu(self.fc2(x))  # Apply ReLU activation to the second layer.
        x = self.fc3(x)          # Output layer (no activation, as we'll use CrossEntropyLoss).
        return x

# Create separate models for the two versions (PIL and No PIL).
model_pil = Net()      # Model for the PIL version.
model_no_pil = Net()   # Model for the non-PIL version.


### **5. Loss Function and Optimizer**

```python
criterion = nn.CrossEntropyLoss()  # Cross-entropy loss for classification tasks.
optimizer_pil = optim.SGD(model_pil.parameters(), lr=0.01)  # Optimizer for the PIL version.
optimizer_no_pil = optim.SGD(model_no_pil.parameters(), lr=0.01)  # Optimizer for the non-PIL version.
```

- **`CrossEntropyLoss`**: This loss function is used for classification tasks. It combines `LogSoftmax` and `Negative Log Likelihood` in one function.
- **`SGD` Optimizer**: Stochastic Gradient Descent is used for optimization. The learning rate is set to 0.01 for both models.

---



In [19]:


# Use CrossEntropyLoss for classification tasks and SGD for optimization.
criterion = nn.CrossEntropyLoss()
optimizer_pil = optim.SGD(model_pil.parameters(), lr=0.01)  # Optimizer for the PIL version.
optimizer_no_pil = optim.SGD(model_no_pil.parameters(), lr=0.01)  # Optimizer for the non-PIL version.


### **6. Training and Testing Loop for PIL Version**

#### Training Loop

```python
start_time_pil = time.time()

for epoch in range(5):
    model_pil.train()  # Set the model to training mode.
    running_loss = 0.0
    
    for batch_idx, (data, target) in enumerate(train_loader_pil):
        optimizer_pil.zero_grad()  # Clear previous gradients.
        output = model_pil(data)  # Forward pass through the network.
        loss = criterion(output, target)  # Compute the loss.
        loss.backward()  # Backward pass to compute gradients.
        optimizer_pil.step()  # Update model weights.
        running_loss += loss.item()  # Accumulate the loss.
    
    print(f'PIL Version - Epoch {epoch+1}, Training Loss: {running_loss/len(train_loader_pil):.4f}')
```

- **Training Process**:
  - **Forward pass**: The data is passed through the network to make predictions.
  - **Loss calculation**: The difference between the predicted output and true labels is computed using cross-entropy loss.
  - **Backward pass**: The gradients of the loss with respect to the model parameters are computed.
  - **Optimization**: The optimizer updates the model parameters based on the gradients.

#### Testing Loop

```python
model_pil.eval()  # Set the model to evaluation mode (no gradient calculation).
correct_pil = 0
total_pil = 0

with torch.no_grad():  # No need to compute gradients during evaluation.
    for data, target in test_loader_pil:
        outputs = model_pil(data)  # Forward pass.
        _, predicted = torch.max(outputs.data, 1)  # Get the predicted class.
        total_pil += target.size(0)  # Increment the total number of samples.
        correct_pil += (predicted == target).sum().item()  # Count correct predictions.

accuracy_pil = 100 * correct_pil / total_pil  # Compute accuracy.
print(f'PIL Version - Test Accuracy: {accuracy_pil:.2f}%')
```

- **Evaluation**: The model is evaluated on the test set by making

 predictions, comparing them to the true labels, and calculating accuracy.

---


In [20]:

###############################################
# Training and Testing for PIL Version        #
###############################################

# Record start time for the PIL version to measure training time.
start_time_pil = time.time()

# Training loop for PIL version
for epoch in range(5):  # We train the model for 5 epochs.
    model_pil.train()  # Set the model to training mode.
    running_loss = 0.0  # Variable to track loss over the epoch.

    # Loop over batches of data in the training set.
    for batch_idx, (data, target) in enumerate(train_loader_pil):
        optimizer_pil.zero_grad()  # Zero the gradients (required before every backward pass).
        output = model_pil(data)   # Forward pass: get predictions from the model.
        loss = criterion(output, target)  # Calculate the loss (how far predictions are from true labels).
        loss.backward()  # Backward pass: compute gradients.
        optimizer_pil.step()  # Update model weights based on gradients.
        running_loss += loss.item()  # Accumulate the loss.

    # Print the average loss for the epoch.
    print(f'PIL Version - Epoch {epoch+1}, Training Loss: {running_loss/len(train_loader_pil):.4f}')

# Testing the model for PIL version
model_pil.eval()  # Set the model to evaluation mode (no backpropagation, etc.).
correct_pil = 0   # To count how many predictions were correct.
total_pil = 0     # To count the total number of examples.

# Loop through the test dataset.
with torch.no_grad():  # No need to compute gradients during evaluation.
    for data, target in test_loader_pil:
        outputs = model_pil(data)  # Forward pass: get predictions.
        _, predicted = torch.max(outputs.data, 1)  # Get the index of the highest score as the prediction.
        total_pil += target.size(0)  # Increment the total number of examples.
        correct_pil += (predicted == target).sum().item()  # Count correct predictions.

# Record end time for the PIL version.
end_time_pil = time.time()
training_time_pil = end_time_pil - start_time_pil  # Calculate total training time.
accuracy_pil = 100 * correct_pil / total_pil  # Calculate accuracy as a percentage.

print(f'PIL Version - Test Accuracy: {accuracy_pil:.2f}%')
print(f'PIL Version - Training Time: {training_time_pil:.2f} seconds')


PIL Version - Epoch 1, Training Loss: 0.8008
PIL Version - Epoch 2, Training Loss: 0.3135
PIL Version - Epoch 3, Training Loss: 0.2565
PIL Version - Epoch 4, Training Loss: 0.2187
PIL Version - Epoch 5, Training Loss: 0.1904
PIL Version - Test Accuracy: 94.78%
PIL Version - Training Time: 117.36 seconds


In [21]:

###############################################
# Training and Testing for No PIL Version     #
###############################################

# Record start time for the non-PIL version.
start_time_no_pil = time.time()

# Training loop for no PIL version (same as PIL version, but using the non-PIL data loader).
for epoch in range(5):
    model_no_pil.train()
    running_loss = 0.0

    # Loop over batches of data in the training set.
    for batch_idx, (data, target) in enumerate(train_loader_no_pil):
        optimizer_no_pil.zero_grad()  # Zero the gradients.
        output = model_no_pil(data)   # Forward pass.
        loss = criterion(output, target)  # Calculate the loss.
        loss.backward()  # Backward pass: compute gradients.
        optimizer_no_pil.step()  # Update model weights.
        running_loss += loss.item()  # Accumulate the loss.

    # Print the average loss for the epoch.
    print(f'No PIL Version - Epoch {epoch+1}, Training Loss: {running_loss/len(train_loader_no_pil):.4f}')

# Testing the model for no PIL version
model_no_pil.eval()  # Set the model to evaluation mode.
correct_no_pil = 0   # To count how many predictions were correct.
total_no_pil = 0     # To count the total number of examples.

# Loop through the test dataset.
with torch.no_grad():  # No need to compute gradients during evaluation.
    for data, target in test_loader_no_pil:
        outputs = model_no_pil(data)  # Forward pass.
        _, predicted = torch.max(outputs.data, 1)  # Get the index of the highest score as the prediction.
        total_no_pil += target.size(0)  # Increment the total number of examples.
        correct_no_pil += (predicted == target).sum().item()  # Count correct predictions.

# Record end time for the non-PIL version.
end_time_no_pil = time.time()
training_time_no_pil = end_time_no_pil - start_time_no_pil  # Calculate total training time.
accuracy_no_pil = 100 * correct_no_pil / total_no_pil  # Calculate accuracy as a percentage.

print(f'No PIL Version - Test Accuracy: {accuracy_no_pil:.2f}%')
print(f'No PIL Version - Training Time: {training_time_no_pil:.2f} seconds')




No PIL Version - Epoch 1, Training Loss: 0.8548
No PIL Version - Epoch 2, Training Loss: 0.3192
No PIL Version - Epoch 3, Training Loss: 0.2609
No PIL Version - Epoch 4, Training Loss: 0.2233
No PIL Version - Epoch 5, Training Loss: 0.1952
No PIL Version - Test Accuracy: 94.50%
No PIL Version - Training Time: 68.90 seconds


### **7. Timing and Comparison**

```python
# Timing and accuracy are tracked for both versions.
end_time_pil = time.time()
training_time_pil = end_time_pil - start_time_pil

print(f'PIL Version - Training Time: {training_time_pil:.2f} seconds')

# Repeat the same for the non-PIL version.
```

- **Timing**: The `time.time()` function is used to measure how long it takes to train the model for both versions. This allows a direct comparison of training times.

- **Results Comparison**:
  - Training times and accuracies for both the `PIL` and non-`PIL` versions are printed side by side to compare the performance of each approach.

---



In [22]:
###############################################
# Results Comparison                          #
###############################################

# Print a side-by-side comparison of the training times and test accuracies.
print("\n================= Comparison =================")
print(f"Training Time (PIL): {training_time_pil:.2f} seconds")
print(f"Training Time (No PIL): {training_time_no_pil:.2f} seconds")
print(f"Test Accuracy (PIL): {accuracy_pil:.2f}%")
print(f"Test Accuracy (No PIL): {accuracy_no_pil:.2f}%")


Training Time (PIL): 117.36 seconds
Training Time (No PIL): 68.90 seconds
Test Accuracy (PIL): 94.78%
Test Accuracy (No PIL): 94.50%


## **Using PIL and not using PIL**

| **Aspect**                  | **Using PIL**                                      | **Without PIL**                                     |
|-----------------------------|----------------------------------------------------|----------------------------------------------------|
| **Image Handling**           | Converts image tensors back to **PIL** images. This allows for custom image processing using `PIL` (Python Imaging Library). | Directly uses tensors from the dataset without converting to PIL. This avoids the overhead of image format conversions. |
| **Custom Dataset Class**     | Requires a custom `MNISTDataset` class to load images using `PIL` and apply transformations manually. | Does not require a custom dataset class. The `torchvision.datasets.MNIST` dataset is directly used as tensors. |
| **Transformations**          | The images are first converted back to **PIL** images and then transformed back to tensors using `transforms.ToTensor()`. This allows for more flexibility with custom image handling if needed. | Transformations (e.g., `ToTensor()` and normalization) are applied directly to the image tensors using `torchvision.transforms`. No need for PIL-based image transformations. |
| **Efficiency**               | **Less efficient**: Converting images from tensors to **PIL** and then back to tensors introduces overhead, making this approach slower, especially for large datasets. | **More efficient**: Directly working with tensors avoids unnecessary conversions, making it faster and more suitable for large-scale datasets like MNIST. |
| **Flexibility**              | **More flexible**: If custom image processing (e.g., resizing, cropping, augmentations) is required, using `PIL` allows for advanced image manipulation that isn't always available in `torchvision.transforms`. | **Less flexible**: `torchvision.transforms` is powerful for common image processing needs, but it might not cover advanced or specific custom operations that **PIL** can handle. |
| **Code Complexity**          | **Higher complexity**: Requires a custom dataset class to manage PIL conversions and manual handling of transformations. This adds extra code and complexity. | **Lower complexity**: Simply using `torchvision.datasets.MNIST` directly with transformations reduces code complexity, making it easier to implement and maintain. |
| **Use Case Suitability**     | Suitable if you need **custom image preprocessing** or manipulation (e.g., resizing, filtering, augmentation) before converting to tensors. Common in projects requiring advanced preprocessing beyond normalization or conversion. | Suitable for most standard datasets where the focus is on efficient loading and training. Common in projects where you need fast, **out-of-the-box dataset handling**, especially for widely used datasets like MNIST. |
| **Training Time**            | Takes longer due to the additional conversion steps between tensor and PIL images. This extra step increases the overall training time, especially noticeable with large datasets or high epochs. | Faster since the images are handled as tensors directly. Avoiding the PIL conversion reduces unnecessary overhead, improving training time. |
| **Code Maintenance**         | More difficult to maintain, especially if adding or modifying the transformations requires working through a custom dataset class. | Easier to maintain since you rely on PyTorch's well-documented and widely-used data handling functionality. |
| **Memory Overhead**          | Higher memory usage since each image is converted between formats, which can be taxing when working with large datasets. | Lower memory overhead since the data remains in tensor format, which is native to PyTorch and more memory efficient. |
| **Transform Customizability** | Provides full control over how images are loaded, processed, and transformed. You can create custom pipelines involving PIL methods before converting to a tensor. | Less customizable but still allows common transformations like normalization, resizing, and data augmentation with `torchvision.transforms`. Custom transformations can still be added but in a more restricted environment. |

---

### **Key Points:**

1. **Performance**:
   - The **without PIL** approach is faster and more efficient because it skips the unnecessary step of converting between image formats. By directly handling the images as tensors, this method allows for faster data loading, training, and testing, particularly important in large-scale datasets.

2. **Flexibility**:
   - The **using PIL** approach offers more flexibility for custom image manipulation. For instance, if you need to perform advanced image preprocessing, like applying filters, specific augmentations, or detailed custom transformations, the PIL approach gives you more control.
   - However, **without PIL** is still capable of common transformations like resizing, normalization, and flipping, but it's more constrained to the functionalities provided by `torchvision.transforms`.

3. **Complexity**:
   - **Using PIL** adds complexity because it requires creating a custom dataset class and manually handling image conversions. This additional code increases the risk of bugs and makes the code more difficult to maintain.
   - **Without PIL** is simpler and easier to manage since you're using PyTorchâ€™s built-in functions for handling datasets and transformations.

4. **Use Cases**:
   - **Using PIL** is more appropriate when working with custom datasets where you might need non-standard image preprocessing steps.
   - **Without PIL** is ideal for standard tasks like MNIST classification, where the dataset is already structured and doesn't require complex image manipulations. This approach is faster and easier to implement.

---

### **Which Approach Should You Use?**

- **Use `PIL`** when:
  - You need **advanced image preprocessing**.
  - You're working with **custom datasets** that require custom image handling.
  - You want **fine-grained control** over how images are loaded and processed.

- **Skip `PIL` (Use tensors directly)** when:
  - You're working with **standard datasets** like MNIST, CIFAR, etc.
  - You prioritize **efficiency** and **simplicity**.
  - You want to reduce **code complexity** and **training time**.

In conclusion, for most typical scenarios like MNIST classification, **not using PIL** is the better choice due to its efficiency, simplicity, and ease of use. However, **using PIL** offers more control when you need custom processing for complex datasets.