# Section 02 — Data Augmentation and Data Loaders

This section defines all preprocessing steps applied to the CT scan images.

Training images receive augmentation to improve generalization:
- Random horizontal flip
- Random rotation
- Brightness and contrast jitter
- Normalization
- Resizing to 224×224
- Grayscale conversion to 1-channel

Validation and test images use only resizing, grayscale conversion, and normalization.

Data loaders are created for:
- train_loader
- val_loader
- test_loader

These loaders are reused in all subsequent notebooks.

In [8]:
# import necessary libraries
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

IMAGE_SIZE = 224
BATCH_SIZE = 32

In [None]:
# Define data augmentation and normalization for training set
train_transforms = transforms.Compose([
    transforms.Grayscale(num_output_channels=1),
    transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomRotation(10),
    transforms.ColorJitter(brightness=0.1, contrast=0.1),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5], std=[0.5])
])

# Define normalization for validation and test sets
test_val_transforms = transforms.Compose([
    transforms.Grayscale(num_output_channels=1),
    transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5], std=[0.5])
])

In [10]:
# Load datasets
train_dataset = datasets.ImageFolder("data/train", transform=train_transforms)
val_dataset   = datasets.ImageFolder("data/validation",   transform=test_val_transforms)
test_dataset  = datasets.ImageFolder("data/test",  transform=test_val_transforms)

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader   = DataLoader(val_dataset,   batch_size=BATCH_SIZE, shuffle=False)
test_loader  = DataLoader(test_dataset,  batch_size=BATCH_SIZE, shuffle=False)

In [11]:
# Print dataset statistics
print("Classes found:", train_dataset.classes)
print("Training samples:", len(train_dataset))
print("Validation samples:", len(val_dataset))
print("Testing samples:", len(test_dataset))

Classes found: ['COVID', 'Lung_Opacity', 'Normal', 'Viral Pneumonia']
Training samples: 14814
Validation samples: 3175
Testing samples: 3176
