# CNN Example – Histopathology Image Classification (PathMNIST)

This notebook walks through building and optimising a Convolutional Neural Network (CNN) with **PyTorch** to classify pathology images in the **PathMNIST** subset of the *MedMNIST* collection.

**Learning goals**
1. Download and prepare an image dataset with `medmnist`
2. Build a simple CNN using `torch.nn`
3. Use GPU acceleration when available
4. Track training/validation loss & accuracy
5. Evaluate with a classification report and confusion matrix
6. Experiment with data augmentation and deeper architectures

In [None]:
# Install dependencies (uncomment as needed)
# !pip install -q torch torchvision torchaudio medmnist scikit-learn matplotlib

In [None]:
import torch, torchvision
from torch import nn, optim
from torchvision import transforms
from torch.utils.data import DataLoader
import medmnist
from medmnist import INFO, Evaluator
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import numpy as np

### Load PathMNIST (9 classes)

### PathMNIST Dataset (MedMNIST v2)

- **Samples:** 107,180 hematoxylin–eosin (H&E) stained image patches (28 × 28 × 3) from colorectal tissue slides.  
- **Task type:** *Multi‑class classification*  
- **Target column:** `label` (histopathology tissue category)  
- **Number of classes:** **9**

| Class ID | Tissue category (CoNSeP reference)                     |
|----------|--------------------------------------------------------|
| `0`      | Adipose                                                |
| `1`      | Background                                             |
| `2`      | Debris                                                 |
| `3`      | Lymphocytes                                            |
| `4`      | Mucus                                                  |
| `5`      | Smooth muscle                                          |
| `6`      | Normal colon mucosa                                    |
| `7`      | Cancer‑associated stroma                               |
| `8`      | Colorectal adenocarcinoma epithelium                   |

Each image patch is RGB and represents a 224 µm² region rescaled to 28 × 28 pixels.  
The objective is to correctly assign each patch to one of the nine colorectal tissue types.


In [None]:
data_flag = 'pathmnist'
download = True
info = INFO[data_flag]
DataClass = getattr(medmnist, info['python_class'])

train_transforms = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[.5], std=[.5])
])

train_dataset = DataClass(split='train', transform=train_transforms, download=download)
val_dataset   = DataClass(split='val',   transform=train_transforms, download=download)
test_dataset  = DataClass(split='test',  transform=train_transforms, download=download)

train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)
val_loader   = DataLoader(val_dataset,   batch_size=128, shuffle=False)
test_loader  = DataLoader(test_dataset,  batch_size=128, shuffle=False)

n_classes = len(info['label'])
print('Classes:', n_classes)

### Define CNN model

### Training loop

In [None]:
def train_epoch(loader):
    # Train each epoch

In [None]:
def eval_epoch(loader):
    # Evaluate after each epoch

In [None]:
num_epochs = 10
train_losses, val_losses = [], []
train_accs, val_accs = [], []
for epoch in range(num_epochs):
    tr_loss, tr_acc = train_epoch(train_loader)
    v_loss, v_acc  = eval_epoch(val_loader)
    train_losses.append(tr_loss); val_losses.append(v_loss)
    train_accs.append(tr_acc); val_accs.append(v_acc)
    print(f'Epoch {epoch+1}/{num_epochs}: train acc={tr_acc:.3f} val acc={v_acc:.3f}')

### Plot learning curves

### Test set evaluation

In [None]:
# Evaluate with testset
# Print classification summary, AUC, ROC 

### Save the model

> **Next steps**: Try data augmentation (`RandomHorizontalFlip`, `RandomRotation`), deeper models like ResNet‑18 via `torchvision.models`, focal loss for class imbalance, or mixed‑precision training.