# CSE 398 Deep Learning Final Project
## Detecting Diabetic Retinopathy in Enhanced Retinal Fundus Images
### DeepNet Architecture Tests with Low-Quality and Enhanced Low-Quality Retinal Fundus Images

James Hoffmeister

### tdqm

We install TDQM to have progress bars while training. This helps understand how fast the models may train and validate data.

In [1]:
!pip install tqdm

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Imports and Transforms

We define transforms thattranslate the original 512x512 image format into 224x224 imagenet standard for the DenseNet model. We also add some preprocessing to reduce overfitting.

In [2]:
from torchvision import datasets, transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader
from torch.utils.data import random_split
import os
import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit
from torch.utils.data import Subset
from sklearn.utils.class_weight import compute_class_weight
from torch.utils.data import WeightedRandomSampler
import torch
from collections import Counter
from tqdm import tqdm

if torch.cuda.is_available:
  device = torch.device("cuda")
  print('Using CUDA')
else:
  device = torch.device("cpu")
  print('Could not use CUDA')

# image transform to DenseNet input architecture
transform_train = transforms.Compose([
    transforms.RandomResizedCrop(512, scale=(0.8, 1.0)),  # Random crop but not too crazy
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomVerticalFlip(p=0.5),
    transforms.RandomRotation(degrees=15),  # Mild rotation
    transforms.ColorJitter(
        brightness=0.2,
        contrast=0.2,
        saturation=0.2,
        hue=0.02
    ),
    transforms.RandomApply([
        transforms.GaussianBlur(kernel_size=(5, 9), sigma=(0.1, 5))
    ], p=0.3),  # Some random blurring
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],  # Standard ImageNet normalization
        std=[0.229, 0.224, 0.225]
    ),
])

transform_val = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5]*3, std=[0.5]*3)
])

print("Transformers defined")


  warn(f"Failed to load image Python extension: {e}")


Using CUDA
Transformers defined


## Data Loading

We load the data from the directories using a filtered image folder object that detects only images that may be used for training and testing. We stratify the data before splitting it into training and validation subsets randomly. 

I am more used to Numpy than Torch so sometimes for tensor operations I default to Numpy.

In [3]:
# this was only necessary because magic had some invisible file
# in my folder that I had to tell the code to ignore ¯\_(ツ)_/¯
class FilteredImageFolder(ImageFolder):
    def find_classes(self, directory):
        # ignore the .ipynb file hidden in my train folder
        classes = [d for d in os.listdir(directory) if os.path.isdir(os.path.join(directory, d)) and not d.startswith('.')]
        classes.sort()
        class_to_idx = {cls_name: i for i, cls_name in enumerate(classes)}
        return classes, class_to_idx

# directory with data
data_dir = "/home/jupyter-jah823/train"

# using imagefolder to extract data
dataset = FilteredImageFolder(root=data_dir, transform=None)

# stratified split
targets = dataset.targets
splitter = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
train_id, val_id = next(splitter.split(np.zeros(len(targets)), targets))

# subsets
train_dataset = Subset(dataset, train_id)
val_dataset = Subset(dataset, val_id)

# add transformations
train_dataset.dataset.transform = transform_train
val_dataset.dataset.transform = transform_val

print("Subsets defined")

Subsets defined


## Weighted Sampling

We weight the sampler because the class distribution is heavily skewed to the class 0. We use power-log scaling.

In [4]:
# weighted sampler to deal with class imbalance

train_targets = torch.tensor([dataset.targets[i] for i in train_dataset.indices], device=device)
num_classes = train_targets.max().item() + 1
class_counts = torch.bincount(train_targets, minlength=num_classes)
class_weights_per_class = 1. / (class_counts.float() ** 0.5)
class_weights_per_class[0] *= 0.15
class_weights_per_class[1] *= 3
class_weights_per_class[2] *= 2
class_weights_per_class[3] *= 3
class_weights_per_class[4] *= 1
samples_weight = class_weights_per_class[train_targets]
sampler = WeightedRandomSampler(weights=samples_weight, num_samples=len(samples_weight), replacement=True)

# load data
train_loader = DataLoader(train_dataset, batch_size=64, sampler=sampler, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False, num_workers=4)

import collections

# Set model to eval mode if necessary
sampled_labels = []

# Iterate over ONE epoch (you don't need to train, just inspect batches)
c = tqdm(train_loader)
for inputs, labels in c:
    sampled_labels.extend(labels.cpu().numpy())  # collect all sampled labels

# Now count occurrences
counter = collections.Counter(sampled_labels)

# Display percentages
total = sum(counter.values())
for label, count in counter.items():
    print(f"Class {label}: {count} samples ({100.0 * count / total:.2f}%)")

print("Data loaders prepared!")

100%|██████████| 157/157 [00:39<00:00,  3.94it/s]

Class 1: 3489 samples (34.77%)
Class 2: 3339 samples (33.28%)
Class 3: 2084 samples (20.77%)
Class 0: 559 samples (5.57%)
Class 4: 563 samples (5.61%)
Data loaders prepared!





## Model Definition

We choose DenseNet121 for this test. This is considerably larger than the other models being trained for this experiment. It is also similar to one of the models used in the original paper that developed the EyeQ dataset. We freeze all but a few of the layers (denseblock4, norm5, classifier). The classifier has dropout to ensure there is no overfitting. 

In [5]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models

# load densenet121 pretrained model
def make_model():
    
    model = models.densenet121(weights='IMAGENET1K_V1')
    num_ftrs = model.classifier.in_features
    # added dropout to fix overfitting
    model.classifier = nn.Sequential(
        nn.Dropout(0.5),
        nn.Linear(model.classifier.in_features, 5)  # 0-4
    )

    for name, param in model.named_parameters():
        if 'denseblock3' in name or 'denseblock4' in name or 'norm5' in name or 'classifier' in name:
            param.requires_grad = True
        else:
            param.requires_grad = False

    trainable_params = filter(lambda p: p.requires_grad, model.parameters())

    model = model.to(device)
    
    return model, trainable_params

## Loss, Optimizer, and Scheduler

We use focal loss and the Adam optimizer. The Focal Loss attempts to ensure that underrepresented classes are weighted higher for loss, ensuring a more even distribution of accuracies across classes. We use Cosine-Annealing Learning Rate scheduler to smoothly decrease learning rate over time.

In [6]:
import torch
import torch.nn as nn
import torch.nn.functional as F

criterions = [nn.CrossEntropyLoss(label_smoothing=0.1)]

# using an ADAM optimizer, mentioned in lecture
# also regularizing with weight decay since initial tests showed heavy overfitting

print("Lossoperational")

Lossoperational


In [7]:
def mixup_data(x, y, alpha=0.4):
    '''Compute the mixup data. Return mixed inputs, pairs of targets, and lambda'''
    if alpha > 0:
        lam = np.random.beta(alpha, alpha)
    else:
        lam = 1

    batch_size = x.size(0)
    index = torch.randperm(batch_size).to(x.device)

    mixed_x = lam * x + (1 - lam) * x[index, :]
    y_a, y_b = y, y[index]
    return mixed_x, y_a, y_b, lam

def mixup_criterion(criterion, pred, y_a, y_b, lam):
    return lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)

## Model Training Loop

The training loop iterates through the epochs and prints training set accuracy, validation set accuracy, and validation accuracy stratified by class. This model has difficulties with memorizing the training data due to its size, so it overfits heavily. Additionally, because of the low number of samples in classes 1 and 3, there is a marked decrease in the accuracy of these classes. Potential fixes could include increased image transformations and oversampling of underrepresented classes. Additionally, the decreasing accuracy for the underrepresented classes implies that either the model is failing to learn the features of these classes due to the lack of training examples. This is the highest performing model I could achieve with DenseNet. Note that overall validation set accuracy is heavily dependent on the class 0 accuracy since the validation sets are representative of the data, which contains many more class 0 images than other classes combined. 

In [None]:
from sklearn.metrics import confusion_matrix
import numpy as np
import matplotlib.pyplot as plt # inspied by hw2

# training loop
def train(model, train_loader, val_loader, optimizer, scheduler, criterion, verbose, epochs):
    
    # for plotting, thanks hw2
    train_losses = []
    val_accuracies = []
    train_accuracies = []
    
    # for number of epochs
    for epoch in range(epochs):
        # switch model to train mode
        model.train()
        # vars for recording
        total_loss = 0.0
        correct = 0
        total = 0

        # loading bar for easy timing
        batch_iterator = tqdm(train_loader, desc=f"Epoch {epoch+1}/{epochs}", leave=False) if verbose else train_loader

        for inputs, targets in batch_iterator:
            inputs, targets = inputs.to(device), targets.to(device)

            inputs, targets_a, targets_b, lam = mixup_data(inputs, targets, alpha=0.4)

            outputs = model(inputs)

            loss = mixup_criterion(criterion, outputs, targets_a, targets_b, lam)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            total_loss += loss.item()

            _, predicted = torch.max(outputs.data, 1)
            correct += (lam * (predicted == targets_a).sum().item() + (1 - lam) * (predicted == targets_b).sum().item())
            total += outputs.size(0)

            if verbose:
                batch_iterator.set_postfix(loss=loss.item())


        # calculate accuracy and print
        train_acc = correct / total
        train_losses.append(total_loss)
        train_accuracies.append(train_acc)
        print(f"\nEpoch [{epoch+1}/{epochs}] - Train Loss: {total_loss:.4f}, Train Acc: {train_acc:.4f}")

        # Validation loop
        model.eval()
        val_loss = 0.0
        val_correct = 0
        val_total = 0
        
        val_iterator = tqdm(val_loader, desc=f"Validating {epoch+1}/{epochs}", leave=False) if verbose else val_loader
        
        all_preds = []
        all_labels = []
        
        correct_per_class = torch.zeros(5)
        total_per_class = torch.zeros(5)
        
        with torch.no_grad():
            # for each image in validation set
            for images, labels in val_iterator:
                images = images.to(device)
                labels = labels.to(device)

                outputs = model(images)
                loss = criterion(outputs, labels)
                val_loss += loss.item()
                
                _, predicted = torch.max(outputs.data, 1)
                val_total += labels.size(0)
                val_correct += (predicted == labels).sum().item()
                
                # classwise accuracy
                for label, pred in zip(labels, predicted):
                    total_per_class[label] += 1
                    if label == pred:
                        correct_per_class[label] += 1
                
                if verbose: val_iterator.set_postfix(loss=loss.item())

        val_acc = val_correct / val_total
        val_accuracies.append(val_acc)
        print(f"Validation Acc: {val_acc:.4f}")
        
        for i in range(5):
            if total_per_class[i] > 0:
                acc = 100 * correct_per_class[i] / total_per_class[i]
                print(f"Accuracy for class {i}: {acc:.2f}%")
            else:
                print(f"Class {i} has no samples.")
    
        # step the scheduler to reduce learning rate
        avg_val_loss = val_loss / len(val_iterator)

        # step the scheduler
        scheduler.step()
        
    return train_losses, train_accuracies, val_accuracies

for c in criterions:
    model, trainable_params = make_model()
    
    optimizer = optim.Adam(trainable_params, lr=1e-4, weight_decay=1e-4)

    # scheduler to bring down learning rate to reduce overfitting
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10)
    
    train_losses, train_accuracies, val_accuracies = train(model, train_loader, val_loader, optimizer, scheduler, c, verbose=True, epochs=20)
    epochs_range = range(1, len(train_losses)+1)

    plt.figure(figsize=(12,5))

    plt.subplot(1,2,1)
    plt.plot(epochs_range, train_losses, label='Training Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title('Training Loss / Epochs')
    plt.legend()

    plt.subplot(1,2,2)
    plt.plot(epochs_range, train_accuracies, label='Training Accuracy')
    plt.plot(epochs_range, val_accuracies, label='Validation Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.title('Training vs Validation Accuracy')
    plt.legend()

    plt.tight_layout()
    plt.show()


                                                                         


Epoch [1/20] - Train Loss: 205.0342, Train Acc: 0.4984


                                                                           

Validation Acc: 0.1375
Accuracy for class 0: 0.00%
Accuracy for class 1: 77.47%
Accuracy for class 2: 38.95%
Accuracy for class 3: 56.72%
Accuracy for class 4: 50.00%


                                                                         


Epoch [2/20] - Train Loss: 166.2321, Train Acc: 0.6851


                                                                           

Validation Acc: 0.1479
Accuracy for class 0: 0.00%
Accuracy for class 1: 64.84%
Accuracy for class 2: 55.52%
Accuracy for class 3: 35.82%
Accuracy for class 4: 56.00%


                                                                         


Epoch [3/20] - Train Loss: 149.0445, Train Acc: 0.7595


                                                                           

Validation Acc: 0.1546
Accuracy for class 0: 0.11%
Accuracy for class 1: 51.65%
Accuracy for class 2: 67.68%
Accuracy for class 3: 31.34%
Accuracy for class 4: 52.00%


                                                                         


Epoch [4/20] - Train Loss: 142.6888, Train Acc: 0.7811


                                                                           

Validation Acc: 0.1570
Accuracy for class 0: 1.08%
Accuracy for class 1: 49.45%
Accuracy for class 2: 64.36%
Accuracy for class 3: 32.84%
Accuracy for class 4: 58.00%


                                                                         


Epoch [5/20] - Train Loss: 137.1742, Train Acc: 0.7948


                                                                           

Validation Acc: 0.1642
Accuracy for class 0: 1.24%
Accuracy for class 1: 35.16%
Accuracy for class 2: 75.97%
Accuracy for class 3: 31.34%
Accuracy for class 4: 58.00%


                                                                         


Epoch [6/20] - Train Loss: 136.4299, Train Acc: 0.7954


                                                                           

Validation Acc: 0.1774
Accuracy for class 0: 2.60%
Accuracy for class 1: 35.16%
Accuracy for class 2: 80.11%
Accuracy for class 3: 22.39%
Accuracy for class 4: 56.00%


                                                                         


Epoch [7/20] - Train Loss: 131.7641, Train Acc: 0.8094


                                                                           

Validation Acc: 0.2164
Accuracy for class 0: 8.77%
Accuracy for class 1: 39.56%
Accuracy for class 2: 72.10%
Accuracy for class 3: 32.84%
Accuracy for class 4: 52.00%


                                                                         


Epoch [8/20] - Train Loss: 133.9250, Train Acc: 0.8006


                                                                           

Validation Acc: 0.2260
Accuracy for class 0: 9.74%
Accuracy for class 1: 38.46%
Accuracy for class 2: 75.14%
Accuracy for class 3: 28.36%
Accuracy for class 4: 52.00%


                                                                         


Epoch [9/20] - Train Loss: 130.0829, Train Acc: 0.8134


                                                                           

Validation Acc: 0.2635
Accuracy for class 0: 14.94%
Accuracy for class 1: 41.21%
Accuracy for class 2: 72.65%
Accuracy for class 3: 29.85%
Accuracy for class 4: 54.00%


                                                                          


Epoch [10/20] - Train Loss: 133.8055, Train Acc: 0.8038


                                                                            

Validation Acc: 0.2491
Accuracy for class 0: 13.15%
Accuracy for class 1: 35.71%
Accuracy for class 2: 75.14%
Accuracy for class 3: 28.36%
Accuracy for class 4: 52.00%


                                                                          


Epoch [11/20] - Train Loss: 128.2669, Train Acc: 0.8243


                                                                            

Validation Acc: 0.2595
Accuracy for class 0: 14.56%
Accuracy for class 1: 39.56%
Accuracy for class 2: 73.20%
Accuracy for class 3: 28.36%
Accuracy for class 4: 52.00%


                                                                          


Epoch [12/20] - Train Loss: 129.8935, Train Acc: 0.8122


                                                                            

Validation Acc: 0.2658
Accuracy for class 0: 15.42%
Accuracy for class 1: 41.21%
Accuracy for class 2: 72.10%
Accuracy for class 3: 29.85%
Accuracy for class 4: 52.00%


                                                                          


Epoch [13/20] - Train Loss: 126.4930, Train Acc: 0.8203


                                                                            

Validation Acc: 0.2539
Accuracy for class 0: 13.74%
Accuracy for class 1: 33.52%
Accuracy for class 2: 76.24%
Accuracy for class 3: 29.85%
Accuracy for class 4: 52.00%


Epoch 14/20:  56%|█████▌    | 88/157 [00:22<00:14,  4.81it/s, loss=1.1]  

In [None]:
#model_path = "DR_DenseNet_Model.pth"
#torch.save(model.state_dict(), model_path)

## Test Set Experiments

We want to determine if the enhanced reject quality images have an effect on the accuracy of DenseNet121 model trained on the EyeQ dataset. We begin by defining the test dataset import object that will help load the datasets. 

In [None]:
import pandas as pd # for CSV from EyeQ dataset
import torch
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import os

# test image loader
class TestDataset(Dataset):
    def __init__(self, csv_file, root_dir, transform=None):
        self.root_dir = root_dir
        self.transform = transform
        df = pd.read_csv(csv_file)
        
        # necessary since enhanced images are PNG and rejects are jpeg
        def find_existing_file(image_name):
            base_name = os.path.splitext(image_name)[0]
            for ext in ['.jpeg', '.png']:
                candidate = os.path.join(root_dir, base_name + ext)
                if os.path.exists(candidate):
                    return candidate
            return None

        # keep only rows with images in the directory
        df['full_path'] = df['image'].apply(find_existing_file)
        df = df[df['full_path'].notnull()]

        # reset index
        self.annotations = df.reset_index(drop=True)

    def __len__(self):
        return len(self.annotations)

    def __getitem__(self, index):
        # get image path
        img_path = self.annotations.iloc[index]['full_path']
        # convert image to RGB
        image = Image.open(img_path).convert("RGB")
        # add label according to EyeQ dataset
        label = self.annotations.iloc[index]['DR_grade']

        # transform
        if self.transform:
            image = self.transform(image)

        return image, label



In [None]:

# transform to imagenet standard
test_transforms = transforms.Compose([
    transforms.Resize((224, 224)),   # imagenet size
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# load reject images (all quality 0)
reject_test_dataset = TestDataset(csv_file='Label_EyeQ_test.csv', root_dir='test/original/Reject', transform=test_transforms)
reject_test_loader = DataLoader(reject_test_dataset, batch_size=64, shuffle=False)

# load images enhanced with Cofe-Net
enhanced_test_dataset = TestDataset(csv_file='Label_EyeQ_test.csv', root_dir='test/enhanced', transform=test_transforms)
enhanced_test_loader = DataLoader(enhanced_test_dataset, batch_size=64, shuffle=False)

print("Test sets defined")

In [None]:
import matplotlib.pyplot as plt

reject_labels = reject_test_dataset.annotations['DR_grade']
reject_class_counts = reject_labels.value_counts().sort_index()

enhanced_labels = enhanced_test_dataset.annotations['DR_grade']
enhanced_class_counts = enhanced_labels.value_counts().sort_index()

# rejects
plt.figure(figsize=(8, 6))
plt.bar(reject_class_counts.index, reject_class_counts.values, color='skyblue')
plt.xlabel('Class Label')
plt.ylabel('Number of Samples')
plt.title('Class Distribution - Reject Fundus Images Test Set')
plt.xticks(reject_class_counts.index)
plt.grid(axis='y')
plt.show()

# enhanced
plt.figure(figsize=(8, 6))
plt.bar(enhanced_class_counts.index, enhanced_class_counts.values, color='skyblue')
plt.xlabel('Class Label')
plt.ylabel('Number of Samples')
plt.title('Class Distribution - Enhanced Fundus Images Test Set')
plt.xticks(enhanced_class_counts.index)
plt.grid(axis='y')
plt.show()


In [None]:

model.eval()
reject_all_preds = []
reject_all_labels = []

reject_correct_per_class = torch.zeros(5)
reject_total_per_class = torch.zeros(5)

with torch.no_grad():
    reject_test_iterator = tqdm(reject_test_loader, desc="Testing reject images...", leave=False)
    reject_test_correct = 0
    reject_test_total = 0
    for images, labels in reject_test_iterator:
        images = images.to(device)
        labels = labels.to(device)

        outputs = model(images)
        _, preds = torch.max(outputs, 1)
        
        # classwise accuracy
        for label, pred in zip(labels, preds):
            reject_total_per_class[label] += 1
            if label == pred:
                reject_correct_per_class[label] += 1
        
        reject_test_total += labels.size(0)
        reject_test_correct += (preds == labels).sum().item()

        reject_all_preds.extend(preds.cpu().numpy())
        reject_all_labels.extend(labels.cpu().numpy())

reject_test_accuracy = reject_test_correct / reject_test_total
print(f"Reject Test Acc: {reject_test_accuracy:.4f}")

for i in range(5):
    if reject_total_per_class[i] > 0:
        acc = 100 * reject_correct_per_class[i] / reject_total_per_class[i]
        print(f"Accuracy for class {i}: {acc:.2f}%")
    else:
        print(f"Class {i} has no samples.")
        
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

reject_cm = confusion_matrix(reject_all_labels, reject_all_preds)
reject_disp = ConfusionMatrixDisplay(confusion_matrix=reject_cm, display_labels=[0, 1, 2, 3, 4])
reject_disp.plot(cmap='Blues')
plt.title('Confusion Matrix')
plt.show()


In [None]:
model.eval()
enhanced_all_preds = []
enhanced_all_labels = []

enhanced_correct_per_class = torch.zeros(5)
enhanced_total_per_class = torch.zeros(5)

with torch.no_grad():
    enhanced_test_iterator = tqdm(enhanced_test_loader, desc="Testing enhanced images...", leave=False)
    enhanced_test_correct = 0
    enhanced_test_total = 0
    for images, labels in enhanced_test_iterator:
        images = images.to(device)
        labels = labels.to(device)

        outputs = model(images)
        _, preds = torch.max(outputs, 1)
        
        # classwise accuracy
        for label, pred in zip(labels, preds):
            enhanced_total_per_class[label] += 1
            if label == pred:
                enhanced_correct_per_class[label] += 1
        
        enhanced_test_total += labels.size(0)
        enhanced_test_correct += (preds == labels).sum().item()

        enhanced_all_preds.extend(preds.cpu().numpy())
        enhanced_all_labels.extend(labels.cpu().numpy())

enhanced_test_accuracy = enhanced_test_correct / enhanced_test_total
print(f"Enhanced Test Acc: {enhanced_test_accuracy:.4f}")

for i in range(5):
    if enhanced_total_per_class[i] > 0:
        acc = 100 * enhanced_correct_per_class[i] / enhanced_total_per_class[i]
        print(f"Accuracy for class {i}: {acc:.2f}%")
    else:
        print(f"Class {i} has no samples.")

enhanced_cm = confusion_matrix(enhanced_all_labels, enhanced_all_preds)
enhanced_disp = ConfusionMatrixDisplay(confusion_matrix=enhanced_cm, display_labels=[0, 1, 2, 3, 4])
enhanced_disp.plot(cmap='Blues')
plt.title('Confusion Matrix')
plt.show()