<a href="https://colab.research.google.com/github/tinayiluo0322/Explainable-Autonomous-Driving-Object-Detection-Using-Yolov5-Model-and-D-RISE-XAI-Technique/blob/main/Explainable_Autonomous_Driving_Object_Detection_Using_Yolov5_Model_and_D_RISE_XAI_Technique_(Clear_All_Outputs).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AIPI 590 - XAI | XAI with Yolov5 for Object Detection in Autonomous Driving

### Explainable Autonomous Driving Object Detection Using Yolov5 Model and D-RISE XAI Technique
### Luopeiwen Yi

## Introduction

This notebook delves into the interpretability of object detection models, specifically within the context of autonomous driving. In this study, I use the YOLOv5 model, a state-of-the-art object detection algorithm, to analyze two sample images from the widely used KITTI dataset, which captures real-world driving scenarios, including various objects like cars and pedestrians. The KITTI dataset provides a diverse and challenging environment, making it ideal for evaluating object detection models in autonomous driving tasks.

The notebook is structured into two main phases. First, I test the performance of a pre-trained YOLOv5 model by running inference on the two selected images. This initial evaluation assesses YOLOv5’s ability to accurately detect and localize key objects in complex driving scenes. Next, I apply the D-RISE (Detector Randomized Input Sampling for Explanation) technique to generate visual explanations for YOLOv5's predictions on these images. D-RISE works by masking random parts of the input images and measuring the impact on detection confidence, ultimately producing heatmaps that reveal the most influential regions for each detected object.

By combining YOLOv5’s detection capabilities with the interpretability offered by D-RISE, this notebook provides a comprehensive analysis of the model’s decision-making process. The goal is to enhance transparency and trust in object detection models used in autonomous vehicles, offering insights into how the model prioritizes different regions of an image during detection. This can lead to improved safety and reliability in real-world driving applications, making the integration of explainable AI techniques critical in autonomous driving systems.

In [None]:
!git clone https://github.com/ultralytics/yolov5
%cd yolov5
!pip install -r requirements.txt

In [None]:
import os
import shutil
import torch
import random
import numpy as np
from torch.utils.data import Dataset, DataLoader
from PIL import Image
from torchvision import transforms
import json
!pip install lime
from lime import lime_image
from skimage.segmentation import mark_boundaries
import matplotlib.pyplot as plt
from IPython.display import Image, display
!pip install Pillow
import cv2
from google.colab.patches import cv2_imshow
!pip install shap
import shap
import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as plt
!pip install alibi
from alibi.explainers import AnchorImage
from google.colab import output
output.enable_custom_widget_manager()
from google.colab import drive
import os
import torch
import cv2
from google.colab.patches import cv2_imshow
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Image, display
import random
import json
import random

import torch
import torch.backends.cudnn as cudnn
import torchvision
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import shap
import cv2
!pip install utils
from utils.general import non_max_suppression, box_iou
from tqdm import tqdm

Check GPU Availability

In [None]:
# Ensure that all operations are deterministic on GPU (if used) for reproducibility
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

# Fetching the device that will be used throughout this notebook
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

if torch.cuda.is_available():
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    print("Using CPU")

Set Seed

In [None]:
# Set a global random seed
GLOBAL_SEED = 42

def set_global_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

# Set the global seed
set_global_seed(GLOBAL_SEED)

Access Original KITTI Training Data

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Set the path to the KITTI dataset folder in Google Drive
dataset_path = '/content/drive/MyDrive/KITTI_Dataset/'

# List the files in the dataset folder to confirm upload
if os.path.isdir(dataset_path):
    print(f"Dataset directory contents: {os.listdir(dataset_path)}")
else:
    print(f"Directory {dataset_path} not found")

## Pre-trained YoloV5 Performance on Sample KITTI Dataset Image

Since We are just using the pre-trained YoloV5, we are going to directly test it's performance on the smaple images from the KITTI's original training dataset.

In [None]:
# Load the pretrained YOLOv5 model
model = torch.hub.load('ultralytics/yolov5', 'yolov5m', pretrained=True)

# Set model to evaluation mode
model.eval()

Image 007480

In [None]:
# Load the image using OpenCV
image_path_007480 = '/content/drive/MyDrive/KITTI_Dataset/original_training/training_images/007480.png'
image_007480 = cv2.imread(image_path_007480)

# Display the image
cv2_imshow(image_007480)

In [None]:
# Convert the image to a PyTorch format using model input processing
results_007480 = model(image_007480)

In [None]:
# Run the detection on a sample image with custom output location
!python detect.py --weights yolov5s.pt --img 640 --conf 0.25 --source /content/drive/MyDrive/KITTI_Dataset/original_training/training_images/007480.png --project /content/drive/MyDrive/KITTI_Dataset/sample_detection --name yolov5_results --exist-ok

In [None]:
# List all files in the detection result folder to find the correct image
output_dir = '/content/drive/MyDrive/KITTI_Dataset/sample_detection/yolov5_results'
print(os.listdir(output_dir))

In [None]:
# Update the path to the correct folder where YOLOv5 saved the detection results
display(Image(filename='/content/drive/MyDrive/KITTI_Dataset/sample_detection/yolov5_results/007480.png'))

Image 000015

In [None]:
# Load the image using OpenCV
image_path_000015 = '/content/drive/MyDrive/KITTI_Dataset/original_training/training_images/000015.png'
image_000015 = cv2.imread(image_path_000015)

# Display the image
cv2_imshow(image_000015)

In [None]:
# Convert the image to a PyTorch format using model input processing
results_000015 = model(image_000015)

In [None]:
# Run the detection on a sample image with custom output location
!python detect.py --weights yolov5s.pt --img 640 --conf 0.25 --source /content/drive/MyDrive/KITTI_Dataset/original_training/training_images/000015.png --project /content/drive/MyDrive/KITTI_Dataset/sample_detection --name yolov5_results --exist-ok

In [None]:
# List all files in the detection result folder to find the correct image
output_dir = '/content/drive/MyDrive/KITTI_Dataset/sample_detection/yolov5_results'
print(os.listdir(output_dir))

In [None]:
# Update the path to the correct folder where YOLOv5 saved the detection results
display(Image(filename='/content/drive/MyDrive/KITTI_Dataset/sample_detection/yolov5_results/000015.png'))

## XAI Technique with D-RISE

D-RISE

[Black-box Explanation of Object Detectors via Saliency Maps](https://arxiv.org/pdf/2006.03204)

[D-RISE usage demo with Faster R-CNN](https://github.com/hysts/pytorch_D-RISE/blob/main/demo.ipynb)

D-RISE (Detector Randomized Input Sampling for Explanation) is an explainability technique specifically designed for object detection models. It builds upon the original RISE method, which creates visual explanations for image classifiers by analyzing how randomly masked versions of the input image affect the model’s output. D-RISE extends this concept to object detection by generating heatmaps that highlight the importance of different regions in an image for detecting specific objects. By randomly masking parts of the input and observing the resulting changes in detection confidence, D-RISE provides insights into which parts of the image are most influential for each detected object.

Here we implement this technique to help explain the decisions made by YOLOv5, aiming to make the models more interpretable and trustworthy for critical applications like autonomous driving.

Get the Dataset Label for Visualization

In [None]:
class_mapping = {
    0: 'Pedestrian', 1: 'Cyclist', 2: 'Car'
}

Application of D-RISE on Image 007480

In [None]:
image_007480 = cv2.cvtColor(image_007480, cv2.COLOR_BGR2RGB)  # Convert to RGB
cv2_imshow(image_007480)

In [None]:
# YOLOv5 Inference
results = model(image_007480)
detections = results.xywh[0]  # Extract bounding boxes (xywh format) and confidence

# Convert YOLOv5 bounding boxes (x_center, y_center, width, height) to (xmin, ymin, xmax, ymax)
def yolo_to_corner(bbox):
    x_center, y_center, width, height = bbox[:4]
    xmin = x_center - (width / 2)
    ymin = y_center - (height / 2)
    xmax = x_center + (width / 2)
    ymax = y_center + (height / 2)
    return [xmin, ymin, xmax, ymax]

# D-RISE mask generation
def generate_mask(image_size, grid_size, prob_thresh):
    image_w, image_h = image_size
    grid_w, grid_h = grid_size
    cell_w, cell_h = np.ceil(image_w / grid_w).astype(int), np.ceil(image_h / grid_h).astype(int)
    up_w, up_h = (grid_w + 1) * cell_w, (grid_h + 1) * cell_h

    mask = (np.random.uniform(0, 1, size=(grid_h, grid_w)) < prob_thresh).astype(np.float32)
    mask = cv2.resize(mask, (up_w, up_h), interpolation=cv2.INTER_LINEAR)

    offset_w = np.random.randint(0, cell_w)
    offset_h = np.random.randint(0, cell_h)
    mask = mask[offset_h:offset_h + image_h, offset_w:offset_w + image_w]
    return mask

# Apply mask to image
def mask_image(image, mask):
    return ((image.astype(np.float32) / 255 * np.dstack([mask] * 3)) * 255).astype(np.uint8)

# IOU Function
def iou(box1, box2):
    box1 = np.asarray(box1)
    box2 = np.asarray(box2)
    tl = np.maximum(box1[:2], box2[:2])
    br = np.minimum(box1[2:], box2[2:])
    intersection = np.prod(np.maximum(br - tl, 0))
    area1 = np.prod(box1[2:] - box1[:2])
    area2 = np.prod(box2[2:] - box2[:2])
    return intersection / (area1 + area2 - intersection)

def generate_saliency_map(image, target_box, prob_thresh=0.5, grid_size=(16, 16), n_masks=1000, seed=0):
    np.random.seed(seed)
    image_h, image_w = image.shape[:2]
    saliency_map = np.zeros((image_h, image_w), dtype=np.float32)

    for _ in tqdm(range(n_masks)):
        mask = generate_mask(image_size=(image_w, image_h), grid_size=grid_size, prob_thresh=prob_thresh)
        masked_image = mask_image(image, mask)

        # Perform inference on masked image
        results_masked = model(masked_image)
        preds = results_masked.xywh[0].cpu().numpy()

        # Check if the bounding box overlaps with the target_box
        score = 0
        for pred in preds:
            pred_box = yolo_to_corner(pred[:4])
            iou_score = iou(target_box, pred_box)
            score = max(score, iou_score * pred[4])  # pred[4] is the confidence score

        saliency_map += mask * score

    return saliency_map

In [None]:
# Convert YOLOv5 bounding boxes to the corner format and move to CPU
if len(detections) > 0:
    # Create a figure with subplots
    n_detections = len(detections)
    fig, axes = plt.subplots(n_detections, 3, figsize=(15, 5 * n_detections))
    if n_detections == 1:
        axes = axes.reshape(1, -1)

    for idx, detection in enumerate(detections):
        target_box = yolo_to_corner(detection[:4].cpu().numpy())
        class_id = int(detection[5].item())  # Get the class ID
        class_name = class_mapping.get(class_id, f'Unknown_{class_id}')  # Get the class name
        confidence = detection[4].item()  # Get the confidence score

        # Generate the saliency map for the detected object
        saliency_map = generate_saliency_map(image_007480, target_box, prob_thresh=0.5, grid_size=(16, 16), n_masks=1000)

        # Normalize saliency map
        saliency_map = (saliency_map - saliency_map.min()) / (saliency_map.max() - saliency_map.min())

        # Overlay the saliency map on the image
        image_with_bbox = image_007480.copy()
        cv2.rectangle(image_with_bbox, tuple(map(int, target_box[:2])), tuple(map(int, target_box[2:])), (0, 255, 0), 2)

        # Add text to the image
        label = f'{class_name} {confidence:.2f}'
        cv2.putText(image_with_bbox, label, (int(target_box[0]), int(target_box[1] - 10)),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)

        # Original Image with bounding box
        axes[idx, 0].imshow(image_with_bbox)
        axes[idx, 0].set_title(f'Object {idx+1}: {class_name}')
        axes[idx, 0].axis('off')

        # Saliency Map
        axes[idx, 1].imshow(saliency_map, cmap='jet')
        axes[idx, 1].set_title(f'Object {idx+1}: {class_name} Saliency Map')
        axes[idx, 1].axis('off')

        # Saliency Map Overlay
        axes[idx, 2].imshow(image_with_bbox)
        axes[idx, 2].imshow(saliency_map, cmap='jet', alpha=0.5)
        axes[idx, 2].set_title(f'Object {idx+1}: {class_name} Saliency Map Overlay')
        axes[idx, 2].axis('off')

    plt.tight_layout()

    # Save the figure as an image file
    output_path = '/content/drive/MyDrive/KITTI_Dataset/d_rise_output_multiple_007480_with_labels.png'
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    plt.close(fig)

    print(f"Output image saved to: {output_path}")
    print(f"Number of objects detected: {n_detections}")

    # display the saved image
    from IPython.display import Image, display
    display(Image(filename=output_path))
else:
    print("No detections found in the image.")

Resize the image

In [None]:
image_007480_resized = cv2.resize(image_007480, (640, 640))
cv2_imshow(image_007480_resized)

In [None]:
# YOLOv5 Inference
results = model(image_007480_resized)
detections = results.xywh[0]  # Extract bounding boxes (xywh format) and confidence

In [None]:
# Convert YOLOv5 bounding boxes to the corner format and move to CPU
if len(detections) > 0:
    # Create a figure with subplots
    n_detections = len(detections)
    fig, axes = plt.subplots(n_detections, 3, figsize=(15, 5 * n_detections))
    if n_detections == 1:
        axes = axes.reshape(1, -1)

    for idx, detection in enumerate(detections):
        target_box = yolo_to_corner(detection[:4].cpu().numpy())
        class_id = int(detection[5].item())  # Get the class ID
        class_name = class_mapping.get(class_id, f'Unknown_{class_id}')  # Get the class name
        confidence = detection[4].item()  # Get the confidence score

        # Generate the saliency map for the detected object
        saliency_map = generate_saliency_map(image_007480_resized, target_box, prob_thresh=0.5, grid_size=(16, 16), n_masks=1000)

        # Normalize saliency map
        saliency_map = (saliency_map - saliency_map.min()) / (saliency_map.max() - saliency_map.min())

        # Overlay the saliency map on the image
        image_with_bbox = image_007480_resized.copy()
        cv2.rectangle(image_with_bbox, tuple(map(int, target_box[:2])), tuple(map(int, target_box[2:])), (0, 255, 0), 2)

        # Add text to the image
        label = f'{class_name} {confidence:.2f}'
        cv2.putText(image_with_bbox, label, (int(target_box[0]), int(target_box[1] - 10)),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)

        # Original Image with bounding box
        axes[idx, 0].imshow(image_with_bbox)
        axes[idx, 0].set_title(f'Object {idx+1}: {class_name}')
        axes[idx, 0].axis('off')

        # Saliency Map
        axes[idx, 1].imshow(saliency_map, cmap='jet')
        axes[idx, 1].set_title(f'Object {idx+1}: {class_name} Saliency Map')
        axes[idx, 1].axis('off')

        # Saliency Map Overlay
        axes[idx, 2].imshow(image_with_bbox)
        axes[idx, 2].imshow(saliency_map, cmap='jet', alpha=0.5)
        axes[idx, 2].set_title(f'Object {idx+1}: {class_name} Saliency Map Overlay')
        axes[idx, 2].axis('off')

    plt.tight_layout()

    # Save the figure as an image file
    output_path = '/content/drive/MyDrive/KITTI_Dataset/d_rise_output_multiple_007480_resize_with_labels.png'
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    plt.close(fig)

    print(f"Output image saved to: {output_path}")
    print(f"Number of objects detected: {n_detections}")

    # display the saved image
    from IPython.display import Image, display
    display(Image(filename=output_path))
else:
    print("No detections found in the image.")

Application of D-RISE on Image 000015

In [None]:
image_000015 = cv2.cvtColor(image_000015, cv2.COLOR_BGR2RGB)  # Convert to RGB
cv2_imshow(image_000015)

In [None]:
# YOLOv5 Inference
results = model(image_000015)
detections = results.xywh[0]  # Extract bounding boxes (xywh format) and confidence

In [None]:
# Convert YOLOv5 bounding boxes to the corner format and move to CPU
if len(detections) > 0:
    # Create a figure with subplots
    n_detections = len(detections)
    fig, axes = plt.subplots(n_detections, 3, figsize=(15, 5 * n_detections))
    if n_detections == 1:
        axes = axes.reshape(1, -1)

    for idx, detection in enumerate(detections):
        target_box = yolo_to_corner(detection[:4].cpu().numpy())
        class_id = int(detection[5].item())  # Get the class ID
        class_name = class_mapping.get(class_id, f'Unknown_{class_id}')  # Get the class name
        confidence = detection[4].item()  # Get the confidence score

        # Generate the saliency map for the detected object
        saliency_map = generate_saliency_map(image_000015, target_box, prob_thresh=0.5, grid_size=(16, 16), n_masks=1000)

        # Normalize saliency map
        saliency_map = (saliency_map - saliency_map.min()) / (saliency_map.max() - saliency_map.min())

        # Overlay the saliency map on the image
        image_with_bbox = image_000015.copy()
        cv2.rectangle(image_with_bbox, tuple(map(int, target_box[:2])), tuple(map(int, target_box[2:])), (0, 255, 0), 2)

        # Add text to the image
        label = f'{class_name} {confidence:.2f}'
        cv2.putText(image_with_bbox, label, (int(target_box[0]), int(target_box[1] - 10)),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)

        # Original Image with bounding box
        axes[idx, 0].imshow(image_with_bbox)
        axes[idx, 0].set_title(f'Object {idx+1}: {class_name}')
        axes[idx, 0].axis('off')

        # Saliency Map
        axes[idx, 1].imshow(saliency_map, cmap='jet')
        axes[idx, 1].set_title(f'Object {idx+1}: {class_name} Saliency Map')
        axes[idx, 1].axis('off')

        # Saliency Map Overlay
        axes[idx, 2].imshow(image_with_bbox)
        axes[idx, 2].imshow(saliency_map, cmap='jet', alpha=0.5)
        axes[idx, 2].set_title(f'Object {idx+1}: {class_name} Saliency Map Overlay')
        axes[idx, 2].axis('off')

    plt.tight_layout()

    # Save the figure as an image file
    output_path = '/content/drive/MyDrive/KITTI_Dataset/d_rise_output_multiple_000015_with_labels.png'
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    plt.close(fig)

    print(f"Output image saved to: {output_path}")
    print(f"Number of objects detected: {n_detections}")

# display the saved image
    from IPython.display import Image, display
    display(Image(filename=output_path))
else:
    print("No detections found in the image.")

Resize the Image

In [None]:
image_000015_resized = cv2.resize(image_000015, (640, 640))
cv2_imshow(image_000015_resized)

In [None]:
# YOLOv5 Inference
results = model(image_000015_resized)
detections = results.xywh[0]  # Extract bounding boxes (xywh format) and confidence

In [None]:
# Convert YOLOv5 bounding boxes to the corner format and move to CPU
if len(detections) > 0:
    # Create a figure with subplots
    n_detections = len(detections)
    fig, axes = plt.subplots(n_detections, 3, figsize=(15, 5 * n_detections))
    if n_detections == 1:
        axes = axes.reshape(1, -1)

    for idx, detection in enumerate(detections):
        target_box = yolo_to_corner(detection[:4].cpu().numpy())
        class_id = int(detection[5].item())  # Get the class ID
        class_name = class_mapping.get(class_id, f'Unknown_{class_id}')  # Get the class name
        confidence = detection[4].item()  # Get the confidence score

        # Generate the saliency map for the detected object
        saliency_map = generate_saliency_map(image_000015_resized, target_box, prob_thresh=0.5, grid_size=(16, 16), n_masks=1000)

        # Normalize saliency map
        saliency_map = (saliency_map - saliency_map.min()) / (saliency_map.max() - saliency_map.min())

        # Overlay the saliency map on the image
        image_with_bbox = image_000015_resized.copy()
        cv2.rectangle(image_with_bbox, tuple(map(int, target_box[:2])), tuple(map(int, target_box[2:])), (0, 255, 0), 2)

        # Add text to the image
        label = f'{class_name} {confidence:.2f}'
        cv2.putText(image_with_bbox, label, (int(target_box[0]), int(target_box[1] - 10)),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)

        # Original Image with bounding box
        axes[idx, 0].imshow(image_with_bbox)
        axes[idx, 0].set_title(f'Object {idx+1}: {class_name}')
        axes[idx, 0].axis('off')

        # Saliency Map
        axes[idx, 1].imshow(saliency_map, cmap='jet')
        axes[idx, 1].set_title(f'Object {idx+1}: {class_name} Saliency Map')
        axes[idx, 1].axis('off')

        # Saliency Map Overlay
        axes[idx, 2].imshow(image_with_bbox)
        axes[idx, 2].imshow(saliency_map, cmap='jet', alpha=0.5)
        axes[idx, 2].set_title(f'Object {idx+1}: {class_name} Saliency Map Overlay')
        axes[idx, 2].axis('off')

    plt.tight_layout()

    # Save the figure as an image file
    output_path = '/content/drive/MyDrive/KITTI_Dataset/d_rise_output_multiple_000015_resize_with_labels.png'
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    plt.close(fig)

    print(f"Output image saved to: {output_path}")
    print(f"Number of objects detected: {n_detections}")

    # display the saved image
    from IPython.display import Image, display
    display(Image(filename=output_path))
else:
    print("No detections found in the image.")