Skip to content

Correct confusion matrix calculation-function evaluate_detection_batch #1853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

panagiotamoraiti
Copy link

@panagiotamoraiti panagiotamoraiti commented May 27, 2025

Description

This fixes the issue where predicted bounding boxes were matched to ground truth boxes solely based on IoU, without considering class agreement during the matching process. Currently, if a predicted box has a higher IoU but the wrong class, it gets matched first, and the correct prediction with the right class but lower IoU is discarded. This leads to miscounting true positives and false positives, resulting in inaccurate confusion matrix.

The change modifies the matching logic (method evaluate_detection_batch) to incorporate both IoU and class agreement simultaneously, ensuring only predictions that match both IoU threshold and class are matched to ground truths. This results in a correct confusion matrix.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

I had an image with 2 TP and 1 FP detections, but the confusion matrix predicted 1 TP, 2 FP and 1FN. The FP bbox with the wrong class had higher overlap so the TP was discarded. At the end also this bbox was discareded due to the wrong class id. Now my confusion matrix predicts correctly 2 TP and 1 FP detections.

I run this in a big dataset, another script i have developed and used extensively in previous project gives the following results that now match with the confusion matrix, before i corrected them they didn't match.

Test Set:
Ground Truth Objects: 481
True Positives: 469
False Positives: 11
False Negatives: 12

Validation Set:
Ground Truth Objects: 1073
True Positives: 1037
False Positives: 23
False Negatives: 36

Train Set:
Ground Truth Objects: 3716
True Positives: 3674
False Positives: 52
False Negatives: 42

@CLAassistant
Copy link

CLAassistant commented May 27, 2025

CLA assistant check
All committers have signed the CLA.

@panagiotamoraiti
Copy link
Author

panagiotamoraiti commented Jun 6, 2025

For evaluating the confusion matrix you can use the following code:

import numpy as np
import supervision as sv

# Define class names
class_names = ['cat', 'dog', 'rabbit']

# Ground truth detections (3 objects, one per class)
gt = sv.Detections(
    xyxy=np.array([
        [0, 0, 2, 2],   # cat
        [3, 3, 5, 5],   # dog
        [6, 6, 8, 8],   # rabbit
        [6, 15, 9, 16], # rabbit
        [2, 2, 3, 3],   # rabbit
    ]),
    class_id=np.array([0, 1, 2, 2, 2])
)

# Predicted detections (6 predictions)
preds = sv.Detections(
    xyxy=np.array([
        [0, 0, 2, 2],
        [3, 3, 5, 5], 
        [6, 6, 8, 8], 
        [9, 9, 11, 11], # FP 
        [10, 10, 12, 12], # FP
        [2, 2, 3, 3],  # confused rabbit as cat
    ]),
    class_id=np.array([0, 1, 2, 0, 1, 1]),  # note: rabbit GT predicted as cat (confused)
    confidence=np.array([0.9, 0.7, 0.8, 0.6, 0.7, 0.7])
)

# Generate confusion matrix
cm = sv.ConfusionMatrix.from_detections(
    predictions=[preds],
    targets=[gt],
    classes=class_names,
    conf_threshold=0.5,
    iou_threshold=0.5
)

print("Confusion Matrix:\n", cm.matrix)

I've confirmed that it works with many examples.

Copy link
Contributor

@soumik12345 soumik12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @panagiotamoraiti, thanks for the PR!
I have a few comments regarding your logic, it would be great if you can address them and add some unit tests.

)
# For each GT, find best matching detection (highest IoU > threshold)
for gt_idx, gt_class in enumerate(true_classes):
candidate_det_idxs = np.where(iou_batch[gt_idx] > iou_threshold)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The selection of the best match is happening based solely on IoU, which means a wrong-class prediction can still be chosen over a right-class one if it has a higher IoU.

Comment on lines +306 to +328
# For each GT, find best matching detection (highest IoU > threshold)
for gt_idx, gt_class in enumerate(true_classes):
candidate_det_idxs = np.where(iou_batch[gt_idx] > iou_threshold)[0]

for i, true_class_value in enumerate(true_classes):
j = matched_true_idx == i
if matches.shape[0] > 0 and sum(j) == 1:
result_matrix[
true_class_value, detection_classes[matched_detection_idx[j]]
] += 1 # TP
if len(candidate_det_idxs) == 0:
# No matching detection → FN for this GT
result_matrix[gt_class, num_classes] += 1
continue

best_det_idx = candidate_det_idxs[
np.argmax(iou_batch[gt_idx, candidate_det_idxs])
]
det_class = detection_classes[best_det_idx]

if best_det_idx not in matched_det_idx:
# Count as matched regardless of class:
# same class → TP, different class → misclassification
result_matrix[gt_class, det_class] += 1
matched_gt_idx.add(gt_idx)
matched_det_idx.add(best_det_idx)
else:
result_matrix[true_class_value, num_classes] += 1 # FN
# Detection already matched, GT is FN
result_matrix[gt_class, num_classes] += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this logic iterates through ground truth boxes and for each one finds the best-matching detection box, i.e, the one with highest IoU above the threshold, that hasn't been matched yet.

The issue with this logic is that the matching process depends on the order of the ground truth boxes in true_classes. So, if a single detection box has a high IoU with multiple ground truth boxes, it will be matched with the first ground truth box that is processed which can lead to inconsistent and incorrect confusion matrices, as the result will vary depending on the order of ground truths in the input data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants