-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Correct confusion matrix calculation-function evaluate_detection_batch #1853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
67cc5a7
to
16c1070
Compare
cd8ce64
to
e56ee44
Compare
For evaluating the confusion matrix you can use the following code:
I've confirmed that it works with many examples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @panagiotamoraiti, thanks for the PR!
I have a few comments regarding your logic, it would be great if you can address them and add some unit tests.
) | ||
# For each GT, find best matching detection (highest IoU > threshold) | ||
for gt_idx, gt_class in enumerate(true_classes): | ||
candidate_det_idxs = np.where(iou_batch[gt_idx] > iou_threshold)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The selection of the best match is happening based solely on IoU, which means a wrong-class prediction can still be chosen over a right-class one if it has a higher IoU.
# For each GT, find best matching detection (highest IoU > threshold) | ||
for gt_idx, gt_class in enumerate(true_classes): | ||
candidate_det_idxs = np.where(iou_batch[gt_idx] > iou_threshold)[0] | ||
|
||
for i, true_class_value in enumerate(true_classes): | ||
j = matched_true_idx == i | ||
if matches.shape[0] > 0 and sum(j) == 1: | ||
result_matrix[ | ||
true_class_value, detection_classes[matched_detection_idx[j]] | ||
] += 1 # TP | ||
if len(candidate_det_idxs) == 0: | ||
# No matching detection → FN for this GT | ||
result_matrix[gt_class, num_classes] += 1 | ||
continue | ||
|
||
best_det_idx = candidate_det_idxs[ | ||
np.argmax(iou_batch[gt_idx, candidate_det_idxs]) | ||
] | ||
det_class = detection_classes[best_det_idx] | ||
|
||
if best_det_idx not in matched_det_idx: | ||
# Count as matched regardless of class: | ||
# same class → TP, different class → misclassification | ||
result_matrix[gt_class, det_class] += 1 | ||
matched_gt_idx.add(gt_idx) | ||
matched_det_idx.add(best_det_idx) | ||
else: | ||
result_matrix[true_class_value, num_classes] += 1 # FN | ||
# Detection already matched, GT is FN | ||
result_matrix[gt_class, num_classes] += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this logic iterates through ground truth boxes and for each one finds the best-matching detection box, i.e, the one with highest IoU above the threshold, that hasn't been matched yet.
The issue with this logic is that the matching process depends on the order of the ground truth boxes in true_classes
. So, if a single detection box has a high IoU with multiple ground truth boxes, it will be matched with the first ground truth box that is processed which can lead to inconsistent and incorrect confusion matrices, as the result will vary depending on the order of ground truths in the input data.
Description
This fixes the issue where predicted bounding boxes were matched to ground truth boxes solely based on IoU, without considering class agreement during the matching process. Currently, if a predicted box has a higher IoU but the wrong class, it gets matched first, and the correct prediction with the right class but lower IoU is discarded. This leads to miscounting true positives and false positives, resulting in inaccurate confusion matrix.
The change modifies the matching logic (method evaluate_detection_batch) to incorporate both IoU and class agreement simultaneously, ensuring only predictions that match both IoU threshold and class are matched to ground truths. This results in a correct confusion matrix.
Type of change
How has this change been tested, please provide a testcase or example of how you tested the change?
I had an image with 2 TP and 1 FP detections, but the confusion matrix predicted 1 TP, 2 FP and 1FN. The FP bbox with the wrong class had higher overlap so the TP was discarded. At the end also this bbox was discareded due to the wrong class id. Now my confusion matrix predicts correctly 2 TP and 1 FP detections.
I run this in a big dataset, another script i have developed and used extensively in previous project gives the following results that now match with the confusion matrix, before i corrected them they didn't match.
Test Set:
Ground Truth Objects: 481
True Positives: 469
False Positives: 11
False Negatives: 12
Validation Set:
Ground Truth Objects: 1073
True Positives: 1037
False Positives: 23
False Negatives: 36
Train Set:
Ground Truth Objects: 3716
True Positives: 3674
False Positives: 52
False Negatives: 42