### Fixes

* performance (`DetectionResults.detections_by_class` memoized)
* performance (`DetectionResults.num_gt_class` memoized)
* "crowd" GT detection handling (OK for T_IoU = 0.5 at least!)

### To do

* "crowd" GT detection handling
* per-class AP
* small / medium / large AP...

In [1]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from evaldets.api import *
from uo.utils import *

In [2]:
dr = DetectionResults('../reval_05/baseline_05/evaluator_dump_R101_101/', area_rng=None, iou_thresh=None, debug=0, cache=0)
dr.finish_cocoeval()

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.404
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.603
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.432
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.240
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.443
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.522
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.336
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.532
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.563
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.376
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.599
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.715


AveP for a single class, at 0.5 $T_{IoU}$.

In [3]:
np.mean(dr.coco.eval["precision"][0, :, 0, 0, -1])

0.8071831344949841

In [4]:
dr.average_precision('person')

0.8071831344949841

mAP.5 seems as good as it gets:

In [5]:
dr.mean_average_precision()

0.6025011979805461

In [6]:
np.mean(dr.coco.eval["precision"][0, :, :, 0, -1])

0.602501197980546

mAP.75 doesn't look so great:

In [7]:
np.mean(dr.coco.eval["precision"][5, :, :, 0, -1])

0.4318813315869904

In [8]:
dr.mean_average_precision(0.75)

0.4268057493526962

Why? Probably because of FPs becoming crowds...

In [9]:
TP = dr._tp_sum('person', 0.75)
FP = dr._fp_sum('person', 0.75)
TP_coco = dr.coco.eval['tp_sums'][0][5]
FP_coco = dr.coco.eval['fp_sums'][0][5]
(TP == TP_coco).all()

False

In [10]:
TP

array([1.000e+00, 2.000e+00, 3.000e+00, ..., 7.053e+03, 7.053e+03,
       7.053e+03])

In [11]:
TP_coco

array([1.000e+00, 2.000e+00, 3.000e+00, ..., 7.176e+03, 7.176e+03,
       7.176e+03])

In [12]:
from evaldets.visualization import show_detection

In [13]:
dets = dr.all_detections_by_class('person')
for i, d in enumerate(dets):
    if TP[i] != TP_coco[i]:
        break
i, TP[i], TP_coco[i], d

(3961,
 3562.0,
 3563.0,
 {'image_id': 12670,
  'score': 0.8061718940734863,
  'iou': 0.9999568597729189,
  'gt_id': 900100012670,
  'category': 'person',
  'x': 330.01025390625,
  'y': 292.5989990234375,
  'w': 108.38092041015625,
  'h': 134.40679931640625})

In [14]:
# show_detection(d)

In [15]:
eis = [x for x in dr.coco.evalImgs if x and x['image_id'] == 12670 and x['category_id'] == 1]
# len(eis), eis[0]

In [16]:
dr.coco.eval['dtigs'][0][5][i]

False

In [17]:
for i, d in enumerate(dets):
    if FP[i] != FP_coco[i]:
        break
i, FP[i], FP_coco[i], d

(1968,
 87.0,
 86.0,
 {'image_id': 143961,
  'score': 0.9081220030784607,
  'iou': 0.5864054907152463,
  'gt_id': 1203308,
  'category': 'person',
  'x': 129.54359436035156,
  'y': 150.02687072753906,
  'w': 159.7234649658203,
  'h': 203.3919219970703})

In [18]:
#show_detection(d)

In [19]:
dr.coco.eval['dtigs'][0][5][i]

True

Re-classifying TP / FP before mAP.75

In [20]:
dr.mean_average_precision(t_iou=0.75)

0.4268057493526962

In [21]:
dr.average_precision('person', t_iou=0.75)

0.5574921347420048

In [22]:
dr.match_detections(5)

In [23]:
dr.mean_average_precision(t_iou=0.75)

0.4318813315869905

In [24]:
TP = dr._tp_sum('person', 0.75)
FP = dr._fp_sum('person', 0.75)
TP_coco = dr.coco.eval['tp_sums'][0][5]
FP_coco = dr.coco.eval['fp_sums'][0][5]
(TP == TP_coco).all()

True

In [25]:
np.mean(dr.coco.eval["precision"][5, :, 0, 0, -1])

0.5667977368947763

In [26]:
dr.average_precision('person', t_iou=0.75)

0.5667977368947763

In [27]:
for name, idx in dr.names.items():
    cocoAveP = np.mean(dr.coco.eval["precision"][5, :, idx, 0, -1])
    drAveP = dr.average_precision(name, t_iou=0.75)
    if cocoAveP != drAveP:
        print(f"Error for {idx} {name}: {cocoAveP=} {drAveP=}")
