mAP bug at higher --conf #1466

glenn-jocher · 2020-11-21T11:43:52Z

A recent modification to the PR curve in pull request #1206 computation introduced a bug whereby mAP increases at higher --conf thresholds. This was caused by a change to the 'sentinel values' on the P and R vectors here:

    # Append sentinel values to beginning and end
    mrec = recall  # np.concatenate(([0.], recall, [recall[-1] + 1E-3]))
    mpre = precision  # np.concatenate(([0.], precision, [0.]))

The appropriate solution would be to reinstitute the old code, which drops the curves to zero after their last data point, or to interpolate it to zero at recall = 1. I'll experiment with both and implement a fix soon.

This does not affect any operations using the default test.py --conf 0.001, so I would imagine almost no users would be impacted by this, but it needs fixing in any case.

glenn-jocher · 2020-11-21T11:45:22Z

A third option would be to extrapolate the curves to zero based on their last known derivatives. I think np.interp has an option for this baked in, could be used in conjunction with np.clip(0,1).

glenn-jocher · 2020-12-08T15:21:12Z

Update on this. np.interp does not have built in extrapolation capability, we would need to mode to scipy for that, so I think I will simply turn back the clock on the code updates introduced in PR #1206

imyhxy · 2021-10-11T12:11:19Z

@glenn-jocher Hey man, current yolov5 codebase has this problem again. Can you solve this?

imyhxy · 2021-10-11T12:13:01Z

python val.py --weights weights/yolov5s.pt --data data/coco.yaml --verbose --name coco --conf 0.7
val: data=data/coco.yaml, weights=['weights/yolov5s.pt'], batch_size=32, imgsz=640, conf_thres=0.7, iou_thres=0.6, task=val, device=, single_cls=False, augment=False, verbose=True, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=coco, exist_ok=False, half=False
YOLOv5 🚀 gitlab-584-g6b4eb27 torch 1.9.0 CUDA:0 (NVIDIA GeForce RTX 2060, 5934MB)

Fusing layers... 
Model Summary: 224 layers, 7266973 parameters, 0 gradients
val: Scanning '11_mscoco/YOLO/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupted: 100%|██████████| 5000/5000 [00:00<?, ?it/s]
dataset: using NoneType
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|██████████| 157/157 [01:05<00:00,  2.39it/s]
                 all       5000      36335      0.902      0.238      0.572      0.461
              person       5000      10777      0.962      0.379      0.671      0.541
             bicycle       5000        314       0.98      0.153      0.568      0.432
                 car       5000       1918      0.918      0.276      0.599      0.485
          motorcycle       5000        367      0.953      0.166      0.561       0.45
            airplane       5000        143      0.984      0.434      0.711      0.621
                 bus       5000        283       0.95       0.47      0.718      0.627
               train       5000        190      0.986      0.363      0.674      0.564
               truck       5000        414      0.933     0.0676      0.502      0.395
                boat       5000        424       0.93     0.0943      0.514      0.371
       traffic light       5000        634      0.932      0.151      0.543      0.373
        fire hydrant       5000        101      0.983      0.564      0.778      0.658
           stop sign       5000         75      0.956      0.573      0.773      0.714
       parking meter       5000         60          1      0.383      0.692      0.553
               bench       5000        411      0.932     0.0998      0.519      0.419
                bird       5000        427      0.929      0.215      0.576      0.446
                 cat       5000        202      0.925      0.243      0.582      0.461
                 dog       5000        218      0.911       0.33      0.629      0.548
               horse       5000        272      0.942      0.474      0.706      0.579
               sheep       5000        354      0.873       0.37      0.628      0.512
                 cow       5000        372       0.92      0.341      0.639      0.528
            elephant       5000        252      0.877      0.536       0.68      0.551
                bear       5000         71      0.921      0.493      0.713      0.632
               zebra       5000        266      0.974       0.56      0.773      0.642
             giraffe       5000        232      0.973      0.612      0.798      0.678
            backpack       5000        371          1     0.0135      0.507      0.325
            umbrella       5000        407       0.89      0.179      0.536       0.43
             handbag       5000        540          1    0.00741      0.504      0.416
                 tie       5000        252      0.957      0.179      0.568      0.425
            suitcase       5000        299      0.955       0.14      0.547      0.451
             frisbee       5000        115      0.922      0.617      0.787      0.639
                skis       5000        241      0.952      0.083      0.518       0.39
           snowboard       5000         69      0.857      0.087      0.474      0.349
         sports ball       5000        260       0.91       0.35      0.634      0.504
                kite       5000        327      0.907      0.269      0.588      0.462
        baseball bat       5000        145          1      0.159      0.579      0.372
      baseball glove       5000        148      0.929      0.351      0.646      0.434
          skateboard       5000        179      0.915       0.48      0.711      0.533
           surfboard       5000        267       0.94      0.176       0.56      0.424
       tennis racket       5000        225      0.948      0.404       0.68      0.467
              bottle       5000       1013      0.935      0.155      0.546      0.444
          wine glass       5000        341      0.938      0.223       0.58      0.467
                 cup       5000        895      0.893      0.225       0.56      0.479
                fork       5000        215      0.952      0.093      0.523      0.425
               knife       5000        325      0.857     0.0185      0.439      0.366
               spoon       5000        253       0.75     0.0119      0.381       0.33
                bowl       5000        623      0.939      0.172      0.554      0.474
              banana       5000        370      0.909     0.0541      0.481      0.386
               apple       5000        236      0.727     0.0339      0.377      0.272
            sandwich       5000        177      0.778     0.0791      0.427      0.324
              orange       5000        285      0.774     0.0842      0.435      0.407
            broccoli       5000        312      0.929     0.0417      0.484      0.357
              carrot       5000        365      0.833     0.0274      0.432      0.359
             hot dog       5000        125          1      0.136      0.568      0.457
               pizza       5000        284       0.93      0.327      0.628      0.512
               donut       5000        328      0.832      0.241      0.538      0.489
                cake       5000        310      0.875      0.113      0.498       0.39
               chair       5000       1771      0.933      0.118      0.527       0.43
               couch       5000        261      0.914      0.203      0.559      0.475
        potted plant       5000        342      0.919     0.0994      0.507      0.371
                 bed       5000        163          1      0.092      0.546      0.413
        dining table       5000        695          1    0.00576      0.503      0.328
              toilet       5000        179      0.956       0.48      0.717      0.613
                  tv       5000        288      0.984      0.441      0.714      0.569
              laptop       5000        231      0.929      0.394       0.67      0.582
               mouse       5000        106      0.906      0.547       0.74       0.61
              remote       5000        283      0.846     0.0777      0.463      0.382
            keyboard       5000        153      0.912       0.34      0.639      0.507
          cell phone       5000        262      0.847      0.191      0.525      0.431
           microwave       5000         55      0.917        0.4      0.665      0.548
                oven       5000        143      0.897      0.182      0.544      0.427
             toaster       5000          9          0          0          0          0
                sink       5000        225      0.895      0.227      0.565       0.46
        refrigerator       5000        126      0.973      0.286      0.631      0.531
                book       5000       1129          1    0.00266      0.501      0.468
               clock       5000        267      0.967      0.547      0.762       0.57
                vase       5000        274      0.849      0.226      0.529      0.414
            scissors       5000         36          1     0.0833      0.542      0.488
          teddy bear       5000        190      0.974        0.2      0.588      0.498
          hair drier       5000         11          0          0          0          0
          toothbrush       5000         57          1     0.0175      0.509      0.458
Speed: 0.2ms pre-process, 4.7ms inference, 0.5ms NMS per image at shape (32, 3, 640, 640)

Evaluating pycocotools mAP... saving runs/val/coco.03/yolov5s_predictions.json...
loading annotations into memory...
Done (t=0.52s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.03s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=8.40s).
Accumulating evaluation results...
DONE (t=1.68s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.183
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.236
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.208
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.055
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.229
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.269
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.151
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.194
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.194
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.055
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.241
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.290
Results saved to runs/val/coco.03

Process finished with exit code 0

glenn-jocher · 2021-10-13T02:00:29Z

@imyhxy thanks for raising this issue again! I'll add a TODO to investigate.

TODO: investigate higher mAP at higher --conf bug in val.py, possibly related to curve extrap towards (0,1) x,y point

glenn-jocher · 2021-10-13T02:18:27Z

@imyhxy this is associated with extrapolation of the PR curve in #4563 to bring us into alignment with Detectron2 and MMDetection mAP computation. Before this RP the curve fell to zero at the last datapoint (no matter where that was on the x axis), but PR #4563 updated this to connect the last point linearly to 0,1. The higher confidence thresholds lack data on the right side of the curve so the extrapolation error is greater:

-    mrec = np.concatenate(([0.], recall, [recall[-1] + 0.01]))
-    mpre = np.concatenate(([1.], precision, [0.]))
+    mrec = np.concatenate(([0.0], recall, [1.0]))
+   mpre = np.concatenate(([1.0], precision, [0.0]))

--conf 0.001

!python val.py --weights yolov5m.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.001

val: data=/content/yolov5/data/coco.yaml, weights=['yolov5m.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True
YOLOv5 🚀 v6.0-3-g20a809d torch 1.9.0+cu111 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)

Fusing layers... 
Model Summary: 290 layers, 21172173 parameters, 0 gradients
val: Scanning '../datasets/coco/val2017' images and labels...4952 found, 48 missing, 0 empty, 0 corrupted: 100% 5000/5000 [00:01<00:00, 2837.14it/s]
val: New cache created: ../datasets/coco/val2017.cache
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100% 157/157 [01:19<00:00,  1.99it/s]
                 all       5000      36335       0.71      0.582      0.633      0.439
Speed: 0.1ms pre-process, 7.8ms inference, 1.7ms NMS per image at shape (32, 3, 640, 640)

Evaluating pycocotools mAP... saving runs/val/exp/yolov5m_predictions.json...
loading annotations into memory...
Done (t=0.89s)
creating index...
index created!
Loading and preparing results...
DONE (t=5.89s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=89.06s).
Accumulating evaluation results...
DONE (t=15.01s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.452
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.639
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.492
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.280
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.506
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.576
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.354
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.586
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.641
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.467
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.703
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.784
Results saved to runs/val/exp

--conf 0.500

!python val.py --weights yolov5m.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.5

val: data=/content/yolov5/data/coco.yaml, weights=['yolov5m.pt'], batch_size=32, imgsz=640, conf_thres=0.5, iou_thres=0.65, task=val, device=, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True
YOLOv5 🚀 v6.0-3-g20a809d torch 1.9.0+cu111 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)

Fusing layers... 
Model Summary: 290 layers, 21172173 parameters, 0 gradients
val: Scanning '../datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupted: 100% 5000/5000 [00:00<?, ?it/s]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100% 157/157 [01:08<00:00,  2.31it/s]
                 all       5000      36335      0.811      0.499      0.667      0.527
Speed: 0.1ms pre-process, 7.8ms inference, 1.0ms NMS per image at shape (32, 3, 640, 640)

Evaluating pycocotools mAP... saving runs/val/exp2/yolov5m_predictions.json...
loading annotations into memory...
Done (t=0.83s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.21s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=12.85s).
Accumulating evaluation results...
DONE (t=1.96s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.358
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.474
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.397
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.173
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.413
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.497
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.283
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.394
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.397
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.185
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.453
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.558
Results saved to runs/val/exp2

Partially addresses invalid mAPs at higher confidence threshold issue #1466.

Partially addresses invalid mAPs at higher confidence threshold issue ultralytics#1466.

smohan-ambarella · 2022-12-08T22:08:27Z

Is there a plan for fixing this issue ? The latest code on master still shows this warning.

glenn-jocher · 2022-12-08T23:31:28Z

@smohan-ambarella there is no bug. If you don't want to be warned, don't modify arguments.

Partially addresses invalid mAPs at higher confidence threshold issue ultralytics/yolov5#1466.

glenn-jocher added the bug Something isn't working label Nov 21, 2020

glenn-jocher added the TODO label Nov 22, 2020

glenn-jocher mentioned this issue Dec 8, 2020

mAP@.5:.95 calculation different than cocoapi calculation #1622

Closed

glenn-jocher mentioned this issue Dec 9, 2020

Reinstate PR curve sentinel values #1645

Merged

glenn-jocher linked a pull request Dec 9, 2020 that will close this issue

Reinstate PR curve sentinel values #1645

Merged

glenn-jocher mentioned this issue Dec 9, 2020

Reinstate PR curve sentinel values ultralytics/yolov3#1598

Merged

glenn-jocher closed this as completed in #1645 Dec 9, 2020

glenn-jocher removed the TODO label Dec 9, 2020

glenn-jocher added the TODO label Oct 13, 2021

glenn-jocher added a commit that referenced this issue Nov 8, 2021

Add --conf-thres >> 0.001 warning

988e778

Partially addresses invalid mAPs at higher confidence threshold issue #1466.

glenn-jocher mentioned this issue Nov 8, 2021

Add --conf-thres >> 0.001 warning #5567

Merged

glenn-jocher linked a pull request Nov 8, 2021 that will close this issue

Add --conf-thres >> 0.001 warning #5567

Merged

glenn-jocher added a commit that referenced this issue Nov 8, 2021

Add --conf-thres >> 0.001 warning (#5567)

0de4a9c

Partially addresses invalid mAPs at higher confidence threshold issue #1466.

glenn-jocher removed the TODO label Nov 8, 2021

wangziren1 mentioned this issue Mar 1, 2022

Fix mAP bug at a higher conf #6813

Closed

comlhj1114 mentioned this issue May 6, 2022

Fix mAP calculations #7714

Closed

MartinPedersenpp mentioned this issue Aug 16, 2022

explanation of the val.py #8968

Closed

1 task

BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this issue Aug 26, 2022

Add --conf-thres >> 0.001 warning (ultralytics#5567)

f028e13

Partially addresses invalid mAPs at higher confidence threshold issue ultralytics#1466.

SecretStar112 added a commit to SecretStar112/yolov5 that referenced this issue May 24, 2023

Add --conf-thres >> 0.001 warning (#5567)

c36e102

Partially addresses invalid mAPs at higher confidence threshold issue ultralytics/yolov5#1466.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mAP bug at higher --conf #1466

mAP bug at higher --conf #1466

glenn-jocher commented Nov 21, 2020 •

edited

Loading

glenn-jocher commented Nov 21, 2020

glenn-jocher commented Dec 8, 2020

imyhxy commented Oct 11, 2021

imyhxy commented Oct 11, 2021

glenn-jocher commented Oct 13, 2021

glenn-jocher commented Oct 13, 2021 •

edited

Loading

smohan-ambarella commented Dec 8, 2022

glenn-jocher commented Dec 8, 2022

mAP bug at higher --conf #1466

mAP bug at higher --conf #1466

Comments

glenn-jocher commented Nov 21, 2020 • edited Loading

glenn-jocher commented Nov 21, 2020

glenn-jocher commented Dec 8, 2020

imyhxy commented Oct 11, 2021

imyhxy commented Oct 11, 2021

glenn-jocher commented Oct 13, 2021

glenn-jocher commented Oct 13, 2021 • edited Loading

--conf 0.001

--conf 0.500

smohan-ambarella commented Dec 8, 2022

glenn-jocher commented Dec 8, 2022

glenn-jocher commented Nov 21, 2020 •

edited

Loading

glenn-jocher commented Oct 13, 2021 •

edited

Loading