Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation metrics implementation VS pycocotools #13363

Closed
1 task done
aliencaocao opened this issue Jun 4, 2024 · 5 comments
Closed
1 task done

Evaluation metrics implementation VS pycocotools #13363

aliencaocao opened this issue Jun 4, 2024 · 5 comments
Labels
question Further information is requested

Comments

@aliencaocao
Copy link

Search before asking

Question

The map50 and map50:95 results i get by running pycocotools and ultralytic's built in model.val() is very different, and while most of the times they correlate, this is not always the case. What am I missing here?
I need to have a fair comparison VS other models that use pycocotools as evaluation.

Additional

For pycocotools one would get a full printout like:

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.197
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.574
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.073
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.184
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.256
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.100
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.289
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.289
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.296
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.264

Is there anyway I can reproduce with ultralytic's implemented models?

@aliencaocao aliencaocao added the question Further information is requested label Jun 4, 2024
@glenn-jocher
Copy link
Member

Hello,

Thank you for reaching out with your query regarding the differences in evaluation metrics between pycocotools and Ultralytics' built-in model.val() method.

The discrepancies you're observing might be due to several factors, including differences in the IoU thresholds, area ranges, and maximum detections (maxDets) settings used during evaluation. Ultralytics' model.val() method and pycocotools might not use identical default settings for these parameters.

To align the evaluation metrics more closely with those provided by pycocotools, you can adjust the IoU thresholds and other relevant parameters in the YOLOv8 validation configuration to match those used by pycocotools. This should help in achieving a fairer comparison between different models.

If you need specific guidance on how to adjust these settings or further assistance, please feel free to ask!

@aliencaocao
Copy link
Author

aliencaocao commented Jun 4, 2024

Yes whats the parameters used by ultralytics and how can i replicate on pycocotools in another repo?

Or, if its easier, how do I replicate pycococools by changing ultralytics implementation?

@glenn-jocher
Copy link
Member

Hello!

To align the evaluation metrics between Ultralytics and pycocotools, you can adjust the parameters in Ultralytics' validation settings to match those typically used by pycocotools. Here are the key parameters you might consider:

  1. IoU Thresholds: pycocotools often uses a range of IoU thresholds from 0.50 to 0.95 for calculating average precision. Ensure your Ultralytics validation settings use the same range.

  2. Area Ranges: pycocotools categorizes object detections into small, medium, and large based on their area. You can specify similar categorizations in Ultralytics settings if not already set.

  3. Max Detections (maxDets): This parameter is crucial for calculating average recall. Set this to the same values used in pycocotools (e.g., 1, 10, 100).

To adjust these settings in Ultralytics, you can modify the validation configuration file or pass these parameters directly through the CLI or Python API. For example:

model.val(data='dataset.yaml', imgsz=640, conf=0.25, iou_thres=0.6, max_det=100)

This should help you achieve comparable evaluation metrics between the two tools. If you need more specific adjustments or further assistance, please let me know! 🚀

@aliencaocao
Copy link
Author

Thank you.

@glenn-jocher
Copy link
Member

Hello,

Thank you for reaching out! To help us investigate the issue effectively, could you please provide a minimum reproducible code example? This will allow us to replicate the problem on our end and work towards a solution. You can find guidelines on how to create a minimum reproducible example here.

Additionally, please ensure that you are using the latest versions of torch and ultralytics. If you haven't already, you can upgrade your packages with the following commands:

pip install --upgrade torch
pip install --upgrade ultralytics

Once you've updated your packages and provided the reproducible code, we'll be able to dive deeper into the issue. If you have any other questions or need further assistance, feel free to ask! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants