-
-
Notifications
You must be signed in to change notification settings - Fork 16.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance metrics per class for comparison #5880
Comments
@KristofferK metrics are automatically displayed per class. No action is required. |
Thank you for the very quick response, Glenn! ~~ When I use val.py, I do not seem to be able to use "--task test". Do you have any idea how to fix this? It works with "--task val", "--task study", etc. I am on the latest commit.~~ (Fixed: I did not have test in in data_config.yaml). I st ill have problem with metrics though, as you will see further down.
Also. Even when I use "--task val" the precision from this result is far lower than the precision (and other metrics as well) that I saw in my results.csv during the training. How come? With --task val for all classes row I get 0.35 P, 0.675 R, 0.411 MAP@.5 In results.csv for the last row (almost the best one, some previous row are slightly better) I have: 0.99754 P, 0.98379 R, 0.98431 mAP@.5.
This seems like a huge difference? I assumed the metrics in results.csv was for the validation set. Is this not the case? Is it for the training set? Wouldn't it be overfitting to use the training set here? Thanks in advance |
@KristofferK --task argument allows you to specify any split, default val. |
Thanks again I solved --task by adding "test" to my data_config.yaml. For metrics it seems to be a matter of toggling --conf-thres and --iou-thres. Since when I run test.py I get very good results, but val.py does not reflect these good results. I will further experiment with it. Thank you for the quick responses. |
I would however still like to know if the results.csv is for training set or for validation set? Would it not be overfitting to select a model based on the training set results? It's just that the metrics in my results.csv vary a lot from those I get in val.py. Should I rather create a new issue for this? |
@KristofferK there is no test.py. Training always runs on validation set. This is standard practice in any ML workflow. You can browse the code here: Lines 352 to 367 in 7bf04d9
|
@glenn-jocher That is my mistake. I meant that when I run train.py (results.csv) I have a very good metrics. When I use detect.py, I get good annotations. When I use val.py, I do not get good metrics - not even on --task val. I have included some results. Hopefully it makes sense.
But when I use val.py, I do not get the same results. Not at all. This is with manually set confience:
This is with default confidence flags:
Am I missing a flag to make the val.py give me the same good results I see in the results.csv? When I use detect.py it does give me good annotations, and these annotations have high confience (0.95+ in general), why I don't understand the low precision and recall when i use val.py. In case it matters, this is how I trained the model
|
@KristofferK I don't know what you're asking. Your metrics are your metrics, there's nothing for me to do here. You obtain metrics on your dataset by running val.py with the same train.py settings, i.e.
That's it. |
But shouldn't the metrics from val.py match the metrics from train.py? If train.py shows the metrics for the validation set and val.py shows the metrics for the validation set (--task val or no --task), then these should surely be the same, no? |
@KristofferK metrics obtain when training after epoch (P, R, mAP) that by last.pt model at the time. it will different when you run val.py with best.pt, because last.pt different best.pt. |
I believe that would make sense, if the metrics I got on val.py were better. But when I use val.py with best.pt, the metrics are worse than the last.pt during training. |
maybe conf-thres and iou-thres different. |
@KristofferK 👋 hi, thanks for letting us know about this possible problem with YOLOv5 🚀. We've created a few short guidelines below to help users provide what we need in order to get started investigating a possible problem. How to create a Minimal, Reproducible ExampleWhen asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:
For Ultralytics to provide assistance your code should also be:
If you believe your problem meets all the above criteria, please close this issue and raise a new one using the 🐛 Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem. Thank you! 😃 |
It seems to be because train.py uses --single-cls for metrics (despite single_cls being false in opt.yaml) while val.py does not use --single-cls by default. As such, if I add --single-cls to my val.py argument, then it will produce the same results as I had in train.py. But I did not use --single-cls for train.py. But will this not result in suboptimal trained models, if these models are trained as a single class classifier? Or is it only during printing to console and results.csv it uses single-cls? |
@KristofferK |
@glenn-jocher But even when not specifying --single-cls during training (train.py), then the printed metrics (in terminal or results.csv), seems to be the metrics for a single classifier, i.e the metrics it prints ignore if the class is correctly predicted. This must be the case, since when I run val.py with --single-cls I get the same metrics as I get in train.py without --train-cls. So the metrics printed during training are metrics for a single classifier, when though we are training a multi class model? |
@KristofferK I have no idea what you are asking. If you are not trying to force a dataset into single class mode there is no reason to use this argument. Train and Val operate correctly. If you have a reproducible bug submit a bug report with code to reproduce. |
Search before asking
Question
Hello,
Is it possible to easily compare the performance on different classes in a multiclass model? I've trained it on a custom dataset with 4 classes. Let's say Class A, B, C, and D.
How would I see e.g. the precision, recall and mAP per class? This is to see if my model is better at a certain class or if it is equally good amongst them all.
I suspect that my model is very good on class A, good on B and C, whole it is mediocre at class D. This is what it looks like when I manually inspect the annotations on the test results. But I'd like the actual performance metrics to support my hypothesis.
Thanks in advance.
Additional
No response
The text was updated successfully, but these errors were encountered: