-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keypoint loss too relaxed for custom number of keypoints #2543
Comments
👋 Hello @davyneven, thank you for your interest in YOLOv8 🚀! We recommend a visit to the YOLOv8 Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered. If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it. If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results. InstallPip install the pip install ultralytics EnvironmentsYOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit. |
Thank you for reaching out and providing such detailed feedback on the YOLOv8 pose/keypoint training script, @davyneven. |
@davyneven Thanks for the nice suggestions! Actually I tested a facial landmark dataset -- widerface to verify the generality of the loss function and after a few epochs the predictions looks good, since widerface does not provide keypoint info in their val set so I don't really know how the metrics look like. |
I can also confirm this observation (I wanted to actually ask a different question on this but then again it's already opened). I think it makes sense to see this phenomenon of box mAP converging much faster but I had the same obs. with the keypoints converging much later (trained for 300 epochs, box mAP @0.98/0.99 as of epochs 10, while the keypoints were barely starting to be > 0 above epoch 100). |
@CalinLucian thank you for bringing this up! It is true that the bounding box mAP metric in YOLOv8 tends to converge faster compared to the keypoint metric in some cases. This is due to several factors, such as the complexity of the task, the amount and quality of the data, and the learning rate. Since the keypoint task is usually more complex than the bounding box task, it might require more data and longer training to converge. Additionally, the learning rate might need to be adjusted to stabilize the training process and help the keypoint metric to converge faster. In any case, it is important to keep monitoring both metrics during training to ensure that the model is performing well on both tasks. A good strategy is to use a combination of loss functions that can balance the importance of the two tasks, and to adjust the learning rate and other hyperparameters based on the feedback from the metrics. We appreciate your feedback and will consider ways to improve the training process for keypoint tasks in future updates. |
@glenn-jocher Can I use a custom metric (such as Pose R/Pose P) as an optimizing metric when saving the best model for posing? Otherwise I think by default the one used is mAP50-95 for the BBox which is not helpful given different points in times in the convervence (e.g. epoch 10 mAP could be 0.954, Pose R could be 0.0123 while epoch 60 mAP could be 0.953, Pose R could be 0.63, but because of say patience=50 (imagine mAP kept around that value), training stops and best model is loaded from epoch 10). In addition, how can I use a custom loss function? Thank you kindly for your time. |
Thank you for your question, @TimbusCalin. In YOLOv8, you can use a custom metric as an optimizing metric when saving the best model for the keypoint/pose task. You can define this metric as a function that takes as input the predicted keypoints and the ground truth keypoints, and outputs a scalar value that measures the accuracy of the model. You can then pass this function to the Regarding custom loss functions, you can define your own loss function that takes as input the model predictions and the ground truth annotations, and outputs a scalar value that measures the error of the model. You can then use this loss function during training by passing it to the To implement these features in YOLOv8, you can modify the relevant parts of the code, such as the training script and the model architecture. You can refer to the TensorFlow documentation on how to define custom metrics and loss functions for more details. I hope this answers your questions. If you have any further questions or suggestions, please don't hesitate to reach out to the Ultralytics team or the YOLOv8 community. |
hi @davyneven ,what's this means:(and lowering the pose weight to 5 instead of 12). |
Hi @LiangYong1216, thanks for your question. |
@LiangYong1216 indeed, as @glenn-jocher indicates, you can set pose=5 in the train function. However, this is only necessary if you also change the loss function, as I mentioned above; if you don't change the loss function, there is no need for lowering the pose weight. |
hello @davyneven @glenn-jocher ,Thanks for your prompt reply, I set pose=5 on my dataset, but it doesn't seem to work, after 30 epochs all P and R are 1 during training, the loss function uses your second scheme, the model I use yolov8s-pose.pt. Each image in my dataset has 52 key points, and the markup is definitely correct, the model seems to be good, but there is still a certain gap compared to the label, how can I improve it? |
Hi @davyneven, thank you for your question. One way to improve the model performance is to add data augmentation techniques during training, such as flipping, scaling, rotating, and shifting the images, to increase the variability of the dataset. Another approach is to use transfer learning, by pretraining the model on a larger dataset that is similar to your own dataset, before fine-tuning it on your dataset. This can help the model to learn more general features that are relevant to your task. You can also try adjusting the hyperparameters of the model, such as the learning rate, the batch size, and the number of layers, to see if this improves the performance. Lastly, you can try visualizing the results of the model, by plotting the predicted keypoints on top of the input images, and comparing them to the ground truth keypoints, to identify which keypoints the model may be struggling with. This can help you to identify where to focus your efforts to improve the model. I hope this helps! If you have any further questions or concerns, please don't hesitate to ask. |
👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help. For additional resources and information, please see the links below:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed! Thank you for your contributions to YOLO 🚀 and Vision AI ⭐ |
Back to standard L1 loss such as described in ultralytics#2543 for better keypoint detection
Search before asking
YOLOv8 Component
Training
Bug
I'm playing around with your pose/keypoint training script on custom keypoint datasets but stumbled upon a few issues.
I noticed that the keypoint metrics (mAP50/mAP50-95) are way too high when comparing the location of predicted keypoints with the groundtruth. I am having an mAP50-95 close to 99, while visually the keypoints on the training set are still way of their target.
By going through the codebase, I might have found what is causing this (and how to solve it):
First of all, I believe the sigmas for custom keypoints are not correctly initialized (too forgiving). Currently they are set to
torch.ones(nkpt, device=self.device) / nkpt
where I believe you forgot to add a/ 10
.For example, for nkpt = 2, this would lead to a sigma of 0.5 for each keypoint, which is way too forgiving. For example, given a bbox of 100 by 200 pixels, an offset of 20 pixels would still yield a OKS of 0.99 ...
This leads to two issues: first the obvious one is that the metrics are way too positive, but more importantly, the loss is too relaxed, so the predicted keypoints don't move any longer when they are already in (not so ) close proximity to the keypoint.
One way to fix the metric would indeed be to lower the sigma value, to for example 0.05 (instead of adding
/ 10
, I would believe a fixed value for each keypoint would be better). However, as you are using the OKS metric as a loss function, this leads to very low gradients (as the derivative of the gaussian is almost zero when too far off, which is the case at the beginning of training. This might be addressed by better initialization, but too difficult to tune for any custom dataset. In my case, after 100 epochs, I still had an mAP50 of 0 ...I believe there are two options to solve this issue:
So, to cut my story short, I believe that to train a custom keypoint task, the default OKS loss fails, and you have to reside to the standard L1/L2 loss or a combination of the two.
You can download my custom (white board markers) dataset here: https://drive.google.com/file/d/1csR0EkzGb3EQ1MEEvlXfsyVF2pbqUUw9/view?usp=sharing
Environment
No response
Minimal Reproducible Example
No response
Additional
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: