Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keypoint loss too relaxed for custom number of keypoints #2543

Closed
1 of 2 tasks
davyneven opened this issue May 11, 2023 · 13 comments
Closed
1 of 2 tasks

Keypoint loss too relaxed for custom number of keypoints #2543

davyneven opened this issue May 11, 2023 · 13 comments
Labels
bug Something isn't working Stale

Comments

@davyneven
Copy link

Search before asking

  • I have searched the YOLOv8 issues and found no similar bug report.

YOLOv8 Component

Training

Bug

I'm playing around with your pose/keypoint training script on custom keypoint datasets but stumbled upon a few issues.

I noticed that the keypoint metrics (mAP50/mAP50-95) are way too high when comparing the location of predicted keypoints with the groundtruth. I am having an mAP50-95 close to 99, while visually the keypoints on the training set are still way of their target.

By going through the codebase, I might have found what is causing this (and how to solve it):

First of all, I believe the sigmas for custom keypoints are not correctly initialized (too forgiving). Currently they are set to
torch.ones(nkpt, device=self.device) / nkpt where I believe you forgot to add a / 10.

For example, for nkpt = 2, this would lead to a sigma of 0.5 for each keypoint, which is way too forgiving. For example, given a bbox of 100 by 200 pixels, an offset of 20 pixels would still yield a OKS of 0.99 ...

This leads to two issues: first the obvious one is that the metrics are way too positive, but more importantly, the loss is too relaxed, so the predicted keypoints don't move any longer when they are already in (not so ) close proximity to the keypoint.

One way to fix the metric would indeed be to lower the sigma value, to for example 0.05 (instead of adding / 10, I would believe a fixed value for each keypoint would be better). However, as you are using the OKS metric as a loss function, this leads to very low gradients (as the derivative of the gaussian is almost zero when too far off, which is the case at the beginning of training. This might be addressed by better initialization, but too difficult to tune for any custom dataset. In my case, after 100 epochs, I still had an mAP50 of 0 ...

I believe there are two options to solve this issue:

  1. Using a different loss function. I managed to get very good results using the standard L1-loss (and lowering the pose weight to 5 instead of 12):
class KeypointLoss(nn.Module):

    def __init__(self, sigmas) -> None:
        super().__init__()
        self.sigmas = sigmas

    def forward(self, pred_kpts, gt_kpts, kpt_mask, area):
        d = (pred_kpts[..., 0] - gt_kpts[..., 0]).abs() + (pred_kpts[..., 1] - gt_kpts[..., 1]).abs()
        kpt_loss_factor = (torch.sum(kpt_mask != 0) + torch.sum(kpt_mask == 0)) / (torch.sum(kpt_mask != 0) + 1e-9)
        return kpt_loss_factor * (d * kpt_mask).mean()
  1. What also works is a combination of the two, which might be best, so you are still directly optimizing the OKS metric (like how the combine BCE and dice loss for semantic segmentation):
        d = (pred_kpts[..., 0] - gt_kpts[..., 0]) ** 2 + (pred_kpts[..., 1] - gt_kpts[..., 1]) ** 2
        d_abs = d = (pred_kpts[..., 0] - gt_kpts[..., 0]).abs() + (pred_kpts[..., 1] - gt_kpts[..., 1]).abs()
        kpt_loss_factor = (torch.sum(kpt_mask != 0) + torch.sum(kpt_mask == 0)) / (torch.sum(kpt_mask != 0) + 1e-9)
        # e = d / (2 * (area * self.sigmas) ** 2 + 1e-9)  # from formula
        e = d / (2 * self.sigmas) ** 2 / (area + 1e-9) / 2  # from cocoeval
        return kpt_loss_factor * (((1 - torch.exp(-e))*0.1 + d_abs*0.9)* kpt_mask).mean()

So, to cut my story short, I believe that to train a custom keypoint task, the default OKS loss fails, and you have to reside to the standard L1/L2 loss or a combination of the two.

You can download my custom (white board markers) dataset here: https://drive.google.com/file/d/1csR0EkzGb3EQ1MEEvlXfsyVF2pbqUUw9/view?usp=sharing

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@davyneven davyneven added the bug Something isn't working label May 11, 2023
@github-actions
Copy link

👋 Hello @davyneven, thank you for your interest in YOLOv8 🚀! We recommend a visit to the YOLOv8 Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Install

Pip install the ultralytics package including all requirements in a Python>=3.7 environment with PyTorch>=1.7.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

Ultralytics CI

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

Thank you for reaching out and providing such detailed feedback on the YOLOv8 pose/keypoint training script, @davyneven.
We appreciate your research and will take it into consideration when updating the codebase.
It's great to hear that you found a solution for custom keypoint tasks and were able to adapt the loss function to better suit your needs.
Thank you for also providing your dataset, which will be helpful for testing and improving the repo.
If you have any further questions or suggestions, please don't hesitate to reach out to the Ultralytics team or the YOLOv8 community.

@Laughing-q
Copy link
Member

Laughing-q commented May 12, 2023

@davyneven Thanks for the nice suggestions! Actually I tested a facial landmark dataset -- widerface to verify the generality of the loss function and after a few epochs the predictions looks good, since widerface does not provide keypoint info in their val set so I don't really know how the metrics look like.
987
And in my experiments the pose part convergence is obviously slower than the bbox part, which usually needs more data and more epochs to converge.
Yes I agree you that the current loss function is not very suitable for some cases that only got 2, 3 keypoints. I like the L1 loss idea and it could make the keypoint part converge faster I think.
Could you make a PR then we can actually experiment on it? Thanks!

@CalinLucian
Copy link

CalinLucian commented May 15, 2023

I can also confirm this observation (I wanted to actually ask a different question on this but then again it's already opened). I think it makes sense to see this phenomenon of box mAP converging much faster but I had the same obs. with the keypoints converging much later (trained for 300 epochs, box mAP @0.98/0.99 as of epochs 10, while the keypoints were barely starting to be > 0 above epoch 100).

@glenn-jocher
Copy link
Member

@CalinLucian thank you for bringing this up! It is true that the bounding box mAP metric in YOLOv8 tends to converge faster compared to the keypoint metric in some cases. This is due to several factors, such as the complexity of the task, the amount and quality of the data, and the learning rate. Since the keypoint task is usually more complex than the bounding box task, it might require more data and longer training to converge. Additionally, the learning rate might need to be adjusted to stabilize the training process and help the keypoint metric to converge faster.

In any case, it is important to keep monitoring both metrics during training to ensure that the model is performing well on both tasks. A good strategy is to use a combination of loss functions that can balance the importance of the two tasks, and to adjust the learning rate and other hyperparameters based on the feedback from the metrics. We appreciate your feedback and will consider ways to improve the training process for keypoint tasks in future updates.

@TimbusCalin
Copy link

TimbusCalin commented May 17, 2023

@glenn-jocher Can I use a custom metric (such as Pose R/Pose P) as an optimizing metric when saving the best model for posing? Otherwise I think by default the one used is mAP50-95 for the BBox which is not helpful given different points in times in the convervence (e.g. epoch 10 mAP could be 0.954, Pose R could be 0.0123 while epoch 60 mAP could be 0.953, Pose R could be 0.63, but because of say patience=50 (imagine mAP kept around that value), training stops and best model is loaded from epoch 10).

In addition, how can I use a custom loss function?

Thank you kindly for your time.

@glenn-jocher
Copy link
Member

Thank you for your question, @TimbusCalin. In YOLOv8, you can use a custom metric as an optimizing metric when saving the best model for the keypoint/pose task. You can define this metric as a function that takes as input the predicted keypoints and the ground truth keypoints, and outputs a scalar value that measures the accuracy of the model. You can then pass this function to the ModelCheckpoint callback during training, and set it as the monitor parameter. The callback will save the model that achieved the highest value for your custom metric during training.

Regarding custom loss functions, you can define your own loss function that takes as input the model predictions and the ground truth annotations, and outputs a scalar value that measures the error of the model. You can then use this loss function during training by passing it to the compile function of the model, along with an optimizer and other parameters.

To implement these features in YOLOv8, you can modify the relevant parts of the code, such as the training script and the model architecture. You can refer to the TensorFlow documentation on how to define custom metrics and loss functions for more details.

I hope this answers your questions. If you have any further questions or suggestions, please don't hesitate to reach out to the Ultralytics team or the YOLOv8 community.

@LiangYong1216
Copy link

hi @davyneven ,what's this means:(and lowering the pose weight to 5 instead of 12).
So your idea is to change pose=12 to pose=5 during training? Is my understanding correct that the code should be model.train(data='header.yaml', epochs=1000, imgsz=640, batch=128, device="1, 2", dropout=0.5, pose=5)? all right?

@glenn-jocher
Copy link
Member

Hi @LiangYong1216, thanks for your question.
The code pose=12 is used to define the weighting factor for the pose/keypoint loss during training. By default, the pose weighting factor is set to 12, which means that the pose loss will contribute more towards the total loss than the bounding box and classification losses.
In the case mentioned by @davyneven, they found that reducing the pose weighting factor to 5 resulted in better convergence of the keypoint metric.
To implement this change, you can modify the pose parameter in the train command as follows: model.train(data='header.yaml', epochs=1000, imgsz=640, batch=128, device="1, 2", dropout=0.5, pose=5). This will set the weighting factor for the pose loss to 5 during training.
I hope this helps! If you have any further questions, please don't hesitate to ask.

@davyneven
Copy link
Author

@LiangYong1216 indeed, as @glenn-jocher indicates, you can set pose=5 in the train function. However, this is only necessary if you also change the loss function, as I mentioned above; if you don't change the loss function, there is no need for lowering the pose weight.

@LiangYong1216
Copy link

hello @davyneven @glenn-jocher ,Thanks for your prompt reply, I set pose=5 on my dataset, but it doesn't seem to work, after 30 epochs all P and R are 1 during training, the loss function uses your second scheme, the model I use yolov8s-pose.pt. Each image in my dataset has 52 key points, and the markup is definitely correct, the model seems to be good, but there is still a certain gap compared to the label, how can I improve it?

@glenn-jocher
Copy link
Member

Hi @davyneven, thank you for your question.
If the Precision (P) and Recall (R) values are already at 1 after 30 epochs, it suggests that the model is possibly overfitting to the training data, meaning it is memorizing the examples instead of generalizing to new examples. This could be due to a variety of factors, such as a small dataset, imbalanced data, noisy data, or the model architecture.

One way to improve the model performance is to add data augmentation techniques during training, such as flipping, scaling, rotating, and shifting the images, to increase the variability of the dataset. Another approach is to use transfer learning, by pretraining the model on a larger dataset that is similar to your own dataset, before fine-tuning it on your dataset. This can help the model to learn more general features that are relevant to your task.

You can also try adjusting the hyperparameters of the model, such as the learning rate, the batch size, and the number of layers, to see if this improves the performance. Lastly, you can try visualizing the results of the model, by plotting the predicted keypoints on top of the input images, and comparing them to the ground truth keypoints, to identify which keypoints the model may be struggling with. This can help you to identify where to focus your efforts to improve the model.

I hope this helps! If you have any further questions or concerns, please don't hesitate to ask.

@github-actions
Copy link

github-actions bot commented Jul 9, 2023

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@github-actions github-actions bot added the Stale label Jul 9, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 19, 2023
Og31330 added a commit to Og31330/ultralytics that referenced this issue Mar 10, 2024
Back to standard L1 loss such as described in 
ultralytics#2543

for better keypoint detection
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

6 participants