Keypoint loss too relaxed for custom number of keypoints #2543

davyneven · 2023-05-11T10:53:23Z

Search before asking

I have searched the YOLOv8 issues and found no similar bug report.

YOLOv8 Component

Training

Bug

I'm playing around with your pose/keypoint training script on custom keypoint datasets but stumbled upon a few issues.

I noticed that the keypoint metrics (mAP50/mAP50-95) are way too high when comparing the location of predicted keypoints with the groundtruth. I am having an mAP50-95 close to 99, while visually the keypoints on the training set are still way of their target.

By going through the codebase, I might have found what is causing this (and how to solve it):

First of all, I believe the sigmas for custom keypoints are not correctly initialized (too forgiving). Currently they are set to
torch.ones(nkpt, device=self.device) / nkpt where I believe you forgot to add a / 10.

For example, for nkpt = 2, this would lead to a sigma of 0.5 for each keypoint, which is way too forgiving. For example, given a bbox of 100 by 200 pixels, an offset of 20 pixels would still yield a OKS of 0.99 ...

This leads to two issues: first the obvious one is that the metrics are way too positive, but more importantly, the loss is too relaxed, so the predicted keypoints don't move any longer when they are already in (not so ) close proximity to the keypoint.

One way to fix the metric would indeed be to lower the sigma value, to for example 0.05 (instead of adding / 10, I would believe a fixed value for each keypoint would be better). However, as you are using the OKS metric as a loss function, this leads to very low gradients (as the derivative of the gaussian is almost zero when too far off, which is the case at the beginning of training. This might be addressed by better initialization, but too difficult to tune for any custom dataset. In my case, after 100 epochs, I still had an mAP50 of 0 ...

I believe there are two options to solve this issue:

Using a different loss function. I managed to get very good results using the standard L1-loss (and lowering the pose weight to 5 instead of 12):

class KeypointLoss(nn.Module):

    def __init__(self, sigmas) -> None:
        super().__init__()
        self.sigmas = sigmas

    def forward(self, pred_kpts, gt_kpts, kpt_mask, area):
        d = (pred_kpts[..., 0] - gt_kpts[..., 0]).abs() + (pred_kpts[..., 1] - gt_kpts[..., 1]).abs()
        kpt_loss_factor = (torch.sum(kpt_mask != 0) + torch.sum(kpt_mask == 0)) / (torch.sum(kpt_mask != 0) + 1e-9)
        return kpt_loss_factor * (d * kpt_mask).mean()

What also works is a combination of the two, which might be best, so you are still directly optimizing the OKS metric (like how the combine BCE and dice loss for semantic segmentation):

        d = (pred_kpts[..., 0] - gt_kpts[..., 0]) ** 2 + (pred_kpts[..., 1] - gt_kpts[..., 1]) ** 2
        d_abs = d = (pred_kpts[..., 0] - gt_kpts[..., 0]).abs() + (pred_kpts[..., 1] - gt_kpts[..., 1]).abs()
        kpt_loss_factor = (torch.sum(kpt_mask != 0) + torch.sum(kpt_mask == 0)) / (torch.sum(kpt_mask != 0) + 1e-9)
        # e = d / (2 * (area * self.sigmas) ** 2 + 1e-9)  # from formula
        e = d / (2 * self.sigmas) ** 2 / (area + 1e-9) / 2  # from cocoeval
        return kpt_loss_factor * (((1 - torch.exp(-e))*0.1 + d_abs*0.9)* kpt_mask).mean()

So, to cut my story short, I believe that to train a custom keypoint task, the default OKS loss fails, and you have to reside to the standard L1/L2 loss or a combination of the two.

You can download my custom (white board markers) dataset here: https://drive.google.com/file/d/1csR0EkzGb3EQ1MEEvlXfsyVF2pbqUUw9/view?usp=sharing

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

The text was updated successfully, but these errors were encountered:

github-actions · 2023-05-11T10:53:56Z

👋 Hello @davyneven, thank you for your interest in YOLOv8 🚀! We recommend a visit to the YOLOv8 Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Install

Pip install the ultralytics package including all requirements in a Python>=3.7 environment with PyTorch>=1.7.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher · 2023-05-12T08:19:50Z

Thank you for reaching out and providing such detailed feedback on the YOLOv8 pose/keypoint training script, @davyneven.
We appreciate your research and will take it into consideration when updating the codebase.
It's great to hear that you found a solution for custom keypoint tasks and were able to adapt the loss function to better suit your needs.
Thank you for also providing your dataset, which will be helpful for testing and improving the repo.
If you have any further questions or suggestions, please don't hesitate to reach out to the Ultralytics team or the YOLOv8 community.

Laughing-q · 2023-05-12T08:59:40Z

@davyneven Thanks for the nice suggestions! Actually I tested a facial landmark dataset -- widerface to verify the generality of the loss function and after a few epochs the predictions looks good, since widerface does not provide keypoint info in their val set so I don't really know how the metrics look like.

And in my experiments the pose part convergence is obviously slower than the bbox part, which usually needs more data and more epochs to converge.
Yes I agree you that the current loss function is not very suitable for some cases that only got 2, 3 keypoints. I like the L1 loss idea and it could make the keypoint part converge faster I think.
Could you make a PR then we can actually experiment on it? Thanks!

CalinLucian · 2023-05-15T09:50:21Z

I can also confirm this observation (I wanted to actually ask a different question on this but then again it's already opened). I think it makes sense to see this phenomenon of box mAP converging much faster but I had the same obs. with the keypoints converging much later (trained for 300 epochs, box mAP @0.98/0.99 as of epochs 10, while the keypoints were barely starting to be > 0 above epoch 100).

glenn-jocher · 2023-05-16T03:16:44Z

@CalinLucian thank you for bringing this up! It is true that the bounding box mAP metric in YOLOv8 tends to converge faster compared to the keypoint metric in some cases. This is due to several factors, such as the complexity of the task, the amount and quality of the data, and the learning rate. Since the keypoint task is usually more complex than the bounding box task, it might require more data and longer training to converge. Additionally, the learning rate might need to be adjusted to stabilize the training process and help the keypoint metric to converge faster.

In any case, it is important to keep monitoring both metrics during training to ensure that the model is performing well on both tasks. A good strategy is to use a combination of loss functions that can balance the importance of the two tasks, and to adjust the learning rate and other hyperparameters based on the feedback from the metrics. We appreciate your feedback and will consider ways to improve the training process for keypoint tasks in future updates.

TimbusCalin · 2023-05-17T18:10:35Z

@glenn-jocher Can I use a custom metric (such as Pose R/Pose P) as an optimizing metric when saving the best model for posing? Otherwise I think by default the one used is mAP50-95 for the BBox which is not helpful given different points in times in the convervence (e.g. epoch 10 mAP could be 0.954, Pose R could be 0.0123 while epoch 60 mAP could be 0.953, Pose R could be 0.63, but because of say patience=50 (imagine mAP kept around that value), training stops and best model is loaded from epoch 10).

In addition, how can I use a custom loss function?

Thank you kindly for your time.

glenn-jocher · 2023-05-18T15:03:35Z

Thank you for your question, @TimbusCalin. In YOLOv8, you can use a custom metric as an optimizing metric when saving the best model for the keypoint/pose task. You can define this metric as a function that takes as input the predicted keypoints and the ground truth keypoints, and outputs a scalar value that measures the accuracy of the model. You can then pass this function to the ModelCheckpoint callback during training, and set it as the monitor parameter. The callback will save the model that achieved the highest value for your custom metric during training.

Regarding custom loss functions, you can define your own loss function that takes as input the model predictions and the ground truth annotations, and outputs a scalar value that measures the error of the model. You can then use this loss function during training by passing it to the compile function of the model, along with an optimizer and other parameters.

To implement these features in YOLOv8, you can modify the relevant parts of the code, such as the training script and the model architecture. You can refer to the TensorFlow documentation on how to define custom metrics and loss functions for more details.

I hope this answers your questions. If you have any further questions or suggestions, please don't hesitate to reach out to the Ultralytics team or the YOLOv8 community.

LiangYong1216 · 2023-06-07T08:29:29Z

hi @davyneven ,what's this means:(and lowering the pose weight to 5 instead of 12).
So your idea is to change pose=12 to pose=5 during training? Is my understanding correct that the code should be model.train(data='header.yaml', epochs=1000, imgsz=640, batch=128, device="1, 2", dropout=0.5, pose=5)? all right?

glenn-jocher · 2023-06-07T20:32:43Z

Hi @LiangYong1216, thanks for your question.
The code pose=12 is used to define the weighting factor for the pose/keypoint loss during training. By default, the pose weighting factor is set to 12, which means that the pose loss will contribute more towards the total loss than the bounding box and classification losses.
In the case mentioned by @davyneven, they found that reducing the pose weighting factor to 5 resulted in better convergence of the keypoint metric.
To implement this change, you can modify the pose parameter in the train command as follows: model.train(data='header.yaml', epochs=1000, imgsz=640, batch=128, device="1, 2", dropout=0.5, pose=5). This will set the weighting factor for the pose loss to 5 during training.
I hope this helps! If you have any further questions, please don't hesitate to ask.

davyneven · 2023-06-07T21:03:49Z

@LiangYong1216 indeed, as @glenn-jocher indicates, you can set pose=5 in the train function. However, this is only necessary if you also change the loss function, as I mentioned above; if you don't change the loss function, there is no need for lowering the pose weight.

LiangYong1216 · 2023-06-08T08:09:57Z

hello @davyneven @glenn-jocher ,Thanks for your prompt reply, I set pose=5 on my dataset, but it doesn't seem to work, after 30 epochs all P and R are 1 during training, the loss function uses your second scheme, the model I use yolov8s-pose.pt. Each image in my dataset has 52 key points, and the markup is definitely correct, the model seems to be good, but there is still a certain gap compared to the label, how can I improve it?

glenn-jocher · 2023-06-08T19:00:16Z

Hi @davyneven, thank you for your question.
If the Precision (P) and Recall (R) values are already at 1 after 30 epochs, it suggests that the model is possibly overfitting to the training data, meaning it is memorizing the examples instead of generalizing to new examples. This could be due to a variety of factors, such as a small dataset, imbalanced data, noisy data, or the model architecture.

One way to improve the model performance is to add data augmentation techniques during training, such as flipping, scaling, rotating, and shifting the images, to increase the variability of the dataset. Another approach is to use transfer learning, by pretraining the model on a larger dataset that is similar to your own dataset, before fine-tuning it on your dataset. This can help the model to learn more general features that are relevant to your task.

You can also try adjusting the hyperparameters of the model, such as the learning rate, the batch size, and the number of layers, to see if this improves the performance. Lastly, you can try visualizing the results of the model, by plotting the predicted keypoints on top of the input images, and comparing them to the ground truth keypoints, to identify which keypoints the model may be struggling with. This can help you to identify where to focus your efforts to improve the model.

I hope this helps! If you have any further questions or concerns, please don't hesitate to ask.

github-actions · 2023-07-09T00:22:49Z

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

Back to standard L1 loss such as described in ultralytics#2543 for better keypoint detection

davyneven added the bug Something isn't working label May 11, 2023

Charlie-crl mentioned this issue May 25, 2023

Brief summary of YOLOv8 model structure #189

Open

michelebechini mentioned this issue May 31, 2023

Improve keypoints detection for symmetric objects #2906

Closed

1 task

github-actions bot added the Stale label Jul 9, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 19, 2023

Og31330 added a commit to Og31330/ultralytics that referenced this issue Mar 10, 2024

Update loss.py

0b571b2

Back to standard L1 loss such as described in ultralytics#2543 for better keypoint detection

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keypoint loss too relaxed for custom number of keypoints #2543

Keypoint loss too relaxed for custom number of keypoints #2543

davyneven commented May 11, 2023

github-actions bot commented May 11, 2023

glenn-jocher commented May 12, 2023

Laughing-q commented May 12, 2023 •

edited

CalinLucian commented May 15, 2023 •

edited

glenn-jocher commented May 16, 2023

TimbusCalin commented May 17, 2023 •

edited

glenn-jocher commented May 18, 2023

LiangYong1216 commented Jun 7, 2023

glenn-jocher commented Jun 7, 2023

davyneven commented Jun 7, 2023

LiangYong1216 commented Jun 8, 2023

glenn-jocher commented Jun 8, 2023

github-actions bot commented Jul 9, 2023

Keypoint loss too relaxed for custom number of keypoints #2543

Keypoint loss too relaxed for custom number of keypoints #2543

Comments

davyneven commented May 11, 2023

Search before asking

YOLOv8 Component

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

github-actions bot commented May 11, 2023

Install

Environments

Status

glenn-jocher commented May 12, 2023

Laughing-q commented May 12, 2023 • edited

CalinLucian commented May 15, 2023 • edited

glenn-jocher commented May 16, 2023

TimbusCalin commented May 17, 2023 • edited

glenn-jocher commented May 18, 2023

LiangYong1216 commented Jun 7, 2023

glenn-jocher commented Jun 7, 2023

davyneven commented Jun 7, 2023

LiangYong1216 commented Jun 8, 2023

glenn-jocher commented Jun 8, 2023

github-actions bot commented Jul 9, 2023

Laughing-q commented May 12, 2023 •

edited

CalinLucian commented May 15, 2023 •

edited

TimbusCalin commented May 17, 2023 •

edited