Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classification Loss: CE vs BCE #3

Closed
glenn-jocher opened this issue Sep 4, 2018 · 4 comments
Closed

Classification Loss: CE vs BCE #3

glenn-jocher opened this issue Sep 4, 2018 · 4 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@glenn-jocher
Copy link
Member

glenn-jocher commented Sep 4, 2018

When developing the training code I found that replacing Binary Cross Entropy (BCE) loss with Cross Entropy (CE) loss significantly improves Precision, Recall and mAP. All show about 2X improvements using CE, though the YOLOv3 paper states these loss terms as BCE in darknet.

The two loss terms are on lines 162 and 163 of models.py. If anyone has any insight into this phenomenon I'd be very interested to hear it. For now you can swap the two back and forth. Note that SGD does not converge using either BCE or CE, so that issue appears independent of this one.

ce_vs_bce

@xyutao
Copy link

xyutao commented Sep 6, 2018

BCE computes sigmoid predictions independently for each class, while CE introduces inter-class competition. For BCE, an instance is allowed to be both class-A and class-B at the same time, which is better for multi-label task (e.g. OpenImage dataset). But for single-label instances (e.g. COCO), using BCE could cause high-score false positives and harm the AP.

@glenn-jocher glenn-jocher added the help wanted Extra attention is needed label Sep 9, 2018
@glenn-jocher glenn-jocher self-assigned this Sep 9, 2018
@glenn-jocher glenn-jocher mentioned this issue Sep 13, 2018
@nirbenz
Copy link

nirbenz commented Oct 10, 2018

While this is true, in theory, it is also clearly stated in the YOLOv3 paper that BCE is a big part of the models' general success (in COCO and PASCAL-VOC).
Looking at models.py I am actually not sure the commented out lines do that. It is supposed to be a binary classification (BCE) per class - in a somewhat one-vs-all manner.

@dtmoodie
Copy link

dtmoodie commented Mar 14, 2019

Hello,

I'm working on getting BCE loss to work in a multi-label task, the majority of my classes follow a hierarchical one vs all type classification, but a few leaves of my hierarchical tree could have multiple states. I'm experimenting with using BCE for the entire tree as in the original darknet paper, but I have yet to get any good results. My loss decreases significantly, but in the end my classification predictions are completely wrong. (Calling a car a street sign ^_-)
Has anyone else had any success with getting BCE to work?

@glenn-jocher
Copy link
Member Author

@dtmoodie hello,

Using BCE for hierarchical multi-label classification can be challenging, especially if some leaves in the hierarchical tree have multiple states. This can lead to unexpected results with misclassified predictions.

One potential approach to consider is adapting the loss function or the model architecture to better handle the hierarchical structure and multiple states. Additionally, experimenting with different loss functions or model configurations tailored to hierarchical multi-label classification tasks might yield improved results.

If you'd like further guidance on this, feel free to consult the Ultralytics Docs for additional insights and considerations while experimenting with BCE loss in your multi-label task.

Keep up the great work, and best of luck with your experimentation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants