fix comment misdirection during scaling loss #36987

techkang · 2025-03-26T02:31:32Z

What does this PR do?

If the GA loss bug is not fixed, we should scale the loss properly. However, it is not solely for reporting purposes as described in the previous comment. https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L3764

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@SunMarc @muellerzr @ArthurZucker

github-actions · 2025-03-26T02:31:44Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

Rocketknight1 · 2025-03-26T12:50:35Z

cc @muellerzr @SunMarc

SunMarc · 2025-03-26T14:55:09Z

src/transformers/trainer.py

+            # Finally we need to normalize the loss if GA loss bug is not fixed during compute loss
            if not self.model_accepts_loss_kwargs and self.compute_loss_func is None:
                loss = loss / self.args.gradient_accumulation_steps


I feel like this condition if not strong enough. Users can overwrite compute loss and not use num_item_per_batch at all when calculating the loss (e.g trl case as I explained here ). The issue is that right now it won't scale the loss.
Maybe we should create a variable to enable/disable the GA fix but in transformers, we set enable it since we are calculating the loss the right way.

SunMarc

LGTM ! Left some thoughts regarding the condition below

github-actions bot marked this pull request as draft March 26, 2025 02:31

techkang marked this pull request as ready for review March 26, 2025 14:39

github-actions bot requested review from muellerzr and SunMarc March 26, 2025 14:39

SunMarc reviewed Mar 26, 2025

View reviewed changes

techkang closed this May 15, 2025

techkang force-pushed the main branch from 29b53db to 0f77ca7 Compare May 15, 2025 15:28

techkang mentioned this pull request May 16, 2025

fix bug in distributed loss test #38166

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix comment misdirection during scaling loss #36987

fix comment misdirection during scaling loss #36987

Uh oh!

techkang commented Mar 26, 2025

Uh oh!

github-actions bot commented Mar 26, 2025

Uh oh!

Rocketknight1 commented Mar 26, 2025

Uh oh!

SunMarc Mar 26, 2025

Uh oh!

SunMarc left a comment

Uh oh!

Uh oh!

fix comment misdirection during scaling loss #36987

fix comment misdirection during scaling loss #36987

Uh oh!

Conversation

techkang commented Mar 26, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions bot commented Mar 26, 2025

Uh oh!

Rocketknight1 commented Mar 26, 2025

Uh oh!

SunMarc Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!