Skip to content

Corrections for gradient centralization example. #2113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

albertoesmp
Copy link

The gradient centralization example used the model trained without gradient centralization (NGC) when applying gradient centralization (GC). Thus, the results were better because the GC model wasn't trained from scratch but started from the trained NGC model. This fix uses independent models for GC and NGC and thus yields a fair comparison between both strategies.

Copy link

google-cla bot commented May 29, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@JyotinderSingh
Copy link
Contributor

@albertoesmp Thank you for this fix, could you please resolve the CLA issue mentioned in the comment above?

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a significant issue in the example where the same model instance was reused for training with and without Gradient Centralization (GC), leading to an unfair comparison. The introduction of a make_model() factory function is an excellent fix for this.

However, my review has uncovered a more critical issue that persists after these changes. The GCRMSprop optimizer is implemented by overriding the get_gradients method. This method is part of a legacy Keras API and is no longer called during the training loop in Keras 3. As a result, the Gradient Centralization logic is never actually applied. The example is currently comparing a model with a standard RMSprop optimizer against another model with the exact same standard RMSprop optimizer.

To fix this, the optimizer customization needs to be updated to the Keras 3 API, likely by overriding the compute_gradients method. I've left detailed comments with a suggested implementation in the relevant files. Addressing this is crucial for the example to correctly demonstrate the effects of Gradient Centralization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants