Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sparse updates for optimizers using DecoupledWeightDecay. #21789

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
from tensorflow.python.training import momentum as momentum_opt
from tensorflow.python.training import optimizer
from tensorflow.python.util.tf_export import tf_export
from tensorflow.python.ops import array_ops


class DecoupledWeightDecayExtension(object):
Expand Down Expand Up @@ -159,8 +160,8 @@ def _decay_weights_op(self, var):

def _decay_weights_sparse_op(self, var, indices, scatter_add):
if not self._decay_var_list or var in self._decay_var_list:
return scatter_add(var, indices, -self._weight_decay * var,
self._use_locking)
update = -self._weight_decay * array_ops.gather(var, indices)
return scatter_add(var, indices, update, self._use_locking)
return control_flow_ops.no_op()

# Here, we overwrite the apply functions that the base optimizer calls.
Expand Down