-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat]support horovod sync train #205
Conversation
Hi @a6802739 , thank you for your contribution! |
tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_optimizer.py
Outdated
Show resolved
Hide resolved
tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_optimizer.py
Outdated
Show resolved
Hide resolved
tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_optimizer.py
Show resolved
Hide resolved
tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_optimizer.py
Outdated
Show resolved
Hide resolved
da36228
to
7475fe3
Compare
tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_optimizer.py
Outdated
Show resolved
Hide resolved
058054a
to
5d212c8
Compare
5d212c8
to
f0b1ae2
Compare
aa8df5e
to
d0bbbed
Compare
aggregated_grad.append(None) # pass-through. | ||
continue | ||
elif isinstance(grad, ops.Tensor): | ||
aggregated_grad.append(hvd.allreduce(grad, op=hvd.Sum)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And just for discussion, I noticed that Horovod implements lots of features and optimizations on its DistributedOptimizer like tensor fusion, grouped allreduce, adasum, gradient compression, etc.., in the latest-nth versions. Use the hvd.allreduce API may not reuse these features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, For tensor fusion
, it's set by environment variable, and it's turn on by default, for most recommend situation, I think there is no need to care about grouped allreduce
or gradient compression
. But I think we could find better way to open reduction operation
to the user. If we directly let user specify hvd.Sum
or hvd.adamsum
, we should let them import horovod
before apply_gradients
, or we could let user specify a string the reduction method like sum
, we map this method to the corresponding hvd op
, but if horovod add some reduction op, we should change the code to be compatible with newest horovod version.
if grad is None: | ||
aggregated_grad.append(None) # pass-through. | ||
continue | ||
elif isinstance(grad, ops.Tensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MLP layers, downstream from embedding, may also generate IndexedSlices gradient, need to notice that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I never notice MLP layers could generate IndexedSlices gradient, could you give me an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I never notice MLP layers could generate IndexedSlices gradient, could you give me an example?
For example:
var = tf.Variable(...)
params = de.get_variable(...)
emb = de.embedding_lookup(...)
latent_tensor = sum_pooling(emb)
... = some_func(latent_tensor, var)
The some_layer
are defined as:
def some_func(latent, var):
mask = tf.greater_equal(var, threshold)
pos = tf.where(mask)
selected = tf.gather(var, pos)
...
The some_func
code will make the gradient become IndexedSlices to var
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, Thanks. Now I delelte the judgement isinstance(grad, ops.Tensor)
. So Horovod will check it's grad type and deal with it.
8f415dd
to
635f55a
Compare
635f55a
to
e9e525f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
TrainableWrapper
's grad, keep same with before.