New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Tensorflow eager mode #670

Merged
merged 28 commits into from Jan 9, 2019

Conversation

Projects
None yet
4 participants
@kuroko1t
Copy link
Contributor

kuroko1t commented Nov 30, 2018

This is a PR for adding support to Tensorflow eager mode. Allreduce and Broadcast, Allgather passes unit tests and mnist sample.

I made allreduce in the form that wraps tf.GradientTape like graph mode wraps tf.train.Optimizer.

@CLAassistant

This comment has been minimized.

Copy link

CLAassistant commented Nov 30, 2018

CLA assistant check
All committers have signed the CLA.

@tgaddair
Copy link
Collaborator

tgaddair left a comment

Thanks for the PR @kuroko1t! I added a few comments, please let me know if I misunderstood something, as I'm not super familiar with tf.GradientTape(). Thanks!

from common import mpi_env_rank_and_size


class MPITests(tf.test.TestCase):

This comment has been minimized.

@tgaddair

tgaddair Nov 30, 2018

Collaborator

Apart from enabling eager execution, is there any difference between these tests and test_tensorflow.py? Based on the expected difficulty of keeping these tests in sync, I think we should instead remove this file and instead use multiple test fixtures with test_tensorflow.py to enable/disable eager execution:

https://docs.pytest.org/en/latest/example/parametrize.html#indirect-parametrization-with-multiple-fixtures

This comment has been minimized.

@kuroko1t

kuroko1t Dec 9, 2018

Contributor

thanks for your comment. modify utest . i used subtest for eager test. parameterization is for pytest method and 2 different mode is must be executed.

@@ -85,8 +85,10 @@ def _allreduce(tensor, name=None):
A tensor of the same shape and type as `tensor`, summed across all
processes.
"""
if name is None:
if name is None and not tf.executing_eagerly():

This comment has been minimized.

@tgaddair

tgaddair Nov 30, 2018

Collaborator

Is it necessary to make eager execution a special case here? For some things, like the upcoming autotuning framework, we rely on differentiating tensors by their name for things like detecting loops.

This comment has been minimized.

@kuroko1t

kuroko1t Dec 10, 2018

Contributor

when eager mode, tensor.name() is meaningless.So i add this code.reference here
If we need to distinguish tensor, we need to use something that changes to tensor.name ().
I am looking into that way.

This comment has been minimized.

@tgaddair

tgaddair Dec 14, 2018

Collaborator

Makes sense. One issue here is that older versions of TF <1.7 will not support tf.executing_eagerly(). I made a change in #704 to address this by adding a new utility function. I think this should work for now.

This comment has been minimized.

@kuroko1t

kuroko1t Dec 18, 2018

Contributor

I see. I update it


class DistributedGradientTape(tf.GradientTape):
"""An tape that wraps another tf.GradientTape, using an allreduce to
average gradient values before applying gradients to model weights."""

This comment has been minimized.

@tgaddair

tgaddair Nov 30, 2018

Collaborator

Please add comments for all the parameters.

This comment has been minimized.

@kuroko1t

kuroko1t Dec 10, 2018

Contributor

I added a comment.

super(DistributedGradientTape, self).__init__(
persistent=False, watch_accessed_variables=True)

def __enter__(self, *args, **kwargs):

This comment has been minimized.

@tgaddair

tgaddair Nov 30, 2018

Collaborator

One option to cut down on some of this delegation, we can use a technique similar to what we do for PyTorch and Keras, where we dynamically create a new class by wrapping the input optimizer. On the flip side that can require coupling your distributed tape to the state of the input tape, which may not be ideal.

This comment has been minimized.

@kuroko1t

kuroko1t Dec 14, 2018

Contributor

I change it to create new class.

Show resolved Hide resolved horovod/tensorflow/__init__.py Outdated

@tgaddair tgaddair requested a review from alsrgv Nov 30, 2018

# from rank 0 to all other processes. This is necessary to ensure consistent
# initialization of all workers when training is started with random weights
# or restored from a checkpoint.
hvd.bcast(0, mnist_model.variables) if batch == 0 else None

This comment has been minimized.

@tgaddair

tgaddair Dec 1, 2018

Collaborator

I think you can move this up above the main loop. No need to track the broadcast in the chain rule.

This comment has been minimized.

@kuroko1t

kuroko1t Dec 11, 2018

Contributor

thanks. i think this is load model and variables. So broadcast needs to execute after this line.

kuroko1t added some commits Dec 9, 2018

@@ -97,6 +97,17 @@ def broadcast_global_variables(root_rank):
return tf.group(*[tf.assign(var, broadcast(var, root_rank))
for var in tf.global_variables()])

def bcast(root_rank, variables):

This comment has been minimized.

@tgaddair

tgaddair Dec 14, 2018

Collaborator

Can you rename this to broadcast_variables?

This comment has been minimized.

@kuroko1t

kuroko1t Dec 18, 2018

Contributor

sure. i renamed to broadcast_variable

def gradient(self, target, sources, output_gradients=None):
gradients = super(self.__class__, self).gradient(target, sources, output_gradients)
if size() > 1:
averaged_gradients = []

This comment has been minimized.

@tgaddair

tgaddair Dec 14, 2018

Collaborator

Can you refactor this per the changes to DistributedOptimizer in #704? Basically, without using tf.contrib.eager.defun, this will be very slow.

This comment has been minimized.

@kuroko1t

kuroko1t Dec 18, 2018

Contributor

ok. i updated it

dtypes = [tf.int32, tf.int64, tf.float16, tf.float32, tf.float64]
dims = [1, 2, 3]

def graph_and_eager(dtype, dim):

This comment has been minimized.

@tgaddair

tgaddair Dec 14, 2018

Collaborator

There's an annotation you can use to accomplish much the same thing:

from tensorflow.python.framework.test_util import run_in_graph_and_eager_modes

This comment has been minimized.

@kuroko1t

kuroko1t Dec 18, 2018

Contributor

thanks. i changed to use run_in_graph_and_eager_modes.

@alsrgv
Copy link
Collaborator

alsrgv left a comment

Thanks for the PR! Added a few comments as well.

Show resolved Hide resolved horovod/tensorflow/__init__.py
dict(_DistributedGradientTape.__dict__))
return cls(gradtape._persistent, gradtape._watch_accessed_variables,
gradtape._tape,
device_dense='', device_sparse='',

This comment has been minimized.

@alsrgv

alsrgv Dec 18, 2018

Collaborator

Should pass device_dense and etc. correctly.

This comment has been minimized.

@kuroko1t

kuroko1t Dec 20, 2018

Contributor

i updated it.

Show resolved Hide resolved horovod/tensorflow/__init__.py

kuroko1t added some commits Dec 20, 2018

@alsrgv
Copy link
Collaborator

alsrgv left a comment

Code largely looks good. Let's make sure it passes all the tests. Added a few comments.

Show resolved Hide resolved examples/tensorflow_mnist_eager.py Outdated
Show resolved Hide resolved examples/tensorflow_mnist_eager.py Outdated
Show resolved Hide resolved examples/tensorflow_mnist_eager.py
Show resolved Hide resolved horovod/tensorflow/__init__.py Outdated
Show resolved Hide resolved test/test_tensorflow.py Outdated
Show resolved Hide resolved horovod/tensorflow/__init__.py Outdated
Show resolved Hide resolved test/test_tensorflow.py Outdated
@alsrgv
Copy link
Collaborator

alsrgv left a comment

Left few comments.

def main(_):
if not hasattr(tf, 'enable_eager_execution'):

This comment has been minimized.

@alsrgv

alsrgv Jan 3, 2019

Collaborator

We don't need this check in the example, just in the tests. We can skip running example for old TF via if statement in .travis.yml

This comment has been minimized.

@alsrgv

alsrgv Jan 8, 2019

Collaborator

Let's get rid of this check.



if __name__ == '__main__':
run_all_in_graph_and_eager_modes(MPITests)

This comment has been minimized.

@alsrgv

alsrgv Jan 3, 2019

Collaborator

I wonder... could we have a single IF here to test if eager mode is available, and then use it, w/o changing rest of the code?

@alsrgv

This comment has been minimized.

Copy link
Collaborator

alsrgv commented Jan 8, 2019

@kuroko1t, thanks for the updates - left one more comment.

@alsrgv

alsrgv approved these changes Jan 8, 2019

Copy link
Collaborator

alsrgv left a comment

LGTM, @tgaddair, could you take a look as well?

@tgaddair
Copy link
Collaborator

tgaddair left a comment

LGTM!

@alsrgv alsrgv merged commit 65bb358 into uber:master Jan 9, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
license/cla Contributor License Agreement is signed.
Details

@alsrgv alsrgv referenced this pull request Jan 10, 2019

Merged

Tensor fusion for allgather #732

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment