Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Stateful Containers with tf.Module #56

Merged
merged 9 commits into from Feb 21, 2019

Conversation

@tomhennigan
Copy link
Member

commented Jan 17, 2019

This review is open for public comment until 2019-02-11

Status Accepted
Author(s) DeepMind: tomhennigan@google.com, mareynolds@google.com
Google: agarwal@google.com, apassos@google.com, dberth@google.com
Sponsor apassos@google.com
Updated 2019-01-16

Summary

We propose a lightweight base class (tf.Module) for stateful containers in TensorFlow. This base class offers three main features:

  • Naming - variables and ops created inside modules are created within a name scope so they can easily be tracked back to their creator.
  • Variable tracking - variables referenced by a module (directly or via dependencies on other modules) can be retrieved via a simple API.
  • Module tracking - given a module a simple API exposes all modules that this module depends on.

Users are encouraged to subclass tf.Module to create their own stateful components and we expect and encourage an ecosystem of third party modules to evolve.

Thanks in advance for your feedback!

@ewilderj ewilderj changed the title RFC tf.Module RFC: Stateful Containers with tf.Module Jan 17, 2019
@ewilderj ewilderj added this to Needs attention in RFC management via automation Jan 17, 2019
@ewilderj ewilderj moved this from Needs attention to Open reviews in RFC management Jan 17, 2019

**`tf.Variable` creation and reuse**

- Modules should be able to create and reuse variables.

This comment has been minimized.

Copy link
@alextp

alextp Jan 17, 2019

Member

What do you mean by reuse here?

My understanding is that modules only create variables, and reuse is done by reusing modules, right?

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Jan 18, 2019

Author Member

Your understanding is correct. Variable creation (self.w = tf.Variable(..)) and reuse (tf.whatever(some_input, self.w)) is enabled via OOP (no magic).

We considered re-implementing variable_scope and get_variable but felt that the OOP design was a lot cleaner and more Pythonic.

def apply_gradients(self, vars_and_grads):
for variable, grad in vars_and_grads:
variable.assign_sub(grad * self.learning_rate)
```

This comment has been minimized.

Copy link
@alextp

alextp Jan 17, 2019

Member

I'd like to comment here on the following points:

  1. Module should be the base class for tf.keras.Layer
  2. We explicitly chose not to force a call signature or build() strategy
  3. Pros and cons of the name Module (familiarity with pytorch being a big issue here)

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Jan 18, 2019

Author Member

Agreed re (1) and re (2) I call out why a bit in the detailed design if anyone wants more context.

Re (3) I feel strongly that Module is the right name and what users will expect. The term Module and its usage in ML is almost as old as I am, the oldest reference I know about (h/t @jekbradbury for the reference) is from SN/Lush [0]:

In Lush, building and training a complex system is done by assembling basic blocks, called modules. Modules are subclasses of the class gb-module.

[0] http://lush.sourceforge.net/lush-manual/51d13a16626fd68f.html

@awav

This comment has been minimized.

Copy link

commented Jan 17, 2019

@tomhennigan, That is awesome! It is very similar to what we have in GPflow - Parameterized class. With this change in TensorFlow we will be able to serialize our objects (currently we have custom saver) and integrate them with Keras models.

Although, RFC could be extended with array-like modules, but don't confuse it with keras sequence. Keras sequence implies dependency between layers/modules, whereas array-like module is a different approach for organising submodules with indexing restricted to integer.

E.g. in GPflow we can construct a sum of the kernels (modules) and represent it as a super-kernel class or a combination kernel. Essentially, a combination kernel is a list of kernels, and when we call, let's say, __call__ method, it sums over sub-kernels results.

I'm happy to give more examples if needed. Thanks!

'custom_name/v:0'
```

Modules created inside methods of other modules inherit their parent module

This comment has been minimized.

Copy link
@leandro-gracia-gil

leandro-gracia-gil Jan 18, 2019

How variable scopes would be affected in graph mode? For example, let's say we have a module named 'bar' that creates a tf.Variable 'v' and a tf.Tensor 't' inside __ call __. Then, we call our module while inside a 'foo' name scope. What would be the names in that case? Maybe 'bar/v' for the tf.Variable and 'foo/bar/t' for the tf.Tensor?

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Jan 18, 2019

Author Member

Whilst the design works in TF1 we have designed with TF2 in mind. In TF2 there are no variable scopes (we use name scope to name variables and tensors). In eager mode tf.Tensor does not have a name property, although if you inspect the graph from a @tf.function you do see tensor names.

Module inherits the name_scope from when the module is constructed. I've run the following with TF1 and as you can see the 'use' name_scope makes no difference inside the module but is applied when using the result of calling the module:

class BarModule(tf.Module):
  def __call__(self):
    v = tf.Variable(1., name='v')
    t = tf.constant(1., name='t')
    return v, t

with tf.Graph().as_default():  # Enable TF1 graph mode.
  with tf.name_scope('create'):
    bar = BarModule()

  with tf.name_scope('use'):
    for i in range(2):
      print('iter', i)
      v, t = bar()
      x = v * t
      print('- v', v.name)
      print('- t', t.name)
      print('- x', x.name)
      print()
iter 0
- v create/bar_module/v:0
- t create/bar_module/t:0
- x use/mul:0

iter 1
- v create/bar_module/v_1:0
- t create/bar_module/t_1:0
- x use/mul_1:0
self.v = tf.Variable(1., name='v')
>>> m = MyModule()
>>> m.name

This comment has been minimized.

Copy link
@leandro-gracia-gil

leandro-gracia-gil Jan 18, 2019

What happens if the module name ends in '/'? Usually that is used to set absolute name scopes ignoring the current stack. Similarly, what happens if a child module does it?

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Jan 18, 2019

Author Member

In the experimental implementation of this I've enforced (in the constructor) that the module name is a valid Python 2 identifier [0], so / is not allowed.

[0] https://docs.python.org/2/reference/lexical_analysis.html#identifiers

@leandro-gracia-gil

This comment has been minimized.

Copy link

commented Jan 18, 2019

Thank you very much for this proposal! We have a remarkably similar class in our custom framework, where we use it as the base for all our own layers and composable networks. It makes me happy to see something like this being added to TF 2.0.

Given the similarity, I'll try to keep an eye to the review in case I notice anything that reminds me of any problems we had developing or using our own approach, but so far I'd say it looks pretty good.

@guillaumekln

This comment has been minimized.

Copy link

commented Jan 18, 2019

I was under the impression that inheriting from tf.keras.layers.Layer was the recommended way to create stateful and reusable components in TensorFlow 2.0. This proposal makes it more nuanced. IMO it should clarify when one should extend tf.Module versus extend tf.keras.layers.Layer.

@tomhennigan

This comment has been minimized.

Copy link
Member Author

commented Jan 18, 2019

I was under the impression that inheriting from tf.keras.layers.Layer was the recommended way to create stateful and reusable components in TensorFlow 2.0. This proposal makes it more nuanced. IMO it should clarify when one should extend tf.Module versus extend tf.keras.layers.Layer.

If this RFC is accepted @alextp and I hope that we can make tf.keras.layers.Layer extend tf.Module (instead of Checkpointable as it does today). Module can help make the implementation of Layer simpler by implementing part of it's API (e.g. the Layer.variables property). We have not attempted to do this yet and I'm not sure if there would be any technical blockers.

Conceptually I see Keras Layer as enriched Modules with additional features (e.g. build(shapes)/call(inputs) pattern, json/yaml serialization, regularizers, losses, metrics, masking etc). If these features are useful to you, or if you are using your subclass in a Keras Model then Keras Layer is a good choice.

If the additional functionality of Keras is not useful for you then you may prefer to subclass tf.Module which is simpler with far fewer features. If tf.Module is not useful you may want to subclass Checkpointable or just object 😄

@tomhennigan

This comment has been minimized.

Copy link
Member Author

commented Jan 18, 2019

@tomhennigan, That is awesome! It is very similar to what we have in GPflow - Parameterized class. With this change in TensorFlow we will be able to serialize our objects (currently we have custom saver) and integrate them with Keras models.

Serialization (as in saving/restoring parameters) should work great (since we extend Checkpointable). I think you'll have a better time integrating with Keras Model if you extend Layer instead. We're hoping to make tf.Module a base class for Keras Layer but that is not included as part of this proposal (see this comment for more context).

Although, RFC could be extended with array-like modules, but don't confuse it with keras sequence. Keras sequence implies dependency between layers/modules, whereas array-like module is a different approach for organising submodules with indexing restricted to integer.

E.g. in GPflow we can construct a sum of the kernels (modules) and represent it as a super-kernel class or a combination kernel. Essentially, a combination kernel is a list of kernels, and when we call, let's say, call method, it sums over sub-kernels results.

I think this should "just work". I tried creating a simple example, please let me know if I've completely misunderstood what you're trying to achieve:

class MulKernel(tf.Module):
  def __init__(self, m, dtype=tf.float32, name=None):
    super(MulKernel, self).__init__(name=name)
    self.m = tf.Variable(tf.constant(m, dtype=dtype), name='m')

  def __call__(self, x):
    return self.m * 1

class ArrayModule(tf.Module):
  def __init__(self, kernels):
    super(ArrayModule, self).__init__()
    self.kernels = kernels

  def __call__(self, x):
    x = tf.convert_to_tensor(x, name='x')
    results = [kernel(x) for kernel in self.kernels]
    return tf.reduce_sum(results)

a = ArrayModule([MulKernel(i, name='mul_%d' % i) for i  in range(10)])
y = a(1.)

Which produces this rather pretty graph:

screenshot from 2019-01-18 11-09-26

@property
def variables(self):
"""Returns a tuple of variables owned by this module and it's submodules.

This comment has been minimized.

Copy link
@superbobry

superbobry Jan 18, 2019

Member

I think list is a better fit conceptually, since the type in question is [Variable] rather than (Variable, Variable, ...). Maybe we could simply remove the type from the docstring or replace it with something less restrictive e.g. "collection"?

The same applies to other *variables properties below.

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Jan 18, 2019

Author Member

Thanks for the IRL discussion, I'll use "Collection" in the docstring and still return a tuple at least for now.

This comment has been minimized.

Copy link
@josh11b

josh11b Jan 18, 2019

Member

In general, returning a tuple makes the API a bit more efficient since you don't have to return a copy to avoid callers mutating your internal data structures.

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Jan 18, 2019

Author Member

We agree returning a tuple is the right decision, but the signature/contract of the method should be less specific to allow this to change in future (e.g. if there is a faster immutable collection we might want to use that instead).

If we could include type hints in TF code I would go with the following:

@property
def variables(self) -> typing.Collection[tf.Variable]:
  ...

This comment has been minimized.

Copy link
@chr1sj0nes

chr1sj0nes Feb 4, 2019

Contributor

Perhaps typing.Sequence would be better here? Importantly, the returned variables have a well-defined, deterministic ordering.

your base class.
- Encouraging a lazy pattern might make it harder to create modules with
multiple forward methods (e.g. a VAE module might be implemented with an
`encode` and `decode` method with no `__call__`).

This comment has been minimized.

Copy link
@superbobry

superbobry Jan 18, 2019

Member

Would the users have to override __call__ with NotImplementedError in that case?

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Jan 18, 2019

Author Member

We prototyped it such that you didn't have to implement __call__ (e.g. _call was not @abstractmethod), but if users called your module then we throw NotImplementedError. Even this has downsides though, like introspection/generated docs making it look like you had a call method.

This comment has been minimized.

Copy link
@gehring

gehring Jan 18, 2019

@tomhennigan, I agree 100% with this decision. There is no need to bloat the tf.Module class more than is absolutely required. Since the goal of this class is to handle stateful DAGs of modules, it is best to focus efforts on that.

As pointed out in this doc, there are many use cases where we would want to handle stateful DAGs but don't require a __call__ method. I see it the responsibility of other libraries/sub-package to extend the functionality of tf.Module. I find specialized classes much easier to extend and reason about.

each leaf. This means variables nested in properties will be found (e.g.
variables in `list`s or `tuple`s on the module).

This "walk" function is used to provide the following properties:

This comment has been minimized.

Copy link
@superbobry

superbobry Jan 18, 2019

Member

Are there properties computed once per instance, or are they fully dynamic?

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Jan 18, 2019

Author Member

Fully dynamic, trying to reason about when to invalidate the caches would be non-trivial.

This comment has been minimized.

Copy link
@gehring

gehring Jan 18, 2019

I'm thinking the performance cost of having this dynamic should be well documented so that users know to cache the results of these calls if using eager and good performance is required. Though, if everything works seamlessly with tf.function, then maybe this wouldn't be necessary and encouraging users to use tf.function for better performance would be sufficient.

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Jan 18, 2019

Author Member

Agreed re documentation, I'll add a note in the docstrings.

For a 100 linear layer sequential it takes 2ms to collect all 200 parameters (at least on my workstation). This could be worse for modules with deep hierarchies or large nests.

Assuming the call to trainable_variables (or whatever) is inside the tf.function this cost is only paid when the function is traced (~once).

@awav

This comment has been minimized.

Copy link

commented Jan 18, 2019

@tomhennigan

I think you'll have a better time integrating with Keras Model if you extend Layer instead.

The finest abstraction is not actually deep learning model (is that correct that keras still a deep learning dedicated framework?), and not a model at all. The tf.Module perfectly fits our needs. But, in some cases this abstract blocks can be included in layer compostion, e.g. head of the sequence is a CNN and output of CNN passes though a GP to get final classification result, https://arxiv.org/abs/1511.02222.

If this RFC is accepted @alextp and I hope that we can make tf.keras.layers.Layer extend tf.Module (instead of Checkpointable as it does today). Module can help make the implementation of Layer simpler by implementing part of it's API (e.g. the Layer.variables property). We have not attempted to do this yet and I'm not sure if there would be any technical blockers.

I'd say, making keras layers depend on tf.Module should be one of the priorities for this RFC - unification of interfaces in TensorFlow. It diminishes a barrier for users, who inherit from tf.Module and want to take advantage of keras features.

Serialization (as in saving/restoring parameters) should work great (since we extend Checkpointable)

That's great!

I think this should "just work". I tried creating a simple example, please let me know if I've completely misunderstood what you're trying to achieve: ...

If tf.Module can find sub-module's in an attribute, which is actually a list, and properties like variables, trainable_variables and etc return correct objects then the problem is solved.

@tomhennigan

This comment has been minimized.

Copy link
Member Author

commented Jan 18, 2019

@awav

I'd say, making keras layers depend on tf.Module should be one of the priorities for this RFC - unification of interfaces in TensorFlow.

We've been mindful to design tf.Module in such a way that it could be a stateful container for a number of components in TensorFlow (not just tf.keras), but I think integrating it well with tf.keras requires a separate design/RFC process. I'm not planning to address that here.

If tf.Module can find sub-module's in an attribute, which is actually a list, and properties like variables, trainable_variables and etc return correct objects then the problem is solved.

Yep, this works as you would expect (we use tf.contrib.framework.nest.flatten(prop) for each object property and walk all leaves).

@martinwicke

This comment has been minimized.

Copy link
Member

commented Jan 18, 2019

@tomhennigan

This comment has been minimized.

Copy link
Member Author

commented Jan 18, 2019

Or Iterable? We could adjust the docstring now.

Collection is better since it indicates the returned value supports contains and len (iterable just indicates it has __iter__). Agreed updating the docstring is clearest.

Come 2020, we will drop Py2 support, and all this typing can actually
happen for real.

Can't wait 😄count me in on a fixit to go through and add types!

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 21, 2019
Addresses comments from tensorflow/community#56.

PiperOrigin-RevId: 230225963
@sguada

This comment has been minimized.

Copy link
Member

commented Jan 25, 2019

+1 for this proposal, it strikes a great balance between abstraction and usability.


**`tf.Module` tracking**

- Modules should track dependencies (e.g. a sequential should provide access

This comment has been minimized.

Copy link
@lc0

lc0 Jan 30, 2019

Something that started on twitter -> https://twitter.com/lc0d3r/status/1090351739537145858

Since we are trying to also

track dependencies (e.g. a sequential should provide access to its layers) and support checkpointing as a graph.

Would it make sense to think about the next step and have on the model level an approachable way to have access to DAG.

The idea is, once you inherit from tf.Module you can run something similar to keras model.summary and even more exciting - model.plot_model.

Sorry for a bit confusing way to formulate this suggestion

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Jan 30, 2019

Author Member

Hey @lc0, thanks for the suggestion! We've been thinking about something similar. I'm not sure if we should solve this inside tf.Module or if this is something we should leave to framework authors, what do you think?

FWIW the current module APIs expose a lot of useful information to do a modest job of this outside of Module:

import tabulate

def weight_size(m):
  size = 0
  for v in m.variables:
    size += v.numpy().nbytes

  suffix = 'B'
  suffixes = ['KB', 'MB', 'GB', 'TB']
  while size > 1024 and suffixes:
    size /= 1024
    suffix = suffixes.pop(0)
  return '%d %s' % (size, suffix)

def summarise(module):
  rows = []
  for m in module.submodules:
    row = [m.name, m.__class__.__name__, weight_size(m)]
    rows.append(row)
  print(tabulate.tabulate(rows, headers=['attr', 'class', 'size'], tablefmt='grid'))
+----------+---------+--------+
| attr     | class   | size   |
+==========+=========+========+
| linear_0 | Linear  | 8 KB   |
+----------+---------+--------+
| linear_1 | Linear  | 4 MB   |
+----------+---------+--------+
| linear_2 | Linear  | 4 MB   |
+----------+---------+--------+
| linear_3 | Linear  | 40 KB  |
+----------+---------+--------+

This comment has been minimized.

Copy link
@episodeyang

episodeyang Sep 16, 2019

The table seems excessive. Here is something with less chrome:

 attr      class    size  
-------------------------
 linear_0  Linear   8 KB   
 linear_1  Linear   4 MB   
 linear_2  Linear   4 MB   
 linear_3  Linear   40 KB  

Alternatively, you can use the pyTorch pattern, which IMO is cleaner:

YourFantasticMode: (
    (linear_0) Linear(100, 20, 100) 8 KB
    (linear_1) Linear(20, 40, 100) 3.2 KB
    (linear_2) Linear(40, 20, 100) 3.2 KB
    (linear_3) Linear(20, 20, 100) 1.6 KB
)
>>> m.name_scope.name
'my_module/'
>>> m.v.name
'my_module/v:0'

This comment has been minimized.

Copy link
@pluskid

pluskid Feb 1, 2019

Are the variable names (with scope) attached upon the variable creation? An alternative might be the pytorch way that a variable does not know about its (parent's) scope, and the full scoped name is only constructed on the fly depending on which parent module are we dealing with (because a module can be re-used in different parent modules, and they might even be constructed externally and pass in via the constructor of a custom module).

I'm thinking about the following pseudocode example:

class FunnyAutoEncoder(tf.Module):
  def __init__(self, encoder, decoder, name=None):
    super(...)
    self.encoder = encoder
    self.decoder = decoder

  def __call__(self, x):
    z = self.encoder(x)
    z2 = tf.funny_op(z)
    return self.decoder(z2)


encoder = MLP(...)
decoder = MLP(...)
ae = FunnyAutoEncoder(encoder, decoder)

then ae.encoder.layerX.w.name will probably not show anything related to being under an auto encoder? Instead, it will be just with a top-level 'mlp/' scope?

I think a major scenario of having full scoped names available is to iterate through all the variables in a module, and do some inspection / processing to the variable according to the names. In this case, it might be more useful to have the (scoped) names reflect what role each variable is playing in the module.

However, I guess one drawback might be to detect potential duplication. For example, if one does this

encoder = ResidualBlock(...)
decoder = encoder
ae = FunnyAutoEncoder(encoder, decoder)

then the same variables might be listed twice under different (scoped) names. But I guess this situation is rather rare, and it is probably to explicitly define a FunnyAutoEncoderSharedEncoderDecoder that has only one encoder_and_decoder field.

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Feb 1, 2019

Author Member

So currently the proposal is that name scopes are determined when the module is constructed and from that point it is fixed per module. For name based filtering you can do something like this:

encoder = MLP(..., name='encoder')
decoder = MLP(..., name='decoder')
ae = FunnyAutoEncoder(encoder, decoder)

encoder_vars = encoder.variables
# .. or ..
encoder_vars = [v for v in ae.variables if v.name.startswith('encoder/')]

It's worth pointing out that you can also enter a name scope and build modules inside it if you want to group a set of modules that way:

with tf.name_scope('encoder'):
  encoder = MLP(...)  # name_scope will be `encoder/mlp/`
decoder = MLP(..., name='decoder')  # name_scope := `decoder/`
ae = FunnyAutoEncoder(encoder, decoder)  # name_scope := `funny_auto_encoder/`

Is that flexible enough for your use cases? We've had a similar pattern in Sonnet and I've not heard any complaints from users about it, but perhaps there's something I'm missing!

This comment has been minimized.

Copy link
@pluskid

pluskid Feb 1, 2019

@tomhennigan Thanks! I'm not sure if it is because I'm too used to the other way of thinking. Let me state the use cases in my mind and we can see if it is easy to do in the current proposal. I think the main inflexibility of the current approach is that in order to have a meaningful namescope hierarchy, we can only construct the model from top-down according to the hierarchy. It will be difficult, for example, to construct some part of the component (e.g. the encoder from the example above) without knowing beforehand what that component is going to be used for (i.e. in your example, you explicitly pass the name='encoder' argument, indicating you know that it is going to be used as an encoder).

This problem might be more pronounced when one want to write a generic library of commonly used (sub)nets. For example, in library1 (say Sonnet2), I have a Module that defines a default ResNet implementation. In library2, which is a general RL training framework, that allow one to pass in a Module to be used for each specific component (e.g. the Vision part of the agent). Now as the user, if I want to use the ResNet in the vision part of the agent, I might need to use some hack to figure out what proper name to pass into when constructing the ResNet to make it consistent with the naming hierarchy in the agent model. Moreover, if the agent is grouping all its vision part code under the name scope agent/vision, then I cannot pass agent/vision as name= parameter to ResNet because it is not a proper python variable name.

My guess that this is not a problem in Sonnet(v1) is that Sonnet(v1) is still graph based library with explicit build() function. In other words, when you create the Python object, the underlying graph building (therefore the naming and scoping) has not been called yet. Therefore, one can also change the name scoping when the build is actually being called. Here is some pseudo-code of my understanding of what might happen in Sonnet V1

class Agent(...):
  def __init__(self, vision_net, ...):
    self.vision_net = vision_net
    # ....

  def build(self, ...):
    with tf.name_scope(self.name + '/vision'):
      self.vision_net.build(...)

On the other hand, with the current proposal, users could (and it seems recommended) create variable in the python object __init__ function, which will attach name to the variable upon creation. Therefore, one need to know the exact name of the parent hierarchy upon creating that python object. Once the Module object is created, the parent module can no longer re-wire its name scope as in the case of Sonnet V1. Is my understanding correct here?

This comment has been minimized.

Copy link
@brilee

brilee Feb 5, 2019

Member

I agree with @pluskid that dependency injection should at least be possible. My preference is for DI to be the recommended way to build up modules, but I'd understand if others disagreed.

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Feb 8, 2019

Author Member

Thanks for the IRL discussions @pluskid, @brilee, @superbobry and @chr1sj0nes.

To summarize our discussions there are many cases where there are multiple paths to weights in a model (e.g. model.encoder.w might be the same instance as model.decoder.w) and this is somewhat confusing since in TensorFlow variables have a single, immutable name (e.g. w's name would be encoder/w:0 in both cases if encoder created it).

Perhaps the "right" solution is to remove the name from variables in Python, and instead use Python variable names to "name" variables. Sadly this is not in scope for TensorFlow 2, and we agreed that the current strategy of variable naming at creation time was OK for simple cases and better than not naming.

We discussed that the python variables or object structure can be used instead to associate a path which each parameter and in the above case you would have two paths to w. We agreed that it would be convenient to have some easy way to traverse the structure of arbitrary Modules in a way that have some simple protocol (e.g. somewhere inside the model there is a property called encoder and somewhere inside the model there is a property called decoder) to filter these paths.

In tensorflow/tensorflow@1d5fd92 I added with_path to Module's _flatten method. This allows you to do exactly that. For example you can find paths in some arbitrary model to all tf.Variables inside a (possibly deeply nested) Module in a property called encoder:

>>> class MyModule(tf.Module):
...   @property
...   def state_dict(self):
...     return dict(self._flatten(
...         predicate=lambda v: isinstance(v, tf.Variable), with_path=True))

>>> mod = MyModule()
>>> mod.encoder = Encoder()
>>> mod.decoder = mod.encoder

>>> mod.state_dict
{('encoder', 'w'), <tf.Variable: ...>,
 ('decoder', 'w'), <tf.Variable: ...>}

>>> {k: v for k, v in mod.state_dict.items() if 'encoder' in k}
{('encoder', 'w'), <tf.Variable: ...>}

I hope that helps 😄. h/t @gehring who was asking about how to determine "depth" in another comment. I think this with_path option to _flatten (or the recursive flag) would help solve that too.

... self.b = tf.Variable(tf.zeros([output_features]), name='b')
...
... def __call__(self, x):
... x = tf.convert_to_tensor(x, name='x')

This comment has been minimized.

Copy link
@pluskid

pluskid Feb 1, 2019

I feel it is important to avoid having the users to call tf.convert_to_tensor every time they define their own __call__ function. One might want to define implicit pre-call hook to do the conversion automatically. Or better approach might be to explicitly require the API to only accept Tensor (or nested tuple of tensors) as inputs, and ask user to do explicit conversion before calling from top level.

This is because many of the modules are probably to be used as building blocks in bigger architectures (e.g. residual blocks inside a resnet). They will not directly interact with the user most of the time, so there is probably no need for them to call convert_to_tensor every time, but it might be better to enforce the API to have consistent behavior.

This comment has been minimized.

Copy link
@gehring

gehring Feb 1, 2019

While I agree with you that it would be good for any callable subclass of tf.Module to have some automatic conversion to tensor (e.g., in an implementation of Layer 2.0), this is beyond the scope of tf.Module. The Dense class example only serves to demonstrate how one would subclass tf.Module. Let me try to lay out what I understand the purpose of tf.Module to be and what the reason for its limited scope is. (I am not affiliated with the TF team)

After the shift to object-oriented storing of tf.Variable's, variable reuse is done by storing variables within object attributes. It would be impractical and clunky to store all variables in a unique container object (which would essentially recreate TF 1.0's collections). To avoid this, we prefer to have variables contained within the objects that would use them, e.g., kernel and bias variables as attributes of the Dense instance that uses it. While this new way of organizing variables has many advantages, it does make checkpointing and operating on the "global" variables more difficult (e.g., collecting variables for gradient descent).

By design, tf.Module is lightweight and focuses on addressing the challenges of managing nested stateful objects. Its sole purpose is to abstract away the DAG of modules for the purpose of operating on all relevant variables. This allows us to act like our top-level/root module explicitly contains all our variables. This is done by providing traversal methods which collect all the variables. As a bonus, tf.Module automatically handles the complexities of checkpointing when variables are allowed to be created before and after the checkpoint is loaded, e.g., hydration and instantiation of variables.

By limiting tf.Module's responsibilities to managing nested stateful modules, we end up with code that is easy to understand, document and extend. As a result, any developer wishing to implement some stateful containers can easily do so without intricate knowledge of how best to manage "global" state by subclassing tf.Module. Since handling stateful DAGs of modules is a very common requirement, tf.Module could easily be used by most sub-packages and libraries with stateful operations, e.g., tf.train, tf.metrics, tf.estimator. (Note, this specific proposal is arguing for the existence/inclusion of tf.Module in TF 2.0. The advantages of using tf.Module as a base class throughout tensorflow will likely be examined on case by case basis in a separate proposal.)

This comment has been minimized.

Copy link
@pluskid

pluskid Feb 1, 2019

@gehring Yes, I agree with you. I don't think doing automatic conversion is a good idea. But enforcing the inputs are Tensors are probably good (even only implicit as a statement in the documentation). Anyhow, this is a minor thing in the whole proposal.

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Feb 1, 2019

Author Member

+1 to what @gehring says. The decision of whether to call convert_to_tensor is entirely up to whoever subclasses tf.Module, there is no requirement to do so 😄

This comment has been minimized.

Copy link
@awav

awav Feb 4, 2019

I'm strongly against auto-conversion to tensors and agree with @tomhennigan, it is up to a scientist, engineers or anyone else, what to do with non tensor types.

@awav awav referenced this pull request Feb 2, 2019
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 4, 2019
I've made this "private" since this exposes the implementation of modules, which
in typical Python code does not represent part of the classes public API. As
such I think having the leading _ makes it clear (at least to the linter) that
you're using an "internal" method. As always this is a "sign" not a cop.

As a concrete example, I strongly encourage the following usage:

    >>> class FooModule(...):
    ...   @Property
    ...   def hash_tables(self):
    ...     return tuple(self._flatten(predicate=IS_HASH_TABLE))

But in general we should discourage usage outside of modules:

    >>> m = FooModule()
    >>> hash_tables = tuple(m._flatten(predicate=IS_HASH_TABLE))  # Naughty.

c.f. tensorflow/community#56

PiperOrigin-RevId: 232246092
Copy link
Member

left a comment

Love the proposal.


- Each module instance must have a name scope. This name should include the
name scope from when the module is constructed.
- Every method on the module must enter this name scope before calling user

This comment has been minimized.

Copy link
@brilee

brilee Feb 5, 2019

Member

Name scopes aren't reentrant; do you have a plan for how we can avoid creating ops with confusing names like 'mymodule/mymodule/mymodule/relu_2'?

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Feb 5, 2019

Author Member

In eager op names are not a thing so that's good.

For tf.function names are uniqueified within the function body, so you would only get an _N suffix if there were actually two relu's within the same module (or you called your module multiple times sequentially).

As a nit, scope names are re-entrant (as in you can do with tf.name_scope("foo/"): ... as many times as you like) but op names need to be unique.

Example: https://gist.github.com/tomhennigan/a183a6570d2a6fb9bce08758ee2d78b6

This comment has been minimized.

Copy link
@brilee

brilee Feb 5, 2019

Member

Thanks for the correction. I see now that this works, although I'm still a bit confused at where the _1 suffix comes from.

@tf.function
def f(x):
  name_scope = tf.name_scope('blah')
  with name_scope:
    with name_scope:
      return x + 1

g = f.get_concrete_function(tf.TensorSpec([1, 64], tf.float32)).graph
for node in g.as_graph_def().node:
  print(node.name)
  
...
x
blah_1/add/y
blah_1/add
Identity

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Feb 5, 2019

Author Member

In this case you're creating the name scope in graph mode (inside the function) and in graph mode name scopes are de-dupd. I guess the function is being traced twice (hence 'blah' followed by 'blah_1'). If you use a fully qualified name scope (e.g. tf.name_scope('blah/')) then the add op will be de-dup'd not the name_scope.

With tf.Module we use the module name to create a unique scope in the constructor [0] and then re-enter the fully qualified name whenever you use with module.name_scope: ... [1].

This is an interesting point though, in eager mode name scopes are not de-dup'd but in graph they are. We have a test explicitly defending this behavior [2] and I expect in TF2 almost all users to create modules outside a tf.function (aka. eagerly) and to call them inside tf.functions.

[0] https://github.com/tensorflow/tensorflow/blob/e0585bc351b19da39610cc20f6d7622b439dca4d/tensorflow/python/module/module.py#L176-L177
[1] https://github.com/tensorflow/tensorflow/blob/e0585bc351b19da39610cc20f6d7622b439dca4d/tensorflow/python/module/module.py#L194-L198
[2] https://github.com/tensorflow/tensorflow/blob/e0585bc351b19da39610cc20f6d7622b439dca4d/tensorflow/python/module/module_test.py#L92-L109

rfcs/20190117-tf-module.md Outdated Show resolved Hide resolved
rfcs/20190117-tf-module.md Show resolved Hide resolved
>>> m.name_scope.name
'my_module/'
>>> m.v.name
'my_module/v:0'

This comment has been minimized.

Copy link
@brilee

brilee Feb 5, 2019

Member

I agree with @pluskid that dependency injection should at least be possible. My preference is for DI to be the recommended way to build up modules, but I'd understand if others disagreed.

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 5, 2019
Addresses feedback on tensorflow/community#56.

PiperOrigin-RevId: 232457135
Thanks brilee@ for spotting!
@googlebot googlebot added the cla: yes label Feb 5, 2019
tomhennigan added 2 commits Feb 5, 2019
This wasn't ever required and including it might cause confusion.

h/t pluskid@ gehring@ and awav@
yield variables inside nested structures (lists etc) but not in other
modules.
"""
@property
def submodules(self):

This comment has been minimized.

Copy link
@gehring

gehring Feb 5, 2019

@tomhennigan Is this done depth first or breadth first? I feel like it could be a good thing for tf.Module to offer a way to infer the depth of submodules/variables. Without owned_*, what would be the best way for a user to do this without re-implementing tf.Module's logic?

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Feb 5, 2019

Author Member

Breadth first.

So there are two options (frustratingly some recent commits haven't been sync'd to GitHub so I can't just point you at the code!).

Firstly we've made it really easy for subclasses to implement this if the want by exposing _flatten:

class MyModule(tf.Module):
  @property
  def owned_submodules(self):
    return tuple(self._flatten(recursive=False, predicate=lambda v: isinstance(v, tf.Module)))

I also discussed with brilee@ and pluskid@ earlier about their feedback on variable names. We were aligned that what they both wanted was not really Variable.name, but rather the path to all variables (e.g. allowing the same variable instance to be reachable via multiple paths). We're planning to add an option to _flatten to allow flattening with paths. So for example in the case where you have:

encoder = Encoder()
decoder = encoder
auto = Auto(encoder, decoder)

You can implement variables_and_paths in one line with self._flatten(..., with_path=True):

>>> auto.variables_and_paths()
{"encoder.layers[0].w": <tf.Variable ...>,
 "encoder.layers[1].w": <tf.Variable ...>, 
 "decoder.layers[0].w": <tf.Variable ...>,
 "decoder.layers[1].w": <tf.Variable ...>}

It seems to me like this is another way of representing depth. WDYT?

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Feb 5, 2019

Author Member

Actually the CL has been pushed 😄, _flatten is in tensorflow/tensorflow@5076adf

This comment has been minimized.

Copy link
@gehring

gehring Feb 6, 2019

I didn't have a particular use case in mind but I believe it is likely for someone to require that type of info (if not simply for human readability). The with_path solution is lightweight and should pretty much cover any uses cases related to the topology of the variables/sub-modules. I'm 100% on board, nice job!

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Feb 8, 2019

Author Member

FYI with_path is in tensorflow/tensorflow@1d5fd92 😄

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 7, 2019
```
>>> class MyModule(tf.Module):
...   @Property
...   def state_dict(self):
...     return dict(self._flatten(
...         predicate=lambda v: isinstance(v, tf.Variable), with_path=True))

>>> mod = MyModule()
>>> mod.encoder = Encoder()
>>> mod.decoder = mod.encoder
>>> mod.state_dict
{('encoder', 'w'), <tf.Variable: ...>,
 ('decoder', 'w'), <tf.Variable: ...>}
```

h/t tensorflow/community#56

PiperOrigin-RevId: 232908045
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 8, 2019
An issue with decorating methods in the metaclass is that when users annotate
methods with decorators that return Objects (e.g. `tf.function`) they lose the
ability to access methods on those objects (other than call) because our
`with_name_scope` decorator is "on the outside".

A simple answer to this from other decorators is to copy the __dict__ from the
wrapped object to the wrapper (or use `__getattr__`). However in our case this
is not safe, since properties like `get_concrete_function` on `Function` would
need special casing (so the returned callable also entered the module name
scope before tracing).

After a bunch of discussion (thanks superbobry@ and chr1sj0nes@!) about how we
could do this in horrible ways, it seems that the cleanest solution is to allow
a Function to be rewrapped (iff it has not been traced). This pushes the
decorator down the stack of decorators and allows tf.Module to preserve
@tf.function annoated methods as Functions.

h/t tensorflow/community#56

PiperOrigin-RevId: 233034181
See tensorflow/tensorflow@5076adf for more context.
rfcs/20190117-tf-module.md Outdated Show resolved Hide resolved
tomhennigan added 2 commits Feb 9, 2019
Thanks k-w-w@!
@ewilderj ewilderj moved this from Open reviews to Accepted RFCs in RFC management Feb 15, 2019
Returns:
A collection of variables for the current module (sorted by attribute name)
followed by variables from all submodules recursively (depth first).
"""

This comment has been minimized.

Copy link
@MarkDaoust

MarkDaoust Feb 15, 2019

I'm a bit late to the party, but I don't see this in the discussion anywhere:

Keras layers and models implement a trainable flag as well.
When set to False, none of the variables in the module, or submodules are returned in by trainable_variables.

The current implementation of module does not recognize a module-level trainable flag, it's not clear if that's intentional, or an omission.

Looking at the implementation... it would require a separate predicate for recursion.

This seems like a minor change that could reduce surprises and increase compatibility. Especially with the plan to make layer and model sub-classes of module.

WDYT?

This comment has been minimized.

Copy link
@tomhennigan

tomhennigan Feb 15, 2019

Author Member

This is intentional. Module uses the immutable "training" bit on tf.Variable to determine which variables are trainable. If you want to implement layer freezing like Keras does for a Module subclass you can do:

class KerasLike(tf.Module):

  def __init__(self, trainable=True, name=None):
    super(KerasLike, self).__init__(name=name)
    self.trainable = trainable

  @property
  def trainable_variables(self):
    if not self.trainable:
      return ()
    return super(KerasLike, self).trainable_variables

This comment has been minimized.

Copy link
@MarkDaoust

MarkDaoust Feb 21, 2019

Great. Thanks for the clarifications.

@ewilderj ewilderj merged commit 2c16981 into tensorflow:master Feb 21, 2019
1 check passed
1 check passed
cla/google All necessary CLAs are signed
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 25, 2019
After attempting to integrate `tf.Module` into existing codebases (e.g.
`tf.keras`) we've found that the automatic name scoping is too invasive (e.g.
changing op and variable names) and it is desirable to disable it ~everywhere.

We propose that name scoping for `tf.Module` becomes opt-in:

>>> class MyModule(tf.Module):
...
...   @tf.Module.with_name_scope
...   def auto_name_scope(self, x):
...     if not hasattr(self, 'w'):
...       self.w = tf.Variable(1., name='w')
...     return x * self.w
...
...   def manual_name_scope(self, x):
...     if not hasattr(self, 'w'):
...       with self.name_scope:
...         self.w = tf.Variable(1., name='w')
...     return x * self.w
...
...   def no_name_scope(self, x):
...     if not hasattr(self, 'w'):
...       self.w = tf.Variable(1., name='w')
...     return x * self.w

We will move opt-out name scoping into Sonnet:

>>> class MyModule(snt.Module):
...
...   def auto_name_scope(self, x):
...     if not hasattr(self, 'w'):
...       self.w = tf.Variable(1., name='w')
...     return x * self.w
...
...   @snt.no_name_scope
...   def no_name_scope(self, x):
...     if not hasattr(self, 'w'):
...       self.w = tf.Variable(1., name='w')
...     return x * self.w

In TF2 name scopes are cosmetic and this should be less of a big deal. We might
consider encouraging users who want to filter on names to instead use flatten
to extract a state dictionary for their objects (c.f.
tensorflow/community#56 (comment)).

I have moved the automatic name scoping logic (Metaclass etc) and associated
tests into Sonnet 2.

PiperOrigin-RevId: 235540184
karllessard added a commit to karllessard/community that referenced this pull request May 10, 2019
* Adding a doc to deprecate collections

* Responding to Karmels comments

* Minor fix to VariableTracker sample code

* RFC for random numbers in TensorFlow 2.0

* Changes after some feedback

* Removed 'global_seed' in the main code and showed the design with 'global_seed' in the Questions section.

* Some changes after feedback

* A tweak

* Change after feedback

* A tweak

* changes

* changes

* fix link

* new-rfc

* changes

* Update rfcs/20181225-tf-backend.md

Co-Authored-By: alextp <apassos@google.com>

* Added some considerations about tf.function

* Renamed the internal name "op_generator" to "global_generator"

* Changed seed size from 256 to 1024 bits

* Initial signpost for community meetings

Adding this so there is basic information about how to find the community calendar and get invited to meetings.

* Add iCal link too

* changes

* Initial version of embedding and partitioned variable RFC.

* Fix one formatting issue.

* Fix another formatting issue.

* Use markdown language for the table instead of HTML.

* Add tensorflow/io R Package CRAN release instructions (tensorflow#53)

* Added Design Review Notes

* Make clear distinction between embedding variables and loadbalancing
variables.

* Added decisions below each question, and "how to use generators with distribution strategies".

* Adopted Dong Lin's suggestions

* Add a paragraph pointing out the problem with the `partition_strategy` argument.

* RFC: Move from tf.contrib to addons (tensorflow#37)

* Checkpoint addons RFC for review

* Add code review to RFC

Add future pull request information to criteria

Update modified date

added some description

RFC Move to addons

* Add weight decay optimizers

* Remove conv2d_in_plane

* Add group_norm

* Accept addons RFC

* Update alternatives since `DynamicPartition` and `DynamicStitch` do have GPU kernels.

* Add a section for saving and restore `PartitionedVariable`.

* Mention that variable types can be nested, attention needs to be paid to their saving and restoring mechanism.

* Create README.md (tensorflow#57)

* Splitted `_state_var` into `_state_var` and `_alg_var` (because of concerns from implementation), and changed status to "Accepted"

* Updated timestamp

* Moved the auto-selection of algorithm from `create_rng_state` to `Generator.__init__`

* Update according to the discussion

* Move performance heuristics in Distribution Strategy level. We will not expose knobs for users to control;
* Emphasize that embedding support in v2 will all be via `Embedding` layer. Users can use `tf.compat.v1` to handle embedding by themselves;
* Mention that default `partition_strategy` in v1 `embedding_lookup` is "mod", which will possibly break users's model when they update to TF 2.0;
* We want to prioritize shuffling embedding after 2.0 release;
* We have plans to serialize and deserialize `Embedding` layer and Distribution Strategies to allow loading a saved model to a different number of partitions.

* Update relese binary build command for sig-io (tensorflow#58)

This PR updates relese binary build command for sig-io

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add Bryan to SIG IO release team (tensorflow#59)

* Change to accepted

* Add link to TensorFlow IO R package

* Updated link for the friction log. (tensorflow#64)

* Switch DistStrat revised API examples to TensorFlow 2 style. (tensorflow#63)

* RFC: Attention for Dense Networks on Keras (tensorflow#54)

* Design review for "Attention for Dense Networks"

* RFC: Stateful Containers with tf.Module (tensorflow#56)

* Create 20190117-tf-module.md

* Update 20190117-tf-module.md

* Loosen return type for variable properties.

* Use Dense consistently.

Thanks brilee@ for spotting!

* Remove convert_to_tensor from examples.

This wasn't ever required and including it might cause confusion.

h/t pluskid@ gehring@ and awav@

* Remove owned_* methods.

* Document `_flatten`

See tensorflow/tensorflow@5076adf for more context.

* Fix typo in module name.

Thanks k-w-w@!

* Update 20190117-tf-module.md

* RFC: New tf.print (tensorflow#14)

* New tf.print proposal

* Attempt to fix table of contents

* Removed not-working TOC label

* Minor updates to the doc.

* Update tf.print to be accepted

* Added design review notes

* Marking doc as accepted

* Update cond_v2 design doc (tensorflow#70)

* Update to bring in line with implementation

* Added the symbol map to the RFC.

* Updated testing section of the Community site.

* Removed the 100%, formatting tweaks.

* Update CHARTER.md

* Change contact email address

I will leave my current company soon, so update my email.

* Create README.md

* Logos for SIGs

* Update README.md

* Update addons owners (tensorflow#85)

Add Yan Facai as another project lead.

* Created a FAQ for TF 2.0. (tensorflow#78)

Adding 2.0 related FAQ to the Testing group.

* Request and charter for SIG JVM (tensorflow#86)

Chartering docs for SIG JVM

* Update CODEOWNERS

Add @karllessard, @sjamesr and @tzolov as code owners for sigs/jvm.

* Update CODEOWNERS

Add missing /

* Update CODEOWNERS

Add @dynamicwebpaige as owner for sigs/testing/

* Update RFC with current information (tensorflow#89)

Make current to SIG Addons

* RFC: TF on Demand Project (tensorflow#69)

* Adding an RFC for TF on Demand Project.

* modified one line in tf-on-demand md file.

* Changing RFC status from PROPOSED to ACCEPTED.

* RFC: SavedModel Save/Load in 2.x (tensorflow#34)

* RFC for SavedModel Save/Load in 2.x

* Minor edits and a discussion topic for load() with multiple MetaGraphs

* Tweak to the "Imported representations of signatures" section

* Update "Importing existing SavedModels" with the .signatures change

* Update RFC and add review notes

* Status -> accepted

* Update CHARTER.md

New leads.

* Update 20180920-unify-rnn-interface.md (tensorflow#81)

Typo fix.

* Update yyyymmdd-rfc-template.md

Adding "user benefit" section into the RFC template, to encourage articulating the benefit to users in a clear way.

* Update while_v2 design doc (tensorflow#71)

* Update while_v2 design doc, include link to implementation

* Update TF 2.0 FAQ to link to TensorBoard TF 2.0 tutorial (tensorflow#94)

* CLN: update sig addons logo png (tensorflow#99)

* Add SIG Keras

Add a reference link to Keras' governance repository for SIG Keras.

* RFC: String Tensor Unification (tensorflow#91)

* RFC: String Tensor Unification

* Updated rfcs/20190411-string-unification.md

Updated TFLite sections to address feedback from @jdduke.  Marked as
Accepted.

* Start RFC for tensor buffers
sonnet-copybara pushed a commit to deepmind/sonnet that referenced this pull request May 29, 2019
h/t. tensorflow/community#56

PiperOrigin-RevId: 250436499
Change-Id: I9332d90f6331756f12907566af9a9e23f86aded8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.