Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Stateful Containers with tf.Module #56

Merged
merged 9 commits into from Feb 21, 2019
Merged

RFC: Stateful Containers with tf.Module #56

merged 9 commits into from Feb 21, 2019

Conversation

tomhennigan
Copy link
Member

@tomhennigan tomhennigan commented Jan 17, 2019

This review is open for public comment until 2019-02-11

Status Accepted
Author(s) DeepMind: tomhennigan@google.com, mareynolds@google.com
Google: agarwal@google.com, apassos@google.com, dberth@google.com
Sponsor apassos@google.com
Updated 2019-01-16

Summary

We propose a lightweight base class (tf.Module) for stateful containers in TensorFlow. This base class offers three main features:

  • Naming - variables and ops created inside modules are created within a name scope so they can easily be tracked back to their creator.
  • Variable tracking - variables referenced by a module (directly or via dependencies on other modules) can be retrieved via a simple API.
  • Module tracking - given a module a simple API exposes all modules that this module depends on.

Users are encouraged to subclass tf.Module to create their own stateful components and we expect and encourage an ecosystem of third party modules to evolve.

Thanks in advance for your feedback!

@ewilderj ewilderj changed the title RFC tf.Module RFC: Stateful Containers with tf.Module Jan 17, 2019
@ewilderj ewilderj added the RFC: Proposed RFC Design Document label Jan 17, 2019
@ewilderj ewilderj added this to Needs attention in RFC management via automation Jan 17, 2019
@ewilderj ewilderj moved this from Needs attention to Open reviews in RFC management Jan 17, 2019

**`tf.Variable` creation and reuse**

- Modules should be able to create and reuse variables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by reuse here?

My understanding is that modules only create variables, and reuse is done by reusing modules, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your understanding is correct. Variable creation (self.w = tf.Variable(..)) and reuse (tf.whatever(some_input, self.w)) is enabled via OOP (no magic).

We considered re-implementing variable_scope and get_variable but felt that the OOP design was a lot cleaner and more Pythonic.

def apply_gradients(self, vars_and_grads):
for variable, grad in vars_and_grads:
variable.assign_sub(grad * self.learning_rate)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to comment here on the following points:

  1. Module should be the base class for tf.keras.Layer
  2. We explicitly chose not to force a call signature or build() strategy
  3. Pros and cons of the name Module (familiarity with pytorch being a big issue here)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed re (1) and re (2) I call out why a bit in the detailed design if anyone wants more context.

Re (3) I feel strongly that Module is the right name and what users will expect. The term Module and its usage in ML is almost as old as I am, the oldest reference I know about (h/t @jekbradbury for the reference) is from SN/Lush [0]:

In Lush, building and training a complex system is done by assembling basic blocks, called modules. Modules are subclasses of the class gb-module.

[0] http://lush.sourceforge.net/lush-manual/51d13a16626fd68f.html

@awav
Copy link

awav commented Jan 17, 2019

@tomhennigan, That is awesome! It is very similar to what we have in GPflow - Parameterized class. With this change in TensorFlow we will be able to serialize our objects (currently we have custom saver) and integrate them with Keras models.

Although, RFC could be extended with array-like modules, but don't confuse it with keras sequence. Keras sequence implies dependency between layers/modules, whereas array-like module is a different approach for organising submodules with indexing restricted to integer.

E.g. in GPflow we can construct a sum of the kernels (modules) and represent it as a super-kernel class or a combination kernel. Essentially, a combination kernel is a list of kernels, and when we call, let's say, __call__ method, it sums over sub-kernels results.

I'm happy to give more examples if needed. Thanks!

'custom_name/v:0'
```

Modules created inside methods of other modules inherit their parent module
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How variable scopes would be affected in graph mode? For example, let's say we have a module named 'bar' that creates a tf.Variable 'v' and a tf.Tensor 't' inside __ call __. Then, we call our module while inside a 'foo' name scope. What would be the names in that case? Maybe 'bar/v' for the tf.Variable and 'foo/bar/t' for the tf.Tensor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whilst the design works in TF1 we have designed with TF2 in mind. In TF2 there are no variable scopes (we use name scope to name variables and tensors). In eager mode tf.Tensor does not have a name property, although if you inspect the graph from a @tf.function you do see tensor names.

Module inherits the name_scope from when the module is constructed. I've run the following with TF1 and as you can see the 'use' name_scope makes no difference inside the module but is applied when using the result of calling the module:

class BarModule(tf.Module):
  def __call__(self):
    v = tf.Variable(1., name='v')
    t = tf.constant(1., name='t')
    return v, t

with tf.Graph().as_default():  # Enable TF1 graph mode.
  with tf.name_scope('create'):
    bar = BarModule()

  with tf.name_scope('use'):
    for i in range(2):
      print('iter', i)
      v, t = bar()
      x = v * t
      print('- v', v.name)
      print('- t', t.name)
      print('- x', x.name)
      print()
iter 0
- v create/bar_module/v:0
- t create/bar_module/t:0
- x use/mul:0

iter 1
- v create/bar_module/v_1:0
- t create/bar_module/t_1:0
- x use/mul_1:0

self.v = tf.Variable(1., name='v')

>>> m = MyModule()
>>> m.name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the module name ends in '/'? Usually that is used to set absolute name scopes ignoring the current stack. Similarly, what happens if a child module does it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the experimental implementation of this I've enforced (in the constructor) that the module name is a valid Python 2 identifier [0], so / is not allowed.

[0] https://docs.python.org/2/reference/lexical_analysis.html#identifiers

@leandro-gracia-gil
Copy link

Thank you very much for this proposal! We have a remarkably similar class in our custom framework, where we use it as the base for all our own layers and composable networks. It makes me happy to see something like this being added to TF 2.0.

Given the similarity, I'll try to keep an eye to the review in case I notice anything that reminds me of any problems we had developing or using our own approach, but so far I'd say it looks pretty good.

@guillaumekln
Copy link

I was under the impression that inheriting from tf.keras.layers.Layer was the recommended way to create stateful and reusable components in TensorFlow 2.0. This proposal makes it more nuanced. IMO it should clarify when one should extend tf.Module versus extend tf.keras.layers.Layer.

@tomhennigan
Copy link
Member Author

I was under the impression that inheriting from tf.keras.layers.Layer was the recommended way to create stateful and reusable components in TensorFlow 2.0. This proposal makes it more nuanced. IMO it should clarify when one should extend tf.Module versus extend tf.keras.layers.Layer.

If this RFC is accepted @alextp and I hope that we can make tf.keras.layers.Layer extend tf.Module (instead of Checkpointable as it does today). Module can help make the implementation of Layer simpler by implementing part of it's API (e.g. the Layer.variables property). We have not attempted to do this yet and I'm not sure if there would be any technical blockers.

Conceptually I see Keras Layer as enriched Modules with additional features (e.g. build(shapes)/call(inputs) pattern, json/yaml serialization, regularizers, losses, metrics, masking etc). If these features are useful to you, or if you are using your subclass in a Keras Model then Keras Layer is a good choice.

If the additional functionality of Keras is not useful for you then you may prefer to subclass tf.Module which is simpler with far fewer features. If tf.Module is not useful you may want to subclass Checkpointable or just object 😄

@tomhennigan
Copy link
Member Author

@tomhennigan, That is awesome! It is very similar to what we have in GPflow - Parameterized class. With this change in TensorFlow we will be able to serialize our objects (currently we have custom saver) and integrate them with Keras models.

Serialization (as in saving/restoring parameters) should work great (since we extend Checkpointable). I think you'll have a better time integrating with Keras Model if you extend Layer instead. We're hoping to make tf.Module a base class for Keras Layer but that is not included as part of this proposal (see this comment for more context).

Although, RFC could be extended with array-like modules, but don't confuse it with keras sequence. Keras sequence implies dependency between layers/modules, whereas array-like module is a different approach for organising submodules with indexing restricted to integer.

E.g. in GPflow we can construct a sum of the kernels (modules) and represent it as a super-kernel class or a combination kernel. Essentially, a combination kernel is a list of kernels, and when we call, let's say, call method, it sums over sub-kernels results.

I think this should "just work". I tried creating a simple example, please let me know if I've completely misunderstood what you're trying to achieve:

class MulKernel(tf.Module):
  def __init__(self, m, dtype=tf.float32, name=None):
    super(MulKernel, self).__init__(name=name)
    self.m = tf.Variable(tf.constant(m, dtype=dtype), name='m')

  def __call__(self, x):
    return self.m * 1

class ArrayModule(tf.Module):
  def __init__(self, kernels):
    super(ArrayModule, self).__init__()
    self.kernels = kernels

  def __call__(self, x):
    x = tf.convert_to_tensor(x, name='x')
    results = [kernel(x) for kernel in self.kernels]
    return tf.reduce_sum(results)

a = ArrayModule([MulKernel(i, name='mul_%d' % i) for i  in range(10)])
y = a(1.)

Which produces this rather pretty graph:

screenshot from 2019-01-18 11-09-26


@property
def variables(self):
"""Returns a tuple of variables owned by this module and it's submodules.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think list is a better fit conceptually, since the type in question is [Variable] rather than (Variable, Variable, ...). Maybe we could simply remove the type from the docstring or replace it with something less restrictive e.g. "collection"?

The same applies to other *variables properties below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the IRL discussion, I'll use "Collection" in the docstring and still return a tuple at least for now.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, returning a tuple makes the API a bit more efficient since you don't have to return a copy to avoid callers mutating your internal data structures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agree returning a tuple is the right decision, but the signature/contract of the method should be less specific to allow this to change in future (e.g. if there is a faster immutable collection we might want to use that instead).

If we could include type hints in TF code I would go with the following:

@property
def variables(self) -> typing.Collection[tf.Variable]:
  ...

Copy link
Contributor

@chr1sj0nes chr1sj0nes Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps typing.Sequence would be better here? Importantly, the returned variables have a well-defined, deterministic ordering.

your base class.
- Encouraging a lazy pattern might make it harder to create modules with
multiple forward methods (e.g. a VAE module might be implemented with an
`encode` and `decode` method with no `__call__`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the users have to override __call__ with NotImplementedError in that case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prototyped it such that you didn't have to implement __call__ (e.g. _call was not @abstractmethod), but if users called your module then we throw NotImplementedError. Even this has downsides though, like introspection/generated docs making it look like you had a call method.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomhennigan, I agree 100% with this decision. There is no need to bloat the tf.Module class more than is absolutely required. Since the goal of this class is to handle stateful DAGs of modules, it is best to focus efforts on that.

As pointed out in this doc, there are many use cases where we would want to handle stateful DAGs but don't require a __call__ method. I see it the responsibility of other libraries/sub-package to extend the functionality of tf.Module. I find specialized classes much easier to extend and reason about.

each leaf. This means variables nested in properties will be found (e.g.
variables in `list`s or `tuple`s on the module).

This "walk" function is used to provide the following properties:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there properties computed once per instance, or are they fully dynamic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully dynamic, trying to reason about when to invalidate the caches would be non-trivial.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking the performance cost of having this dynamic should be well documented so that users know to cache the results of these calls if using eager and good performance is required. Though, if everything works seamlessly with tf.function, then maybe this wouldn't be necessary and encouraging users to use tf.function for better performance would be sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed re documentation, I'll add a note in the docstrings.

For a 100 linear layer sequential it takes 2ms to collect all 200 parameters (at least on my workstation). This could be worse for modules with deep hierarchies or large nests.

Assuming the call to trainable_variables (or whatever) is inside the tf.function this cost is only paid when the function is traced (~once).

@awav
Copy link

awav commented Jan 18, 2019

@tomhennigan

I think you'll have a better time integrating with Keras Model if you extend Layer instead.

The finest abstraction is not actually deep learning model (is that correct that keras still a deep learning dedicated framework?), and not a model at all. The tf.Module perfectly fits our needs. But, in some cases this abstract blocks can be included in layer compostion, e.g. head of the sequence is a CNN and output of CNN passes though a GP to get final classification result, https://arxiv.org/abs/1511.02222.

If this RFC is accepted @alextp and I hope that we can make tf.keras.layers.Layer extend tf.Module (instead of Checkpointable as it does today). Module can help make the implementation of Layer simpler by implementing part of it's API (e.g. the Layer.variables property). We have not attempted to do this yet and I'm not sure if there would be any technical blockers.

I'd say, making keras layers depend on tf.Module should be one of the priorities for this RFC - unification of interfaces in TensorFlow. It diminishes a barrier for users, who inherit from tf.Module and want to take advantage of keras features.

Serialization (as in saving/restoring parameters) should work great (since we extend Checkpointable)

That's great!

I think this should "just work". I tried creating a simple example, please let me know if I've completely misunderstood what you're trying to achieve: ...

If tf.Module can find sub-module's in an attribute, which is actually a list, and properties like variables, trainable_variables and etc return correct objects then the problem is solved.

@tomhennigan
Copy link
Member Author

@awav

I'd say, making keras layers depend on tf.Module should be one of the priorities for this RFC - unification of interfaces in TensorFlow.

We've been mindful to design tf.Module in such a way that it could be a stateful container for a number of components in TensorFlow (not just tf.keras), but I think integrating it well with tf.keras requires a separate design/RFC process. I'm not planning to address that here.

If tf.Module can find sub-module's in an attribute, which is actually a list, and properties like variables, trainable_variables and etc return correct objects then the problem is solved.

Yep, this works as you would expect (we use tf.contrib.framework.nest.flatten(prop) for each object property and walk all leaves).

@martinwicke
Copy link
Member

martinwicke commented Jan 18, 2019 via email

@tomhennigan
Copy link
Member Author

Or Iterable? We could adjust the docstring now.

Collection is better since it indicates the returned value supports contains and len (iterable just indicates it has __iter__). Agreed updating the docstring is clearest.

Come 2020, we will drop Py2 support, and all this typing can actually
happen for real.

Can't wait 😄count me in on a fixit to go through and add types!

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 21, 2019
Addresses comments from tensorflow/community#56.

PiperOrigin-RevId: 230225963
@sguada
Copy link
Member

sguada commented Jan 25, 2019

+1 for this proposal, it strikes a great balance between abstraction and usability.


**`tf.Module` tracking**

- Modules should track dependencies (e.g. a sequential should provide access
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that started on twitter -> https://twitter.com/lc0d3r/status/1090351739537145858

Since we are trying to also

track dependencies (e.g. a sequential should provide access to its layers) and support checkpointing as a graph.

Would it make sense to think about the next step and have on the model level an approachable way to have access to DAG.

The idea is, once you inherit from tf.Module you can run something similar to keras model.summary and even more exciting - model.plot_model.

Sorry for a bit confusing way to formulate this suggestion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @lc0, thanks for the suggestion! We've been thinking about something similar. I'm not sure if we should solve this inside tf.Module or if this is something we should leave to framework authors, what do you think?

FWIW the current module APIs expose a lot of useful information to do a modest job of this outside of Module:

import tabulate

def weight_size(m):
  size = 0
  for v in m.variables:
    size += v.numpy().nbytes

  suffix = 'B'
  suffixes = ['KB', 'MB', 'GB', 'TB']
  while size > 1024 and suffixes:
    size /= 1024
    suffix = suffixes.pop(0)
  return '%d %s' % (size, suffix)

def summarise(module):
  rows = []
  for m in module.submodules:
    row = [m.name, m.__class__.__name__, weight_size(m)]
    rows.append(row)
  print(tabulate.tabulate(rows, headers=['attr', 'class', 'size'], tablefmt='grid'))
+----------+---------+--------+
| attr     | class   | size   |
+==========+=========+========+
| linear_0 | Linear  | 8 KB   |
+----------+---------+--------+
| linear_1 | Linear  | 4 MB   |
+----------+---------+--------+
| linear_2 | Linear  | 4 MB   |
+----------+---------+--------+
| linear_3 | Linear  | 40 KB  |
+----------+---------+--------+

Copy link

@geyang geyang Sep 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table seems excessive. Here is something with less chrome:

 attr      class    size  
-------------------------
 linear_0  Linear   8 KB   
 linear_1  Linear   4 MB   
 linear_2  Linear   4 MB   
 linear_3  Linear   40 KB  

Alternatively, you can use the pyTorch pattern, which IMO is cleaner:

YourFantasticMode: (
    (linear_0) Linear(100, 20, 100) 8 KB
    (linear_1) Linear(20, 40, 100) 3.2 KB
    (linear_2) Linear(40, 20, 100) 3.2 KB
    (linear_3) Linear(20, 20, 100) 1.6 KB
)

>>> m.name_scope.name
'my_module/'
>>> m.v.name
'my_module/v:0'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the variable names (with scope) attached upon the variable creation? An alternative might be the pytorch way that a variable does not know about its (parent's) scope, and the full scoped name is only constructed on the fly depending on which parent module are we dealing with (because a module can be re-used in different parent modules, and they might even be constructed externally and pass in via the constructor of a custom module).

I'm thinking about the following pseudocode example:

class FunnyAutoEncoder(tf.Module):
  def __init__(self, encoder, decoder, name=None):
    super(...)
    self.encoder = encoder
    self.decoder = decoder

  def __call__(self, x):
    z = self.encoder(x)
    z2 = tf.funny_op(z)
    return self.decoder(z2)


encoder = MLP(...)
decoder = MLP(...)
ae = FunnyAutoEncoder(encoder, decoder)

then ae.encoder.layerX.w.name will probably not show anything related to being under an auto encoder? Instead, it will be just with a top-level 'mlp/' scope?

I think a major scenario of having full scoped names available is to iterate through all the variables in a module, and do some inspection / processing to the variable according to the names. In this case, it might be more useful to have the (scoped) names reflect what role each variable is playing in the module.

However, I guess one drawback might be to detect potential duplication. For example, if one does this

encoder = ResidualBlock(...)
decoder = encoder
ae = FunnyAutoEncoder(encoder, decoder)

then the same variables might be listed twice under different (scoped) names. But I guess this situation is rather rare, and it is probably to explicitly define a FunnyAutoEncoderSharedEncoderDecoder that has only one encoder_and_decoder field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So currently the proposal is that name scopes are determined when the module is constructed and from that point it is fixed per module. For name based filtering you can do something like this:

encoder = MLP(..., name='encoder')
decoder = MLP(..., name='decoder')
ae = FunnyAutoEncoder(encoder, decoder)

encoder_vars = encoder.variables
# .. or ..
encoder_vars = [v for v in ae.variables if v.name.startswith('encoder/')]

It's worth pointing out that you can also enter a name scope and build modules inside it if you want to group a set of modules that way:

with tf.name_scope('encoder'):
  encoder = MLP(...)  # name_scope will be `encoder/mlp/`
decoder = MLP(..., name='decoder')  # name_scope := `decoder/`
ae = FunnyAutoEncoder(encoder, decoder)  # name_scope := `funny_auto_encoder/`

Is that flexible enough for your use cases? We've had a similar pattern in Sonnet and I've not heard any complaints from users about it, but perhaps there's something I'm missing!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomhennigan Thanks! I'm not sure if it is because I'm too used to the other way of thinking. Let me state the use cases in my mind and we can see if it is easy to do in the current proposal. I think the main inflexibility of the current approach is that in order to have a meaningful namescope hierarchy, we can only construct the model from top-down according to the hierarchy. It will be difficult, for example, to construct some part of the component (e.g. the encoder from the example above) without knowing beforehand what that component is going to be used for (i.e. in your example, you explicitly pass the name='encoder' argument, indicating you know that it is going to be used as an encoder).

This problem might be more pronounced when one want to write a generic library of commonly used (sub)nets. For example, in library1 (say Sonnet2), I have a Module that defines a default ResNet implementation. In library2, which is a general RL training framework, that allow one to pass in a Module to be used for each specific component (e.g. the Vision part of the agent). Now as the user, if I want to use the ResNet in the vision part of the agent, I might need to use some hack to figure out what proper name to pass into when constructing the ResNet to make it consistent with the naming hierarchy in the agent model. Moreover, if the agent is grouping all its vision part code under the name scope agent/vision, then I cannot pass agent/vision as name= parameter to ResNet because it is not a proper python variable name.

My guess that this is not a problem in Sonnet(v1) is that Sonnet(v1) is still graph based library with explicit build() function. In other words, when you create the Python object, the underlying graph building (therefore the naming and scoping) has not been called yet. Therefore, one can also change the name scoping when the build is actually being called. Here is some pseudo-code of my understanding of what might happen in Sonnet V1

class Agent(...):
  def __init__(self, vision_net, ...):
    self.vision_net = vision_net
    # ....

  def build(self, ...):
    with tf.name_scope(self.name + '/vision'):
      self.vision_net.build(...)

On the other hand, with the current proposal, users could (and it seems recommended) create variable in the python object __init__ function, which will attach name to the variable upon creation. Therefore, one need to know the exact name of the parent hierarchy upon creating that python object. Once the Module object is created, the parent module can no longer re-wire its name scope as in the case of Sonnet V1. Is my understanding correct here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @pluskid that dependency injection should at least be possible. My preference is for DI to be the recommended way to build up modules, but I'd understand if others disagreed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the IRL discussions @pluskid, @brilee, @superbobry and @chr1sj0nes.

To summarize our discussions there are many cases where there are multiple paths to weights in a model (e.g. model.encoder.w might be the same instance as model.decoder.w) and this is somewhat confusing since in TensorFlow variables have a single, immutable name (e.g. w's name would be encoder/w:0 in both cases if encoder created it).

Perhaps the "right" solution is to remove the name from variables in Python, and instead use Python variable names to "name" variables. Sadly this is not in scope for TensorFlow 2, and we agreed that the current strategy of variable naming at creation time was OK for simple cases and better than not naming.

We discussed that the python variables or object structure can be used instead to associate a path which each parameter and in the above case you would have two paths to w. We agreed that it would be convenient to have some easy way to traverse the structure of arbitrary Modules in a way that have some simple protocol (e.g. somewhere inside the model there is a property called encoder and somewhere inside the model there is a property called decoder) to filter these paths.

In tensorflow/tensorflow@1d5fd9219 I added with_path to Module's _flatten method. This allows you to do exactly that. For example you can find paths in some arbitrary model to all tf.Variables inside a (possibly deeply nested) Module in a property called encoder:

>>> class MyModule(tf.Module):
...   @property
...   def state_dict(self):
...     return dict(self._flatten(
...         predicate=lambda v: isinstance(v, tf.Variable), with_path=True))

>>> mod = MyModule()
>>> mod.encoder = Encoder()
>>> mod.decoder = mod.encoder

>>> mod.state_dict
{('encoder', 'w'), <tf.Variable: ...>,
 ('decoder', 'w'), <tf.Variable: ...>}

>>> {k: v for k, v in mod.state_dict.items() if 'encoder' in k}
{('encoder', 'w'), <tf.Variable: ...>}

I hope that helps 😄. h/t @gehring who was asking about how to determine "depth" in another comment. I think this with_path option to _flatten (or the recursive flag) would help solve that too.

... self.b = tf.Variable(tf.zeros([output_features]), name='b')
...
... def __call__(self, x):
... x = tf.convert_to_tensor(x, name='x')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it is important to avoid having the users to call tf.convert_to_tensor every time they define their own __call__ function. One might want to define implicit pre-call hook to do the conversion automatically. Or better approach might be to explicitly require the API to only accept Tensor (or nested tuple of tensors) as inputs, and ask user to do explicit conversion before calling from top level.

This is because many of the modules are probably to be used as building blocks in bigger architectures (e.g. residual blocks inside a resnet). They will not directly interact with the user most of the time, so there is probably no need for them to call convert_to_tensor every time, but it might be better to enforce the API to have consistent behavior.

Copy link

@gehring gehring Feb 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree with you that it would be good for any callable subclass of tf.Module to have some automatic conversion to tensor (e.g., in an implementation of Layer 2.0), this is beyond the scope of tf.Module. The Dense class example only serves to demonstrate how one would subclass tf.Module. Let me try to lay out what I understand the purpose of tf.Module to be and what the reason for its limited scope is. (I am not affiliated with the TF team)

After the shift to object-oriented storing of tf.Variable's, variable reuse is done by storing variables within object attributes. It would be impractical and clunky to store all variables in a unique container object (which would essentially recreate TF 1.0's collections). To avoid this, we prefer to have variables contained within the objects that would use them, e.g., kernel and bias variables as attributes of the Dense instance that uses it. While this new way of organizing variables has many advantages, it does make checkpointing and operating on the "global" variables more difficult (e.g., collecting variables for gradient descent).

By design, tf.Module is lightweight and focuses on addressing the challenges of managing nested stateful objects. Its sole purpose is to abstract away the DAG of modules for the purpose of operating on all relevant variables. This allows us to act like our top-level/root module explicitly contains all our variables. This is done by providing traversal methods which collect all the variables. As a bonus, tf.Module automatically handles the complexities of checkpointing when variables are allowed to be created before and after the checkpoint is loaded, e.g., hydration and instantiation of variables.

By limiting tf.Module's responsibilities to managing nested stateful modules, we end up with code that is easy to understand, document and extend. As a result, any developer wishing to implement some stateful containers can easily do so without intricate knowledge of how best to manage "global" state by subclassing tf.Module. Since handling stateful DAGs of modules is a very common requirement, tf.Module could easily be used by most sub-packages and libraries with stateful operations, e.g., tf.train, tf.metrics, tf.estimator. (Note, this specific proposal is arguing for the existence/inclusion of tf.Module in TF 2.0. The advantages of using tf.Module as a base class throughout tensorflow will likely be examined on case by case basis in a separate proposal.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gehring Yes, I agree with you. I don't think doing automatic conversion is a good idea. But enforcing the inputs are Tensors are probably good (even only implicit as a statement in the documentation). Anyhow, this is a minor thing in the whole proposal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to what @gehring says. The decision of whether to call convert_to_tensor is entirely up to whoever subclasses tf.Module, there is no requirement to do so 😄

Copy link

@awav awav Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm strongly against auto-conversion to tensors and agree with @tomhennigan, it is up to a scientist, engineers or anyone else, what to do with non tensor types.

@awav awav mentioned this pull request Feb 2, 2019
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 4, 2019
I've made this "private" since this exposes the implementation of modules, which
in typical Python code does not represent part of the classes public API. As
such I think having the leading _ makes it clear (at least to the linter) that
you're using an "internal" method. As always this is a "sign" not a cop.

As a concrete example, I strongly encourage the following usage:

    >>> class FooModule(...):
    ...   @Property
    ...   def hash_tables(self):
    ...     return tuple(self._flatten(predicate=IS_HASH_TABLE))

But in general we should discourage usage outside of modules:

    >>> m = FooModule()
    >>> hash_tables = tuple(m._flatten(predicate=IS_HASH_TABLE))  # Naughty.

c.f. tensorflow/community#56

PiperOrigin-RevId: 232246092
Copy link

@brilee brilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the proposal.


- Each module instance must have a name scope. This name should include the
name scope from when the module is constructed.
- Every method on the module must enter this name scope before calling user
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name scopes aren't reentrant; do you have a plan for how we can avoid creating ops with confusing names like 'mymodule/mymodule/mymodule/relu_2'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In eager op names are not a thing so that's good.

For tf.function names are uniqueified within the function body, so you would only get an _N suffix if there were actually two relu's within the same module (or you called your module multiple times sequentially).

As a nit, scope names are re-entrant (as in you can do with tf.name_scope("foo/"): ... as many times as you like) but op names need to be unique.

Example: https://gist.github.com/tomhennigan/a183a6570d2a6fb9bce08758ee2d78b6

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the correction. I see now that this works, although I'm still a bit confused at where the _1 suffix comes from.

@tf.function
def f(x):
  name_scope = tf.name_scope('blah')
  with name_scope:
    with name_scope:
      return x + 1

g = f.get_concrete_function(tf.TensorSpec([1, 64], tf.float32)).graph
for node in g.as_graph_def().node:
  print(node.name)
  
...
x
blah_1/add/y
blah_1/add
Identity

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case you're creating the name scope in graph mode (inside the function) and in graph mode name scopes are de-dupd. I guess the function is being traced twice (hence 'blah' followed by 'blah_1'). If you use a fully qualified name scope (e.g. tf.name_scope('blah/')) then the add op will be de-dup'd not the name_scope.

With tf.Module we use the module name to create a unique scope in the constructor [0] and then re-enter the fully qualified name whenever you use with module.name_scope: ... [1].

This is an interesting point though, in eager mode name scopes are not de-dup'd but in graph they are. We have a test explicitly defending this behavior [2] and I expect in TF2 almost all users to create modules outside a tf.function (aka. eagerly) and to call them inside tf.functions.

[0] https://github.com/tensorflow/tensorflow/blob/e0585bc351b19da39610cc20f6d7622b439dca4d/tensorflow/python/module/module.py#L176-L177
[1] https://github.com/tensorflow/tensorflow/blob/e0585bc351b19da39610cc20f6d7622b439dca4d/tensorflow/python/module/module.py#L194-L198
[2] https://github.com/tensorflow/tensorflow/blob/e0585bc351b19da39610cc20f6d7622b439dca4d/tensorflow/python/module/module_test.py#L92-L109

rfcs/20190117-tf-module.md Outdated Show resolved Hide resolved
rfcs/20190117-tf-module.md Show resolved Hide resolved
>>> m.name_scope.name
'my_module/'
>>> m.v.name
'my_module/v:0'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @pluskid that dependency injection should at least be possible. My preference is for DI to be the recommended way to build up modules, but I'd understand if others disagreed.

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 5, 2019
Thanks brilee@ for spotting!
This wasn't ever required and including it might cause confusion.

h/t pluskid@ gehring@ and awav@
yield variables inside nested structures (lists etc) but not in other
modules.
"""

@property
def submodules(self):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomhennigan Is this done depth first or breadth first? I feel like it could be a good thing for tf.Module to offer a way to infer the depth of submodules/variables. Without owned_*, what would be the best way for a user to do this without re-implementing tf.Module's logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breadth first.

So there are two options (frustratingly some recent commits haven't been sync'd to GitHub so I can't just point you at the code!).

Firstly we've made it really easy for subclasses to implement this if the want by exposing _flatten:

class MyModule(tf.Module):
  @property
  def owned_submodules(self):
    return tuple(self._flatten(recursive=False, predicate=lambda v: isinstance(v, tf.Module)))

I also discussed with brilee@ and pluskid@ earlier about their feedback on variable names. We were aligned that what they both wanted was not really Variable.name, but rather the path to all variables (e.g. allowing the same variable instance to be reachable via multiple paths). We're planning to add an option to _flatten to allow flattening with paths. So for example in the case where you have:

encoder = Encoder()
decoder = encoder
auto = Auto(encoder, decoder)

You can implement variables_and_paths in one line with self._flatten(..., with_path=True):

>>> auto.variables_and_paths()
{"encoder.layers[0].w": <tf.Variable ...>,
 "encoder.layers[1].w": <tf.Variable ...>, 
 "decoder.layers[0].w": <tf.Variable ...>,
 "decoder.layers[1].w": <tf.Variable ...>}

It seems to me like this is another way of representing depth. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the CL has been pushed 😄, _flatten is in tensorflow/tensorflow@5076adf6

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have a particular use case in mind but I believe it is likely for someone to require that type of info (if not simply for human readability). The with_path solution is lightweight and should pretty much cover any uses cases related to the topology of the variables/sub-modules. I'm 100% on board, nice job!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI with_path is in tensorflow/tensorflow@1d5fd921 😄

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 7, 2019
```
>>> class MyModule(tf.Module):
...   @Property
...   def state_dict(self):
...     return dict(self._flatten(
...         predicate=lambda v: isinstance(v, tf.Variable), with_path=True))

>>> mod = MyModule()
>>> mod.encoder = Encoder()
>>> mod.decoder = mod.encoder
>>> mod.state_dict
{('encoder', 'w'), <tf.Variable: ...>,
 ('decoder', 'w'), <tf.Variable: ...>}
```

h/t tensorflow/community#56

PiperOrigin-RevId: 232908045
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 8, 2019
An issue with decorating methods in the metaclass is that when users annotate
methods with decorators that return Objects (e.g. `tf.function`) they lose the
ability to access methods on those objects (other than call) because our
`with_name_scope` decorator is "on the outside".

A simple answer to this from other decorators is to copy the __dict__ from the
wrapped object to the wrapper (or use `__getattr__`). However in our case this
is not safe, since properties like `get_concrete_function` on `Function` would
need special casing (so the returned callable also entered the module name
scope before tracing).

After a bunch of discussion (thanks superbobry@ and chr1sj0nes@!) about how we
could do this in horrible ways, it seems that the cleanest solution is to allow
a Function to be rewrapped (iff it has not been traced). This pushes the
decorator down the stack of decorators and allows tf.Module to preserve
@tf.function annoated methods as Functions.

h/t tensorflow/community#56

PiperOrigin-RevId: 233034181
rfcs/20190117-tf-module.md Outdated Show resolved Hide resolved
@ewilderj ewilderj added RFC: Accepted RFC Design Document: Accepted by Review and removed RFC: Proposed RFC Design Document labels Feb 15, 2019
@ewilderj ewilderj moved this from Open reviews to Accepted RFCs in RFC management Feb 15, 2019
Returns:
A collection of variables for the current module (sorted by attribute name)
followed by variables from all submodules recursively (depth first).
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit late to the party, but I don't see this in the discussion anywhere:

Keras layers and models implement a trainable flag as well.
When set to False, none of the variables in the module, or submodules are returned in by trainable_variables.

The current implementation of module does not recognize a module-level trainable flag, it's not clear if that's intentional, or an omission.

Looking at the implementation... it would require a separate predicate for recursion.

This seems like a minor change that could reduce surprises and increase compatibility. Especially with the plan to make layer and model sub-classes of module.

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional. Module uses the immutable "training" bit on tf.Variable to determine which variables are trainable. If you want to implement layer freezing like Keras does for a Module subclass you can do:

class KerasLike(tf.Module):

  def __init__(self, trainable=True, name=None):
    super(KerasLike, self).__init__(name=name)
    self.trainable = trainable

  @property
  def trainable_variables(self):
    if not self.trainable:
      return ()
    return super(KerasLike, self).trainable_variables

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Thanks for the clarifications.

@ewilderj ewilderj merged commit 2c16981 into tensorflow:master Feb 21, 2019
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 25, 2019
After attempting to integrate `tf.Module` into existing codebases (e.g.
`tf.keras`) we've found that the automatic name scoping is too invasive (e.g.
changing op and variable names) and it is desirable to disable it ~everywhere.

We propose that name scoping for `tf.Module` becomes opt-in:

>>> class MyModule(tf.Module):
...
...   @tf.Module.with_name_scope
...   def auto_name_scope(self, x):
...     if not hasattr(self, 'w'):
...       self.w = tf.Variable(1., name='w')
...     return x * self.w
...
...   def manual_name_scope(self, x):
...     if not hasattr(self, 'w'):
...       with self.name_scope:
...         self.w = tf.Variable(1., name='w')
...     return x * self.w
...
...   def no_name_scope(self, x):
...     if not hasattr(self, 'w'):
...       self.w = tf.Variable(1., name='w')
...     return x * self.w

We will move opt-out name scoping into Sonnet:

>>> class MyModule(snt.Module):
...
...   def auto_name_scope(self, x):
...     if not hasattr(self, 'w'):
...       self.w = tf.Variable(1., name='w')
...     return x * self.w
...
...   @snt.no_name_scope
...   def no_name_scope(self, x):
...     if not hasattr(self, 'w'):
...       self.w = tf.Variable(1., name='w')
...     return x * self.w

In TF2 name scopes are cosmetic and this should be less of a big deal. We might
consider encouraging users who want to filter on names to instead use flatten
to extract a state dictionary for their objects (c.f.
tensorflow/community#56 (comment)).

I have moved the automatic name scoping logic (Metaclass etc) and associated
tests into Sonnet 2.

PiperOrigin-RevId: 235540184
karllessard added a commit to karllessard/tensorflow-community that referenced this pull request May 10, 2019
* Adding a doc to deprecate collections

* Responding to Karmels comments

* Minor fix to VariableTracker sample code

* RFC for random numbers in TensorFlow 2.0

* Changes after some feedback

* Removed 'global_seed' in the main code and showed the design with 'global_seed' in the Questions section.

* Some changes after feedback

* A tweak

* Change after feedback

* A tweak

* changes

* changes

* fix link

* new-rfc

* changes

* Update rfcs/20181225-tf-backend.md

Co-Authored-By: alextp <apassos@google.com>

* Added some considerations about tf.function

* Renamed the internal name "op_generator" to "global_generator"

* Changed seed size from 256 to 1024 bits

* Initial signpost for community meetings

Adding this so there is basic information about how to find the community calendar and get invited to meetings.

* Add iCal link too

* changes

* Initial version of embedding and partitioned variable RFC.

* Fix one formatting issue.

* Fix another formatting issue.

* Use markdown language for the table instead of HTML.

* Add tensorflow/io R Package CRAN release instructions (tensorflow#53)

* Added Design Review Notes

* Make clear distinction between embedding variables and loadbalancing
variables.

* Added decisions below each question, and "how to use generators with distribution strategies".

* Adopted Dong Lin's suggestions

* Add a paragraph pointing out the problem with the `partition_strategy` argument.

* RFC: Move from tf.contrib to addons (tensorflow#37)

* Checkpoint addons RFC for review

* Add code review to RFC

Add future pull request information to criteria

Update modified date

added some description

RFC Move to addons

* Add weight decay optimizers

* Remove conv2d_in_plane

* Add group_norm

* Accept addons RFC

* Update alternatives since `DynamicPartition` and `DynamicStitch` do have GPU kernels.

* Add a section for saving and restore `PartitionedVariable`.

* Mention that variable types can be nested, attention needs to be paid to their saving and restoring mechanism.

* Create README.md (tensorflow#57)

* Splitted `_state_var` into `_state_var` and `_alg_var` (because of concerns from implementation), and changed status to "Accepted"

* Updated timestamp

* Moved the auto-selection of algorithm from `create_rng_state` to `Generator.__init__`

* Update according to the discussion

* Move performance heuristics in Distribution Strategy level. We will not expose knobs for users to control;
* Emphasize that embedding support in v2 will all be via `Embedding` layer. Users can use `tf.compat.v1` to handle embedding by themselves;
* Mention that default `partition_strategy` in v1 `embedding_lookup` is "mod", which will possibly break users's model when they update to TF 2.0;
* We want to prioritize shuffling embedding after 2.0 release;
* We have plans to serialize and deserialize `Embedding` layer and Distribution Strategies to allow loading a saved model to a different number of partitions.

* Update relese binary build command for sig-io (tensorflow#58)

This PR updates relese binary build command for sig-io

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add Bryan to SIG IO release team (tensorflow#59)

* Change to accepted

* Add link to TensorFlow IO R package

* Updated link for the friction log. (tensorflow#64)

* Switch DistStrat revised API examples to TensorFlow 2 style. (tensorflow#63)

* RFC: Attention for Dense Networks on Keras (tensorflow#54)

* Design review for "Attention for Dense Networks"

* RFC: Stateful Containers with tf.Module (tensorflow#56)

* Create 20190117-tf-module.md

* Update 20190117-tf-module.md

* Loosen return type for variable properties.

* Use Dense consistently.

Thanks brilee@ for spotting!

* Remove convert_to_tensor from examples.

This wasn't ever required and including it might cause confusion.

h/t pluskid@ gehring@ and awav@

* Remove owned_* methods.

* Document `_flatten`

See tensorflow/tensorflow@5076adf6 for more context.

* Fix typo in module name.

Thanks k-w-w@!

* Update 20190117-tf-module.md

* RFC: New tf.print (tensorflow#14)

* New tf.print proposal

* Attempt to fix table of contents

* Removed not-working TOC label

* Minor updates to the doc.

* Update tf.print to be accepted

* Added design review notes

* Marking doc as accepted

* Update cond_v2 design doc (tensorflow#70)

* Update to bring in line with implementation

* Added the symbol map to the RFC.

* Updated testing section of the Community site.

* Removed the 100%, formatting tweaks.

* Update CHARTER.md

* Change contact email address

I will leave my current company soon, so update my email.

* Create README.md

* Logos for SIGs

* Update README.md

* Update addons owners (tensorflow#85)

Add Yan Facai as another project lead.

* Created a FAQ for TF 2.0. (tensorflow#78)

Adding 2.0 related FAQ to the Testing group.

* Request and charter for SIG JVM (tensorflow#86)

Chartering docs for SIG JVM

* Update CODEOWNERS

Add @karllessard, @sjamesr and @tzolov as code owners for sigs/jvm.

* Update CODEOWNERS

Add missing /

* Update CODEOWNERS

Add @dynamicwebpaige as owner for sigs/testing/

* Update RFC with current information (tensorflow#89)

Make current to SIG Addons

* RFC: TF on Demand Project (tensorflow#69)

* Adding an RFC for TF on Demand Project.

* modified one line in tf-on-demand md file.

* Changing RFC status from PROPOSED to ACCEPTED.

* RFC: SavedModel Save/Load in 2.x (tensorflow#34)

* RFC for SavedModel Save/Load in 2.x

* Minor edits and a discussion topic for load() with multiple MetaGraphs

* Tweak to the "Imported representations of signatures" section

* Update "Importing existing SavedModels" with the .signatures change

* Update RFC and add review notes

* Status -> accepted

* Update CHARTER.md

New leads.

* Update 20180920-unify-rnn-interface.md (tensorflow#81)

Typo fix.

* Update yyyymmdd-rfc-template.md

Adding "user benefit" section into the RFC template, to encourage articulating the benefit to users in a clear way.

* Update while_v2 design doc (tensorflow#71)

* Update while_v2 design doc, include link to implementation

* Update TF 2.0 FAQ to link to TensorBoard TF 2.0 tutorial (tensorflow#94)

* CLN: update sig addons logo png (tensorflow#99)

* Add SIG Keras

Add a reference link to Keras' governance repository for SIG Keras.

* RFC: String Tensor Unification (tensorflow#91)

* RFC: String Tensor Unification

* Updated rfcs/20190411-string-unification.md

Updated TFLite sections to address feedback from @jdduke.  Marked as
Accepted.

* Start RFC for tensor buffers
sonnet-copybara pushed a commit to google-deepmind/sonnet that referenced this pull request May 29, 2019
h/t. tensorflow/community#56

PiperOrigin-RevId: 250436499
Change-Id: I9332d90f6331756f12907566af9a9e23f86aded8
aisuperdev added a commit to aisuperdev/sonnet that referenced this pull request May 30, 2022
h/t. tensorflow/community#56

PiperOrigin-RevId: 250436499
Change-Id: I9332d90f6331756f12907566af9a9e23f86aded8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes RFC: Accepted RFC Design Document: Accepted by Review
Projects
RFC management
  
Accepted RFCs
Development

Successfully merging this pull request may close these issues.

None yet