RFC: Stateful Containers with tf.Module #56

tomhennigan · 2019-01-17T17:01:37Z

This review is open for public comment until 2019-02-11

Status	Accepted
Author(s)	DeepMind: tomhennigan@google.com, mareynolds@google.com
	Google: agarwal@google.com, apassos@google.com, dberth@google.com
Sponsor	apassos@google.com
Updated	2019-01-16

Summary

We propose a lightweight base class (tf.Module) for stateful containers in TensorFlow. This base class offers three main features:

Naming - variables and ops created inside modules are created within a name scope so they can easily be tracked back to their creator.
Variable tracking - variables referenced by a module (directly or via dependencies on other modules) can be retrieved via a simple API.
Module tracking - given a module a simple API exposes all modules that this module depends on.

Users are encouraged to subclass tf.Module to create their own stateful components and we expect and encourage an ecosystem of third party modules to evolve.

Thanks in advance for your feedback!

alextp · 2019-01-17T23:05:16Z

rfcs/20190117-tf-module.md

+
+**`tf.Variable` creation and reuse**
+
+-   Modules should be able to create and reuse variables.


What do you mean by reuse here?

My understanding is that modules only create variables, and reuse is done by reusing modules, right?

Your understanding is correct. Variable creation (self.w = tf.Variable(..)) and reuse (tf.whatever(some_input, self.w)) is enabled via OOP (no magic).

We considered re-implementing variable_scope and get_variable but felt that the OOP design was a lot cleaner and more Pythonic.

alextp · 2019-01-17T23:08:39Z

rfcs/20190117-tf-module.md

+  def apply_gradients(self, vars_and_grads):
+    for variable, grad in vars_and_grads:
+      variable.assign_sub(grad * self.learning_rate)
+```


I'd like to comment here on the following points:

Module should be the base class for tf.keras.Layer

We explicitly chose not to force a call signature or build() strategy

Pros and cons of the name Module (familiarity with pytorch being a big issue here)

Agreed re (1) and re (2) I call out why a bit in the detailed design if anyone wants more context.

Re (3) I feel strongly that Module is the right name and what users will expect. The term Module and its usage in ML is almost as old as I am, the oldest reference I know about (h/t @jekbradbury for the reference) is from SN/Lush [0]:

In Lush, building and training a complex system is done by assembling basic blocks, called modules. Modules are subclasses of the class gb-module.

[0] http://lush.sourceforge.net/lush-manual/51d13a16626fd68f.html

awav · 2019-01-17T23:45:22Z

@tomhennigan, That is awesome! It is very similar to what we have in GPflow - Parameterized class. With this change in TensorFlow we will be able to serialize our objects (currently we have custom saver) and integrate them with Keras models.

Although, RFC could be extended with array-like modules, but don't confuse it with keras sequence. Keras sequence implies dependency between layers/modules, whereas array-like module is a different approach for organising submodules with indexing restricted to integer.

E.g. in GPflow we can construct a sum of the kernels (modules) and represent it as a super-kernel class or a combination kernel. Essentially, a combination kernel is a list of kernels, and when we call, let's say, __call__ method, it sums over sub-kernels results.

I'm happy to give more examples if needed. Thanks!

leandro-gracia-gil · 2019-01-18T03:26:22Z

rfcs/20190117-tf-module.md

+'custom_name/v:0'
+```
+
+Modules created inside methods of other modules inherit their parent module


How variable scopes would be affected in graph mode? For example, let's say we have a module named 'bar' that creates a tf.Variable 'v' and a tf.Tensor 't' inside __ call __. Then, we call our module while inside a 'foo' name scope. What would be the names in that case? Maybe 'bar/v' for the tf.Variable and 'foo/bar/t' for the tf.Tensor?

Whilst the design works in TF1 we have designed with TF2 in mind. In TF2 there are no variable scopes (we use name scope to name variables and tensors). In eager mode tf.Tensor does not have a name property, although if you inspect the graph from a @tf.function you do see tensor names.

Module inherits the name_scope from when the module is constructed. I've run the following with TF1 and as you can see the 'use' name_scope makes no difference inside the module but is applied when using the result of calling the module:

class BarModule(tf.Module): def __call__(self): v = tf.Variable(1., name='v') t = tf.constant(1., name='t') return v, t with tf.Graph().as_default(): # Enable TF1 graph mode. with tf.name_scope('create'): bar = BarModule() with tf.name_scope('use'): for i in range(2): print('iter', i) v, t = bar() x = v * t print('- v', v.name) print('- t', t.name) print('- x', x.name) print()

iter 0 - v create/bar_module/v:0 - t create/bar_module/t:0 - x use/mul:0 iter 1 - v create/bar_module/v_1:0 - t create/bar_module/t_1:0 - x use/mul_1:0

leandro-gracia-gil · 2019-01-18T03:39:03Z

rfcs/20190117-tf-module.md

+    self.v = tf.Variable(1., name='v')
+
+>>> m = MyModule()
+>>> m.name


What happens if the module name ends in '/'? Usually that is used to set absolute name scopes ignoring the current stack. Similarly, what happens if a child module does it?

In the experimental implementation of this I've enforced (in the constructor) that the module name is a valid Python 2 identifier [0], so / is not allowed.

[0] https://docs.python.org/2/reference/lexical_analysis.html#identifiers

leandro-gracia-gil · 2019-01-18T05:41:11Z

Thank you very much for this proposal! We have a remarkably similar class in our custom framework, where we use it as the base for all our own layers and composable networks. It makes me happy to see something like this being added to TF 2.0.

Given the similarity, I'll try to keep an eye to the review in case I notice anything that reminds me of any problems we had developing or using our own approach, but so far I'd say it looks pretty good.

guillaumekln · 2019-01-18T08:20:10Z

I was under the impression that inheriting from tf.keras.layers.Layer was the recommended way to create stateful and reusable components in TensorFlow 2.0. This proposal makes it more nuanced. IMO it should clarify when one should extend tf.Module versus extend tf.keras.layers.Layer.

tomhennigan · 2019-01-18T10:37:33Z

I was under the impression that inheriting from tf.keras.layers.Layer was the recommended way to create stateful and reusable components in TensorFlow 2.0. This proposal makes it more nuanced. IMO it should clarify when one should extend tf.Module versus extend tf.keras.layers.Layer.

If this RFC is accepted @alextp and I hope that we can make tf.keras.layers.Layer extend tf.Module (instead of Checkpointable as it does today). Module can help make the implementation of Layer simpler by implementing part of it's API (e.g. the Layer.variables property). We have not attempted to do this yet and I'm not sure if there would be any technical blockers.

Conceptually I see Keras Layer as enriched Modules with additional features (e.g. build(shapes)/call(inputs) pattern, json/yaml serialization, regularizers, losses, metrics, masking etc). If these features are useful to you, or if you are using your subclass in a Keras Model then Keras Layer is a good choice.

If the additional functionality of Keras is not useful for you then you may prefer to subclass tf.Module which is simpler with far fewer features. If tf.Module is not useful you may want to subclass Checkpointable or just object 😄

tomhennigan · 2019-01-18T11:15:15Z

@tomhennigan, That is awesome! It is very similar to what we have in GPflow - Parameterized class. With this change in TensorFlow we will be able to serialize our objects (currently we have custom saver) and integrate them with Keras models.

Serialization (as in saving/restoring parameters) should work great (since we extend Checkpointable). I think you'll have a better time integrating with Keras Model if you extend Layer instead. We're hoping to make tf.Module a base class for Keras Layer but that is not included as part of this proposal (see this comment for more context).

Although, RFC could be extended with array-like modules, but don't confuse it with keras sequence. Keras sequence implies dependency between layers/modules, whereas array-like module is a different approach for organising submodules with indexing restricted to integer.

E.g. in GPflow we can construct a sum of the kernels (modules) and represent it as a super-kernel class or a combination kernel. Essentially, a combination kernel is a list of kernels, and when we call, let's say, call method, it sums over sub-kernels results.

I think this should "just work". I tried creating a simple example, please let me know if I've completely misunderstood what you're trying to achieve:

class MulKernel(tf.Module):
  def __init__(self, m, dtype=tf.float32, name=None):
    super(MulKernel, self).__init__(name=name)
    self.m = tf.Variable(tf.constant(m, dtype=dtype), name='m')

  def __call__(self, x):
    return self.m * 1

class ArrayModule(tf.Module):
  def __init__(self, kernels):
    super(ArrayModule, self).__init__()
    self.kernels = kernels

  def __call__(self, x):
    x = tf.convert_to_tensor(x, name='x')
    results = [kernel(x) for kernel in self.kernels]
    return tf.reduce_sum(results)

a = ArrayModule([MulKernel(i, name='mul_%d' % i) for i  in range(10)])
y = a(1.)

Which produces this rather pretty graph:

superbobry · 2019-01-18T10:44:12Z

rfcs/20190117-tf-module.md

+
+  @property
+  def variables(self):
+    """Returns a tuple of variables owned by this module and it's submodules.


I think list is a better fit conceptually, since the type in question is [Variable] rather than (Variable, Variable, ...). Maybe we could simply remove the type from the docstring or replace it with something less restrictive e.g. "collection"?

The same applies to other *variables properties below.

Thanks for the IRL discussion, I'll use "Collection" in the docstring and still return a tuple at least for now.

In general, returning a tuple makes the API a bit more efficient since you don't have to return a copy to avoid callers mutating your internal data structures.

We agree returning a tuple is the right decision, but the signature/contract of the method should be less specific to allow this to change in future (e.g. if there is a faster immutable collection we might want to use that instead).

If we could include type hints in TF code I would go with the following:

@property def variables(self) -> typing.Collection[tf.Variable]: ...

Perhaps typing.Sequence would be better here? Importantly, the returned variables have a well-defined, deterministic ordering.

superbobry · 2019-01-18T10:47:49Z

rfcs/20190117-tf-module.md

+    your base class.
+-   Encouraging a lazy pattern might make it harder to create modules with
+    multiple forward methods (e.g. a VAE module might be implemented with an
+    `encode` and `decode` method with no `__call__`).


Would the users have to override __call__ with NotImplementedError in that case?

We prototyped it such that you didn't have to implement __call__ (e.g. _call was not @abstractmethod), but if users called your module then we throw NotImplementedError. Even this has downsides though, like introspection/generated docs making it look like you had a call method.

@tomhennigan, I agree 100% with this decision. There is no need to bloat the tf.Module class more than is absolutely required. Since the goal of this class is to handle stateful DAGs of modules, it is best to focus efforts on that.

As pointed out in this doc, there are many use cases where we would want to handle stateful DAGs but don't require a __call__ method. I see it the responsibility of other libraries/sub-package to extend the functionality of tf.Module. I find specialized classes much easier to extend and reason about.

superbobry · 2019-01-18T11:16:29Z

rfcs/20190117-tf-module.md

+each leaf. This means variables nested in properties will be found (e.g.
+variables in `list`s or `tuple`s on the module).
+
+This "walk" function is used to provide the following properties:


Are there properties computed once per instance, or are they fully dynamic?

Fully dynamic, trying to reason about when to invalidate the caches would be non-trivial.

I'm thinking the performance cost of having this dynamic should be well documented so that users know to cache the results of these calls if using eager and good performance is required. Though, if everything works seamlessly with tf.function, then maybe this wouldn't be necessary and encouraging users to use tf.function for better performance would be sufficient.

Agreed re documentation, I'll add a note in the docstrings.

For a 100 linear layer sequential it takes 2ms to collect all 200 parameters (at least on my workstation). This could be worse for modules with deep hierarchies or large nests.

Assuming the call to trainable_variables (or whatever) is inside the tf.function this cost is only paid when the function is traced (~once).

awav · 2019-01-18T12:39:14Z

@tomhennigan

I think you'll have a better time integrating with Keras Model if you extend Layer instead.

The finest abstraction is not actually deep learning model (is that correct that keras still a deep learning dedicated framework?), and not a model at all. The tf.Module perfectly fits our needs. But, in some cases this abstract blocks can be included in layer compostion, e.g. head of the sequence is a CNN and output of CNN passes though a GP to get final classification result, https://arxiv.org/abs/1511.02222.

If this RFC is accepted @alextp and I hope that we can make tf.keras.layers.Layer extend tf.Module (instead of Checkpointable as it does today). Module can help make the implementation of Layer simpler by implementing part of it's API (e.g. the Layer.variables property). We have not attempted to do this yet and I'm not sure if there would be any technical blockers.

I'd say, making keras layers depend on tf.Module should be one of the priorities for this RFC - unification of interfaces in TensorFlow. It diminishes a barrier for users, who inherit from tf.Module and want to take advantage of keras features.

Serialization (as in saving/restoring parameters) should work great (since we extend Checkpointable)

That's great!

I think this should "just work". I tried creating a simple example, please let me know if I've completely misunderstood what you're trying to achieve: ...

If tf.Module can find sub-module's in an attribute, which is actually a list, and properties like variables, trainable_variables and etc return correct objects then the problem is solved.

tomhennigan · 2019-01-18T13:32:14Z

@awav

I'd say, making keras layers depend on tf.Module should be one of the priorities for this RFC - unification of interfaces in TensorFlow.

We've been mindful to design tf.Module in such a way that it could be a stateful container for a number of components in TensorFlow (not just tf.keras), but I think integrating it well with tf.keras requires a separate design/RFC process. I'm not planning to address that here.

If tf.Module can find sub-module's in an attribute, which is actually a list, and properties like variables, trainable_variables and etc return correct objects then the problem is solved.

Yep, this works as you would expect (we use tf.contrib.framework.nest.flatten(prop) for each object property and walk all leaves).

martinwicke · 2019-01-18T20:19:09Z

Or Iterable? We could adjust the docstring now. Come 2020, we will drop Py2 support, and all this typing can actually happen for real.

tomhennigan · 2019-01-18T20:31:47Z

Or Iterable? We could adjust the docstring now.

Collection is better since it indicates the returned value supports contains and len (iterable just indicates it has __iter__). Agreed updating the docstring is clearest.

Come 2020, we will drop Py2 support, and all this typing can actually
happen for real.

Can't wait 😄count me in on a fixit to go through and add types!

Addresses comments from tensorflow/community#56. PiperOrigin-RevId: 230225963

sguada · 2019-01-25T21:41:52Z

+1 for this proposal, it strikes a great balance between abstraction and usability.

lc0 · 2019-01-30T07:21:44Z

rfcs/20190117-tf-module.md

+
+**`tf.Module` tracking**
+
+-   Modules should track dependencies (e.g. a sequential should provide access


Something that started on twitter -> https://twitter.com/lc0d3r/status/1090351739537145858

Since we are trying to also

track dependencies (e.g. a sequential should provide access to its layers) and support checkpointing as a graph.

Would it make sense to think about the next step and have on the model level an approachable way to have access to DAG.

The idea is, once you inherit from tf.Module you can run something similar to keras model.summary and even more exciting - model.plot_model.

Sorry for a bit confusing way to formulate this suggestion

Hey @lc0, thanks for the suggestion! We've been thinking about something similar. I'm not sure if we should solve this inside tf.Module or if this is something we should leave to framework authors, what do you think?

FWIW the current module APIs expose a lot of useful information to do a modest job of this outside of Module:

import tabulate def weight_size(m): size = 0 for v in m.variables: size += v.numpy().nbytes suffix = 'B' suffixes = ['KB', 'MB', 'GB', 'TB'] while size > 1024 and suffixes: size /= 1024 suffix = suffixes.pop(0) return '%d %s' % (size, suffix) def summarise(module): rows = [] for m in module.submodules: row = [m.name, m.__class__.__name__, weight_size(m)] rows.append(row) print(tabulate.tabulate(rows, headers=['attr', 'class', 'size'], tablefmt='grid'))

+----------+---------+--------+ | attr | class | size | +==========+=========+========+ | linear_0 | Linear | 8 KB | +----------+---------+--------+ | linear_1 | Linear | 4 MB | +----------+---------+--------+ | linear_2 | Linear | 4 MB | +----------+---------+--------+ | linear_3 | Linear | 40 KB | +----------+---------+--------+

The table seems excessive. Here is something with less chrome:

attr class size ------------------------- linear_0 Linear 8 KB linear_1 Linear 4 MB linear_2 Linear 4 MB linear_3 Linear 40 KB

Alternatively, you can use the pyTorch pattern, which IMO is cleaner:

YourFantasticMode: ( (linear_0) Linear(100, 20, 100) 8 KB (linear_1) Linear(20, 40, 100) 3.2 KB (linear_2) Linear(40, 20, 100) 3.2 KB (linear_3) Linear(20, 20, 100) 1.6 KB )

pluskid · 2019-02-01T18:57:35Z

rfcs/20190117-tf-module.md

+>>> m.name_scope.name
+'my_module/'
+>>> m.v.name
+'my_module/v:0'


Are the variable names (with scope) attached upon the variable creation? An alternative might be the pytorch way that a variable does not know about its (parent's) scope, and the full scoped name is only constructed on the fly depending on which parent module are we dealing with (because a module can be re-used in different parent modules, and they might even be constructed externally and pass in via the constructor of a custom module).

I'm thinking about the following pseudocode example:

class FunnyAutoEncoder(tf.Module): def __init__(self, encoder, decoder, name=None): super(...) self.encoder = encoder self.decoder = decoder def __call__(self, x): z = self.encoder(x) z2 = tf.funny_op(z) return self.decoder(z2) encoder = MLP(...) decoder = MLP(...) ae = FunnyAutoEncoder(encoder, decoder)

then ae.encoder.layerX.w.name will probably not show anything related to being under an auto encoder? Instead, it will be just with a top-level 'mlp/' scope?

I think a major scenario of having full scoped names available is to iterate through all the variables in a module, and do some inspection / processing to the variable according to the names. In this case, it might be more useful to have the (scoped) names reflect what role each variable is playing in the module.

However, I guess one drawback might be to detect potential duplication. For example, if one does this

encoder = ResidualBlock(...) decoder = encoder ae = FunnyAutoEncoder(encoder, decoder)

then the same variables might be listed twice under different (scoped) names. But I guess this situation is rather rare, and it is probably to explicitly define a FunnyAutoEncoderSharedEncoderDecoder that has only one encoder_and_decoder field.

So currently the proposal is that name scopes are determined when the module is constructed and from that point it is fixed per module. For name based filtering you can do something like this:

encoder = MLP(..., name='encoder') decoder = MLP(..., name='decoder') ae = FunnyAutoEncoder(encoder, decoder) encoder_vars = encoder.variables # .. or .. encoder_vars = [v for v in ae.variables if v.name.startswith('encoder/')]

It's worth pointing out that you can also enter a name scope and build modules inside it if you want to group a set of modules that way:

with tf.name_scope('encoder'): encoder = MLP(...) # name_scope will be `encoder/mlp/` decoder = MLP(..., name='decoder') # name_scope := `decoder/` ae = FunnyAutoEncoder(encoder, decoder) # name_scope := `funny_auto_encoder/`

Is that flexible enough for your use cases? We've had a similar pattern in Sonnet and I've not heard any complaints from users about it, but perhaps there's something I'm missing!

@tomhennigan Thanks! I'm not sure if it is because I'm too used to the other way of thinking. Let me state the use cases in my mind and we can see if it is easy to do in the current proposal. I think the main inflexibility of the current approach is that in order to have a meaningful namescope hierarchy, we can only construct the model from top-down according to the hierarchy. It will be difficult, for example, to construct some part of the component (e.g. the encoder from the example above) without knowing beforehand what that component is going to be used for (i.e. in your example, you explicitly pass the name='encoder' argument, indicating you know that it is going to be used as an encoder).

This problem might be more pronounced when one want to write a generic library of commonly used (sub)nets. For example, in library1 (say Sonnet2), I have a Module that defines a default ResNet implementation. In library2, which is a general RL training framework, that allow one to pass in a Module to be used for each specific component (e.g. the Vision part of the agent). Now as the user, if I want to use the ResNet in the vision part of the agent, I might need to use some hack to figure out what proper name to pass into when constructing the ResNet to make it consistent with the naming hierarchy in the agent model. Moreover, if the agent is grouping all its vision part code under the name scope agent/vision, then I cannot pass agent/vision as name= parameter to ResNet because it is not a proper python variable name.

My guess that this is not a problem in Sonnet(v1) is that Sonnet(v1) is still graph based library with explicit build() function. In other words, when you create the Python object, the underlying graph building (therefore the naming and scoping) has not been called yet. Therefore, one can also change the name scoping when the build is actually being called. Here is some pseudo-code of my understanding of what might happen in Sonnet V1

class Agent(...): def __init__(self, vision_net, ...): self.vision_net = vision_net # .... def build(self, ...): with tf.name_scope(self.name + '/vision'): self.vision_net.build(...)

On the other hand, with the current proposal, users could (and it seems recommended) create variable in the python object __init__ function, which will attach name to the variable upon creation. Therefore, one need to know the exact name of the parent hierarchy upon creating that python object. Once the Module object is created, the parent module can no longer re-wire its name scope as in the case of Sonnet V1. Is my understanding correct here?

I agree with @pluskid that dependency injection should at least be possible. My preference is for DI to be the recommended way to build up modules, but I'd understand if others disagreed.

Thanks for the IRL discussions @pluskid, @brilee, @superbobry and @chr1sj0nes.

To summarize our discussions there are many cases where there are multiple paths to weights in a model (e.g. model.encoder.w might be the same instance as model.decoder.w) and this is somewhat confusing since in TensorFlow variables have a single, immutable name (e.g. w's name would be encoder/w:0 in both cases if encoder created it).

Perhaps the "right" solution is to remove the name from variables in Python, and instead use Python variable names to "name" variables. Sadly this is not in scope for TensorFlow 2, and we agreed that the current strategy of variable naming at creation time was OK for simple cases and better than not naming.

We discussed that the python variables or object structure can be used instead to associate a path which each parameter and in the above case you would have two paths to w. We agreed that it would be convenient to have some easy way to traverse the structure of arbitrary Modules in a way that have some simple protocol (e.g. somewhere inside the model there is a property called encoder and somewhere inside the model there is a property called decoder) to filter these paths.

In tensorflow/tensorflow@1d5fd9219 I added with_path to Module's _flatten method. This allows you to do exactly that. For example you can find paths in some arbitrary model to all tf.Variables inside a (possibly deeply nested) Module in a property called encoder:

>>> class MyModule(tf.Module): ... @property ... def state_dict(self): ... return dict(self._flatten( ... predicate=lambda v: isinstance(v, tf.Variable), with_path=True)) >>> mod = MyModule() >>> mod.encoder = Encoder() >>> mod.decoder = mod.encoder >>> mod.state_dict {('encoder', 'w'), <tf.Variable: ...>, ('decoder', 'w'), <tf.Variable: ...>} >>> {k: v for k, v in mod.state_dict.items() if 'encoder' in k} {('encoder', 'w'), <tf.Variable: ...>}

I hope that helps 😄. h/t @gehring who was asking about how to determine "depth" in another comment. I think this with_path option to _flatten (or the recursive flag) would help solve that too.

pluskid · 2019-02-01T19:01:59Z

rfcs/20190117-tf-module.md

+  ...     self.b = tf.Variable(tf.zeros([output_features]), name='b')
+  ...
+  ...   def __call__(self, x):
+  ...     x = tf.convert_to_tensor(x, name='x')


I feel it is important to avoid having the users to call tf.convert_to_tensor every time they define their own __call__ function. One might want to define implicit pre-call hook to do the conversion automatically. Or better approach might be to explicitly require the API to only accept Tensor (or nested tuple of tensors) as inputs, and ask user to do explicit conversion before calling from top level.

This is because many of the modules are probably to be used as building blocks in bigger architectures (e.g. residual blocks inside a resnet). They will not directly interact with the user most of the time, so there is probably no need for them to call convert_to_tensor every time, but it might be better to enforce the API to have consistent behavior.

While I agree with you that it would be good for any callable subclass of tf.Module to have some automatic conversion to tensor (e.g., in an implementation of Layer 2.0), this is beyond the scope of tf.Module. The Dense class example only serves to demonstrate how one would subclass tf.Module. Let me try to lay out what I understand the purpose of tf.Module to be and what the reason for its limited scope is. (I am not affiliated with the TF team)

After the shift to object-oriented storing of tf.Variable's, variable reuse is done by storing variables within object attributes. It would be impractical and clunky to store all variables in a unique container object (which would essentially recreate TF 1.0's collections). To avoid this, we prefer to have variables contained within the objects that would use them, e.g., kernel and bias variables as attributes of the Dense instance that uses it. While this new way of organizing variables has many advantages, it does make checkpointing and operating on the "global" variables more difficult (e.g., collecting variables for gradient descent).

By design, tf.Module is lightweight and focuses on addressing the challenges of managing nested stateful objects. Its sole purpose is to abstract away the DAG of modules for the purpose of operating on all relevant variables. This allows us to act like our top-level/root module explicitly contains all our variables. This is done by providing traversal methods which collect all the variables. As a bonus, tf.Module automatically handles the complexities of checkpointing when variables are allowed to be created before and after the checkpoint is loaded, e.g., hydration and instantiation of variables.

By limiting tf.Module's responsibilities to managing nested stateful modules, we end up with code that is easy to understand, document and extend. As a result, any developer wishing to implement some stateful containers can easily do so without intricate knowledge of how best to manage "global" state by subclassing tf.Module. Since handling stateful DAGs of modules is a very common requirement, tf.Module could easily be used by most sub-packages and libraries with stateful operations, e.g., tf.train, tf.metrics, tf.estimator. (Note, this specific proposal is arguing for the existence/inclusion of tf.Module in TF 2.0. The advantages of using tf.Module as a base class throughout tensorflow will likely be examined on case by case basis in a separate proposal.)

@gehring Yes, I agree with you. I don't think doing automatic conversion is a good idea. But enforcing the inputs are Tensors are probably good (even only implicit as a statement in the documentation). Anyhow, this is a minor thing in the whole proposal.

+1 to what @gehring says. The decision of whether to call convert_to_tensor is entirely up to whoever subclasses tf.Module, there is no requirement to do so 😄

I'm strongly against auto-conversion to tensors and agree with @tomhennigan, it is up to a scientist, engineers or anyone else, what to do with non tensor types.

I've made this "private" since this exposes the implementation of modules, which in typical Python code does not represent part of the classes public API. As such I think having the leading _ makes it clear (at least to the linter) that you're using an "internal" method. As always this is a "sign" not a cop. As a concrete example, I strongly encourage the following usage: >>> class FooModule(...): ... @Property ... def hash_tables(self): ... return tuple(self._flatten(predicate=IS_HASH_TABLE)) But in general we should discourage usage outside of modules: >>> m = FooModule() >>> hash_tables = tuple(m._flatten(predicate=IS_HASH_TABLE)) # Naughty. c.f. tensorflow/community#56 PiperOrigin-RevId: 232246092

brilee

Love the proposal.

brilee · 2019-02-05T02:50:45Z

rfcs/20190117-tf-module.md

+
+-   Each module instance must have a name scope. This name should include the
+    name scope from when the module is constructed.
+-   Every method on the module must enter this name scope before calling user


Name scopes aren't reentrant; do you have a plan for how we can avoid creating ops with confusing names like 'mymodule/mymodule/mymodule/relu_2'?

In eager op names are not a thing so that's good.

For tf.function names are uniqueified within the function body, so you would only get an _N suffix if there were actually two relu's within the same module (or you called your module multiple times sequentially).

As a nit, scope names are re-entrant (as in you can do with tf.name_scope("foo/"): ... as many times as you like) but op names need to be unique.

Example: https://gist.github.com/tomhennigan/a183a6570d2a6fb9bce08758ee2d78b6

Thanks for the correction. I see now that this works, although I'm still a bit confused at where the _1 suffix comes from.

@tf.function def f(x): name_scope = tf.name_scope('blah') with name_scope: with name_scope: return x + 1 g = f.get_concrete_function(tf.TensorSpec([1, 64], tf.float32)).graph for node in g.as_graph_def().node: print(node.name) ... x blah_1/add/y blah_1/add Identity

In this case you're creating the name scope in graph mode (inside the function) and in graph mode name scopes are de-dupd. I guess the function is being traced twice (hence 'blah' followed by 'blah_1'). If you use a fully qualified name scope (e.g. tf.name_scope('blah/')) then the add op will be de-dup'd not the name_scope.

With tf.Module we use the module name to create a unique scope in the constructor [0] and then re-enter the fully qualified name whenever you use with module.name_scope: ... [1].

This is an interesting point though, in eager mode name scopes are not de-dup'd but in graph they are. We have a test explicitly defending this behavior [2] and I expect in TF2 almost all users to create modules outside a tf.function (aka. eagerly) and to call them inside tf.functions.

[0] https://github.com/tensorflow/tensorflow/blob/e0585bc351b19da39610cc20f6d7622b439dca4d/tensorflow/python/module/module.py#L176-L177
[1] https://github.com/tensorflow/tensorflow/blob/e0585bc351b19da39610cc20f6d7622b439dca4d/tensorflow/python/module/module.py#L194-L198
[2] https://github.com/tensorflow/tensorflow/blob/e0585bc351b19da39610cc20f6d7622b439dca4d/tensorflow/python/module/module_test.py#L92-L109

rfcs/20190117-tf-module.md

brilee · 2019-02-05T03:07:58Z

rfcs/20190117-tf-module.md

+>>> m.name_scope.name
+'my_module/'
+>>> m.v.name
+'my_module/v:0'


I agree with @pluskid that dependency injection should at least be possible. My preference is for DI to be the recommended way to build up modules, but I'd understand if others disagreed.

Addresses feedback on tensorflow/community#56. PiperOrigin-RevId: 232457135

Thanks brilee@ for spotting!

This wasn't ever required and including it might cause confusion. h/t pluskid@ gehring@ and awav@

gehring · 2019-02-05T20:03:38Z

rfcs/20190117-tf-module.md

-      yield variables inside nested structures (lists etc) but not in other
-      modules.
-    """
-
  @property
  def submodules(self):


@tomhennigan Is this done depth first or breadth first? I feel like it could be a good thing for tf.Module to offer a way to infer the depth of submodules/variables. Without owned_*, what would be the best way for a user to do this without re-implementing tf.Module's logic?

Breadth first.

So there are two options (frustratingly some recent commits haven't been sync'd to GitHub so I can't just point you at the code!).

Firstly we've made it really easy for subclasses to implement this if the want by exposing _flatten:

class MyModule(tf.Module): @property def owned_submodules(self): return tuple(self._flatten(recursive=False, predicate=lambda v: isinstance(v, tf.Module)))

I also discussed with brilee@ and pluskid@ earlier about their feedback on variable names. We were aligned that what they both wanted was not really Variable.name, but rather the path to all variables (e.g. allowing the same variable instance to be reachable via multiple paths). We're planning to add an option to _flatten to allow flattening with paths. So for example in the case where you have:

encoder = Encoder() decoder = encoder auto = Auto(encoder, decoder)

You can implement variables_and_paths in one line with self._flatten(..., with_path=True):

>>> auto.variables_and_paths() {"encoder.layers[0].w": <tf.Variable ...>, "encoder.layers[1].w": <tf.Variable ...>, "decoder.layers[0].w": <tf.Variable ...>, "decoder.layers[1].w": <tf.Variable ...>}

It seems to me like this is another way of representing depth. WDYT?

Actually the CL has been pushed 😄, _flatten is in tensorflow/tensorflow@5076adf6

I didn't have a particular use case in mind but I believe it is likely for someone to require that type of info (if not simply for human readability). The with_path solution is lightweight and should pretty much cover any uses cases related to the topology of the variables/sub-modules. I'm 100% on board, nice job!

FYI with_path is in tensorflow/tensorflow@1d5fd921 😄

``` >>> class MyModule(tf.Module): ... @Property ... def state_dict(self): ... return dict(self._flatten( ... predicate=lambda v: isinstance(v, tf.Variable), with_path=True)) >>> mod = MyModule() >>> mod.encoder = Encoder() >>> mod.decoder = mod.encoder >>> mod.state_dict {('encoder', 'w'), <tf.Variable: ...>, ('decoder', 'w'), <tf.Variable: ...>} ``` h/t tensorflow/community#56 PiperOrigin-RevId: 232908045

An issue with decorating methods in the metaclass is that when users annotate methods with decorators that return Objects (e.g. `tf.function`) they lose the ability to access methods on those objects (other than call) because our `with_name_scope` decorator is "on the outside". A simple answer to this from other decorators is to copy the __dict__ from the wrapped object to the wrapper (or use `__getattr__`). However in our case this is not safe, since properties like `get_concrete_function` on `Function` would need special casing (so the returned callable also entered the module name scope before tracing). After a bunch of discussion (thanks superbobry@ and chr1sj0nes@!) about how we could do this in horrible ways, it seems that the cleanest solution is to allow a Function to be rewrapped (iff it has not been traced). This pushes the decorator down the stack of decorators and allows tf.Module to preserve @tf.function annoated methods as Functions. h/t tensorflow/community#56 PiperOrigin-RevId: 233034181

See tensorflow/tensorflow@5076adf6 for more context.

rfcs/20190117-tf-module.md

Thanks k-w-w@!

MarkDaoust · 2019-02-15T20:43:57Z

rfcs/20190117-tf-module.md

+    Returns:
+      A collection of variables for the current module (sorted by attribute name)
+      followed by variables from all submodules recursively (depth first).
+    """


I'm a bit late to the party, but I don't see this in the discussion anywhere:

Keras layers and models implement a trainable flag as well.
When set to False, none of the variables in the module, or submodules are returned in by trainable_variables.

The current implementation of module does not recognize a module-level trainable flag, it's not clear if that's intentional, or an omission.

Looking at the implementation... it would require a separate predicate for recursion.

This seems like a minor change that could reduce surprises and increase compatibility. Especially with the plan to make layer and model sub-classes of module.

WDYT?

This is intentional. Module uses the immutable "training" bit on tf.Variable to determine which variables are trainable. If you want to implement layer freezing like Keras does for a Module subclass you can do:

class KerasLike(tf.Module): def __init__(self, trainable=True, name=None): super(KerasLike, self).__init__(name=name) self.trainable = trainable @property def trainable_variables(self): if not self.trainable: return () return super(KerasLike, self).trainable_variables

Great. Thanks for the clarifications.

After attempting to integrate `tf.Module` into existing codebases (e.g. `tf.keras`) we've found that the automatic name scoping is too invasive (e.g. changing op and variable names) and it is desirable to disable it ~everywhere. We propose that name scoping for `tf.Module` becomes opt-in: >>> class MyModule(tf.Module): ... ... @tf.Module.with_name_scope ... def auto_name_scope(self, x): ... if not hasattr(self, 'w'): ... self.w = tf.Variable(1., name='w') ... return x * self.w ... ... def manual_name_scope(self, x): ... if not hasattr(self, 'w'): ... with self.name_scope: ... self.w = tf.Variable(1., name='w') ... return x * self.w ... ... def no_name_scope(self, x): ... if not hasattr(self, 'w'): ... self.w = tf.Variable(1., name='w') ... return x * self.w We will move opt-out name scoping into Sonnet: >>> class MyModule(snt.Module): ... ... def auto_name_scope(self, x): ... if not hasattr(self, 'w'): ... self.w = tf.Variable(1., name='w') ... return x * self.w ... ... @snt.no_name_scope ... def no_name_scope(self, x): ... if not hasattr(self, 'w'): ... self.w = tf.Variable(1., name='w') ... return x * self.w In TF2 name scopes are cosmetic and this should be less of a big deal. We might consider encouraging users who want to filter on names to instead use flatten to extract a state dictionary for their objects (c.f. tensorflow/community#56 (comment)). I have moved the automatic name scoping logic (Metaclass etc) and associated tests into Sonnet 2. PiperOrigin-RevId: 235540184

@karllessard

* Adding a doc to deprecate collections * Responding to Karmels comments * Minor fix to VariableTracker sample code * RFC for random numbers in TensorFlow 2.0 * Changes after some feedback * Removed 'global_seed' in the main code and showed the design with 'global_seed' in the Questions section. * Some changes after feedback * A tweak * Change after feedback * A tweak * changes * changes * fix link * new-rfc * changes * Update rfcs/20181225-tf-backend.md Co-Authored-By: alextp <apassos@google.com> * Added some considerations about tf.function * Renamed the internal name "op_generator" to "global_generator" * Changed seed size from 256 to 1024 bits * Initial signpost for community meetings Adding this so there is basic information about how to find the community calendar and get invited to meetings. * Add iCal link too * changes * Initial version of embedding and partitioned variable RFC. * Fix one formatting issue. * Fix another formatting issue. * Use markdown language for the table instead of HTML. * Add tensorflow/io R Package CRAN release instructions (tensorflow#53) * Added Design Review Notes * Make clear distinction between embedding variables and loadbalancing variables. * Added decisions below each question, and "how to use generators with distribution strategies". * Adopted Dong Lin's suggestions * Add a paragraph pointing out the problem with the `partition_strategy` argument. * RFC: Move from tf.contrib to addons (tensorflow#37) * Checkpoint addons RFC for review * Add code review to RFC Add future pull request information to criteria Update modified date added some description RFC Move to addons * Add weight decay optimizers * Remove conv2d_in_plane * Add group_norm * Accept addons RFC * Update alternatives since `DynamicPartition` and `DynamicStitch` do have GPU kernels. * Add a section for saving and restore `PartitionedVariable`. * Mention that variable types can be nested, attention needs to be paid to their saving and restoring mechanism. * Create README.md (tensorflow#57) * Splitted `_state_var` into `_state_var` and `_alg_var` (because of concerns from implementation), and changed status to "Accepted" * Updated timestamp * Moved the auto-selection of algorithm from `create_rng_state` to `Generator.__init__` * Update according to the discussion * Move performance heuristics in Distribution Strategy level. We will not expose knobs for users to control; * Emphasize that embedding support in v2 will all be via `Embedding` layer. Users can use `tf.compat.v1` to handle embedding by themselves; * Mention that default `partition_strategy` in v1 `embedding_lookup` is "mod", which will possibly break users's model when they update to TF 2.0; * We want to prioritize shuffling embedding after 2.0 release; * We have plans to serialize and deserialize `Embedding` layer and Distribution Strategies to allow loading a saved model to a different number of partitions. * Update relese binary build command for sig-io (tensorflow#58) This PR updates relese binary build command for sig-io Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add Bryan to SIG IO release team (tensorflow#59) * Change to accepted * Add link to TensorFlow IO R package * Updated link for the friction log. (tensorflow#64) * Switch DistStrat revised API examples to TensorFlow 2 style. (tensorflow#63) * RFC: Attention for Dense Networks on Keras (tensorflow#54) * Design review for "Attention for Dense Networks" * RFC: Stateful Containers with tf.Module (tensorflow#56) * Create 20190117-tf-module.md * Update 20190117-tf-module.md * Loosen return type for variable properties. * Use Dense consistently. Thanks brilee@ for spotting! * Remove convert_to_tensor from examples. This wasn't ever required and including it might cause confusion. h/t pluskid@ gehring@ and awav@ * Remove owned_* methods. * Document `_flatten` See tensorflow/tensorflow@5076adf6 for more context. * Fix typo in module name. Thanks k-w-w@! * Update 20190117-tf-module.md * RFC: New tf.print (tensorflow#14) * New tf.print proposal * Attempt to fix table of contents * Removed not-working TOC label * Minor updates to the doc. * Update tf.print to be accepted * Added design review notes * Marking doc as accepted * Update cond_v2 design doc (tensorflow#70) * Update to bring in line with implementation * Added the symbol map to the RFC. * Updated testing section of the Community site. * Removed the 100%, formatting tweaks. * Update CHARTER.md * Change contact email address I will leave my current company soon, so update my email. * Create README.md * Logos for SIGs * Update README.md * Update addons owners (tensorflow#85) Add Yan Facai as another project lead. * Created a FAQ for TF 2.0. (tensorflow#78) Adding 2.0 related FAQ to the Testing group. * Request and charter for SIG JVM (tensorflow#86) Chartering docs for SIG JVM * Update CODEOWNERS Add @karllessard, @sjamesr and @tzolov as code owners for sigs/jvm. * Update CODEOWNERS Add missing / * Update CODEOWNERS Add @dynamicwebpaige as owner for sigs/testing/ * Update RFC with current information (tensorflow#89) Make current to SIG Addons * RFC: TF on Demand Project (tensorflow#69) * Adding an RFC for TF on Demand Project. * modified one line in tf-on-demand md file. * Changing RFC status from PROPOSED to ACCEPTED. * RFC: SavedModel Save/Load in 2.x (tensorflow#34) * RFC for SavedModel Save/Load in 2.x * Minor edits and a discussion topic for load() with multiple MetaGraphs * Tweak to the "Imported representations of signatures" section * Update "Importing existing SavedModels" with the .signatures change * Update RFC and add review notes * Status -> accepted * Update CHARTER.md New leads. * Update 20180920-unify-rnn-interface.md (tensorflow#81) Typo fix. * Update yyyymmdd-rfc-template.md Adding "user benefit" section into the RFC template, to encourage articulating the benefit to users in a clear way. * Update while_v2 design doc (tensorflow#71) * Update while_v2 design doc, include link to implementation * Update TF 2.0 FAQ to link to TensorBoard TF 2.0 tutorial (tensorflow#94) * CLN: update sig addons logo png (tensorflow#99) * Add SIG Keras Add a reference link to Keras' governance repository for SIG Keras. * RFC: String Tensor Unification (tensorflow#91) * RFC: String Tensor Unification * Updated rfcs/20190411-string-unification.md Updated TFLite sections to address feedback from @jdduke. Marked as Accepted. * Start RFC for tensor buffers

h/t. tensorflow/community#56 PiperOrigin-RevId: 250436499 Change-Id: I9332d90f6331756f12907566af9a9e23f86aded8

Create 20190117-tf-module.md

3942266

tomhennigan requested review from ewilderj, goldiegadde and martinwicke as code owners January 17, 2019 17:01

ewilderj changed the title ~~RFC tf.Module~~ RFC: Stateful Containers with tf.Module Jan 17, 2019

ewilderj added the RFC: Proposed RFC Design Document label Jan 17, 2019

ewilderj added this to Needs attention in RFC management via automation Jan 17, 2019

ewilderj moved this from Needs attention to Open reviews in RFC management Jan 17, 2019

alextp suggested changes Jan 17, 2019

View reviewed changes

leandro-gracia-gil reviewed Jan 18, 2019

View reviewed changes

Update 20190117-tf-module.md

f459305

superbobry reviewed Jan 18, 2019

View reviewed changes

Loosen return type for variable properties.

976bc37

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 21, 2019

Add performance note for recursive methods.

4bfa235

Addresses comments from tensorflow/community#56. PiperOrigin-RevId: 230225963

lc0 reviewed Jan 30, 2019

View reviewed changes

tomhennigan mentioned this pull request Jan 31, 2019

Will sonnet support tensorfow 2.0? google-deepmind/sonnet#117

Closed

pluskid reviewed Feb 1, 2019

View reviewed changes

awav mentioned this pull request Feb 2, 2019

GPflow 2.0 GPflow/GPflow#871

Merged

brilee reviewed Feb 5, 2019

View reviewed changes

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 5, 2019

Use Sequence instead of Collection to indicate defined ordering.

134f3fb

Addresses feedback on tensorflow/community#56. PiperOrigin-RevId: 232457135

Use Dense consistently.

550c8be

Thanks brilee@ for spotting!

googlebot added the cla: yes label Feb 5, 2019

tomhennigan added 2 commits February 5, 2019 16:07

Remove convert_to_tensor from examples.

f4b1ea1

This wasn't ever required and including it might cause confusion. h/t pluskid@ gehring@ and awav@

Remove owned_* methods.

4a69bfa

gehring reviewed Feb 5, 2019

View reviewed changes

Document _flatten

05efb09

See tensorflow/tensorflow@5076adf6 for more context.

k-w-w reviewed Feb 9, 2019

View reviewed changes

rfcs/20190117-tf-module.md Outdated Show resolved Hide resolved

tomhennigan added 2 commits February 9, 2019 12:44

Fix typo in module name.

acb7977

Thanks k-w-w@!

Update 20190117-tf-module.md

fd73e49

ewilderj added RFC: Accepted RFC Design Document: Accepted by Review and removed RFC: Proposed RFC Design Document labels Feb 15, 2019

ewilderj moved this from Open reviews to Accepted RFCs in RFC management Feb 15, 2019

MarkDaoust reviewed Feb 15, 2019

View reviewed changes

ewilderj approved these changes Feb 21, 2019

View reviewed changes

ewilderj merged commit 2c16981 into tensorflow:master Feb 21, 2019

sonnet-copybara pushed a commit to google-deepmind/sonnet that referenced this pull request May 29, 2019

Add reference to tf.Module RFC.

804f7d4

h/t. tensorflow/community#56 PiperOrigin-RevId: 250436499 Change-Id: I9332d90f6331756f12907566af9a9e23f86aded8

tomhennigan mentioned this pull request Jun 10, 2019

tf.Module doesn't recognise non trainable variables from keras layers [TF 2.0] tensorflow/tensorflow#29329

Closed

gpengzhi mentioned this pull request Nov 23, 2019

Start working on TensorFlow 2.0 compatibility asyml/texar#250

Closed

aisuperdev added a commit to aisuperdev/sonnet that referenced this pull request May 30, 2022

Add reference to tf.Module RFC.

ec29497

h/t. tensorflow/community#56 PiperOrigin-RevId: 250436499 Change-Id: I9332d90f6331756f12907566af9a9e23f86aded8


		`tf.Variable` creation and reuse

		- Modules should be able to create and reuse variables.


		`tf.Module` tracking

		- Modules should track dependencies (e.g. a sequential should provide access

RFC: Stateful Containers with tf.Module #56

RFC: Stateful Containers with tf.Module #56

Conversation

tomhennigan commented Jan 17, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awav commented Jan 17, 2019 • edited

leandro-gracia-gil Jan 18, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leandro-gracia-gil commented Jan 18, 2019

guillaumekln commented Jan 18, 2019

tomhennigan commented Jan 18, 2019

tomhennigan commented Jan 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chr1sj0nes Feb 4, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awav commented Jan 18, 2019 • edited

tomhennigan commented Jan 18, 2019

martinwicke commented Jan 18, 2019 via email

tomhennigan commented Jan 18, 2019

sguada commented Jan 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geyang Sep 16, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gehring Feb 1, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awav Feb 4, 2019 • edited

Choose a reason for hiding this comment

brilee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomhennigan commented Jan 17, 2019 •

edited

awav commented Jan 17, 2019 •

edited

leandro-gracia-gil Jan 18, 2019 •

edited

chr1sj0nes Feb 4, 2019 •

edited

awav commented Jan 18, 2019 •

edited

geyang Sep 16, 2019 •

edited

gehring Feb 1, 2019 •

edited

awav Feb 4, 2019 •

edited