Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Variables in TensorFlow 2.0 #11

Merged
merged 4 commits into from
Sep 19, 2018
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 140 additions & 0 deletions rfcs/20180817-variables-20.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# Variables in TensorFlow 2.0

| Status | Proposed |
:-------------- |:---------------------------------------------------- |
| **Author(s)** | apassos@google.com |
| **Sponsor** | wicke@google.com, joshl@google.com, ashankar@google.com |
| **Updated** | 2018-08-17 |


## Objective

The API for TensorFlow variables has many drawbacks: impossible-to-reason-about semantics, reliance on global scopes, and reliance on global collections. As the TensorFlow API moves to become more pythonic and object oriented, with the Keras layers and models and the object-based serialization, we no longer have a need for much of this global infrastructure around variables.


## Main changes

The API for Variables will then change in the following ways for TF 2.0:



* tf.Variable will become an abstract base class with a well-defined interface and a scoped factory to construct instances
* users will be able to implement their own variable-like objects by subclassing tf.Variable and adding a scoped factory function to use those variables
* variable_scope and get_variable will be removed
* the tf 1.0 version of variable_scope and get_variable will be left in tf.compat.v1
* to control variable naming users can use tf.name_scope + tf.Variable
* whether a variable is shared across sessions / processes will be controlled by a constructor argument to tf.Variable; no other type of scope reuse will be done in the framework
* scoped partitioning will be implemented as a factory function at first
* libraries and users are encouraged to reuse variables by reusing their objects, like Keras layers do
* custom_getters will have the following API: [variable_creator_scope](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/variable_scope.py#L2395)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So variable_scope will be replaced by name_scope, right? Here url of variable_create_scope is linked to a blank line, could you give more details about the function (say, some examples)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the link. The documentation has examples of how it's used.

* the default implementation of the tf.Variable interface will be ResourceVariable
* RefVariable will be kept in tf.compat.v1 and will be the default implementation for tf.compat.v1.Variable
* tf.compat.v1.Variable will have a use_resource argument to control whether a resource variable or a ref variable will be created
* symbols like tf.assign* will be removed in favor of methods in tf.Variable
Copy link

@ageron ageron Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make item assignment possible:

>>> import tensorflow as tf
>>> a = tf.Variable([1, 2, 3])
>>> a[1] = 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Variable' object does not support item assignment
  • It's already possible via other methods (tf.scatter*) but really cumbersome.
  • Since it's already possible, I assume it can't be that hard to implement (but maybe I'm missing something)
  • It would make teaching TensorFlow easier ("it's just like NumPy")
  • It's one of those little things that makes some people prefer PyTorch: they can say "PyTorch is just like NumPy", but it's harder to say this about TensorFlow when something as fundamental to NumPy is missing.
  • I have run into real-life use cases where I really needed it (porting a library from NumPy to TensorFlow to make it run on a GPU).
    Please, pretty please with sugar on top? ;-)
    Edit: Alex pointed out that it will be possible in TF 1.11 with a[1].assign(5).

* in tf.compat.v1 these symbols will be marked as deprecated and will call the corresponding methods in the Variable object instead


## Detailed changes


### tf.Variable class

The tf.Variable class will be an abstract base class which defines a tf.Variable interface. Initially this interface will have enough abstract methods such that the user-visible API of tf.Variable does not change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially this interface will have enough abstract methods such that the user-visible API of tf.Variable does not change.

I'm not sure this makes sense: did it mean to read "enough concrete methods"? Adding many abstract methods doesn't change the user-visible tf.Variable API (for those using the existing/TensorFlow 1.x tf.Variable API)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has already been implemented. If you look at tf.Variable now it's a class with no implementations of methods, and most concrete instances are instances of subclasses (RefVariable for the old ones and ResourceVariable for the new ones).


There will be two main implementations of this interface: RefVariable, with the legacy ref edges, available only in tf.compat.v1, and ResourceVariable, which is the default for the v2 API. PartitionedVariable, MirroredVariable, _UnreadVariable, CastVariable, etc, are other implementations which are part of the core library. None of these implementations will be publicly visible, only tf.Variable will be.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: please escape _ in _UnreadVariable, markdown thinks you are trying to put stuff in italics (I think _UnreadVariable works).


Constructing variables is done by calling tf.Variable(*args, **kwargs). Under the hood this will call a hierarchy of scoped constructor functions, similar to what is now done in variable_scope.variable. Each such constructor function can do some combination of:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Could you explain why we choose tf.Variable(*args, **kwargs), rather than tf.get_variable, to construct variables?

    The tf.Variable class will be an abstract base class which defines a tf.Variable interface.

    If tf.Variable will be an abstract base class, how to call tf.Variable(*args, **kwargs)?

  2. Could you explain what is scoped constructor functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. tf.get_variable was created to handle silent sharing of variables in the graph. This behavior is being removed.

  2. See the link I updated about variable_creator_scope

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Will it be possible to recover tf.Variable objects only from a graph or graph_def, just like it's now possible to do with tf.Variable.from_proto? We work a lot with managing models restored purely from graph def files, without necessarily having all the code that produced the original graph. The ability to restore basic TF objects such as tf.Variables directly from graph def data only is a must for us.

  2. How is the above affected by tf.Variable types written by users?

  3. Will it be possible to explicitly recreate or recover tf.Variable objects from other non-python-object pieces of data like in some way?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. tf.get_variable was created to handle silent sharing of variables in the graph. This behavior is being removed.
  2. See the link I updated about variable_creator_scope

a related question: instead of tf.Variable why not calling factory function directly since it is supposed to call a factory function.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alextp.
Could you please show an example about how to create a PartitionedVariable via API tf.Variable(*args, **kwargs) ? My question is whether user should pass an indicator to show what kinds of concrete Variable to create ? Does it mean the parameters *args and **kwargs are exposed to users without any limit?

Copy link

@sjain-stanford sjain-stanford Nov 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Will it be possible to recover tf.Variable objects only from a graph or graph_def, just like it's now possible to do with tf.Variable.from_proto? We work a lot with managing models restored purely from graph def files, without necessarily having all the code that produced the original graph. The ability to restore basic TF objects such as tf.Variables directly from graph def data only is a must for us.
  2. How is the above affected by tf.Variable types written by users?
  3. Will it be possible to explicitly recreate or recover tf.Variable objects from other non-python-object pieces of data like in some way?

+1 on this. It is crucial for us to restore them from serialized graphdefs. Currently we use a RestoredVariable class inheriting from tf.Variable, but RefVariable changes in TF1.11 are breaking this inheritance. See issues #23591, #22648.




* calling a base constructor to actually create a variable
* returning preexisting variables
* changing some arguments to the base constructor, and maybe calling it multiple times

This is implemented by having a custom metaclass for tf.Variable which, when asked to construct a tf.Variable directly will call the factory functions, but when asked to construct subclasses of tf.Variable will do nothing and construct the child class.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to include a justification for why the client API should be calling the constructor to an abstract base class instead of having users explicitly call the type of variable they want. This document just says "it will do this complicated thing" without saying what the rationale is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is that the user should not have to know what type they want. For example, code called under distribution strategies might create MirroredVariables when the user calls tf.Variable. Think of tf.Variable as a factory function for which isinstance also works.


The tf.Variable interface will make no reference to graph collections, and tf.Variable will not add the Variable to any collections by default. tf.compat.v1.Variable, on the other hand, will have the collections argument and respect the existing semantics for it. Things which currently rely on collections (saving / loading, Optimizer.minimize, etc) will instead be expected to be passed either a list of variables or a CheckpointableBase-inheriting object.
Copy link
Member

@facaiy facaiy Aug 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So tf.global_variables_initializer will be deprecated as well, right?

Can we let variable take care of the initialization by itself? I find that it's awkward to force user to call sess.run(tf.global_variables_initializer) before training. When a variable is read, it knows whether its status is initialized or not in fact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global_variables_initializer will be deprecated, yes. I agree there could be a better solution to initialization but it's not in scope for this change.

Note that if eager is turned on by default and variables are created from eager then they're already automatically initialized even if most code runs inside graph functions, so most people in tf 2 will hopefully not be affected by this.



### Variable sharing

Sharing within a model will not be a part of the public API for tf.Variable. Users are strongly encouraged to share variables by sharing a reference to their objects.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add an example of what that canonical approach for sharing variables will be? There are a large number of models that relied on tf.get_variable() (as it was pushed to be the standard way to create/access variables), so demonstrating what the new uses would look like would be beneficial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The canonical approach to sharing variables is by sharing their objects, as in Keras layers and Keras models, tf.make_template, and other ways of doing that.

Copy link

@samjabrahams samjabrahams Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm glad you brought up tf.make_template, as it's my preferred way of sharing weights between training/eval/inference in a single graph, but I'm wondering what the plan is to support tf.make_template given that it heavily relies on the existing variable_scope and naming semantics in order to work. There's a comment at the bottom which mentions it potentially being in scope, but I wonder what the mechanisms would like like without collections or special naming semantics.


That said, the tf.compat.v1.variable_scope library can be made self-contained if we replace the per-graph variable scope stack with a module-global weak key dictionary from graphs to scope objects, and we call the protected methods to access graph collections. This will remain available for users who are not willing to port their libraries to have object-based sharing, as the support burden of maintaining that file in tf.compat.v1 is negligible and the volume of code written to use it is broad.


### Checkpointing

Checkpointing will be done in tf 2.0 via the object-oriented checkpointing API.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to the API for reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



### Optimizers

The Optimizer.minimize method will no longer work if it's passed a Tensor and no list of variables. Users are expected to pass the list of variables to minimize wrt or pass an object which implements the CheckpointableBase interface to let the optimizer find the variables. The behavior of tf.compat.v1.Optimizer will not change.


### Assignment operations

Instead of having free functions which access internal state of variables, reading from and writing to variables will be done via methods. Current tf.assign*(variable, ...) will become variable.assign*(...). tf.compat.v1 will keep the old aliases, but they will call the new methods instead.

This is an easy LSC to make (once the current operations are modified to return a RefVariable object instead of a Ref tensor) and will make the code more homogeneous and pythonic.


### Ref edges versus resources

TensorFlow graphs need to represent state (information which survives calls to session.run, or generally information produced by an op which depends on something other than the content of its input tensors) so most nontrivial programs can be useful. Examples of state are input pipelines, model parameters, queues, mutexes, and random number generators.

There are a number of ways of representing state in TensorFlow directly in the graph, but the most robust and flexible is using resource handles. A **resource handle** is a regular immutable Tensor which represents a name to a shared out-of-graph resource (any C++ class inheriting from ResourceBase can be used as a resource). The resource handle itself doesn't change during the program execution. The resource pointed to by a handle lives on a specific device (so while it's possible to serialize resource handle tensors it's usually not a good idea), and can be accessed by any op which runs on that device and has access to the resource handle tensor. These ops can do things such as reading from the resource, modifying the resource, initializing the resource, and deleting it.

A resource handle is a scalar tensor of dtype DT_RESOURCE (or dtypes.resource in Python), and can be manipulated as any other Tensor: you can concatenate resources, they can go through conditionals, you can slice into them, etc. This means that while it's often possible to determine statically whether two operations can access the same resource some graphs might be structured in ways which make this difficult.

When you can determine statically that two ops touch the same resource you can make inferences about the state of the resource when one op is executing solely by looking at the graph. For example, if there is a path formed of control or data edges connecting a resource-using op O to a resource-using op O', you know that O' is guaranteed to see the effects of O on the resource and, conversely, that O is guaranteed to not see the effects of O' on the resource. If, on the other hand, there is no path in the graph connecting ops O and O' which use the same resource then whether one sees the effects of the other is undefined, and might vary from one execution to another.

Resource variables were the motivating case for introducing the explicit notion of resources to TensorFlow graphs. This was done to avoid complicated issues related to the lack of a memory model for the deprecated ref-edge-based variables and allow compilation of TensorFlow graphs containing mutable state.

A resource-based variable is the simplest type of resource. What's stored in the device's resource manager is a pair of a Tensor and a mutex. The main operation to read the value of a variable is read_variable_op, and it simply outputs a Tensor which has the same value as the Tensor in the resource handle state. There are many ops which write to the resource (assign_variable_op, assign_add_variable_op, resource_apply_gradient_descent, etc), and the basic properties of the resource edges ensure that it's possible to order reading and writing ops to avoid undefined behavior.

These ops are currently implemented using copy-on-write, but they could also be implemented using copy-on-read or other, more complex, mechanisms, as long as the semantics of the read-before-writes and write-before-read are respected and as long as no mutation is done to the Tensor returned by a read_variable_op after it's been read. Here are two examples of why mutating a Tensor returned by a read_variable_op might be dangerous:
Copy link
Member

@facaiy facaiy Aug 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have trouble understanding the sentence, do you mean that:

v = tf.Variable(xxxxx)
v_read = v.read_variable_op()

v is mutable, while v_read not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly




* tf.cond predicates: a tf.cond takes a boolean tensor as a predicate and conditionally executes ops in the true or false branch of the conditional based on the value of the predicate. The way this is implemented in TensorFlow, to allow for graph pruning and non-strict execution is that there are many "switch" ops in the graph, each of which looks at the value of the predicate and decides which operations downstream from it can execute. If the predicate is a variable and one branch modifies the value of this variable, we would like to ensure that, because the "read" operation happened before the switch ops, only one branch of the conditional will execute. If, instead, writing to a variable could mutate the value of the tensor returned by "read", then a subset of both branches could execute, leading to hard-to-debug errors.
* gating gradients: when computing the backward pass and training a deep neural network there is by default no in-graph order between the operation to update the parameters of a layer based on its gradients and to use the value of the parameters of the layer to compute the gradient with respect to the previous layer. If the value of a variable was allowed to change after it was read, it would be possible for the value after the update to be used in the backward pass, leading to incorrect gradients for the layers closer to the input of the network.

These are just two examples of how it's much harder to reason about TensorFlow programs when the value of a variable can change after it was read.

Before resource handles TensorFlow variables were represented using a "ref" edge. A ref edge is a pair of pointers, one to a Tensor and one to a mutex, owned by something other than the tf runtime. When an op takes a ref tensor its input has to be a ref tensor, but when an op takes a non-ref tensor but its input is a ref tensor the pointer is silently dereferenced. This means that normal tensor objects in the graph can silently alias a mutable tensor, and hence two ops with the same input can see it having different values. Which value will be seen can depend on execution-specific details such as whether the variables are on a local or remote device, and in general it's not easy to ensure that a read happens before or after a specific write.


### Internal resource variable ops

We will expose the internal ops used to implement ResourceVariable as tf.experimental.variable_operations (name TBD). This way users and libraries can, if they need to, modify the behavior of variables at will.
Copy link
Member

@facaiy facaiy Aug 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you'll allow a slight digression, what is the role of tf.experimental module? Would it become a next tf.contrib? cc @martinwicke



## Migration plan

The migration plan is roughly as follows. TODO(apassos): flesh out this section with cost estimates.



1. Implement the abstract base class and factory function scope under the hood
1. Expose the factory function scope as tf.variable_creator_scope
1. LSC to change tf.variable_scope / tf.get_variable to tf.compat.v1.*
1. Removal of tf.variable_scope and tf.get_variable from the tf 2 namespace
1. Implement the subclass to be returned from tf.assign*
1. LSC to change tf.assign*(v, …) to v.assign*(...)
1. Change the implementation of tf.compat.v1.variable_scope to not rely on a per-graph variable scope stack
1. Remove the get_variable_scope and related public methods from tf.Graph (leaving them on tf.compat.v1.Graph)
1. Implement PartitionedVariable as a subclass of the tf.Variable interface
1. Add a partitioner scope to the tf 2.0 API
1. Add a deprecation warning to the tf.compat.v1 partitioned variable scope with a migration warning
1. [questionable] Implement a variable creator factory function which calls get_variable under the hood
1. Make this function active in all tf.compat.v1 endpoints which currently call get_variable (with a decorator, probably)
1. Change the behavior in tf2 to call tf.Variable (which will redirect to tf.get_variable in tf.compat.v1, keeping the existing behavior but cleaning the codebase)
1. [WARNING: checkpoint-breaking change] drop calls to variable_scope in parts of our API which use it. Right now they are: feature_column, rnn, canned estimators, optimizer slots, TPU estimator. Most can be replaced with judicious use of name= arguments
1. [optional] Implement tf v2 make_template which does not rely on variable_scope internally and uses a factory creator function to track and reuse variables

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to request that this is a requirement instead of an optional.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, expand layer-based APIs to make it easier to reuse existing variables imperatively.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also want to request make_template in v2.
One question: right now in make_template, one can use get_variable to create reused variable and tf.Variable(trainable=False) to create local (unshared) variables. After get_variable is deprecated I wonder what should be the alternative.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alextp could you comment on how make_template will be supported and how a user should create shared and unshared variables inside make_template ?



## Questions and Discussion Topics

1. How should we deal with the deprecation of model building APIs?