Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Standardizing Composite Operations In TensorFlow #113

Merged
merged 9 commits into from Aug 20, 2019

Conversation

marksandler2
Copy link
Contributor

@marksandler2 marksandler2 commented Jun 11, 2019

Review period will close 2019-06-25

Standardizing composite ops in tensorflow to support efficient inference.

Status Accepted
Author(s) Mark Sandler (sandler@google.com)
Sponsor Alexandre Passos (apassos@google.com)
Updated 2019-08-15

Objective

The objective is to create a simple API that enables adding new composite ops in
a way that they can be automatically and robustly processed by downstream
inference tooling. The developers (of the composite op) should be able to
iterate on their implementation or even replace them with standard tensorflow
op, without breaking any existing tools.

Why? Composite ops often provide building blocks for building complex models.
However, and this is especially true for embedded and specialized hardware,
these ops when implemented naively, become unusable due to architectural
(hardware) details. It is thus preferable for the downstream tools to be able to
extract such composite ops and for instance provide specialized implementation.

The current proposal concentrates on supporting inference-only transformations
and optimizations. However we leave the door open for follow-up gradient
optimization support. See appendix for a few possible ways forward gradients.

@ewilderj ewilderj changed the title RFC on standartizing composite operations in tensorflow. RFC: Standardizing Composite Operations In TensorFlow Jun 11, 2019
@ewilderj ewilderj added this to Needs attention in RFC management via automation Jun 11, 2019
@ewilderj ewilderj added the RFC: Proposed RFC Design Document label Jun 11, 2019
@ewilderj ewilderj moved this from Needs attention to Open reviews in RFC management Jun 11, 2019

2. If we go org/name, do we need centralized registry of accepted "org" names?

3. Do we need `reference` field? The isdea is that points to the source of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of requiring a reference. However, I feel that a more natural way to document an op is by just writing a Python docstring which could contain one or more references if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, the idea was that automatic tools to rely on it as human-readable hash (or use non-tensorflow implementation s s a source of groundtruth), but perhaps this would be too confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the idea, but I'm also in favor of just keeping references in doc string. If a downstream developer is re-implementing a composite op I think they'll be thoroughly reading the doc.


* Tooling is not forward compatible.

## Questions and Discussion Topics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to cover specialization in this RFC? I can imagine a situation where an efficient implementation is device-specific e.g. TPU vs GPU. A recent use-case we had in Sonnet is LSTM which has a CuDNN-RNN implementation, a "block" implementation in contrib/rnn and a fallback via tf.nn.dynamic_rnn(...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting question. My intent was to leave it up to tooling to decide what they want to do. In this sense the field is leveled for both core tensorflow and downstream tools (such as tf-lite). if tensorflow has an efficient implementation it can simply use it, or it can rely on function implementation. Or it can try both and decide what is better at runtime. Basically it brings what tensorflow already does with conv2d to the api level. Or perhaps i misread this comment. Do you have some specific suggestion on specialization in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if my initial comment has been unclear. Specialization itself will have to be implemented in downstream tools. I was actually thinking about implementation selector in Grappler while writing that. However, to allow that, an op should somehow indicate that it's targeting a specific device, or is device agnostic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, if i understand it right: basically the idea is that different block implementations could be more efficient on different target devices and at grappler (or downstream tooling for that matter) should be able to put the correct implementation in based on the device it is targetting. I certainly can see a utility for doing that though i think will need to think how to avoid it being very messy. Will add this to discussion list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qlzh727 I think your specialization work should commute with this, can you comment?


Arguments:

* `implements`: we propose using namespaces with the top-level namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there already is a function with a given name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. We can follow python semantic of silently overriding such functions, or we can log(error) message. I think crashing would probably be undesirable.. Namespaces should help us avoid most name conflicts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mildly in favor of crashing. Reasoning about failures in the presence of silent overrides is tricky. Note also that Python overriding is lexical (unless you monkey-patch which is usually frowned upon), whereas here you could accidentally override an op by importing a third-party module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the "implements" is an indication for downstream tools to replace the current subgraph with a more optimized implementation, conflicts should not be a problem and should not involve any override. Having multiple function providing a different implementation of the same algorithm seems fine to me.

The problem is more if multiple downstream tools are looking for the same string to replace with a different implementation, but I'm not sure how you could catch it here.

rfcs/20190610-standartizing-composite_ops.md Outdated Show resolved Hide resolved

```
@function.library_function(
implements=“google.matmul_low_rank_matrix”,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For any composite ops moved from contrib would you track down the original developers or will everything in core fall under google namespace?


2. If we go org/name, do we need centralized registry of accepted "org" names?

3. Do we need `reference` field? The isdea is that points to the source of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the idea, but I'm also in favor of just keeping references in doc string. If a downstream developer is re-implementing a composite op I think they'll be thoroughly reading the doc.

If so, we might also consider having another attribute 'aliases' to allow
history of names for backward compatibility.

2. If we go org/name, do we need centralized registry of accepted "org" names?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote no. IMO this adds bureaucracy that can probably be handled with an error that is raised when over-writing an existing namespace.function

future collisions with function names. One option is to adopt java-style
"org.opname", basically using the developing organization as disambiguating
factor. Another alternative is to use semantic namespaces e.g. `linalg`. Yet
third, is to use org.semantic.opname.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.semantic.opname for future proofing with extra disambiguation would be my vote.

6) *Simpler automatic conversion* As tensorflow moves forward it is conceivable
that its computation description language (today: graph def, tomorrow:MLIR)
will be fully be separated from the underlying tensorflow kernel implementations.
This kind of standard attributes allow for more automatic conversions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an observation - Overall, the approach makes sense to me and is definitely a progress from where we are today. Relying on user supplied names is kind of brittle (as against a strongly typed or well defined schema) but at the same time is also very flexible.

Copy link
Member

@tomhennigan tomhennigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Would like to make sure the design review covers name collisions and the possibility of aligning definitions with MLIR.

rfcs/20190610-standartizing-composite_ops.md Outdated Show resolved Hide resolved
rfcs/20190610-standartizing-composite_ops.md Outdated Show resolved Hide resolved
rfcs/20190610-standartizing-composite_ops.md Outdated Show resolved Hide resolved
For example:

```
@function.library_function(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this symbol in the public API? Is @tf.library_function a new symbol or are you proposing adding reference and implements to @tf.function ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the thought was differentiate user-functions (something that is very unlikely to be of interest downstream) from library-functions, however, it might not be needed and could be differentiated by the attributes. .

rfcs/20190610-standartizing-composite_ops.md Outdated Show resolved Hide resolved
rfcs/20190610-standartizing-composite_ops.md Outdated Show resolved Hide resolved

Arguments:

* `implements`: we propose using namespaces with the top-level namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the "implements" is an indication for downstream tools to replace the current subgraph with a more optimized implementation, conflicts should not be a problem and should not involve any override. Having multiple function providing a different implementation of the same algorithm seems fine to me.

The problem is more if multiple downstream tools are looking for the same string to replace with a different implementation, but I'm not sure how you could catch it here.

* `reference`: The goal for this attribute is to provide a stable reference for the function
implementation details. This could be a reference to non-tensorflow library (e.g. numpy.linalg.special_matmul)
or arxiv reference. The downstream tooling can use this value as an opaque cache to verify that
it implements the right flavor of the function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The impact of this field on the downstream tool is blurry to me. The "implement" should be enough to replace a subgraph and this field seems redundant to me right now.
(I can see it being useful for documentation purpose though)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this paragraph is obsolete, the field is dropped right?)

rfcs/20190610-standartizing-composite_ops.md Outdated Show resolved Hide resolved
rfcs/20190610-standartizing-composite_ops.md Outdated Show resolved Hide resolved
rfcs/20190610-standartizing-composite_ops.md Outdated Show resolved Hide resolved
rfcs/20190610-standartizing-composite_ops.md Outdated Show resolved Hide resolved
def f(..):
pass
[0] https://github.com/tensorflow/mlir/blob/master/g3doc/OpDefinitions.md#operation-definition
[1] https://github.com/tensorflow/mlir/blob/master/g3doc/OpDefinitions.md#operation-name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the "name" could match the MLIR operation name used by the downstream tool: so "google.mm_low_rank_matrix" should be enough. Since there should be a registration in MLIR for the operation itself, you shouldn't replicate here the summary and description though?

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

ℹ️ Googlers: Go here for more info.

1 similar comment
@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: no and removed cla: yes labels Jul 3, 2019
@joker-eph
Copy link
Contributor

Talking to @alextp last week, I was wondering about the interaction with gradients here: the annotated function would be mutated and the downstream tool would likely not get what it expect when looking at the function.

@marksandler2
Copy link
Contributor Author

Talking to @alextp last week, I was wondering about the interaction with gradients here: the annotated function would be mutated and the downstream tool would likely not get what it expect when looking at the function.

@joker-eph
The idea is that that the downstream has two options: first replace tensorflow implementation with its own, or just blindly follow the existing tensorflow implementation. In either case the assumption si that the function itself should never change (just the implementation). Or are you referring to the fact that gradient of a function won't be annotated as a function (unless provided with a custom gradient?) This indeed might be undesirable. I think the solution would be to ensure tensorflow annotates gradients of functions as functions as well...

@joker-eph
Copy link
Contributor

Talking to @alextp last week, I was wondering about the interaction with gradients here: the annotated function would be mutated and the downstream tool would likely not get what it expect when looking at the function.

@joker-eph
The idea is that that the downstream has two options: first replace tensorflow implementation with its own, or just blindly follow the existing tensorflow implementation. In either case the assumption si that the function itself should never change (just the implementation). Or are you referring to the fact that gradient of a function won't be annotated as a function (unless provided with a custom gradient?) This indeed might be undesirable. I think the solution would be to ensure tensorflow annotates gradients of functions as functions as well...

I think taking the gradients of a function requires to change the function to escape temporary values during the forward pass and pass them to the gradient function.
In some way, taking gradients of a function is very specific to a particular implementation of a function since another implementation could lead to a different gradient function.

It seems to me that a downstream transformation that would want to use a more efficient implementation for such function would need to match and replace the pair of the original function and the gradient one and replace both of them together to be correct.

@marksandler2
Copy link
Contributor Author

marksandler2 commented Jul 9, 2019

I think taking the gradients of a function requires to change the function to escape temporary values during the forward pass and pass them to the gradient function.
In some way, taking gradients of a function is very specific to a particular implementation of a function since another implementation could lead to a different gradient function.

It seems to me that a downstream transformation that would want to use a more efficient implementation for such function would need to match and replace the pair of the original function and the gradient one and replace both of them together to be correct.

Let me see if I understood. The actual gradient dL/dx, of y=F(x) when viewed as dL/dx = G(x, y, dL/dy) needs to stay the same no matter the implementation.

However, when implemented as symbolic gradient this function actually has dependencies on all intermediate tensors inside F of the composite ops. There isn't much we can do, and i don't think we need to expose this. If downstream tooling wants to provide gradient, they would either need to implement their own with the exposed arguments "x, y, dL/dy" (note they can pull whatever intermediate state they want from their own representation), or recompute symbolically (and hence effectively not being able to use the efficient implementation on when training), or alternatively they can mark-up all intermediate tensors as recomputable and compute them using x.

To summarize: i think if downstream provides their own implementation of forward pass, it won't be able to use it for training automatically, because gradients require composite implementation. That's ok. We might want to support optional gradient functions, (i think tf.function already supports it), that downstream tooling can implement if it so chooses. I'll add this to the discussion.

@joker-eph
Copy link
Contributor

To summarize: i think if downstream provides their own implementation of forward pass, it won't be able to use it for training automatically, because gradients require composite implementation. That's ok. We might want to support optional gradient functions, (i think tf.function already supports it), that downstream tooling can implement if it so chooses. I'll add this to the discussion.

Right, replacing a function with the gradient computation seems to me like a key use-case that we will want to support. Not having a solution for this makes this proposal much less attractive.


```
@function.library_function(
implements=“google.matmul_low_rank_matrix”,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to use composite ops for more complex structures (e.g. a Keras Layer)? It's helpful for downstream tools to identify complex structures without pattern matching.

In this case, using a single implements name might be insufficient. For example, a Keras RNN layer has many parameters (e.g. return_sequence, go_backwards) that can change the implementation. IMHO there are a few potential solutions:

  • Encode the parameters in the implementation name. E.g. implements="keras.layers.RNN,go_backwards=False,return_sequence=True"
  • Store parameters in separate attributes

Any thoughts around this?

Copy link
Contributor

@miaout17 miaout17 Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A even simpler & very real example is Conv2D layer. A Keras Conv2D or TFLite Conv2D layer is composing TensorFlow Conv2D -> Add -> Activation (e.g. Relu, Relu6...etc).

Ideally we need a way to know what activation function to use without looking into the function body -- Either make the implementation name like "Conv2D,activation=Relu6", or design something smarter.

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot removed the cla: no label Aug 7, 2019
@marksandler2
Copy link
Contributor Author

Notes from design Meeting: Standardizing Composite Ops

https://docs.google.com/document/d/177_TwgR_fPTP-yrlukGCmoZfGpQJjtxxlPrKkqKuPk8/edit#

@ewilderj
Copy link
Contributor

@marksandler2 can you copy the notes from the design meeting directly into the comments here? That doc is not publicly viewable. Alternatively grant me (ewj@google) to the doc and I can do it. LMK. Thanks!

@ewilderj
Copy link
Contributor

ewilderj commented Aug 14, 2019

Notes from review meeting

This list reflects questions raised and the consensus discussed during Tensorflow Review process.

  1. We should provide namespacing for the operation names early on to avoid future collisions with
    function names. One option is to adopt java-style "org.opname", basically using the developing
    organization as disambiguating factor. Another alternative is to use semantic namespaces e.g. linalg.
    Yet third, is to use org.semantic.opname.

    Note: since many of these functions will live in user codebase we obviously can't and shouldn't
    police them, however having a recommended style guide will simplify things later on.

    Should there be a graduation process? Where ops eventually move to a higher tier. E.g. dedicated
    top-tier op like tensorflow.opname? If so, we might also consider having another attribute 'aliases' to
    allow history of names for backward compatibility.

Consensus was that this decision will be deferred to a later date: "we can cross that bridge when we get there".

  1. If we go org/name, do we need centralized registry of accepted "org" names?

Q: What will prevent users from placing files in old locations?
A: Not TensorFlow engineering team's responsibility.

Consensus: there is no incentive to do this intentionally.

  1. Do we need reference field? The idea is that points to the source of authoritative ground truth
    implementation (likely not tensorflow based), which this op is faithfully trying to reproduce? Should
    this reference be structured? (For example it could point to existing scipy/numpy/pillow library, or it
    could be pointing to a paper?). Should we make this optional or get rid of it altogeher and delegate
    this to documentation?

Pros: this can be machine-readable as a hash, and is better than having nothing.

Cons: cannot guarantee.

Q: What does the reference add over the canonical org name?
A: Org takes care of this, from one standpoint - LSTM is a good example (edge cases). Would prefer not to have a load-bearing reference field: should just have name and documentation. The canonical implementation is the implementation in TensorFlow, in terms of ops.

  1. Name collisions - crash, no crash, etc

The consensus appears to be no-crash is the most natural outcome despite initial misgivings. The argument that tilted in this direction is that the definition that will be invoked is actually well defined by python code semantic, (e.g. user code would call say python_lib.something.my_conv_op, which declares (myai.my_conv_op) and user intent is clear. What the downstream tooling will do is going to be up-to downstream tooling as long as it follows the contract. If there are two implementations available and different parts of the code call different > ones, we might end up with two definitions in the same function, but every invocation is still well defined in the graph itself, and thus preferrable.

Q: Should this be Python semantic, or Java semantic? Should we crash, or should we allow users to provide a different implementation?
A: If we have a crash, then we can always opt for a no-crash later. This is the safer approach.
Q: What if I'm using a library that internally creates a function to implement the same op? Either there is no downstream tool that understands the function, or there are slightly different versions of the same function. Would this lead to subtle bugs?
Q: Weak vs. strong linkage?

A: @alextp : This sounds like a poor choice to me, because it seems like the whole idea is to provide a single source of truth without hardcoding it into TensorFlow. If that's what we're doing, then we shouldn't support overriding.

Response: We don't see this proposal as being a single source of truth: it should be more like hints. If you try to impose a single source of truth, you need a centralized registry and we don't have one of those right now. It's safe to replace the body of this function with a new implementation, but there are no guarantees. The most natural implementation of this does not crash. If you crash, you're adding complexity that would not be there otherwise. This is a step up in the sense that we define the interface for the operations.

For now there is no way to build downstream tooling, external to Google. Override would allow you to do it without tooling. We should have some standardized way of doing graph rewrites within TensorFlow. TensorFlow has a mechanism for registering operations and kernels, and it would be great to transfer this ownership to the user — would like an additional mechanism for registering kernels.

@alextp : This can come later. Function could either bind to function or kernel (lightweight). We should make kernels and user-defined functions equal citizens = and that means you need a priority. It becomes target specific once you pick a deployment strategy. By forcing this, you lose the debuggability in eager.

Q: When should we make the selection? —> Defer this decision to another proposal. **We agree on no-crash and the annotation should not be target-specific. **

  1. Aligning the definitions with MLIR

I wonder if we should consider more metadata here. For example within a dialect MLIR op definitions
[0] include a name [1], summary, description. Maybe we can align with them? Concretely I suggest
considering something like:
@tf.function( op_def=tf.OpDef( dialect="google", name="mm_low_rank_matrix", summary="..", description="..", )) def f(..): pass
[0] https://github.com/tensorflow/mlir/blob/master/g3doc/OpDefinitions.md#operation-definition [1]
https://github.com/tensorflow/mlir/blob/master/g3doc/OpDefinitions.md#operation-name

No, because MLIR dialects are not well aligned with semantic dialects that we are considering. E.g. MLIR dialects are following framework dialect (e.g. TPU, or tensorflow, etc...), instead of "this is a group of related ops".
Maybe instead of doing org.op we should do dialect.op? Instead of “org”, we move to change the term to “dialect”. dialect.function_name = would verify that it exists in MLIR vs. whether it’s a custom situation. This would say that no one could put something into TesnroFlow dialect because it has a predefined schema. If you know the dialect, you validate it - and if it doesn’t exist, then you ignore it. Dialect is like a C++ namespace class. You can say your dialect is open (use ops without registering). If you use an existing dialect, then you follow the existing preferences.

Consensus: Maybe “dialect” is not a great match, because it describes a tool rather than a framework. We’re not going to do alignment, because it’s not safe.

  1. Should this belong to existing tf.function or a new alias is preferrable e.g. tf.library_function

Agnostic. Use this as an argument to tf.function.

  1. Should there be a specialization mechanism, where different block implementations that are more efficient on different target hardware. (Both for tensorflow, and downstream tooling).

Yes, eventually, but seem not critical to get it from get-go.

There should be a specialization mechanism, as the user might not be aware that the function is being called. We need a way for vendors to be able to specify a custom kernel. Current implementation allows downstream to sub with custom version. Can’t sub composite op implementation.

  1. Should function_def have dedicated fields for describing these, or just using attrs<string, AttrValue> attrs is a good option.

Consensus: named attributes (attrs).

  1. what about backward pass? If downstream tooling is looking to support backprop efficiently, there will need to be more hooks to implement the gradient functions. In particular, the backprop pass won't be able to use efficient forward iplementation, because backprop requires internal tensors (unless there is a matching efficient gradient implementation).

From @joker-eph : "Right, replacing a function with the gradient computation seems to me like a key use-case that we will want to support. Not having a solution for this makes this proposal much less attractive."

I think we run out of time on this one, but the proposal is that this is probalby will be left unspecified in this iteration. The potential path forward is to automatically wrap gradient into its own function that downstream tooling can identify and replace with its own implementation. If the tooling needs to support both forward and backward path, it seems that to benefit from this the tooling would need to provide both implementations (and do internal caching) or simply rely on default implementation.

Recommendation: @alextp to walk through an example using softmax cross entropy with logits. What would the graph look like with fused implementations? Make sure that diffusing works in a way that doesn’t break it. Looks for a function with 1 output vs. 2. Naive implementation: have a hook, but that would mean that you wouldn’t have a link into the Tensor. Would have to reimplement softmax, which would be a performance bottleneck. We need to address this — simplest training example.

  1. @tomhennigan: Performance implication of wrapping too much stuff into tf.function: One thing to note is that there is an overhead to calling a @tf.function from eager mode (currently ~100us) and I suspect we will want to be careful about adding it around lots of TensorFlow's public API without careful consideration (e.g. if 100us of overhead would dominate the runtime of the composite op).

The consensus is that we will eventually make tf.function fast enough that it won't be an issue, and given that this is likely to have very gradual roll out we will have time to adapt.

@ewilderj ewilderj self-assigned this Aug 15, 2019
@ewilderj
Copy link
Contributor

Hi @marksandler2 a couple of nits before I can merge this:

  • there's a typo in the file name "standartizing" vs "standardizing".
  • please change the Status in the header from Proposed to Accepted.

Thank you!

* Discussion what should live in "core" tensorflow (e.g. `tf.xxx.my_cool_op` )
* Ultra complex functions (e.g. trained models) that are unlikely to get
specialized implementations in hardware.
* Immediate support for replacing gradients of composite ops.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Immediate support for replacing gradients of composite ops.
* Support for uses of the annotation during training, taking gradients eliminates the annotation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded to avoid mention of annotations since we haven' introduced them yet:

Support for processing gradients (and forward inference in the presense of gradients)
for composite ops.



### Behind the scenes changes
When a new tf.function is created, behind the scenes tensorflow creates up-to 3 functions that are stored in the graph. These functions are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true? I thought that @tf.function would not do that, but taking the gradient of a function would create the two extra functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well the function creation is lazy in the first place... so tf.function creates no functions at all... They are created when you actually invoke them and/or call gradient of them. So i think the statement is technically correct: note "up-to" :)

@marksandler2
Copy link
Contributor Author

Hi @marksandler2 a couple of nits before I can merge this:

  • there's a typo in the file name "standartizing" vs "standardizing".
  • please change the Status in the header from Proposed to Accepted.

Thank you!

@ewilderj Done!

@ewilderj ewilderj merged commit a3315fb into tensorflow:master Aug 20, 2019
RFC management automation moved this from Open reviews to Accepted RFCs Aug 20, 2019
@ewilderj ewilderj added RFC: Accepted RFC Design Document: Accepted by Review and removed RFC: Proposed RFC Design Document labels Aug 20, 2019
@lgeiger
Copy link

lgeiger commented Nov 18, 2020

The work on TFR Composable Tensorflow seems to overlap a bit with this RFC and follows a very similar rational. Would you mind to share whether there are plans to unify the approaches in the future or replace one with another?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes RFC: Accepted RFC Design Document: Accepted by Review
Projects
RFC management
  
Accepted RFCs
Development

Successfully merging this pull request may close these issues.

None yet