From a05e0da52abdf3aa96ebdff51f635ede59d22ba4 Mon Sep 17 00:00:00 2001 From: Saurabh Saxena Date: Tue, 3 Dec 2019 20:00:26 -0500 Subject: [PATCH 1/8] First commit --- 20191203-single-eager-graph-path.md | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 20191203-single-eager-graph-path.md diff --git a/20191203-single-eager-graph-path.md b/20191203-single-eager-graph-path.md new file mode 100644 index 000000000..328b5604c --- /dev/null +++ b/20191203-single-eager-graph-path.md @@ -0,0 +1,9 @@ +# Single python code path for eager and graph + +| Status | Proposed | +:-------------- |:---------------------------------------------------- | +| **RFC #** | | +| **Author(s)** | Saurabh Saxena (srbs@google.com) | +| **Sponsor** | Alex Passos, Gaurav Jain | +| **Updated** | 2019-12-03 | + From a920a4d902bef24aa259496f6d063021a4bf3019 Mon Sep 17 00:00:00 2001 From: Saurabh Saxena Date: Wed, 4 Dec 2019 13:18:42 -0500 Subject: [PATCH 2/8] Move to rfcs/ --- .../20191203-single-eager-graph-path.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename 20191203-single-eager-graph-path.md => rfcs/20191203-single-eager-graph-path.md (100%) diff --git a/20191203-single-eager-graph-path.md b/rfcs/20191203-single-eager-graph-path.md similarity index 100% rename from 20191203-single-eager-graph-path.md rename to rfcs/20191203-single-eager-graph-path.md From 789ce192a6324f54240c8262e70014bd65d0d4a9 Mon Sep 17 00:00:00 2001 From: Saurabh Saxena Date: Wed, 4 Dec 2019 18:35:29 -0500 Subject: [PATCH 3/8] Add RFC and FuncGraph CUJs addendum doc. --- rfcs/20191203-single-eager-graph-path.md | 321 +++++++++++++++++- .../20191203-func-graph-cujs.md | 112 ++++++ 2 files changed, 431 insertions(+), 2 deletions(-) create mode 100644 rfcs/20191203-single-eager-graph-path/20191203-func-graph-cujs.md diff --git a/rfcs/20191203-single-eager-graph-path.md b/rfcs/20191203-single-eager-graph-path.md index 328b5604c..bec009b4f 100644 --- a/rfcs/20191203-single-eager-graph-path.md +++ b/rfcs/20191203-single-eager-graph-path.md @@ -3,7 +3,324 @@ | Status | Proposed | :-------------- |:---------------------------------------------------- | | **RFC #** | | -| **Author(s)** | Saurabh Saxena (srbs@google.com) | -| **Sponsor** | Alex Passos, Gaurav Jain | +| **Author** | Saurabh Saxena (srbs@google.com) | +| **Sponsors** | Alex Passos, Gaurav Jain | | **Updated** | 2019-12-03 | + +## Objective + +This proposal discusses merging the graph building and eager op-dispatch code-paths in python and moving the FuncGraph capturing logic and gradient tape bookkeeping into C++. + +## Motivation + +### Graph building performance + +Graph-building time performance has been a key bottleneck in enabling implementation of large models in TF2. + +* Capturing external tensors: In analysis of graph-building time for [BERT](https://github.com/tensorflow/models/tree/master/official/nlp/bert) we found that ~20% time of building the body graph of a tf.while_loop is spent in `FuncGraph.capture`. We also extensively perform capturing when building gradients of functional ops since the backward function requires access to intermediate tensors of the forward function. This includes 2 parts, both of which we could potentially perform in C++. + * [Creating](https://github.com/tensorflow/tensorflow/blob/6a70aa6d438259cabd23c09808db4cf51a2e5377/tensorflow/python/framework/func_graph.py#L1118) the placeholders. These can be many (154630 in BERT’s while loop). Building these in python means we incur the python op building overheads, Python->C SWIG costs and maintain the captures mapping in python. + * [Copying](https://github.com/tensorflow/tensorflow/blob/6a70aa6d438259cabd23c09808db4cf51a2e5377/tensorflow/python/framework/func_graph.py#L1120) the handle data (for resources and variants). Handle data contains information about the shape and type of the _contained_ entity of a `DT_RESOURCE`/`DT_VARIANT` type. Copying handle data requires a python-c round-trip since the handle data is contained in either `EagerTensor._handle_data` (for EagerTensors) or `InferenceContext.output_handle_shapes_and_types` (for Graph tensors). +* Automatic control deps: We add control dependencies to a `tf.function`’s nodes as a post-processing step to make sure that any side-effects occur in program order. This can easily be done in C/C++. +* Gradient Tape: The tape needs to keep track of the forward ops to build gradients (or actually compute the gradient in the case of forward-mode diff). This is currently triggered in gen_xyz_ops.py. We can move this to C++ as well. + + +### Cross-language support + +There have been various [requests](https://github.com/tensorflow/tensorflow/issues/28195) for providing APIs for building `tf.function` and v2 control flow in non-python frontends. Moving capturing logic to the C/C++ layer is the first step towards enabling this. The full details for this will be fleshed out in follow-up proposals, however, we do analyse how this proposal addresses use-cases of `FuncGraph` later in this doc. + + +### Shape Inference + +C++ shape inference in FuncGraphs fails if a shape tensor relies on the constant value of a captured placeholder because we do not have information about graph nesting available there. We currently work around this c6c1f2ff3bc979f420d8fffa2b6e02268f711bf6 by explicitly calling [maybe_set_static_shape](https://github.com/tensorflow/tensorflow/blob/15715cb2c8e877c18f8d969cc51a37ff26e8397b/tensorflow/python/ops/random_ops.py#L78) in Python because we have the graph hierarchy available there. One alternative @allenlavoie suggested was to replace the placeholders with their constant value tensors if possible, guarded by a size threshold but it was unclear what this threshold should be. Having information about the nested graphs and captures etc during shape inference could help avoid this problem. + + +### Consistent execution environments + +(Contributed by @allenlavoie) We currently rely on Python exporting SavedModels which are compatible with Session-based execution, where the Session owns variable memory and it is retrieved by executing variable nodes with fixed names. TensorFlow Serving for example still uses Sessions. This compatibility mode is quite different than the 2.x Python eager execution memory model where the language bindings associate memory with variable objects, and is likely going to be a source of confusion and bugs. This effort lays necessary groundwork for implementing FuncGraph in C/C++ and hence brings us closer to executing SavedModels the same way during serving (from C++) that we execute them during development (TF 2.x Python). + +References: + +1. TODO(saxenasaurabh): Link to FuncGraph CUJs doc. + +## Design Proposal + +Basically we want to get rid of the graph-building part in gen_*_ops.py and get rid of gradient tape bookkeeping in both graph and eager modes. For example: + + +``` +def batch_matrix_band_part(input, num_lower, num_upper, name=None): + _ctx = _context._context or _context.context() + tld = _ctx._thread_local_data + ~~if tld.is_eager:~~ + try: + _result = _pywrap_tensorflow.TFE_Py_FastPathExecute( + _ctx._context_handle, tld.device_name, "BatchMatrixBandPart", name, + tld.op_callbacks, input, num_lower, num_upper) + return _result + except _core._FallbackException: + try: + return batch_matrix_band_part_eager_fallback( + input, num_lower, num_upper, name=name, ctx=_ctx) + except _core._SymbolicException: + pass # Add nodes to the TensorFlow graph. + except _core._NotOkStatusException as e: + _ops.raise_from_not_ok_status(e, name) + ~~# Add nodes to the TensorFlow graph. + _, _, _op, _outputs = _op_def_library._apply_op_helper( + "BatchMatrixBandPart", input=input, num_lower=num_lower, + num_upper=num_upper, name=name) + _result = _outputs[:] + if _execute.must_record_gradient(): + _attrs = ("T", _op._get_attr_type("T")) + _inputs_flat = _op.inputs + _execute.record_gradient( + "BatchMatrixBandPart", _inputs_flat, _attrs, _result) + _result, = _result + return _result~~ + +def batch_matrix_band_part_eager_fallback(input, num_lower, num_upper, name, ctx): + _attr_T, (input,) = _execute.args_to_matching_eager([input], ctx) + num_lower = _ops.convert_to_tensor(num_lower, _dtypes.int64) + num_upper = _ops.convert_to_tensor(num_upper, _dtypes.int64) + _inputs_flat = [input, num_lower, num_upper] + _attrs = ("T", _attr_T) + _result = _execute.execute(b"BatchMatrixBandPart", 1, inputs=_inputs_flat, + attrs=_attrs, ctx=ctx, name=name) + ~~if _execute.must_record_gradient(): + _execute.record_gradient( + "BatchMatrixBandPart", _inputs_flat, _attrs, _result)~~ + _result, = _result + return _result +``` + + +1. Graph building will implicitly happen in `TFE_Py_Execute` which is called from `xyz_eager_fallback`. +1. `TF_EagerContext` makes the call to `Tape.RecordGradient` so we no longer need to call it from Python. +1. The Graph stack will be maintained in `TF_EagerContext` (see below) which includes the graph hierarchy and captures made from each graph. + + +## Detailed Design + + +### API +A high-level overview of the anticipated API. + +**C API** + + +``` +// TODO: control dependencies, auto control dependencies, callbacks + +// A TF_EagerContext knows whether we're in eager mode or in graph mode, keeps +// track of gradient tapes, etc. +typedef struct TF_EagerContext TF_EagerContext; + +TF_EagerContext* TF_NewEagerContext(TFE_Context* ctx); +void TF_DeleteEagerContext(TF_EagerContext* c); + +// The context is executing eagerly if there are no graphs in the stack. We +// check when popping a graph from the stack that it is indeed the one we +// expected to avoid bugs. +int TF_EagerContextIsExecutingEagerly(TF_EagerContext* c); +void TF_EagerContextEnterGraph(TF_EagerContext* c, TF_Graph* g); +void TF_EagerContextExitGraph(TF_EagerContext* c, TF_Graph* g, TF_Status* s); + +// A TF_TensorHandle is a union type of TFE_TensorHandle (eager tensor) and +// TF_Output (graph tensor). +typedef struct TF_TensorHandle TF_TensorHandle; + +// Note: takes ownership of t. +TF_TensorHandle* TF_TensorHandleFromTensor(TFE_TensorHandle* t); +void TF_TensorHandleDecRef(TF_TensorHandle* t); +void TF_TensorHandleIncRef(TF_TensorHandle* t); +TFE_TensorHandle* TF_TensorHandleToTensor(TF_TensorHandle* t, TF_Status* s); +TF_Output TF_TensorHandleToGraphTensor(TF_TensorHandle* t, TF_Status* s); +int TF_TensorHandleHasValue(TF_TensorHandle* t); +TF_DataType TF_TensorHandleDataType(TF_TensorHandle* t); + +// When in graph mode accessing a tensor from outside the graph will trigger +// capturing logic similar to what we have in FuncGraph. These methods let you +// inspect the capturing metadata before popping the graph from the graph stack. +int TF_EagerContextNumCaptures(TF_EagerContext* c, TF_Graph* g, TF_Status* s); +void TF_EagerContextCapturedValues(TF_EagerContext* c, TF_Graph* g, + TF_TensorHandle** captures, TF_Status* s); +void TF_EagerContextCapturedPlaceholders(TF_EagerContext* c, TF_Graph* g, + TF_Output* captures, + TF_Status* s); + +// Allows specifying a custom capturing function. To be use to implement +// custom capturing logic for tf.while_loop. `captured` must be in the current +// context graph. +typedef void(*CaptureCallback)(TF_EagerContext* c, + TF_Graph* source_graph, + TF_TensorHandle* source, + TF_TensorHandle** captured, + TF_Status* s); +void TF_EagerContextPushCaptureCallback(TF_EagerContext* c, + CaptureCallback* callback, + TF_Graph* graph, TF_Status* s); +void TF_EagerContextPopCaptureCallback(TF_EagerContext* c, + TF_Graph* graph, TF_Status* s); + +// Needed for updating the captured tensor in tf.function, tf.cond grad func, VariableTensor. +void TF_EagerContextUpdateCaptureForPlaceholder(TF_EagerContext* c, TF_Graph* g, + TF_TensorHandle* placeholder, + TF_TensorHandle* new_capture, + TF_Status* s); + +// TF_OutputList just lets us not specify the number of outputs of an operation +// beforehand. This forces a memory allocation in the runtime, which is bad, but +// it allows for generic code. +typedef struct TF_OutputList TF_OutputList; +TF_OutputList* TF_NewOutputList(); +void TF_DeleteOutputList(TF_OutputList* o); +int TF_OutputListNumOutputs(TF_OutputList* o); +TF_TensorHandle* TF_OutputListOutput(TF_OutputList* o, int i); + +// A TF_AbstractOp is the metadata we need to execute an operation in either +// eager or graph mode. +typedef struct TF_AbstractOp TF_AbstractOp; +TF_AbstractOp* TF_NewAbstractOp(TF_EagerContext* c, const char* const op_name, + TF_Status* s); +void TF_DeleteAbstractOp(TF_AbstractOp* op); + +// TODO: we need a way of specifying attrs + +// TF_ExecuteOperation will, if in eager mode, execute, if in graph mode, maybe +// capture some inputs and then add a node in the graph, and after +// execution/node creation it'll go and record things that happened in any tape +// which happens to be active. +void TF_ExecuteOperation(TF_AbstractOp* op, int num_inputs, + TF_TensorHandle* const * inputs, TF_Status* s, + TF_OutputList* o); + +// TF_Tape is just a specialization of tensorflow::eager::Tape on +// TF_TensorHandle values and gradients. +typedef struct TF_Tape TF_Tape; +TF_Tape* TF_NewTape(TF_EagerContext* c, int persistent); +void TF_DeleteTape(TF_Tape* t); + +void TF_ContextPushTape(TF_EagerContext* ctx, TF_Tape* tape); +void TF_ContextPopTape(TF_EagerContext* ctx, TF_Tape* tape, TF_Status* s); +void TF_TapeWatch(TF_EagerContext* ctx, TF_TensorHandle* t); +void TF_TapeGradient(TF_Tape* t, int num_sources, TF_TensorHandle** sources, + int num_targets, TF_TensorHandle** targets, + TF_OutputList* gradients, TF_Status* s); + +// A GradientFunction is what we execute at runtime when computing a gradient; +// it takes some closure-captured values from the forward pass and the output +// gradients of the op and produces the input gradients of the op. +typedef void (*GradientFunction)(int num_output_gradients, + TF_TensorHandle* const * output_gradients, + TF_TensorHandle** input_gradients, + TF_Status* s, void* closure); +typedef void (*GradientFunctionDeleter)(GradientFunction function, + void* closure); + +// A GradientFunctionRegisterer is the code that will run during the forward +// pass to find out which gradient function should be pushed into the tape. It +// has access to all inputs and outputs of an operation and gets to choose which +// ones to pack into the closure which will be available to the GradientFunction +// at runtime. +typedef void (*GradientFunctionRegisterer)( + TF_EagerContext* c, int num_inputs, TF_TensorHandle* const* inputs, + TF_OutputList* outputs, GradientFunction* gradient_function, + GradientFunctionDeleter* gradient_function_deleter, + void* registerer_closure, void** gradient_function_closure); + +void TF_TapeCustomGradient(TF_EagerContext* ctx, + int num_inputs, + TF_TensorHandle** inputs, + int num_outputs, + TF_TensorHandle** outputs, + GradientFunctionRegisterer registerer, + void* registerer_closure); + +// Registers a gradient function to run given an op name. +void TF_ContextRegisterGradientFunction(TF_EagerContext* ctx, + const char* op_name, + GradientFunctionRegisterer registerer, + void* registerer_closure); +``` + + +**Python API** + + +``` +class EagerContextManager(object): + def __init__(self, c_graph): + self._c_graph = c_graph + def __enter__(self): + c_api.TF_EagerContextEnterGraph(ctx, self._c_graph) + def __exit__(self): + c_api.TF_EagerContextExitGraph(ctx, self._c_graph) + +class _FuncGraphBase(object): + def __init__(): + self._c_graph = c_api.TF_NewGraph() + @contextmanager + def as_default(): + # Note: This means that the graph hierarchy is no longer maintained in python. + return EagerContextManager(self._c_graph) +``` + + +We will implement a new subclass for `FuncGraph` that will replace `Graph`. We will try to keep as much of the logic as possible in C++ and expose that using pybind or somesuch. Here’s a discussion of some of the features that `FuncGraph` inherits from `Graph` which we will need to support. This list may not be exhaustive and we are hoping to add support for other things as needed. + + + +1. `apply_op_helper` + `create_op_internal` contain a lot of op _preparation_ logic which will need to be moved to C++. For example: + 1. [Uniquifying op names](https://github.com/tensorflow/tensorflow/blob/41228d7f14496ff661e7c22361a987b0255cf812/tensorflow/python/framework/ops.py#L3297). + 1. [Checking](https://github.com/tensorflow/tensorflow/blob/41228d7f14496ff661e7c22361a987b0255cf812/tensorflow/python/framework/op_def_library.py#L319-L327) deprecated op versions. [Graph version](https://github.com/tensorflow/tensorflow/blob/41228d7f14496ff661e7c22361a987b0255cf812/tensorflow/python/framework/ops.py#L2946) is already maintained in C++ so this should be fine. + 1. [Type checking](https://github.com/tensorflow/tensorflow/blob/41228d7f14496ff661e7c22361a987b0255cf812/tensorflow/python/framework/op_def_library.py#L53). + 1. There is a lot of logic for building attrs in python. For this we could possibly re-use the existing implementation in the pywrap C++ eager layer ([1](https://github.com/tensorflow/tensorflow/blob/41228d7f14496ff661e7c22361a987b0255cf812/tensorflow/python/eager/pywrap_tfe_src.cc#L755), [2](https://github.com/tensorflow/tensorflow/blob/41228d7f14496ff661e7c22361a987b0255cf812/tensorflow/python/eager/pywrap_tfe_src.cc#L850)) + 1. apply_op_helper calls [convert_to_tensor](https://github.com/tensorflow/tensorflow/blob/6d7926bb87c1a91ffd110aa3407c003b2ae54009/tensorflow/python/framework/op_def_library.py#L421) to convert python scalars to Tensors. This will happen in python for now and may move to a python specific C++ layer in the future. +1. We need some form of context management to handle a variety of context managers we have in Graph e.g. [control dependencies](https://github.com/tensorflow/tensorflow/blob/6d7926bb87c1a91ffd110aa3407c003b2ae54009/tensorflow/python/framework/ops.py#L4345), control flow contexts (for XlaControlFlowContext), [colocate_with](https://github.com/tensorflow/tensorflow/blob/23275fb35cf17482d147f88ce7d8f4ce9c2376f3/tensorflow/python/framework/ops.py#L4115), [name_scope](https://github.com/tensorflow/tensorflow/blob/6d7926bb87c1a91ffd110aa3407c003b2ae54009/tensorflow/python/framework/ops.py#L3918), [_attr_scope_map](https://github.com/tensorflow/tensorflow/blob/6d7926bb87c1a91ffd110aa3407c003b2ae54009/tensorflow/python/framework/ops.py#L4587), [_kernel_label_map](https://github.com/tensorflow/tensorflow/blob/6d7926bb87c1a91ffd110aa3407c003b2ae54009/tensorflow/python/framework/ops.py#L4653) etc. We will look into whether this can be implemented using a generic op callback mechanism. The same mechanism can be used for implementing op callbacks as well. +1. We will perform a case-by-case analysis of APIs of `Graph` to decide which of those should be supported in `_FuncGraphBase`. + 1. Certain APIs related to [feeding and fetching](https://github.com/tensorflow/tensorflow/blob/6d7926bb87c1a91ffd110aa3407c003b2ae54009/tensorflow/python/framework/ops.py#L4788-L4805) probably don’t make sense for FuncGraph. + 1. APIs for fetching Operations and Tensors: These APIs rely on a [dict of Operations](https://github.com/tensorflow/tensorflow/blob/6d7926bb87c1a91ffd110aa3407c003b2ae54009/tensorflow/python/framework/ops.py#L2721) maintained in Graph. Currently this dict is built _actively_ as operations are created in the graph. We could choose to populate this cache lazily as well. + 1. In each Graph we maintain a [dict](https://github.com/tensorflow/tensorflow/blob/6d7926bb87c1a91ffd110aa3407c003b2ae54009/tensorflow/python/framework/ops.py#L2757) of EagerDefinedFunction/DefinedFunction used in the graph directly or in a sub-function. In nested functions we probably spend quadratic time in [copying](https://github.com/tensorflow/tensorflow/blob/23275fb35cf17482d147f88ce7d8f4ce9c2376f3/tensorflow/python/eager/function.py#L488-L497) the inner functions all the way to the eager context and use quadratic (in the number of functions) memory. Storing `_EagerDefinedFunction` references in the global graph has been a common source of memory leaks which @kkimdev has been valiantly fighting with. I think we should try to register functions directly in the global eager context. We can just keep weakrefs to the _EagerDefinedFunction so that we don’t interfere with memory management. @kkimdev pointed out that we still need to maintain some reference to the list of functions used inside a ConcreteFunction so that we can add those to the [SavedModel](https://github.com/tensorflow/tensorflow/blob/23275fb35cf17482d147f88ce7d8f4ce9c2376f3/tensorflow/python/saved_model/save.py#L593). + +Some implementation notes: + +1. Need to add RefCounting for eager tensor handles. + 1. If a graph captures an EagerTensor, the code creating the EagerTensor should not delete it. + 1. How do you write the gradient function of add, which just wants to forward the output gradient to the two inputs + + +### Analysing some FuncGraph CUJs + + +**tf.function** + +When building the gradient (Stateful)PartitionedCall op, a captured tensor in the forward graph needs to be resolved to a forward call op’s output. This will still be possible to do in python. + +**tf.cond/tf.switch_case** + +Similar to tf.function, during gradient computation forward graph intermediates need to be mapped to forward op’s outputs. This currently updates the FuncGraph.captures map which can be done using `TF_EagerContextUpdateCaptureForPlaceholder`. Note however that tf.function does not actually update FuncGraph.captures and simply uses the new captures for building the gradient op. We may be able to avoid calling the API to update captures here if we do the same. Not sure if there any behavior relying on this though. Higher order derivatives using tf.gradients maybe? + +**tf.while_loop** + +tf.while_loop intercepts the default capturing mechanism of FuncGraph with custom behavior. In tf.while_loop, when a forward pass tensor needs to be captured we have to add an [accumulator](https://github.com/tensorflow/tensorflow/blob/6d7926bb87c1a91ffd110aa3407c003b2ae54009/tensorflow/python/ops/while_v2.py#L1012) and then capture the output of the While op corresponding to that accumulator. + +To support this we will provide a `TF_EagerContext{Push|Pop}CaptureCallback` API which will register a callback function to perform the logic in [_WhileBodyGradFuncGraph._capture_helper](https://github.com/tensorflow/tensorflow/blob/6d7926bb87c1a91ffd110aa3407c003b2ae54009/tensorflow/python/ops/while_v2.py#L933). + +We could leverage this to unify the gradient graph captures resolving behavior of `tf.function`/`tf.cond`/`tf.while_loop` all of which have their own recipes right now. + +### Automatic Control Dependencies + +Automatic control dependencies (ACD) will move to C++ as well. However instead of being post-hoc it will now be performed _during_ graph building. The current design has certain limitations e.g. control dependencies across function boundaries are performed at the function level which is prohibitive for performance. There are ongoing discussions on ways to improve this. Other issues have come up in `tf.data` and `tf.distribute` for example because ACD only tracks direct dependencies. Ideally we should use this opportunity to address these shortcomings. However the details of this redesign are left to another doc to avoid diluting this doc. + +### Open questions + +1. Keras seems to be using [non-public APIs](https://github.com/tensorflow/tensorflow/blob/6d7926bb87c1a91ffd110aa3407c003b2ae54009/tensorflow/python/keras/engine/base_layer.py#L2511) for directly building NodeDef and adding that to the graph. This is necessary for supporting Keras's Functional API (Model.add_loss, Model.add_metric, and auto-Lambda layers). We need to figure out if/how to support that. There are ongoing efforts to use just the public API of TF in tf.keras but the timelines for that are unclear. + + +## Appendix + +**Definition: Capturing** + +Capturing is the process used to allow users to write functions which can reference tensors that are not directly passed as function inputs or are not passed as loop_vars in a call to tf.while_loop. In FuncGraph, when an external tensor is captured we create a [placeholder](https://github.com/tensorflow/tensorflow/blob/23275fb35cf17482d147f88ce7d8f4ce9c2376f3/tensorflow/python/framework/func_graph.py#L649) just like any other input and add that placeholder to the list of [FuncGraph.inputs](https://github.com/tensorflow/tensorflow/blob/23275fb35cf17482d147f88ce7d8f4ce9c2376f3/tensorflow/python/framework/func_graph.py#L672) and store the mapping from the external tensor to the placeholder in [FuncGraph._captures](https://github.com/tensorflow/tensorflow/blob/23275fb35cf17482d147f88ce7d8f4ce9c2376f3/tensorflow/python/framework/func_graph.py#L671).Capturing is triggered in `_create_op_internal` which is overridden in FuncGraph. + diff --git a/rfcs/20191203-single-eager-graph-path/20191203-func-graph-cujs.md b/rfcs/20191203-single-eager-graph-path/20191203-func-graph-cujs.md new file mode 100644 index 000000000..4586bdef9 --- /dev/null +++ b/rfcs/20191203-single-eager-graph-path/20191203-func-graph-cujs.md @@ -0,0 +1,112 @@ +# CUJs for FuncGraph + +| **Author** | Saurabh Saxena (srbs@google.com) | +:-------------- |:---------------------------------------------------- | +| **Updated** | 2019-12-03 | + + +### tf.function + + +### **Forward** + + + +1. An empty FuncGraph is created. +1. [Placeholders](https://github.com/tensorflow/tensorflow/blob/6a70aa6d438259cabd23c09808db4cf51a2e5377/tensorflow/python/framework/func_graph.py#L1205) are created in it corresponding to the input_signature. Note the signature can contain CompositeTensors which are flattened. The input structure is maintained in [structured_input_signature](https://github.com/tensorflow/tensorflow/blob/6a70aa6d438259cabd23c09808db4cf51a2e5377/tensorflow/python/framework/func_graph.py#L906). + 1. We seem to be [always](https://github.com/tensorflow/tensorflow/blob/6a70aa6d438259cabd23c09808db4cf51a2e5377/tensorflow/python/framework/func_graph.py#L1237) capturing variables even though they are unused. Can that be avoided? +1. The python_func is called with the above input placeholders as args. This can trigger creation of new placeholders by capturing. The captured tensors can be symbolic tensors from outer graphs or eager tensors. +1. FuncGraph.structured_outputs is populated with the structured tensors(containing CompositeTensors, IndexedSlices etc.). FuncGraph.outputs is built by flattening the structure and CompositeTensors in structured_outputs and by removing any Nones. + 1. We call [capture](https://github.com/tensorflow/tensorflow/blob/6a70aa6d438259cabd23c09808db4cf51a2e5377/tensorflow/python/framework/func_graph.py#L1015) on the tensors in the list of outputs to handle the case when the function is simply returning an external tensor. Solutions: + 1. We could replace this with creating an Identity node in the forward graph which would implicitly capture the external tensor. However, these Identity nodes are not necessary and might cause performance problems later. + 1. Can we avoid doing the capturing in func_graph_from_py_func? Idea: We keep Nones in the list of structured_outputs and not in the list of outputs. We could do the same for external outputs. These can get repackaged just like we [repackage](https://github.com/tensorflow/tensorflow/blob/6a70aa6d438259cabd23c09808db4cf51a2e5377/tensorflow/python/eager/function.py#L1911-L1913) Nones. + +**Backward** + + + +1. An empty FuncGraph is created. +1. input_signature is [constructed](https://github.com/tensorflow/tensorflow/blob/6a70aa6d438259cabd23c09808db4cf51a2e5377/tensorflow/python/eager/function.py#L644) from the incoming grads and [placeholders](https://github.com/tensorflow/tensorflow/blob/6a70aa6d438259cabd23c09808db4cf51a2e5377/tensorflow/python/framework/func_graph.py#L1205) are created in it corresponding to the input_signature. +1. The gradient [function](https://github.com/tensorflow/tensorflow/blob/6a70aa6d438259cabd23c09808db4cf51a2e5377/tensorflow/python/eager/function.py#L649) is called in this FuncGraph. This triggers capturing of intermediate tensors in the forward FuncGraph or one of its outer graphs in case custom_gradients are involved. Note that we already created placeholders for incoming grads so those are not captured. When building the gradient PartitionedCall op, this external capture will be replaced with a Placeholder in the current graph if the capture is not already in the current graph. The external capture is now a capture in the current graph (graph containing the gradient PartitionedCall). There are a few cases in the resolution: + 1. The external tensor is in one of the outer graphs of the current graph. In this case nothing needs to be done. + 1. The external tensor is not in the current hierarchy. + 1. If it is in the forward graph it gets [added](https://github.com/tensorflow/tensorflow/blob/6a70aa6d438259cabd23c09808db4cf51a2e5377/tensorflow/python/eager/function.py#L688) to the list of outputs and the forward op is [updated](https://github.com/tensorflow/tensorflow/blob/6a70aa6d438259cabd23c09808db4cf51a2e5377/tensorflow/python/eager/function.py#L715) with new outputs and this tensor is [resolved](https://github.com/tensorflow/tensorflow/blob/6a70aa6d438259cabd23c09808db4cf51a2e5377/tensorflow/python/eager/function.py#L723-L728) to an op output. + 1. If it is in an outer graph of the forward graph, nothing needs to be done (yet). + 1. If it is in an inner graph of the forward graph, an error is raised (this should never happen). +1. FuncGraph.structured_outputs is populated with the structured tensors(containing CompositeTensors, IndexedSlices etc.). FuncGraph.outputs is built by flattening the structure and CompositeTensors in structured_outputs and by removing any Nones. + + +### tf.cond/tf.switch_case + +**Forward** + + + +1. Build graphs for branch functions. +1. Find the superset of input tensors needed by all branch functions and update signatures of all branch functions so that they [match](https://github.com/tensorflow/tensorflow/blob/f540109342f8b7cb9b96163dae455013249c3128/tensorflow/python/ops/cond_v2.py#L494) by creating dummy placeholders. This requires resetting FuncGraph.inputs and FuncGraph.captures. + 1. Supporting this would require either ResetInputs, ResetCaptures APIs or adding new If/Case ops that don’t need this signature matching (b/143286622). + 1. Another option is to not support resetting inputs and captures at all and let the consumers take care of this when generating the FunctionDef. However this would mean that the FunctionDef would not match the FuncGraph which may cause problems in [gradient computation](https://github.com/tensorflow/tensorflow/blob/f540109342f8b7cb9b96163dae455013249c3128/tensorflow/python/ops/cond_v2.py#L109) which use the forward cached FuncGraph and expects the forward op’s FunctionDef to be generated 1-1 from the forward FuncGraph. + +**Backward** + + + +1. Build the grad func for each branch using tf.gradients. +1. Similar to forward pass, add dummy inputs to make input signatures match. +1. Any needed intermediates in the forward graph are wrapped in Optionals and are added to the list of forward graph [outputs](https://github.com/tensorflow/tensorflow/blob/f540109342f8b7cb9b96163dae455013249c3128/tensorflow/python/ops/cond_v2.py#L151-L152). +1. Similar to tf.function, we resolve any external captures to the forward op’s outputs. + + +### tf.while_loop + +**Forward** + + + +1. Build the [cond](https://github.com/tensorflow/tensorflow/blob/c29529aa7d55bc66b040917a36acdb5722231043/tensorflow/python/ops/while_v2.py#L141) FuncGraph using a signature built from the input loop vars. Cond function can capture external tensors which show up in cond_graph.external_captures. +1. Build the [body](https://github.com/tensorflow/tensorflow/blob/c29529aa7d55bc66b040917a36acdb5722231043/tensorflow/python/ops/while_v2.py#L186) FuncGraph using the same signature as the cond. However in the body function [capture](https://github.com/tensorflow/tensorflow/blob/c29529aa7d55bc66b040917a36acdb5722231043/tensorflow/python/ops/while_v2.py#L162-L165) the external captures of cond first. At this point the full signature, i.e. original input signature with loop vars + captures, matches in cond and body. + 1. The explicit capture is needed here to make the signatures of cond and body to match. This can be avoided if we allow signatures of cond and body to diverge. +1. Now body_graph has some extra external captures. These are captured in the [cond_graph](https://github.com/tensorflow/tensorflow/blob/c29529aa7d55bc66b040917a36acdb5722231043/tensorflow/python/ops/while_v2.py#L206-L213). So in effect the external captures of body cond_graph and body_graph are effectively cond-graph-captures + body-graph-captures. + +**Backward** + + + +1. Build the gradient graph for the forward graph just like for other functional ops. +1. Since a while loop can run for multiple iterations, if the backwards pass needs to capture a forward tensor there are two cases: + 1. If the tensor’s value potentially varies across iterations, in the forward graph the tensor is [accumulated](https://github.com/tensorflow/tensorflow/blob/c29529aa7d55bc66b040917a36acdb5722231043/tensorflow/python/ops/while_v2.py#L1012) in a TensorList (think: stack). Note: now the forward op has an extra input, the empty stack, and an extra output which contains the list of values of the tensor in multiple iterations. The forward graph stack is captured in the backward graph and a value is popped from it to use as the intermediate value for that tensor. + 1. If the tensor’s value is invariant across loop iterations, we directly [capture](https://github.com/tensorflow/tensorflow/blob/c29529aa7d55bc66b040917a36acdb5722231043/tensorflow/python/ops/while_v2.py#L978) the forward tensor in the backward graph. + + +### Autograph + +FuncGraph is used as a temporary graph to evaluate the type of a while loop’s conditional expression. See [while_stmt](https://github.com/tensorflow/tensorflow/blob/6a70aa6d438259cabd23c09808db4cf51a2e5377/tensorflow/python/autograph/operators/control_flow.py#L739). Created ops, if any, are discarded immediately - we only need to test whether the expression evaluates to a Tensor or not, and if a tf.while_loop is created, they will be created again by the while_loop itself. + +This might not require a FuncGraph - any regular graph is suitable for this purpose. + + +### Serialization/SavedModel + +Serialization + + + +1. The Trackable object graph is crawled to find all functions. An error is raised if trying to save an unsaveable FuncGraph. + 1. FuncGraph has a `_saveable` property which is used to denote whether a FuncGraph can be saved to a SavedModel. This seems to have only [one usage](https://github.com/tensorflow/tensorflow/blob/99f0e90b384cfb255103a8965bec0d11a7995e49/tensorflow/python/keras/backend.py#L311) right now in Keras to mark functions that capture the symbolic learning phase to be unsaveable. +1. For every ConcreteFunction + 1. Its captured non-resource non-variant tensors are [converted](https://github.com/tensorflow/tensorflow/blob/23275fb35cf17482d147f88ce7d8f4ce9c2376f3/tensorflow/python/saved_model/save.py#L280-L298) into graph constants. + 1. The graph is converted to a [FunctionDef](https://github.com/tensorflow/tensorflow/blob/23275fb35cf17482d147f88ce7d8f4ce9c2376f3/tensorflow/python/saved_model/save.py#L593) and is written to the MetaGraphDef graph’s function library. + 1. An [entry](https://github.com/tensorflow/tensorflow/blob/99f0e90b384cfb255103a8965bec0d11a7995e49/tensorflow/core/protobuf/saved_object_graph.proto#L32) is added to the object graph proto which stores the node ids of the captured inputs in the object graph and the input/output structures. +1. To enable loading the SavedModel with Sessions, placeholders are [created](https://github.com/tensorflow/tensorflow/blob/23275fb35cf17482d147f88ce7d8f4ce9c2376f3/tensorflow/python/saved_model/save.py#L341) in the graph for non-captured inputs. Then a (Stateful)PartitionedCall op is created in the graph, by feeding the placeholders + constants as inputs to the call op. A SignatureDef is then created for the call op and added to the MetaGraphDef. + 1. This requires access to FuncGraph.inputs, captures and external_captures and assumes that placeholders for captures are at the rear of FuncGraph.inputs. + +Deserialization + + + +1. Concrete functions are [created](https://github.com/tensorflow/tensorflow/blob/99f0e90b384cfb255103a8965bec0d11a7995e49/tensorflow/python/saved_model/load.py#L113-L115) for all graph library functions. + 1. This probably instantiates ConcreteFunctions for non-top-level functions as well. Is that necessary? +1. The captures map is initialized by using the [bound_inputs](https://github.com/tensorflow/tensorflow/blob/99f0e90b384cfb255103a8965bec0d11a7995e49/tensorflow/core/protobuf/saved_object_graph.proto#L107) field of the SavedConcreteFunction proto. + 1. This makes a call to [replace_capture](https://github.com/tensorflow/tensorflow/blob/99f0e90b384cfb255103a8965bec0d11a7995e49/tensorflow/python/saved_model/load.py#L184) and then a separate call to [capture](https://github.com/tensorflow/tensorflow/blob/99f0e90b384cfb255103a8965bec0d11a7995e49/tensorflow/python/saved_model/load.py#L200). This is done because we already have the internal placeholders created and we just need to update the captures map. The call to FuncGraph.capture records the capture on the tape. +1. Input/output structures are [initialized](https://github.com/tensorflow/tensorflow/blob/99f0e90b384cfb255103a8965bec0d11a7995e49/tensorflow/python/saved_model/load.py#L155-L157). + 1. Seems like structured_outputs only contains the structure but not really the tensors e.g. in the original FuncGraph.structured_outputs. From 60cf538c4f8138b0f9685b3d1d9584478a7d7b6d Mon Sep 17 00:00:00 2001 From: Saurabh Saxena Date: Wed, 4 Dec 2019 18:44:27 -0500 Subject: [PATCH 4/8] Correctly format diffed code. --- rfcs/20191203-single-eager-graph-path.md | 34 ++++++++++++------------ 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/rfcs/20191203-single-eager-graph-path.md b/rfcs/20191203-single-eager-graph-path.md index bec009b4f..1cbe451f1 100644 --- a/rfcs/20191203-single-eager-graph-path.md +++ b/rfcs/20191203-single-eager-graph-path.md @@ -48,11 +48,11 @@ References: Basically we want to get rid of the graph-building part in gen_*_ops.py and get rid of gradient tape bookkeeping in both graph and eager modes. For example: -``` +```diff def batch_matrix_band_part(input, num_lower, num_upper, name=None): _ctx = _context._context or _context.context() tld = _ctx._thread_local_data - ~~if tld.is_eager:~~ +- if tld.is_eager: try: _result = _pywrap_tensorflow.TFE_Py_FastPathExecute( _ctx._context_handle, tld.device_name, "BatchMatrixBandPart", name, @@ -66,18 +66,18 @@ def batch_matrix_band_part(input, num_lower, num_upper, name=None): pass # Add nodes to the TensorFlow graph. except _core._NotOkStatusException as e: _ops.raise_from_not_ok_status(e, name) - ~~# Add nodes to the TensorFlow graph. - _, _, _op, _outputs = _op_def_library._apply_op_helper( - "BatchMatrixBandPart", input=input, num_lower=num_lower, - num_upper=num_upper, name=name) - _result = _outputs[:] - if _execute.must_record_gradient(): - _attrs = ("T", _op._get_attr_type("T")) - _inputs_flat = _op.inputs - _execute.record_gradient( - "BatchMatrixBandPart", _inputs_flat, _attrs, _result) - _result, = _result - return _result~~ +- # Add nodes to the TensorFlow graph. +- _, _, _op, _outputs = _op_def_library._apply_op_helper( +- "BatchMatrixBandPart", input=input, num_lower=num_lower, +- num_upper=num_upper, name=name) +- _result = _outputs[:] +- if _execute.must_record_gradient(): +- _attrs = ("T", _op._get_attr_type("T")) +- _inputs_flat = _op.inputs +- _execute.record_gradient( +- "BatchMatrixBandPart", _inputs_flat, _attrs, _result) +- _result, = _result +- return _result~~ def batch_matrix_band_part_eager_fallback(input, num_lower, num_upper, name, ctx): _attr_T, (input,) = _execute.args_to_matching_eager([input], ctx) @@ -87,9 +87,9 @@ def batch_matrix_band_part_eager_fallback(input, num_lower, num_upper, name, ctx _attrs = ("T", _attr_T) _result = _execute.execute(b"BatchMatrixBandPart", 1, inputs=_inputs_flat, attrs=_attrs, ctx=ctx, name=name) - ~~if _execute.must_record_gradient(): - _execute.record_gradient( - "BatchMatrixBandPart", _inputs_flat, _attrs, _result)~~ +- if _execute.must_record_gradient(): +- _execute.record_gradient( +- "BatchMatrixBandPart", _inputs_flat, _attrs, _result) _result, = _result return _result ``` From 78d2a13462f719e67508cbde9c4d43d5ba506d0e Mon Sep 17 00:00:00 2001 From: Saurabh Saxena Date: Wed, 4 Dec 2019 18:46:36 -0500 Subject: [PATCH 5/8] Remove haning tildas. --- rfcs/20191203-single-eager-graph-path.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/20191203-single-eager-graph-path.md b/rfcs/20191203-single-eager-graph-path.md index 1cbe451f1..b397208f2 100644 --- a/rfcs/20191203-single-eager-graph-path.md +++ b/rfcs/20191203-single-eager-graph-path.md @@ -77,7 +77,7 @@ def batch_matrix_band_part(input, num_lower, num_upper, name=None): - _execute.record_gradient( - "BatchMatrixBandPart", _inputs_flat, _attrs, _result) - _result, = _result -- return _result~~ +- return _result def batch_matrix_band_part_eager_fallback(input, num_lower, num_upper, name, ctx): _attr_T, (input,) = _execute.args_to_matching_eager([input], ctx) From ee02db84fc90ac0060984b985f62d3bc45f32956 Mon Sep 17 00:00:00 2001 From: Saurabh Saxena Date: Thu, 9 Jan 2020 18:48:13 -0500 Subject: [PATCH 6/8] Minor API updates. --- rfcs/20191203-single-eager-graph-path.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/rfcs/20191203-single-eager-graph-path.md b/rfcs/20191203-single-eager-graph-path.md index b397208f2..fef421a03 100644 --- a/rfcs/20191203-single-eager-graph-path.md +++ b/rfcs/20191203-single-eager-graph-path.md @@ -125,6 +125,8 @@ void TF_DeleteEagerContext(TF_EagerContext* c); int TF_EagerContextIsExecutingEagerly(TF_EagerContext* c); void TF_EagerContextEnterGraph(TF_EagerContext* c, TF_Graph* g); void TF_EagerContextExitGraph(TF_EagerContext* c, TF_Graph* g, TF_Status* s); +// Cleans up captures and other graph metadata in the eager context. +void TF_EagerContextDeleteGraph(TF_EagerContext* c, TF_Graph* g, TF_Status* s); // A TF_TensorHandle is a union type of TFE_TensorHandle (eager tensor) and // TF_Output (graph tensor). @@ -181,8 +183,8 @@ TF_TensorHandle* TF_OutputListOutput(TF_OutputList* o, int i); // A TF_AbstractOp is the metadata we need to execute an operation in either // eager or graph mode. typedef struct TF_AbstractOp TF_AbstractOp; -TF_AbstractOp* TF_NewAbstractOp(TF_EagerContext* c, const char* const op_name, - TF_Status* s); +TF_AbstractOp* TF_NewAbstractOp(TF_EagerContext* c, const char* const op_type, + const char* const op_name, TF_Status* s); void TF_DeleteAbstractOp(TF_AbstractOp* op); // TODO: we need a way of specifying attrs From ca37121b46b785a5f64a0e735a5c0793c4ca942b Mon Sep 17 00:00:00 2001 From: Saurabh Saxena Date: Mon, 13 Jan 2020 15:42:57 -0500 Subject: [PATCH 7/8] Update RFC # --- rfcs/20191203-single-eager-graph-path.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/20191203-single-eager-graph-path.md b/rfcs/20191203-single-eager-graph-path.md index fef421a03..42db1684e 100644 --- a/rfcs/20191203-single-eager-graph-path.md +++ b/rfcs/20191203-single-eager-graph-path.md @@ -2,7 +2,7 @@ | Status | Proposed | :-------------- |:---------------------------------------------------- | -| **RFC #** | | +| **RFC #** | 184 | | **Author** | Saurabh Saxena (srbs@google.com) | | **Sponsors** | Alex Passos, Gaurav Jain | | **Updated** | 2019-12-03 | From 1f28c31d44ae3d086f276c44d91b374cb1db3e2e Mon Sep 17 00:00:00 2001 From: Saurabh Saxena Date: Mon, 13 Jan 2020 16:12:11 -0500 Subject: [PATCH 8/8] Resolve keras open question and mark RFC as accepted --- rfcs/20191203-single-eager-graph-path.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/rfcs/20191203-single-eager-graph-path.md b/rfcs/20191203-single-eager-graph-path.md index 42db1684e..f8dce0512 100644 --- a/rfcs/20191203-single-eager-graph-path.md +++ b/rfcs/20191203-single-eager-graph-path.md @@ -1,8 +1,8 @@ # Single python code path for eager and graph -| Status | Proposed | +| Status | Accepted | :-------------- |:---------------------------------------------------- | -| **RFC #** | 184 | +| **RFC #** | [184](https://github.com/tensorflow/community/pull/184) | | **Author** | Saurabh Saxena (srbs@google.com) | | **Sponsors** | Alex Passos, Gaurav Jain | | **Updated** | 2019-12-03 | @@ -317,7 +317,8 @@ Automatic control dependencies (ACD) will move to C++ as well. However instead o ### Open questions -1. Keras seems to be using [non-public APIs](https://github.com/tensorflow/tensorflow/blob/6d7926bb87c1a91ffd110aa3407c003b2ae54009/tensorflow/python/keras/engine/base_layer.py#L2511) for directly building NodeDef and adding that to the graph. This is necessary for supporting Keras's Functional API (Model.add_loss, Model.add_metric, and auto-Lambda layers). We need to figure out if/how to support that. There are ongoing efforts to use just the public API of TF in tf.keras but the timelines for that are unclear. +1. Keras seems to be using [non-public APIs](https://github.com/tensorflow/tensorflow/blob/6d7926bb87c1a91ffd110aa3407c003b2ae54009/tensorflow/python/keras/engine/base_layer.py#L2511) for directly building NodeDef and adding that to the graph. This is necessary for supporting Keras's Functional API (Model.add_loss, Model.add_metric, and auto-Lambda layers). We need to figure out if/how to support that. There are ongoing efforts to use just the public API of TF in tf.keras but the timelines for that are unclear. + 1. In the design review it was concluded that we should either be able to change Keras to use public python APIs or replace the internal python API calls with C API calls. ## Appendix