RFC: New tf.print #14

tomerk · 2018-08-24T20:51:52Z

Review period open until 2018-09-10

New tf.print

Status	Proposed
Author(s)	Tomer Kaftan (Google)
Sponsor	Asim Shankar (Google)
Updated	2018-08-24

A proposal for a new tf.print operator, as part of TF 2.0 development.

Summary

This doc proposes a new tf.print TensorFlow printing approach that is very
similar to the standard python print API whether or not executing eagerly. It
also provides long-requested functionality for both eager and session-based
execution, such as a more meaningful Tensor summarization, support for printing
nested data structures that contain tensors, and controllable logging levels.

googlebot · 2018-08-24T20:57:11Z

CLAs look good, thanks!

facaiy · 2018-08-28T12:36:52Z

rfcs/20180824-tf-print-v2.md

+  output stream or logging level. The inputs may be dense or sparse Tensors,
+  primitive python objects, data structures that contain Tensors, and printable
+  python objects. Printed tensors will recursively show the first and last
+  `summarize` elements of each dimension.


Is there any way to print the whole elements of a tensor, say, summarize=None or -1 ?

We can go ahead and add that functionality.

karmel · 2018-08-28T20:08:29Z

rfcs/20180824-tf-print-v2.md

+        tensor = tf.range(10)
+        print_op = tf.print("tensors:", tensor, {2: tensor * 2},
+                            output_stream=sys.stdout)
+        with tf.control_dependencies([print_op]):


This doesn't seem much better than tf.Print is right now from a graph perspective. Don't the problems listed out in the overview continue to exist? Except now I don't just have an Identity, I have to control deps.

From a session-based graph building perspective those problems do continue to exist. My understanding is this is a lower priority to solve if we're moving to an eager & graph function-first world.

We could add support for tf.print as a context manager in session-based graph building mode that automatically adds the control dependency, so users could say:

with tf.print(...): ... out = ... session.run(out)

Instead of having to explicitly add it with tf.control_dependencies.

Alternatively, it might be better to suggest that users add the return value of tf.print to their session.run code rather than add control_dependencies, if that's a better user experience. That would be harder w/ control flow and functions in graph mode though.

We could even try implicitly adding tf.prints to a graph collection that automatically gets executed at session.run. I'm more hesitant to do that, because it would require a lot of care with control flow & functions, and to work nicely with graph pruning.

FWIW, I agree with @karmel and feel that the new proposal is a worse experience for session-based graph users. If solving this problem is a low priority for the team, please reconsider deprecating tf.Print until a better solution is available.

karmel · 2018-08-28T20:09:38Z

rfcs/20180824-tf-print-v2.md

+        print_op = tf.print("tensors:", tensor, {2: tensor * 2},
+                            output_stream=sys.stdout)
+        with tf.control_dependencies([print_op]):
+          doubled_tensor = tensor * 2


nit: it's potentially slightly confusing to use tensor * 2 as the tensor being printed and here, since it becomes less clear which doubled tensor is being printed below.

Ah that's true, I can update this example.

karmel · 2018-08-28T20:10:57Z

rfcs/20180824-tf-print-v2.md

+
+  Note: This op is only partially compatible with Jupyter notebooks and colabs.
+    Because it prints to the C++ standard out / standard error, this will go
+    in the notebook kernel's console output, not in the notebook cell output.


This seems like a fairly big caveat. Isn't one of the primary use-cases running in a notebook or similar, when you're debugging some graph/op?

Yeah, I think so too. There's unfortunately no clean solution because python notebooks don't support this very well. As I mention lower down in the doc:

* Python notebooks require complex capturing logic to send C++ output to notebook cells, and the solutions are often not totally portable across operating systems / C++ runtimes. * Mixing C++ output and python print output requires regular, careful flushing to prevent things happening in strange orders. This definitely has portability issues. * If we want to do something about this, a possible approach could be to have the PrintV2 kernel execute a python print if we detect a jupyter/colab notebook environment and that the current device has a python runtime, rather than using the C++ logging/printing methods. * Alternatively, we could provide utilities to capture the C++ stdout/stderr in Jupyter notebook outputs as a part of TensorFlow. * With either approach, we would have to be very careful w/ device placement in distributed multi-machine/multi-device settings to ensure that the print device is the notebook kernel CPU.

There is some internal tooling for colabs that captures stderr and prints it to the cell output which we could look into releasing. But, it's not perfect and it only flushes/prints the log output once the cell finishes executing. This can be painful w/ long-running execution so we'd need to add a regular poll for flushing.

karmel · 2018-08-28T20:12:12Z

rfcs/20180824-tf-print-v2.md

+      printed output will be separated by spaces. Inputs may be python
+      primitives, tensors, data structures such as dicts and lists that
+      may contain tensors (with the data structures possibly nested in
+      arbitrary ways), and printable python objects.


Does this include data structures with mixed types? Ie, [python int, tensor, python object, python list of tensors...]

Yes, those print fine as well. I can add some text about that to the description.

karmel · 2018-08-28T20:12:52Z

rfcs/20180824-tf-print-v2.md

+
+    output_stream: The output stream or logging level to print to. Defaults to
+      sys.stderr, but sys.stdout, tf.logging.info, tf.logging.warning,
+      tf.logging.error, and tf.logging.fatal are also supported.


In the case of colabs + similar, where do tf.log messages end up going?

The tf.log messages should end up going to stderr, with extra logging context & the log filtering levels applied. If the device is set to the colab kernel, this should end up in the kernel console stderr logs. (But I can check this to confirm).

karmel · 2018-08-28T20:18:54Z

rfcs/20180824-tf-print-v2.md

+
+````python
+@tf_export("strings.format")
+def string_format(template, inputs, placeholder="%s", summarize=3, name=None):


Any reason the default placeholder shouldn't reflect "new style" string formatting?

I'm not totally sure which formatting style on that page you're referring to. If you just mean changing the default placeholder to "{}", that's totally fine and easy to do.

If you mean adding support for positional arguments, for modifiers of the number format, or even for the full-fledged python formatted string literals (https://docs.python.org/3/reference/lexical_analysis.html#f-strings) there's more steps involved. I'm hesitant to spend too much effort on string formatting as part of this design, because the focus of this is on the printing functionality. At the same time, we don't want the API of this design to limit adding better functionality to the strings.format operator in the future.

Below I list a bunch of possibilities for how string formatting could be made more fully-fledged (although I'm not sure if any of these are worthwhile right now):

A lot of complexity comes about from having to combine both python string processing and c++ string processing. We need to process python code to do the python object formatting & template substitution, and then have to use c++ code to process the template with python objects inserted, format tensors, and substitute them into the template.

If you are referring to support for {} and positional arguments: We could carefully combine the nest utilities & python formatting to convert the string into a template that either the c++ abseil strings substitute method or printf in can support.

The abseil template would only be able to support 10 tensors positionally referenced in the template.

Printf would be able to support NL_ARGMAX positionally specified tensors (which is implementation-dependent, but is at least 9).

These would probably be slower than the current strsplit/strcat-based implementation of the kernel, but the absl method should be faster than the printf based one.

Printf might be more susceptible to security flaws.

We would have to be careful if any python objects inserted into the template include strings that conflict w/ the printf or absl substitute placeholders. It would probably be easier to escape / modify the printed objects correctly for absl substitute than for printf.

If you mean printf named argument support where tensors in the format string are requested as strings (but python objects may be formatted w/ the number formatting): Same as the above support for {} and positional arguments.

If you mean printf style number formatting for tensors: We would no longer be able to call directly into strsplit/substitute/printf methods. We would need to manually check the string to find where tensors need to be inserted and how their elements should be formatted. We would then need to extend the tensor summarization code w/ support for printing elements using different number formats.

If you mean fully-fledged python f-strings: This would be really hard and would involve manually checking the python environment scopes / manually parsing & rewriting the python code to hook references to tensors.

To clarify, all I meant was having {} be the default, since that also plays nicely with ints, floats, and others by default, whereas %s does not. Plus, one assumes that people will gradually expect that to be the way they write string formatters as py3 takes over.

karmel · 2018-08-28T20:23:53Z

rfcs/20180824-tf-print-v2.md

+    inputs: A list of `Tensor` objects, or a single Tensor.
+      The list of tensors to format into the template string. If a solitary
+      tensor is passed in, the input tensor will automatically be wrapped as a
+      list.


Can this also support python primitives/objects as the print method does? Imagine:

def my_func(tensor, some_param=5) raise ValueError(tf.string.format("You passed in {} with param {}, which is not allowed", (tensor, my_param))

Seems easier than having to tf.string.format the tensor, then normal-string-format that with the param.

See my above comment on supporting {} and positional arguments. This is definitely possible, but there would be some caveats caused by using absl string substitute or the c++ printf.

karmel · 2018-08-28T23:02:22Z

rfcs/20180824-tf-print-v2.md

+    printing not work. In colab they might switch to tf.print then suddenly be
+    confused why nothing is printing.
+    *   It may make sense for eager graph functions to automatically hook the
+        python print method to replace it with calls to `tf.Print`. This would


You mean tf.print here presumably?

Yes, will update that!

karmel · 2018-08-28T23:05:20Z

rfcs/20180824-tf-print-v2.md

+*   This also means that the structure of printed lists/dicts is also only
+    captured the first time a `tf.print` is executed. If tensors are added or
+    removed to a list printed in `tf.print` multiple times, those changes to the
+    list will not be captured when printing again!


This seems like it will be a confusing gotcha for users. Is there some way to indicate that the value is stale?

I just ran this by @akshayka, and this actually shouldn't be an issue.
Non-eager tf control flow doesn't support modifying python objects anyway, and the tfe.defun graph functions should automatically detect that the cached value is stale and will re-generate the ops.

I'll remove this caveat from the doc.

karmel · 2018-09-13T21:19:52Z

Design Review notes

Most comments were addressed in the existing CLs
Concerns that in graph mode, tf.print is still painful
- Could use context -- with tf.print-- but that's not better.
- Any way to get automatic control deps? defun.
- This concern mostly goes away in 2.0, so fine to leave as-is for now.
- → Include defun example, test
For the registration of the op, no special formatting for now-- just a placeholder attribute. Seems fine for now.
There is an explicit with device(cpu); is this a problem?
- joshl: Seems like it will fix more than it breaks.
- apassos: not safe for non-eager TPU code.
- There are places this would help-- GPU code block in eager.
- Then user should specify with cpu
- Eager should respect soft placement, but that's a separate issue
- → Remove explicit device placement
Concerns about memory/performance if you are copying tensor to print onto CPU?
- Turn off printing for performance; that is expected
- But memory? Shouldn't be a concern generally. If the tensor fits on the GPU, it will fit on the CPU it is being transferred to.
- Don't worry for now. If we need to add summary ops later, we can
Add a deprecation warning to tf.Print

mitar · 2018-09-16T19:11:35Z

I do not think this is a good design, if we want compatibility with Python logging. Currently, the caller of the tf.print decides what is an output stream. This is I think not good. Ideally, using tf.print you would just send data to be printed/logged. And then from a central location you could configure exactly where and how and which levels should be logged. It should not be a role of the tf.print to decide if something should go to stderr or logging stream. It should maybe just set a level of the message.

In this way, in libraries, we can have clear and standard way to log messages. And then users of libraries they can configure in which way they want to display those message (iPython output, stderr, stdout, redirect to Python logging, saving to database, whatever).

ewilderj · 2018-09-19T21:02:03Z

@tomerk are there notes from the review meeting to add to this PR comment thread? and are there any updates?

pending adding those, I'm ready to merge this.

tomerk · 2018-09-19T21:06:31Z

Hi @ewilderj, @karmel added the notes above. Before merging I still need to figure out if it makes sense to do anything re @mitar's comment first, or if that should come later (e.g. having tf.print default to None as the logging level, in which case it would look it up from a global config.)

ewilderj · 2018-09-19T21:09:22Z

Great, thanks. Sorry I missed the notes. Please feel free to ping me when further action is needed.

tomerk · 2018-09-19T21:56:12Z

Okay, so:
@mitar Thank you for your suggestion! To do it we would need to set up some sort of global configuration for Tensorflow logging & automated print op device placement (e.g. see the concerns mentioned in the doc re: printing in colabs). We've decided that this is out of scope for this specific design doc. However, we can always look into extending tf.print's default behavior in the future to support this.
@ewilderj you're free to merge this now, thank you!

mitar · 2018-09-19T22:04:13Z

However, we can always look into extending tf.print's default behavior in the future to support this.

How do you see this being possible, if you we now train everyone to hard-code where to output logs. Then later on we will have a problem or ignoring what they explicitly requested (output to stdout) or not being able to suppress it.

I really believe that providing to a community tooling here is critical because otherwise it will be so much noise that this logging will become useless. Especially in code where TF is used deep inside some other codebase.

Also, why not simply building this on top of Python logging. Then TF does not have to reinvent anything here. Just use Python logging and TF can be just a Python logger.

…the standard python `print` method, and deprecates the old `tf.Print` operator (to be removed in in v2.0). It follows the design doc specified in tensorflow/community#14 and additionally incorporates the community feedback and design review decisions. This CL adds two new internal graph operators: a StringFormat operator that formats a template string with a list of input tensors to insert into the string and outputs a string scalar containing the result, and a PrintV2 operator that prints a string scalar to a specified output stream or logging level. The formatting op is exposed at `tf.strings.Format`. A new python method is exposed at `tf.print` that takes a list of inputs that may be nested structures and may contain tensors, formats them nicely using the formatting op, and returns a PrintV2 operator that prints them. In Eager mode and inside defuns this PrintV2 operator will automatically be executed, but in graph mode it will need to be either added to `sess.run`, or used as a control dependency for other operators being executed. As compared to the previous print function, the new print function: - Has an API that more closely aligns with the standard python3 print - Supports changing the print logging level/output stream - allows printing arbitrary (optionally nested) data structures as opposed to just flat lists of tensors - support printing sparse tensors - changes printed tensor format to show more meaningful summary (recursively print the first and last elements of each tensor dimension, instead of just the first few elements of the tensor irregardless of dimension). PiperOrigin-RevId: 213709924

tomerk · 2018-09-19T23:19:29Z

Hi @mitar ,
W/ regards to not training users improperly now: I'll run it by some folks again to see about making the output_stream kwarg default to None instead of stderr, and perhaps adding a default_print_output_stream to the context (which itself would default to stderr).

As for your second question, as well as why adding python-based printing & custom logging functionality tf.print would be out of scope for this specific document:
Building tf.print on top of python print itself (using tf.py_func) was my first inclination as well.
Unfortunately we can't naively build tf.print like this because that places an operator in the graph whose device requires a python runtime. This would cause issues when running in distributed settings, on TPU devices, on embedded devices, and when using non-python front-ends.

So, properly adding support for this would require configuration options to determine if there is a python runtime, and what device is the client. The print / string_format operator placement then needs to carefully happen according to these configurations. At the same time, we would still need to support exporting the graph to environments without python runtimes, so we would have to be careful to make sure this doesn't cause device placement issues for the exported graphs.

Additionally, the operator kernel itself would need to know whether to switch between a py_func for printing and the C++ logging operations. We would probably still need to maintain some level of consistency between the functionality of both.

This is somewhat related both to the "Device Locations" section in the document, and the various caveats mentioned w/ respect to Colabs.

mitar · 2018-09-19T23:24:45Z

I'll run it by some folks again to see about making the output_stream kwarg default to None instead of stderr, and perhaps adding a default_print_output_stream to the context (which itself would default to stderr).

I think this would be better. Or maybe even not having output_stream in the print, but just have it configured through the context?

ewilderj · 2018-09-20T14:55:35Z

@tomerk I'll wait a re-confirmation, if you're making changes as per the recent comments.

@Defun

* Add --config=v2 option to the .bazelrc file. PiperOrigin-RevId: 213027176 * Populate custom name in registration. PiperOrigin-RevId: 213028338 * Disable the flaky test case in timeline_test PiperOrigin-RevId: 213034078 * Convert more kernel signatures to use runtime shapes. PiperOrigin-RevId: 213037039 * Disable flaky gpu_base_test PiperOrigin-RevId: 213040362 * Added TFE_OpSetAttrTensor() to eager C API. Also added some experimental C APIs for facilitate the use of eager C APIs in S4TF compiler. PiperOrigin-RevId: 213041780 * Generalize TransformFilter method in preparation of NHWC Conv support PiperOrigin-RevId: 213049674 * [TF:XLA] Remove special base case from BatchDot that has been redundant ever since xla::DotGeneral was added. PiperOrigin-RevId: 213052269 * Disable flaky keras_test. PiperOrigin-RevId: 213053512 * Refactored some of the metrics code in compile function for better readability. - Logic change: Moved getting metric name and function out of the training/eval loops in eager mode - Moved setting metric attributes on the model out the function which calls metric functions. PiperOrigin-RevId: 213060143 * Fixed documentation of Optimizer.minimize() for eager mode to match behavior of Optimizer.compute_gradients(). PiperOrigin-RevId: 213060585 * Fix spelling in error message PiperOrigin-RevId: 213062112 * Makes tf.Variable arguments (non-captured) DT_RESOURCE function inputs. Previously, tf.Variable arguments to a defun-d Python function were made captured inputs. This change makes it possible to parameterize functions on DT_RESOURCE inputs. PiperOrigin-RevId: 213064739 * Switch to Eigen::Index in Tensorflow kernels. Mixing index type doesn't work well with latest Eigen. PiperOrigin-RevId: 213067224 * Revert PR tensorflow#21997: Fixes the formatting issue pointed out at tensorflow#21762 It breaks. should be s/input_shape/inputs_shape. PiperOrigin-RevId: 213070141 * Make accessed variable ordering deterministic again when constructing defuns PiperOrigin-RevId: 213074939 * fix bug of lacking axis when using array.ops.concat in unwrap_and_concat function * compat: Update forward compatibility horizon to 2018-09-15 PiperOrigin-RevId: 213100589 * [TPU] Deprecate the computation_shape attribute to the TpuReplicate op in lieu of a new num_cores_per_replica. PiperOrigin-RevId: 213111326 * compat: Update forward compatibility horizon to 2018-09-16 PiperOrigin-RevId: 213161736 * Introduce gmock matchers for TensorFlow nodes I need these to write readable unit tests for TF graph transformations. All of my use cases will live inside tensorflow/compiler so putting it in tensorflow/compiler/jit for now; but we can move these out if other users are interested. In the future we may want to auto-generate type safe versions of these from the op registrations like we generate C++ wrappers today. PiperOrigin-RevId: 213186810 * Conditionally allow changing a non-fusion computation root_instruction shape. PiperOrigin-RevId: 213191899 * Update broken link to intro on ADAGRAD * Fix some typos in the doc for XlaDynamicSlice phawkins@ suggested these in cr/212715067 but I accidentally made the changes in another client. PiperOrigin-RevId: 213208811 * Improve TFLite iOS doc. PiperOrigin-RevId: 213210253 * Add ZerosLike to schema. PiperOrigin-RevId: 213212445 * Implement ZerosLike PiperOrigin-RevId: 213227615 * Add fill to schema. PiperOrigin-RevId: 213234759 * compat: Update forward compatibility horizon to 2018-09-17 PiperOrigin-RevId: 213234942 * revised a parameter error Hi, i found that when firstly use `interpreter `as a parameter pass into `eval_model` function, wrong spell mistake of `interpreter_quant`. * [XLA:TF] Enable int8 and uint8 support in the bridge for CPU/GPU The test changes are awkward. None of these are XLA bugs, it's just that the op definitions in tensorflow are really inconsistent. I tried to infer whether the limitation is on signed types, index types or just arbitrary. In the latter case just int8/uint8 is blacklisted, we should probably lift that requirement at some point. PiperOrigin-RevId: 213243906 * README s/tensorflow.contrib/tensorflow.python/. PiperOrigin-RevId: 213262445 * Convert more kernel signatures to use runtime shapes. PiperOrigin-RevId: 213275003 * Convert more kernel signatures to use runtime shapes. PiperOrigin-RevId: 213281730 * Removing unused code comment in AutoGraph error rewriting. PiperOrigin-RevId: 213282302 * [tf.data] Adding support for `tf.data.AUTOTUNE` as a special value for the `num_parallel_calls` argument of `tf.data.Dataset.map()`, `tf.data.Dataset.interleave()`, and `tf.contrib.data.map_and_batch()`. When `tf.data.AUTOTUNE` is specified, the level of parallelism is determined at runtime. The underlying mechanism instruments the input pipeline to build a performance model and then uses the model to find the optimal values for the parallelism knobs. PiperOrigin-RevId: 213283297 * Increase tolerance in linalg_grad_test to fix tensorflow#19935 Fixes tensorflow#19935 PiperOrigin-RevId: 213286535 * Minor docstring change: update link to saved_model_cli. PiperOrigin-RevId: 213296537 * [Java]: Release 1.11.0-rc0 PiperOrigin-RevId: 213305616 * Fix and complete StreamExecutor's DoFusedConvolve: * bias_nd is set to have CUDNN_DATA_FLOAT, even though BiasType is not float. * double is supported but not exposed through the public interface. * DoFusedConvolveImpl has duplicated information in its template parameter list. PiperOrigin-RevId: 213308435 * Numerics tweak to symmetric quantization. PiperOrigin-RevId: 213314024 * Do not segfault in Conv2d/3d if cuDNN version is too low. PiperOrigin-RevId: 213315830 * Convert more kernel signatures to use runtime shapes. PiperOrigin-RevId: 213316034 * [XLA] Allow adding extra instructions in HloComputation::CloneWithReplacements PiperOrigin-RevId: 213316504 * GradientTape: Documentation formatting tweak. PiperOrigin-RevId: 213318051 * [XLA] Add ReduceWindow test. PiperOrigin-RevId: 213322116 * Raise error on encountering bad indentation during Autograph parsing. PiperOrigin-RevId: 213324570 * Move from deprecated self.test_session() to self.cached_session(). self.test_session() has been deprecated in 9962eb5 as its name confuses readers of the test. Moving to cached_session() instead which is more explicit about: * the fact that the session may be reused. * the session is not closed even when doing a "with self.test_session()" statement. PiperOrigin-RevId: 213326167 * Add missing `watch` call to GradientTape documentation. PiperOrigin-RevId: 213326503 * Move from deprecated self.test_session() to self.cached_session(). self.test_session() has been deprecated in 9962eb5 as its name confuses readers of the test. Moving to cached_session() instead which is more explicit about: * the fact that the session may be reused. * the session is not closed even when doing a "with self.test_session()" statement. PiperOrigin-RevId: 213326581 * Add support for predicting models with learning_phase. PiperOrigin-RevId: 213327633 * Compute `axes` and `free` statically during graph creation. PiperOrigin-RevId: 213327709 * Tweak test tolerance in segment_reduction_ops_test.py, which is otherwise flaky. PiperOrigin-RevId: 213327863 * Improve the error messages in custom_export_strategy. PiperOrigin-RevId: 213334465 * Use a single thread in eager if inter_op_parallelism_threads isn't specified. PiperOrigin-RevId: 213336463 * Keep only weak references to variables in graph functions This enables cleanup of the variables referenced in defunned methods of objects when the object is garbage collected. Since one PolymorphicFunction is created per @Defun, decorated methods before this change held on to all of the variables referenced in that method for any instance of the class (i.e. variables which should have been object-scoped were scoped to the lifetime of the class definition). Raises an exception if variables used in the function have been deleted when it is called, which means no local variables. PiperOrigin-RevId: 213337256 * Fix testing bug where partitioned primals wasn't actually being tested (constructing Variable directly instead of get_variable under scope with partitioner). PiperOrigin-RevId: 213345447 * Add benchmarks comparing Mkl vs Default Conv2D ops. PiperOrigin-RevId: 213346439 * Fix _check_is_tensor like _check_is_tensor_or_operation was fixed in tensorflow#22264. PiperOrigin-RevId: 213346485 * Add api_docs_relpath option. Eliminate error when copying a file to itself. PiperOrigin-RevId: 213349424 * Move OvicBenchmarker class from app folder to source folder. PiperOrigin-RevId: 213349833 * Add generic fallback optimized implementations for dilated DepthwiseConv. PiperOrigin-RevId: 213350122 * Remove tensorflow/contrib/linalg library. linalg remains in core. PiperOrigin-RevId: 213352573 * Fix GraphConstructor and import_graph_def bug with variadic ops. Prior to this change, GraphConstructor::PopulateMissingUnusedInputMapKey() didn't correctly compute the number of outputs for ops with variadic outputs. This meant that missing_unused_input_map_keys could contain spurious entries for unused variadic outputs, which could trigger a ValueError in import_graph_def. This also adds a new util method in node_def_util.h, NumOutputsForNode(). PiperOrigin-RevId: 213353158 * Fixing the documentation of the parse_sequence_example function. PiperOrigin-RevId: 213354240 * [tf.data] Introducing `tf.data.Dataset.window(size, shift, stride, drop_remainder)`, which can be used for combining elements of input dataset into "windows". A window is itself a finite dataset and, among other things, can be used for generalized batching (see tensorflow/community#5 for details). PiperOrigin-RevId: 213360134 * Add basic op resolver registration to TFLite C API PiperOrigin-RevId: 213360279 * Update 1.11.0-rc0 version strings to 1.11.0-rc1 (tensorflow#22284) * Make HLO liveness analysis correctly handle computations with side effect instructions. PiperOrigin-RevId: 213361904 * Changing `OpInputList` so that it is a forward iterator and taking advantage of the fact in the tf.data kernels. PiperOrigin-RevId: 213361953 * Increase test timeout for dnn_tree_combined_estimator_test to de-flake. PiperOrigin-RevId: 213363558 * Fixed bug where a mixture of Variable and PartitionedVariable would break SDCA. Added new test that fails with `IndexError: list index out of range` in `_get_partitioned_update_ops` without the corresponding fix. Note that the effect of this bug is minimal, because for Estimator users, it only applies to sparse features that are not partitionable (e.g. [1,]), since all variables are created with the same partitioner in Estimator). PiperOrigin-RevId: 213365956 * Remove unnecessary side-effect test, since HLO liveness now reports correct liveness information if a control flow computation contains side effect instructions. PiperOrigin-RevId: 213367995 * Update ops-related pbtxt files. PiperOrigin-RevId: 213368723 * Eliminate VisitableAllocator. The visitor pattern is used to allow pre-registration of memory for DMA access, e.g. for fast GPU/CPU i/o and for RDMA networking. The VisitableAllocator interface was introduced to support this use some time ago, prior to SubAllocators. Memory registration works best if it's done infrequently, on large pieces of memory, rather than on every piece that's dynamically allocated/freed. This usage pattern fits the SubAllocator better than a general Allocator. This change moves memory allocation visitor access to SubAllocator and eliminates the VisitableAllocator subclass of Allocator. This change also more rigorously enforces the requirement that all Visitors be declared prior to memory allocation begining. This is accomplished by requiring that Visitors be provided to the SubAllocator constructor. This refactoring will ease an upcoming CL introducing NUMA specific CPU devices. It also should fix some performance pitfalls (e.g. accidental use of PoolAllocator) introduced by an earlier refactoring of ProcessState that was also in preparation for NUMA. It restores the default use of the cpu_allocator() value (i.e. no SubAllocator) by model executions that don't use allocation visitors (since visitor registration must precede the first allocation, hence can be detected at that time). PiperOrigin-RevId: 213371553 * Add type checking at the beginning of tpu.shard(). Otherwise a message like "TypeError: Tensor objects are only iterable when eager execution is enabled. To iterate over this tensor use tf.map_fn." will be thrown, which is confusing. PiperOrigin-RevId: 213371676 * Remove some dead code after migration from python to C. PiperOrigin-RevId: 213372027 * Increase test timeout for image_grad_test to de-flake. PiperOrigin-RevId: 213372241 * Num elements fastpath for eager tensors. PiperOrigin-RevId: 213377426 * Break cwise_opt_test.py into 3 files to speed up testing, since we are up against the 50 shard limit. PiperOrigin-RevId: 213377776 * Add Keras TPU support for the new metrics. PiperOrigin-RevId: 213378552 * Register fp16 reduce_max on GPU PiperOrigin-RevId: 213383647 * Fix unused variable error on powerpc. PiperOrigin-RevId: 213386145 * [tf.data] Fixing an error in the optimization loop. PiperOrigin-RevId: 213386401 * Refactor out the metadata_ops set from const_analysis to a per-op bit; NFC PiperOrigin-RevId: 213389224 * Automated rollback of commit 185aa89 PiperOrigin-RevId: 213394522 * Support scoped_allocator_ops for renamed device. This fixes tensorflow#22274. Signed-off-by: Bairen Yi <byi@connect.ust.hk> * [XLA] Refactor conv_ops emitters to make them reusable. PiperOrigin-RevId: 213398930 * compat: Update forward compatibility horizon to 2018-09-18 PiperOrigin-RevId: 213414462 * Simplify the interface of conversion_call to allow a ConversionOptions object that can be more easily extended. Currently any new argument needs changing a lot of call sites and there is redundancy in argument documentation. Note: this does not modify the public symbols yet - it's not clear whether we want to complicate their interface. However we may want to use it in to_graph and to_code. PiperOrigin-RevId: 213433379 * Add a fuzzer to test DecodeCompressed PiperOrigin-RevId: 213441868 * Automated rollback of commit 19d66a9 PiperOrigin-RevId: 213453719 * Creating an InstantiatedCapturedFunction that captures the instantiated state of a function to be executed, separating it out from the non instantiated regular state such as function name, captured inputs etc. This allows us to truly separate Dataset kernel creation from Iterator creation i.e. each time a dataset is created that uses functions, we create only a CapturedFunction whereas we create an InstantiatedCapturedFunction each time a new iterator is created. PiperOrigin-RevId: 213456128 * Extend template expansion support for arithmetic expressions. PiperOrigin-RevId: 213462334 * [SE] Restore int8x4 data types if that's the requested DataLayout for fused conv This broke in a recent refactoring. PiperOrigin-RevId: 213497416 * Link to readme for distribution strategy from distribute.py and package init file, so that folks looking at API documentation can find the readme as well. PiperOrigin-RevId: 213499832 * Only start_step/end_step on GradientTape if executing eagerly. This prevents creating a context where none is required. PiperOrigin-RevId: 213500408 * Register FakeResourceUpdateOp for the right op Before this CL the PartiallyDeclusterPassTest.DontDuplicateResourceVarOps test was buggy, in that it wasn't testing what it was supposed to test. PiperOrigin-RevId: 213501558 * Eliminate VisitableAllocator. The visitor pattern is used to allow pre-registration of memory for DMA access, e.g. for fast GPU/CPU i/o and for RDMA networking. The VisitableAllocator interface was introduced to support this use some time ago, prior to SubAllocators. Memory registration works best if it's done infrequently, on large pieces of memory, rather than on every piece that's dynamically allocated/freed. This usage pattern fits the SubAllocator better than a general Allocator. This change moves memory allocation visitor access to SubAllocator and eliminates the VisitableAllocator subclass of Allocator. This change also more rigorously enforces the requirement that all Visitors be declared prior to memory allocation begining. This is accomplished by requiring that Visitors be provided to the SubAllocator constructor. This refactoring will ease an upcoming CL introducing NUMA specific CPU devices. It also should fix some performance pitfalls (e.g. accidental use of PoolAllocator) introduced by an earlier refactoring of ProcessState that was also in preparation for NUMA. It restores the default use of the cpu_allocator() value (i.e. no SubAllocator) by model executions that don't use allocation visitors (since visitor registration must precede the first allocation, hence can be detected at that time). PiperOrigin-RevId: 213505655 * Clean up remove_negation pass in Grappler. PiperOrigin-RevId: 213520177 * Add error reporting TFLIte C API PiperOrigin-RevId: 213526489 * [TF:XLA] Document that the order of control predecessors matters. PiperOrigin-RevId: 213528296 * Automated rollback of commit b1ff7c2 PiperOrigin-RevId: 213528716 * Updates documentation of Estimator.predict to note that an issue with yielding and graph context. PiperOrigin-RevId: 213528782 * "Isolate" must-be-constant side effecting operations I first tried to fix this issue in cr/209996730 but didn't quite fix the problem for for XLA_* devices. A node assigned to an XLA_* device must be compiled so the cr/209996730 fix of simply not compiling the nodes doesn't generalize to XLA_* devices. Instead we now "isolate" these nodes, only putting them in a trivial one-node cluster. For non-XLA devices even this trivial cluster is ignored because of flags->tf_xla_min_cluster_size. I was initially considering a more principled data-flow-analysis based solution but then decided the upfront work isn't worth it until I see a clear motivating example. PiperOrigin-RevId: 213531437 * Convert more kernel signatures to use runtime shapes. PiperOrigin-RevId: 213536334 * Reject RESHAPE if new_shape tensor is not provided. PiperOrigin-RevId: 213541006 * Return OrderedDict as eval results should be sorted by global_step key. PiperOrigin-RevId: 213541935 * Add ConstantScalar, WithPredicate, Disjunction, and OpAnyOrder (where Op is a commutative binary operator) to the XLA pattern matcher. PiperOrigin-RevId: 213543953 * Convert the new metric instances to (value_op, update_op) tuple in the EstimatorSpec. PiperOrigin-RevId: 213548081 * Add a new function to load kernel libraries and library folders. PiperOrigin-RevId: 213549838 * Add layout information to logging. PiperOrigin-RevId: 213551652 * Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 213552354 * Update the grappler plugin to support the @Defun generated function and ops. PiperOrigin-RevId: 213554813 * [tf.data] Add a test for state persistence between iterators over the same MapDataset. PiperOrigin-RevId: 213555982 * Getting DNNModel to work with the new feature columns. PiperOrigin-RevId: 213561495 * First commit for functional while loop. Supports single and double derivatives but does not supporting nesting yet. tensorflow/community#13 PiperOrigin-RevId: 213565971 * Putting `NodeExecStatsWrapper` behind an interface and providing a light-weight statistics collector for tf.data performance modeling. PiperOrigin-RevId: 213566889 * [TF:XLA] Change HloPtrComparator to work across HLO modules. Declaring the method out of line does not increase compile time. PiperOrigin-RevId: 213571783 * Add xla.compile(), a low-level API that compiles graph with XLA. PiperOrigin-RevId: 213574904 * Modify Timeline Analysis to consider allocations in order. PiperOrigin-RevId: 213589710 * Implement sort op for CPU. Also don't allow parallelization for the sort op in parallel_task_assignment. PiperOrigin-RevId: 213592046 * Replace DLOG(FATAL) with an Unimplemented error. In tensorflow we don't have DLOG, and we should not use LOG(FATAL). PiperOrigin-RevId: 213595376 * Enable XlaSort and TopKV2 for CPU backend. PiperOrigin-RevId: 213595499 * compat: Update forward compatibility horizon to 2018-09-19 PiperOrigin-RevId: 213595705 * Run CPU tests remotely. Being able to run CPU tests remotely while running GPU tests locally required multiple changes: 1. Unify how we tag GPU tests in TF; we now always use tf_cuda_tests_tags(). 2. Tag tests using tf_cuda_tests_tags() with 'local' and 'gpu'; this makes them not run on non-gpu builds and always runs them locally. PiperOrigin-RevId: 213601626 * jacobian: manually setting the output shape in the output. PiperOrigin-RevId: 213610324 * Enable tests for CPU and GPU backends that involve XlaSort. PiperOrigin-RevId: 213611371 * [TF:XLA] Enable ClipByValue test for integer types This has been fixed a while ago. Even though TF allows ClipByValue for complex types it's not implemented anywhere (and it doesn't make sense for complex numbers) so blacklist complex types. PiperOrigin-RevId: 213615429 * Distributions should raise the original exception (log_prob not implemented) instead of the fallback exception (prob not implemented). Additionally, in a nested structure of transformed distributions, it can be useful to know which distribution is raising this error. PiperOrigin-RevId: 213618306 * Enable while_test for the GPU backend. PiperOrigin-RevId: 213618350 * Add interface for HLO passes which run on HloModuleGroup. Derive HloModulePass and HloModuleGroupPass from HloPassInterface which run module-scoped and module-group-scoped respectively. Replace all existing uses of HloPassInterface with HloModulePass because all existing passes are module-scoped. Also rewrite HloPassPipeline to support both module-scoped and module-group-scoped passes. PiperOrigin-RevId: 213629604 * Automated rollback of commit 9fe1778 PiperOrigin-RevId: 213630404 * Treat kDomain instruction as a pure pass-through in HloValue It doesn't access the data in any way similarly to kTuple so it should be handled the same way. PiperOrigin-RevId: 213630620 * Add build rules for mnist_softmax_xla.py so it can work internally. PiperOrigin-RevId: 213637804 * Convert more kernel signatures to use runtime shapes. PiperOrigin-RevId: 213640434 * [XLA:CPU] Add an emitter for erfinv(double) and erfinv(half). This is used by the random number generator. Same algorithm as for float, just with more precision. fp16 is upcasted to fp32 and then processed with the float algorithm. PiperOrigin-RevId: 213648736 * Convert more kernel signatures to use runtime shapes. PiperOrigin-RevId: 213651158 * Fix estimator_training test flakiness. PiperOrigin-RevId: 213653403 * Return error message with illegal input rather than check-failing in op_kernel. PiperOrigin-RevId: 213653853 * Force-place embedding variables on CPUs ein eager mode. This avoids problems which happen because most optimizers do not have sparse updating gpu kernels implemented. Fixes tensorflow#22042 PiperOrigin-RevId: 213654354 * Fix documentation markdown PiperOrigin-RevId: 213655969 * Enable large constant array deduping by default. If this causes trouble (makes graph visualizations harder to read, etc) then consider increasing the default value of dedupe_array_min_size_bytes. PiperOrigin-RevId: 213656796 * Python interface for Boosted Trees model explainability (currently includes directional feature contributions); fixed ExampleDebugOutputs bug where it errors with empty trees. PiperOrigin-RevId: 213658470 * Add a space to the error message. PiperOrigin-RevId: 213661062 * Re-enable flaky keras_test PiperOrigin-RevId: 213665390 * Remove non-determinism in model-parallel compilation PiperOrigin-RevId: 213667385 * Fixed broken links * [XLA:TF] Re-disable testRandomUniformIsInRange The bug is still there and makes this test flakily fail with fp16. PiperOrigin-RevId: 213669453 * Convert more kernel signatures to use runtime shapes. PiperOrigin-RevId: 213673402 * Adds an experimental package group to allow Swift and ObjC targets to depend on the "c_api" target. PiperOrigin-RevId: 213673549 * Simplify ir_emitter_unnested so that it doesn't take a look at conv custom call and try to understand what's inside. convolution_thunk does it anyway. PiperOrigin-RevId: 213676051 * Fixes in ResolveReorderAxes. The main issue is we were keeping the input array, updating it in place and discarding the output array. That was a problem when the input array had multiple consumer ops. Now we're keeping the output array instead, which is the correct thing to do. However, in order to minimize disruption, we keep using the input array's name whenever possible, by means of some array renamings. PiperOrigin-RevId: 213678219 * Two improvements in resolve_tensorflow_matmul: 1. Before inserting a new Transpose node, check if there already is one that may be reused. In practice, there are two cases: either the array being transposed is a constant (by far the most common case) or it's not. * If it is constant, then this doesn't really make a difference: ResolveConstantTranspose runs anyway, eliminating these Transpose nodes and also mootifying this change as it leaves no Transpose node to be reused. So in that case, constant-array-deduping is really the only thing that prevents duplication of data. * If it is not constant, that's where this new logic really helps, as the resulting Transpose nodes are here to stay in the final graph, and this avoids inserting more than are needed. 2. transpose_a is not supported. However, rather than CHECK-fail, it's more useful to have this graph transformation bail with a log message. The resulting 'unresolved' MatMul node could still be handled in some way at the TFLite level, or we could end up having support for MatMul per se. PiperOrigin-RevId: 213678294 * Remove the CHECK added for debugging. PiperOrigin-RevId: 213681549 * Fixes bits/bytes unit error in comment. PiperOrigin-RevId: 213684048 * [tf.data] MapVectorization optimization: C++ conversion framework to vectorize a MapDefun function. Also implements conversion for two ops: Cast and Unpack. PiperOrigin-RevId: 213686720 * Remove LOG(INFO) in MetaOptimizer:Optimize as this currently produces a large number of debugging outputs in the INFO log that look like: I0917 16:20:11.073992 9191 meta_optimizer.cc:334] Starting optimization for grappler item: tf_graph I0917 16:20:11.079458 9191 meta_optimizer.cc:334] Starting optimization for grappler item: tf_graph I0917 16:20:11.084827 12447 meta_optimizer.cc:334] Starting optimization for grappler item: tf_graph I0917 16:20:11.089359 12447 meta_optimizer.cc:334] Starting optimization for grappler item: tf_graph After this change those lines will simply no longer appear. RELNOTES: n/a PiperOrigin-RevId: 213690759 * Added ABSL_DEPRECATED annotations to various deprecated TensorFlow functions. PiperOrigin-RevId: 213693027 * Add min/max version for depthwise conv. PiperOrigin-RevId: 213698663 * Allow the tape tensor to have unknown shapes. This is done by making the TapeTensor a template rather than a concrete struct. PiperOrigin-RevId: 213700425 * Create a steps_per_run variable to be updated correctly in the fit loop to make sure we run fit for the right number of steps. PiperOrigin-RevId: 213706042 * Boosted trees: Add error messages when tree complexity parameter is not properly set. PiperOrigin-RevId: 213706101 * This CL adds a new `tf.print` operator that more closely aligns with the standard python `print` method, and deprecates the old `tf.Print` operator (to be removed in in v2.0). It follows the design doc specified in tensorflow/community#14 and additionally incorporates the community feedback and design review decisions. This CL adds two new internal graph operators: a StringFormat operator that formats a template string with a list of input tensors to insert into the string and outputs a string scalar containing the result, and a PrintV2 operator that prints a string scalar to a specified output stream or logging level. The formatting op is exposed at `tf.strings.Format`. A new python method is exposed at `tf.print` that takes a list of inputs that may be nested structures and may contain tensors, formats them nicely using the formatting op, and returns a PrintV2 operator that prints them. In Eager mode and inside defuns this PrintV2 operator will automatically be executed, but in graph mode it will need to be either added to `sess.run`, or used as a control dependency for other operators being executed. As compared to the previous print function, the new print function: - Has an API that more closely aligns with the standard python3 print - Supports changing the print logging level/output stream - allows printing arbitrary (optionally nested) data structures as opposed to just flat lists of tensors - support printing sparse tensors - changes printed tensor format to show more meaningful summary (recursively print the first and last elements of each tensor dimension, instead of just the first few elements of the tensor irregardless of dimension). PiperOrigin-RevId: 213709924 * Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 213716034 * [XLA] Add R2 strided slice test. PiperOrigin-RevId: 213718019 * Add VerifiedHloModule class. VerifiedHloModule is derived from HloModule and verifies itself on destruction. This is designed to be used in HloVerifiedTestBase. This replaces the current mechanism which verifies HloModules in the TearDown method. The VerifiedHloModule approach is cleaner (less state on the test object) and more capable because these verified HLO modules can be passed to methods which require taking ownership of the module (eg, HlotestBase::Execute). This change required some changes to the parser which enables constructing the parsed HloModule into an already allocated HloModule. Some trivial changes to HloModule are required as well. PiperOrigin-RevId: 213718126 * Allow setting a global override for the "allow_growth" GPU option via the TF_FORCE_GPU_ALLOW_GROWTH environment variable. PiperOrigin-RevId: 213728460 * TOCO transformations updated to support dilated depthwise convolution. PiperOrigin-RevId: 213729750 * Update ops-related pbtxt files. PiperOrigin-RevId: 213729979 * Fix the error message thrown when running eval on pod PiperOrigin-RevId: 213730668 * Copy Tensor._handle_data from external_capture to placeholder for Variant tensors in Graph mode defun. This allows inferring the shape of values popped from TensorLists inside defuns. Remove "Resource" from {Set|Get}ResourceHandleShapeAndType since the same functions are re-usable for variants. Eager mode fix coming in a future changelist. PiperOrigin-RevId: 213735462 * BEGIN_PUBLIC It's desirable to run int64 compute on GPU. Rolling back the folowing CL. *** Original change description *** Register a new Sum op for T:int64 and Tidx:int32 END_PUBLIC Automated rollback of commit a9a5929 PiperOrigin-RevId: 213736058 * Update TF Lite subsite PiperOrigin-RevId: 213737482 * Internal change. PiperOrigin-RevId: 213749129 * Fix typo error in grapper remapper optimizer. * Speeds up _random_flip for batched images. PiperOrigin-RevId: 213753728 * Add feature_group_count parameter of Convolution op to xla_client.py. This parameter has been added to HLO to support depthwise convolution. PiperOrigin-RevId: 213761790 * Add AOT test case for XlaSort. The only tensorflow op that uses XlaSort is nn.top_k, so we add a test case using nn.top_k. PiperOrigin-RevId: 213763591 * Automated rollback of commit 31c0857 PiperOrigin-RevId: 213764810 * Internal change. PiperOrigin-RevId: 213770000 * Automated rollback of commit da3357e PiperOrigin-RevId: 213771631 * compat: Update forward compatibility horizon to 2018-09-20 PiperOrigin-RevId: 213773990 * [XLA:TF] Whitelist quantized types for CPU/GPU These have the same behavior as unquantized types so we can just pass them through to XLA (which converts them to unquantized types). They're supposed to be used with special ops, none of which are currently implemented by XLA. Casting (without quantization) and basic math works fine though. These do not have a corresponding numpy type, so only tests using TF types will see them. PiperOrigin-RevId: 213781650 * Fix typo in _EnforceShapeInvariant. PiperOrigin-RevId: 213801006 * Callbacks should count the steps correctly in the multi step case PiperOrigin-RevId: 213829360 * [tf.data] Use vectorization_utils::VectorizeMapDefun in MapVectorization optimization PiperOrigin-RevId: 213840320 * [SE] Use absl instead of TF classes where an absl version exists With the exception of StrCat all of these are using absl already, this change just removes one layer of indirection. PiperOrigin-RevId: 213846036 * [data-stats] Adds number of filtered elements as scalar summary, also adds number of filtered elements to monitoring counter. PiperOrigin-RevId: 213846793 * Moving tpu_embedding_config.proto to tpu_embedding_configuration.proto, refactoring it, adding several new fields and an EmbeddingOutputLayout message to provide experimental support for controlling the embedding output. PiperOrigin-RevId: 213849572 * Replace the OrderedDict with a basic list/dict solution. OrderedDict is problematic to use in eager because of the circular references it creates. PiperOrigin-RevId: 213862402 * Fix _handle_data of variant and resource type outputs of While op in while_v2. tensorflow/community#13 PiperOrigin-RevId: 213862844 * Add searchsorted (ie lower/upper bound) op. PiperOrigin-RevId: 213863392 * Modify docs under contrib/distributions to point to tfp. PiperOrigin-RevId: 213866466 * Updating doc references to tf.distributions to point to tfp.distributions. PiperOrigin-RevId: 213867606 * Simplifies the ResourceVariable constructor. PiperOrigin-RevId: 213872127 * This CL adds a Keras-based mobilenet_v2 feature extractor for object detection models. As part of this CL, we use the Keras mobilenet_v2 application's keyword argument layer injection API to allow the generated network to support the object detection hyperparameters. PiperOrigin-RevId: 213872175 * [tf.data] Fixes for two recently introduced use-after-free bugs. 1. In ParallelMapIterator, do not call `cond_var_.notify_all()` without holding the associated mutex. In some cases, the iterator may have been deleted between releasing the lock and notifying the condition variable, which leads to a use-after-free. This change applies this style to all use of condition variables in tensorflow/core/kernels/data/. 2. In CapturedFunction::RunAsync(), do not use `shared_ptr` to manage the lifetime of objects that (potentially) borrow from runtime objects. The present code runs the destructor after the `done()` callback is called, but the `done()` callback may be the last action in a session, and thus trigger destruction of those borrowed objects. In that case, the `shared_ptr` destructor may use the borrowed objects after they are freed. PiperOrigin-RevId: 213872829 * Update ops-related pbtxt files. PiperOrigin-RevId: 213873471 * Implement TF graph capture. PiperOrigin-RevId: 213875284 * Fix bug in Pow optimizer rule when broadcasting is involved. Minor cleanup by moving the helper function ShapesEqual to GraphProperties and adding unit tests for it. PiperOrigin-RevId: 213876779 * Include the print function in the list of special functions - its name is not found in the namespace in Python 3. PiperOrigin-RevId: 213879813 * [Java]: Release 1.11.0-rc1 PiperOrigin-RevId: 213882538 * [XLA] Don't create mixed precision operations accidentally The reshape we created change the element type unintentionally. PiperOrigin-RevId: 213883142 * Remove restriction on scope for bypass operators. Previously, the scope had to be of the form 'scope/<arbitrary_text>'. Relax restriction to handle empty scopes. Enable this change to work for both fused and unfused batch norm layers PiperOrigin-RevId: 213883621 * Fix missing TODO. PiperOrigin-RevId: 213885561 * [tf.data] Some vectorization cleanup PiperOrigin-RevId: 213886813 * Add more specific ReLU implementation tests. PiperOrigin-RevId: 213890403 * This CL moves the tf.print logging level tests that are sensitive to OS & environment configurations to a separate test target, and disables running them on Windows. PiperOrigin-RevId: 213895372 * Split XlaLaunch into XlaCompile and XlaRun; NFC This CL splits the functionality in XlaLaunch into two separate operations: - XlaCompile, responsible for compiling a TF function into a LocalExecutable - XlaRun, responsible for executing a LocalExecutable created by XlaCompile This CL is a stepping stone towards implementing lazy compilation for TF/XLA. The XlaCompile op is spec'ed to return a boolean indicating whether the compilation was successful. Right now that boolean is always set to true by XlaCompile and its value is otherwise ignored, but in the future it will be used to indicate whether the TF function was compiled or not, and thus whether we should execute XlaRun or just directly call the TF function. XlaLaunch still exists, and will be created by create_xla_launch_op.cc. In the future we may consider removing it altogether. build_xla_launch_ops.cc, now renamed to build_xla_ops.cc, creates a XlaCompile/XlaRun pair instead of XlaLaunch. This CL is organized as follows: - jit/ops/xla_ops.cc gets two new XLA-specific operations, XlaCompile and XlaRun, described above. XlaRun redundantly takes the must-be-constant inputs to the TensorFlow cluster to keep the implementation simple (simple in the sense of similar to XlaLaunch), but I will remove this in a subsequent cleanup CL. - jit/kernels/xla_ops.cc implements XlaCompile and XlaRun in a fairly straightforward manner. XlaCompile compiles the TF function, puts it in a process-global storage, XlaExecutableClosureStore, and produces a int64 key. XlaRun uses the key to read out the LocalExecutable and execute it. I'm not sure if XlaExecutableClosureStore should be a resource like XlaCompilationCache; I did not immediately see any reason to make it so. - There are changes to the various _device files to register XlaCompile and XlaRun for the XLA_* devices. - Finally, I had to fix some tests that were expecting XlaLaunch in the execution timeline. PiperOrigin-RevId: 213895405 * Change all YAML booleans from True/False to true/false. PiperOrigin-RevId: 213896057 * It is more computationally efficient to represent resize bilinear as a depthwise convolution instead of a full convolution now that it exists in XLA. PiperOrigin-RevId: 213896333 * [tf.data] Moving auto-tuning optimizations into a background thread, refactoring the API for exposing tunable parameters, and removing `model::Node` from the public API. PiperOrigin-RevId: 213907565 * Fixes regression to tf.Print that removed square braces around printed tensors. PiperOrigin-RevId: 213912507 * Support 16 ways model parallelism. PiperOrigin-RevId: 213913013 * Updating doc references to tf.distributions to tfp.distributions. PiperOrigin-RevId: 213915666 * Update links to tf lite site. PiperOrigin-RevId: 213917881 * Update links to install pages. PiperOrigin-RevId: 213917946 * Add an API which gives explicit control over shard sizes and introspection into the number of shards used. This is a variant of threadpool::parallelFor PiperOrigin-RevId: 213920649 * Make threading.local not an instance member of collective ops because in python3 threading.local cannot be pickled. PiperOrigin-RevId: 213928766 * Return model format from LoadSessionBundleOrSavedModelBundle(), allowing callers to know if we up-converted a SessionBundle to SavedModel format. PiperOrigin-RevId: 213937542 * Fix cub include path so that TensorFlow compiles when used as a bazel dependency. PiperOrigin-RevId: 213942340 * Move from deprecated self.test_session() to self.cached_session(). self.test_session() has been deprecated in 9962eb5 as its name confuses readers of the test. Moving to cached_session() instead which is more explicit about: * the fact that the session may be reused. * the session is not closed even when doing a "with self.test_session()" statement. PiperOrigin-RevId: 213944355 * Move from deprecated self.test_session() to self.cached_session(). self.test_session() has been deprecated in 9962eb5 as its name confuses readers of the test. Moving to cached_session() instead which is more explicit about: * the fact that the session may be reused. * the session is not closed even when doing a "with self.test_session()" statement. PiperOrigin-RevId: 213944932 * keras/training.py: Improve error message. Inspired by: https://stackoverflow.com/questions/52428939/eager-mode-optimizers/ PiperOrigin-RevId: 213948133 * Internal change. PiperOrigin-RevId: 213948394 * [TF:XLA] Bump open source llvm revision to r342644 PiperOrigin-RevId: 213952786 * [XLA:CPU] Re-enable half float tests for unary ops This was blocked by an LLVM bug, which was fixed in r342542. PiperOrigin-RevId: 213953743 * compat: Update forward compatibility horizon to 2018-09-21 PiperOrigin-RevId: 213955428 * Added fetch support for attrs classes. Given a class @attr.s() class SampleAttr(object): field_1 = attr.ib() field_2 = attr.ib() we will be able to run obj = SampleAttr(tensor_1, tensor_2) session.run(obj) # equivalent with session.run([obj.field_1, obj.field_2]) Please note, this does not need nest flatten support (which is only relevant to the feed_dict argument). Also, the information in __attrs_attrs__ is provided for extensions (as per the docs: http://www.attrs.org/en/stable/extending.html#extending-metadata) like this and is not an "implementation detail". PiperOrigin-RevId: 213963978 * Use weakrefs where absolutely safe to do so, in order to reduce the number of circular references. Replace unnecessary OrderedDict with a regular dict. PiperOrigin-RevId: 213982097 * [TPU] Change the TPU DeviceAssignment class to use a flatter (replica, logical core) indexing scheme for cores. Previously the DeviceAssignment class mixed both a general concept (a mapping from (replica, logical core) to physical TPU core) and a specific instantiation of that concept, by imposing a particular 3D grid structure on the logical core numbers. This was excessive ? while the physical core numbers have a particular structure, there is no need to impose any particular structure on the logical core numbers. This change simplifies the DeviceAssignment scheme, changing it so logical cores within a replica are numbered sequentially without any particular semantics. PiperOrigin-RevId: 213984629

tomerk · 2019-02-07T19:23:11Z

Hi @ewilderj, a follow up on all this:
tf.print now successfully prints to colab notebooks, as a result of adding a Log listener registration mechanism to the Tensorflow C api that is used by tf.print, and registering a listener in interactive environments that prints to the Python stdout.

For now though we will be leaving output_stream in the tf.print's api because it matches how python's own print method works. If people want to extend this in the future they will be able to extend Tensorflow's source with new log listeners, or to expose interactive python log listener registration in more than just interactive environments, and do whatever they want with python's stdout.

ewilderj · 2019-02-11T21:32:07Z

Thanks for the update.

So to be clear, will this design proposal be resumed, or should I just archive it?

@karllessard

* Adding a doc to deprecate collections * Responding to Karmels comments * Minor fix to VariableTracker sample code * RFC for random numbers in TensorFlow 2.0 * Changes after some feedback * Removed 'global_seed' in the main code and showed the design with 'global_seed' in the Questions section. * Some changes after feedback * A tweak * Change after feedback * A tweak * changes * changes * fix link * new-rfc * changes * Update rfcs/20181225-tf-backend.md Co-Authored-By: alextp <apassos@google.com> * Added some considerations about tf.function * Renamed the internal name "op_generator" to "global_generator" * Changed seed size from 256 to 1024 bits * Initial signpost for community meetings Adding this so there is basic information about how to find the community calendar and get invited to meetings. * Add iCal link too * changes * Initial version of embedding and partitioned variable RFC. * Fix one formatting issue. * Fix another formatting issue. * Use markdown language for the table instead of HTML. * Add tensorflow/io R Package CRAN release instructions (tensorflow#53) * Added Design Review Notes * Make clear distinction between embedding variables and loadbalancing variables. * Added decisions below each question, and "how to use generators with distribution strategies". * Adopted Dong Lin's suggestions * Add a paragraph pointing out the problem with the `partition_strategy` argument. * RFC: Move from tf.contrib to addons (tensorflow#37) * Checkpoint addons RFC for review * Add code review to RFC Add future pull request information to criteria Update modified date added some description RFC Move to addons * Add weight decay optimizers * Remove conv2d_in_plane * Add group_norm * Accept addons RFC * Update alternatives since `DynamicPartition` and `DynamicStitch` do have GPU kernels. * Add a section for saving and restore `PartitionedVariable`. * Mention that variable types can be nested, attention needs to be paid to their saving and restoring mechanism. * Create README.md (tensorflow#57) * Splitted `_state_var` into `_state_var` and `_alg_var` (because of concerns from implementation), and changed status to "Accepted" * Updated timestamp * Moved the auto-selection of algorithm from `create_rng_state` to `Generator.__init__` * Update according to the discussion * Move performance heuristics in Distribution Strategy level. We will not expose knobs for users to control; * Emphasize that embedding support in v2 will all be via `Embedding` layer. Users can use `tf.compat.v1` to handle embedding by themselves; * Mention that default `partition_strategy` in v1 `embedding_lookup` is "mod", which will possibly break users's model when they update to TF 2.0; * We want to prioritize shuffling embedding after 2.0 release; * We have plans to serialize and deserialize `Embedding` layer and Distribution Strategies to allow loading a saved model to a different number of partitions. * Update relese binary build command for sig-io (tensorflow#58) This PR updates relese binary build command for sig-io Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add Bryan to SIG IO release team (tensorflow#59) * Change to accepted * Add link to TensorFlow IO R package * Updated link for the friction log. (tensorflow#64) * Switch DistStrat revised API examples to TensorFlow 2 style. (tensorflow#63) * RFC: Attention for Dense Networks on Keras (tensorflow#54) * Design review for "Attention for Dense Networks" * RFC: Stateful Containers with tf.Module (tensorflow#56) * Create 20190117-tf-module.md * Update 20190117-tf-module.md * Loosen return type for variable properties. * Use Dense consistently. Thanks brilee@ for spotting! * Remove convert_to_tensor from examples. This wasn't ever required and including it might cause confusion. h/t pluskid@ gehring@ and awav@ * Remove owned_* methods. * Document `_flatten` See tensorflow/tensorflow@5076adf6 for more context. * Fix typo in module name. Thanks k-w-w@! * Update 20190117-tf-module.md * RFC: New tf.print (tensorflow#14) * New tf.print proposal * Attempt to fix table of contents * Removed not-working TOC label * Minor updates to the doc. * Update tf.print to be accepted * Added design review notes * Marking doc as accepted * Update cond_v2 design doc (tensorflow#70) * Update to bring in line with implementation * Added the symbol map to the RFC. * Updated testing section of the Community site. * Removed the 100%, formatting tweaks. * Update CHARTER.md * Change contact email address I will leave my current company soon, so update my email. * Create README.md * Logos for SIGs * Update README.md * Update addons owners (tensorflow#85) Add Yan Facai as another project lead. * Created a FAQ for TF 2.0. (tensorflow#78) Adding 2.0 related FAQ to the Testing group. * Request and charter for SIG JVM (tensorflow#86) Chartering docs for SIG JVM * Update CODEOWNERS Add @karllessard, @sjamesr and @tzolov as code owners for sigs/jvm. * Update CODEOWNERS Add missing / * Update CODEOWNERS Add @dynamicwebpaige as owner for sigs/testing/ * Update RFC with current information (tensorflow#89) Make current to SIG Addons * RFC: TF on Demand Project (tensorflow#69) * Adding an RFC for TF on Demand Project. * modified one line in tf-on-demand md file. * Changing RFC status from PROPOSED to ACCEPTED. * RFC: SavedModel Save/Load in 2.x (tensorflow#34) * RFC for SavedModel Save/Load in 2.x * Minor edits and a discussion topic for load() with multiple MetaGraphs * Tweak to the "Imported representations of signatures" section * Update "Importing existing SavedModels" with the .signatures change * Update RFC and add review notes * Status -> accepted * Update CHARTER.md New leads. * Update 20180920-unify-rnn-interface.md (tensorflow#81) Typo fix. * Update yyyymmdd-rfc-template.md Adding "user benefit" section into the RFC template, to encourage articulating the benefit to users in a clear way. * Update while_v2 design doc (tensorflow#71) * Update while_v2 design doc, include link to implementation * Update TF 2.0 FAQ to link to TensorBoard TF 2.0 tutorial (tensorflow#94) * CLN: update sig addons logo png (tensorflow#99) * Add SIG Keras Add a reference link to Keras' governance repository for SIG Keras. * RFC: String Tensor Unification (tensorflow#91) * RFC: String Tensor Unification * Updated rfcs/20190411-string-unification.md Updated TFLite sections to address feedback from @jdduke. Marked as Accepted. * Start RFC for tensor buffers

tomerk requested review from ewilderj and martinwicke as code owners August 24, 2018 20:51

This comment has been minimized.

Sign in to view

Tomer Kaftan added 3 commits August 24, 2018 13:56

New tf.print proposal

a8baa76

Attempt to fix table of contents

b7c1c74

Removed not-working TOC label

c86d351

tomerk force-pushed the master branch from 1a97075 to c86d351 Compare August 24, 2018 20:57

ewilderj added RFC: Proposed RFC Design Document 2.0 TensorFlow 2.0 development labels Aug 27, 2018

ewilderj added this to Open reviews in RFC management Aug 27, 2018

facaiy reviewed Aug 28, 2018

View reviewed changes

karmel reviewed Aug 28, 2018

View reviewed changes

ebrevdo mentioned this pull request Sep 16, 2018

tf.Print() re/direction tensorflow/tensorflow#15953

Closed

ewilderj moved this from Open reviews to Awaiting Committee in RFC management Sep 18, 2018

Minor updates to the doc.

be1b4b1

ewilderj moved this from Awaiting Committee Notes to In Revision in RFC management Sep 19, 2018

tomerk mentioned this pull request Jan 11, 2019

Capture output coming from C and C++ libraries ipython/ipykernel#110

Open

Update tf.print to be accepted

0292cd1

tomerk requested a review from goldiegadde as a code owner February 12, 2019 21:52

googlebot added the cla: yes label Feb 12, 2019

ewilderj added RFC: Accepted RFC Design Document: Accepted by Review and removed RFC: Proposed RFC Design Document labels Feb 12, 2019

ewilderj moved this from In Revision to Accepted RFCs in RFC management Feb 12, 2019

ben-albrecht mentioned this pull request Feb 17, 2019

Chapel output does not propagate to Jupyter notebook cells chapel-lang/chapel#12359

Open

ewilderj approved these changes Feb 21, 2019

View reviewed changes

ewilderj merged commit 0530779 into tensorflow:master Feb 21, 2019

RFC: New tf.print #14

RFC: New tf.print #14

Conversation

tomerk commented Aug 24, 2018 • edited by ewilderj Loading

New tf.print

Summary

This comment has been minimized.

googlebot commented Aug 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomerk Aug 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomerk Aug 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karmel commented Sep 13, 2018

mitar commented Sep 16, 2018 • edited Loading

ewilderj commented Sep 19, 2018

tomerk commented Sep 19, 2018

ewilderj commented Sep 19, 2018

tomerk commented Sep 19, 2018

mitar commented Sep 19, 2018 • edited Loading

tomerk commented Sep 19, 2018 • edited Loading

mitar commented Sep 19, 2018

ewilderj commented Sep 20, 2018

tomerk commented Feb 7, 2019

ewilderj commented Feb 11, 2019

tomerk commented Aug 24, 2018 •

edited by ewilderj

Loading

tomerk Aug 30, 2018 •

edited

Loading

tomerk Aug 30, 2018 •

edited

Loading

mitar commented Sep 16, 2018 •

edited

Loading

mitar commented Sep 19, 2018 •

edited

Loading

tomerk commented Sep 19, 2018 •

edited

Loading